Limitations of Web Logs As Data Collection Mechanism
All those who use clickstream data (which is the recording of what a computer user clicks on while Web browsing or using another software application) would know that there are four main ways of capturing clickstream data - web logs, web beacons, JavaScript tags and packet sniffing.
Ever since people started using the Web, web logs have been the original source of data collection.
Though it has many advantages, there are certain limitations too of using web logs as data collection mechanism.
The primary target of web logs is to capture technical information like 404 errors, server usage trends, browser types and so on.
Therefore they are not ideally suited to catch business or marketing information.
In cases where business and marketing information is required, a close collaboration with the IT team is needed.
This may result in a dependency on their release schedules and can be somewhat mitigated with other data capture mechanisms to enable faster movement.
Web logs are created to identify all clicks on the server.
Hence, it is essential that when using logs one has to be very careful and deliberate about applying the right filters to remove image requests, page errors, robot traffic, requests for Cascading Style Sheets (CSS) files etc.
in order to get accurate traffic trends and behavior.
Also, if the web server is not setting cookies, identifying visitors with any degree of accuracy can be very challenging.
Sometimes page cashing by ISPs and proxy servers mean that some traffic on a website (10% or more) can be invisible.
Since page cashing is quite common website pages such as home page, product page and so forth are cashed at ISP or proxy servers.
So when someone from that ISP's network requests your home page, it is served from the ISP and not your web server.
So you will not have an entry for that request in your log files.
Ever since people started using the Web, web logs have been the original source of data collection.
Though it has many advantages, there are certain limitations too of using web logs as data collection mechanism.
The primary target of web logs is to capture technical information like 404 errors, server usage trends, browser types and so on.
Therefore they are not ideally suited to catch business or marketing information.
In cases where business and marketing information is required, a close collaboration with the IT team is needed.
This may result in a dependency on their release schedules and can be somewhat mitigated with other data capture mechanisms to enable faster movement.
Web logs are created to identify all clicks on the server.
Hence, it is essential that when using logs one has to be very careful and deliberate about applying the right filters to remove image requests, page errors, robot traffic, requests for Cascading Style Sheets (CSS) files etc.
in order to get accurate traffic trends and behavior.
Also, if the web server is not setting cookies, identifying visitors with any degree of accuracy can be very challenging.
Sometimes page cashing by ISPs and proxy servers mean that some traffic on a website (10% or more) can be invisible.
Since page cashing is quite common website pages such as home page, product page and so forth are cashed at ISP or proxy servers.
So when someone from that ISP's network requests your home page, it is served from the ISP and not your web server.
So you will not have an entry for that request in your log files.
Source...