Data Analysis: Beyond Simple Raw Counts
Effective, multi-level analysis of Web data is a critical element for the survival of many Web-oriented businesses, and the design (and determination) of data-analysis tests is often the job of systems administrators and in-house application designers who may not have an understanding of statistics beyond tabulating raw counts. In this article, Paul Meagher delivers the skills and concepts Web developers need to be able to apply inferential statistics to their Web data streams.
Dynamic Web sites generate an enormous amount of data -- access logs, poll and survey results, customer profiles and orders, and more -- so increasingly, the job of a Web developer is not just to create the applications that generate this data, but also to develop applications and approaches to make sense of these data steams.
Often, the response of Web developers to the growing data-analytic requirements of managing their sites is inadequate. For the most part, Web developers haven't progressed much beyond reporting various descriptive statistics to characterize the data streams. An array of inferential statistical procedures (methodologies for estimating population parameters based upon sample data) could be fruitfully exploited, but at present are not being applied.
For example, Web-access statistics (as currently compiled) are little more than frequency counts grouped in various ways. The results of polls and surveys are too often expressed in terms of simple raw counts and percentages.
Maybe developers shouldn't be expected to deal with the statistical analysis of data streams except in superficial ways. After all, there are those who devote careers to the more complex data-stream analysis; they're called statisticians and trained analysts. They can be brought in when an organization needs more than just descriptive statistics.
However, an alternative response is to acknowledge that increasing savvy with inferential statistics is becoming part of the job description for Web developers. Dynamic sites are generating more and more data and it is arguably the responsibility of Web developers and system administrators to find ways of turning this data into actionable knowledge.
I advocate the latter response; this article is intended to help Web developers and systems administrators learn (or activate, in the case of inert knowledge) the design and analysis skills necessary to apply inferential statistics to their Web data streams.