Take Web Data Analysis To The Next Level With PHP Apply the knowledge

Take Web data analysis to the next level with PHP

By Paul Meagher - 2004-04-12 Page: 1 2 3 4 5 6 7 8 9 10 11

Apply the knowledge

In this article, you have learned how to apply inferential statistics to the ubiquitous frequency data used to summarize Web data streams, focusing on the analysis of Web poll data. However, the simple one-way Chi Square analysis procedure discussed can be fruitfully applied to other types of data streams (access logs, survey results, customer profiles, customer orders) to turn raw data into actionable knowledge.

I also covered the desirability, when applying inferential statistics to Web data, to regard data streams as outcomes of Web experiments so that you increase the likelihood of invoking experimental design considerations in making your inferences. Often you cannot make inferences because you do not have adequate controls in your data-collection process. This can change, however, if you become more proactive in applying experimental design tenets to your Web data collection procedures (such as, randomize the selection of voters in your Web polls).

Finally, I demonstrated how to simulate the Chi Square sampling distribution for different degrees of freedom, going beyond simply commenting on its derivation. In doing so, I also demonstrated a workaround (simulating the sampling distribution for experiments using a small $NTrials value) to the prohibition of using the Chi Square test in cases in which the expected frequency of measurement categories is less than 5 (in other words, a small N experiment). So, instead of just using the df from the study to compute the probability of a sample outcome, for small numbers of trials, you might also need to use the $NTrials value as a parameter to evaluate the probability of the observed Chi Square result.

It is worth pondering how you might analyze small N experiments because often you might want to analyze your data before data collection is complete -- when each observation is costly, when observations take a long time to obtain, or simply because you are curious. These two questions are good to keep in mind when attempting this level of Web-data analysis:

Are you justified in making inferences under conditions of small N or not?
Can simulation help you determine what inferences to draw under these circumstances?

View Take Web data analysis to the next level with PHP Discussion

Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Resources

First published by IBM developerWorks