Apply the knowledge
In this article, you have learned how to apply inferential statistics to the ubiquitous frequency data used to summarize Web data streams, focusing on the analysis of Web poll data. However, the simple one-way Chi Square analysis procedure discussed can be fruitfully applied to other types of data streams (access logs, survey results, customer profiles, customer orders) to turn raw data into actionable knowledge.
I also covered the desirability, when applying inferential statistics to Web data, to regard data streams as outcomes of Web experiments so that you increase the likelihood of invoking experimental design considerations in making your inferences. Often you cannot make inferences because you do not have adequate controls in your data-collection process. This can change, however, if you become more proactive in applying experimental design tenets to your Web data collection procedures (such as, randomize the selection of voters in your Web polls).
Finally, I demonstrated how to simulate the Chi Square sampling
distribution for different degrees of freedom, going beyond simply
commenting on its derivation. In doing so, I also demonstrated a
workaround (simulating the sampling distribution for experiments using
$NTrials value) to the prohibition of using the
Chi Square test in cases in which the expected frequency of measurement
categories is less than 5 (in other words, a small N
experiment). So, instead of just using the df from the study to compute
the probability of a sample outcome, for small numbers of trials, you
might also need to use the
$NTrials value as a parameter to evaluate the probability of the observed Chi Square result.
It is worth pondering how you might analyze small N experiments because often you might want to analyze your data before data collection is complete -- when each observation is costly, when observations take a long time to obtain, or simply because you are curious. These two questions are good to keep in mind when attempting this level of Web-data analysis:
- Are you justified in making inferences under conditions of small N or not?
- Can simulation help you determine what inferences to draw under these circumstances?