Take Web Data Analysis To The Next Level With PHP Repoll

Take Web data analysis to the next level with PHP

By Paul Meagher - 2004-04-12 Page: 1 2 3 4 5 6 7 8 9 10 11

Repoll

Another interesting application of the one-way Chi Square test is to repoll to see if responses have changed.

Imagine that you were to do another Web poll of Nova Scotia beer drinkers after a period had elapsed. You again ask about their favorite brand of beer and now observe the following:

Table 4. A new beer poll

Keiths	Olands	Schooner	Other
385 (27.50%)	350 (25.00%)	315 (22.50%)	350 (25.00%)

Recall that the past data looked like this:

Table 1. The old beer poll, yet again

Keiths	Olands	Schooner	Other
285 (28.50%)	250 (25.00%)	215 (21.50%)	250 (25.00%)

The obvious difference between the poll outcomes is that the first poll had 1,000 observations and the second one had 1,400 observations. The main effect of these additional observations is a 100-point increase in the frequency count for each response alternative.

When ready to do the analysis of the new poll, you can choose to analyze the data using the default method of computing the expected frequencies or you can initialize the analysis with the expected probability of each outcome based on the proportions observed in the previous poll. In the second case, you load the previously obtained proportions into an expected probability array ($ExpProb) and use them to compute the expected frequency values for each response option.

Listing 6 shows the beer-poll analysis code for detecting changing preferences:

Listing 6. Detecting changing preferences


<?php

// beer_repoll_analysis.php

require_once "../init.php";

require PHP_MATH . "chi/ChiSquare1D_HTML.php";

$Headings = array("Keiths", "Olands", "Schooner", "Other");

$ObsFreq  = array(385, 350, 315, 350);
$Alpha    = 0.05;
$ExpProb  = array(.285, .250, .215, .250);

$Chi = new ChiSquare1D_HTML($ObsFreq, $Alpha, $ExpProb);

$Chi->showTableSummary($Headings);
echo "<br><br>";
$Chi->showChiSquareStats();

?>

Tables 5 and 6 show the HTML output that the beer_repoll_analysis.php script generates:

Table 5. Expected frequencies and variances from running beer_repoll_analysis.php

	Keiths	Olands	Schooner	Other	Totals
Observed	385	350	315	350	1400
Expected	399	350	301	350	1400
Variance	0.49	0.00	0.65	0.00	1.14

Table 6. Various Chi Square statistics from running beer_repoll_analysis.php

Statistic	DF	Obtained	Prob	Critical
Chi Square	3	1.14	0.77	7.81

Table 6 shows you have a 77 percent probability of obtaining the Chi Square value of 1.14 under the null hypothesis. We cannot reject the null hypothesis that the preferences of Nova Scotia beer drinkers have changed since your last poll. Any discrepancies between the observed and expected frequencies can be accounted for as expected sampling variability from the same population of Nova Scotia beer drinkers. This null finding should not be a surprise given that the transformation of the original poll results was just to add a constant of 100 to each previous poll outcome.

You can imagine, however, that the results might have been different and that they may have suggested a different brand of beer was becoming more popular (by noting the size of the variance reported below each column in Table 5). You can further imagine that such a finding would have significant financial implications for the breweries in question since bar owners tend to stock the most popular beer in their locality.

These results would be subjected to intense scrutiny by brewery owners who would question the appropriateness of the analytic procedures and experimental methodology; in particular, they would question the representativeness of the samples. If you plan to conduct a Web experiment that may have significant practical implications, you need to pay equal attention to the experimental methodologies you use to collect the data and the analysis techniques you employ to make inferences from your data.

So not only can this article give you a good grounding so you can increase your effective understanding of Web data, it can offer some advice on how to defend your selection of statistical test and provide additional legitimacy to the conclusions you draw from the data.

View Take Web data analysis to the next level with PHP Discussion

Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Apply the knowledge

First published by IBM developerWorks