Handle Output Issues
The code in Listing 5 shows how easy it is to perform a Chi Square analysis using the
ChiSquare1D.php class. It also demonstrates the handling of output issues.
The script invokes a wrapper script called
ChiSquare1D_HTML.php. The purpose of this wrapper script is to separate the logic of the Chi Square procedure from its presentational aspects. The _HTML suffix indicates that the output is intended for a standard Web browser or other HTML-rendering device.
Another purpose of the wrapper script is to organize the output in
ways that facilitate understanding the data. Towards this end, this
class contains two methods for displaying the results of the Chi Square
showTableSummary method displays the first output table shown following the code (Table 2), while the
showChiSquareStats displays the second output table (Table 3).
The script generates the following output:
Table 2. Expected frequencies and variances from running the wrapper script
Table 3. Various Chi Square statistics from running the wrapper script
Table 2 displays the expected frequencies and the variance measure for each cell, (O - E)2 / E. The sum of the variance scores is equal to the obtained Chi Square (9.80) value that is reported in the lower right cell of the summary table.
Table 3 reports various Chi Square statistics. It includes the degrees of freedom used in the analysis and the obtained Chi Square value is reported again. The obtained Chi Square value is re-expressed as a tail probability value -- in this case, 0.02. This means that the probability of observing a Chi Square value as extreme as 9.80 under the null hypothesis is 2 percent (which is quite a low probability).
Most statisticians would not argue if you decided to reject the null hypothesis that the results can be accounted for in terms of random sampling variability from the null distribution. It is more likely that your poll results reflect a real difference in brand preference among the population of Nova Scotia beer drinkers.
Just to confirm this conclusion, you can compare the obtained Chi Square value to the Critical value.
Why is the Critical value important? The Critical value is based upon the significance level (or alpha-cutoff level) set for the analysis. The alpha-cutoff value is conventionally set at 0.05 (and used for the above analysis). This setting is used to find the location (or critical value) on the Chi Square sampling distribution that includes a tail area equal to the alpha-cutoff value (0.05).
In this study, the obtained Chi Square value was larger then the Critical value. This means that the threshold for retaining the null hypothesis explanation was exceeded. The alternative hypothesis -- that a difference in proportions exists in the population -- is statistically more likely to be true.
In the automated analysis of data steams, an alpha-cutoff setting could be used to set an output filter for a knowledge-discovery algorithm (such as Chi Square Automatic Interaction Detection, or CHIAD) that does not have the benefit of detailed human guidance in discovering real and useful patterns.