Developer Forums | About Us | Site Map


Useful Lists

Web Host
site hosted by netplex

Online Manuals

Take Web data analysis to the next level with PHP
By Paul Meagher - 2004-04-12 Page:  1 2 3 4 5 6 7 8 9 10 11

Handle output issues

The code in Listing 5 shows how easy it is to perform a Chi Square analysis using the ChiSquare1D.php class. It also demonstrates the handling of output issues.

The script invokes a wrapper script called ChiSquare1D_HTML.php. The purpose of this wrapper script is to separate the logic of the Chi Square procedure from its presentational aspects. The _HTML suffix indicates that the output is intended for a standard Web browser or other HTML-rendering device.

Another purpose of the wrapper script is to organize the output in ways that facilitate understanding the data. Towards this end, this class contains two methods for displaying the results of the Chi Square analysis. The showTableSummary method displays the first output table shown following the code (Table 2), while the showChiSquareStats displays the second output table (Table 3).

Listing 5. Organizing data with a wrapper script


// beer_poll_analysis.php

require_once "../init.php";

require_once PHP_MATH . "chi/ChiSquare1D_HTML.php";

$Headings = array("Keiths", "Olands", "Schooner", "Other");

$ObsFreq  = array(285, 250, 215, 250);
$Alpha    = 0.05;
$Chi      = new ChiSquare1D_HTML($ObsFreq, $Alpha);

echo "<br><br>";


The script generates the following output:

Table 2. Expected frequencies and variances from running the wrapper script


Table 3. Various Chi Square statistics from running the wrapper script

Chi Square39.800.027.81

Table 2 displays the expected frequencies and the variance measure for each cell, (O - E)2 / E. The sum of the variance scores is equal to the obtained Chi Square (9.80) value that is reported in the lower right cell of the summary table.

Table 3 reports various Chi Square statistics. It includes the degrees of freedom used in the analysis and the obtained Chi Square value is reported again. The obtained Chi Square value is re-expressed as a tail probability value -- in this case, 0.02. This means that the probability of observing a Chi Square value as extreme as 9.80 under the null hypothesis is 2 percent (which is quite a low probability).

Most statisticians would not argue if you decided to reject the null hypothesis that the results can be accounted for in terms of random sampling variability from the null distribution. It is more likely that your poll results reflect a real difference in brand preference among the population of Nova Scotia beer drinkers.

Just to confirm this conclusion, you can compare the obtained Chi Square value to the Critical value.

Why is the Critical value important? The Critical value is based upon the significance level (or alpha-cutoff level) set for the analysis. The alpha-cutoff value is conventionally set at 0.05 (and used for the above analysis). This setting is used to find the location (or critical value) on the Chi Square sampling distribution that includes a tail area equal to the alpha-cutoff value (0.05).

In this study, the obtained Chi Square value was larger then the Critical value. This means that the threshold for retaining the null hypothesis explanation was exceeded. The alternative hypothesis -- that a difference in proportions exists in the population -- is statistically more likely to be true.

In the automated analysis of data steams, an alpha-cutoff setting could be used to set an output filter for a knowledge-discovery algorithm (such as Chi Square Automatic Interaction Detection, or CHIAD) that does not have the benefit of detailed human guidance in discovering real and useful patterns.

View Take Web data analysis to the next level with PHP Discussion

Page:  1 2 3 4 5 6 7 8 9 10 11 Next Page: Repoll

First published by IBM developerWorks

Copyright 2004-2017 All rights reserved.
Article copyright and all rights retained by the author.