Developer Forums | About Us | Site Map
Search  
HOME > TUTORIALS > SERVER SIDE CODING > PHP TUTORIALS > TAKE WEB DATA ANALYSIS TO THE NEXT LEVEL WITH PHP


Sponsors





Useful Lists

Web Host
site hosted by netplex

Online Manuals

Take Web data analysis to the next level with PHP
By Paul Meagher - 2004-04-12 Page:  1 2 3 4 5 6 7 8 9 10 11

Start with the sampling

Imagine you run a weekly poll on your site -- www.NovaScotiaBeerDrinkers.com -- asking members for their opinions on various topics. You've created a poll that asks members their favorite brand of beer (in Nova Scotia, there are three well-known brands: Keiths, Olands, and Schooner). So the survey is as inclusive as possible, you include "Other" among the responses.

You receive 1,000 responses and observe the results in Table 1. (The results shown in this article are for demonstration purposes only and not based on actual surveys.)

Table 1. The beer poll

KeithsOlandsSchoonerOther
285 (28.50%)250 (25.00%)215 (21.50%)250 (25.00%)

The data appears to support the conclusion that Keiths is the most popular brand among Nova Scotia residents. Based on these numbers, can you draw this conclusion? In other words, can you make an inference about the population of Nova Scotia beer drinkers on the basis of results obtained from the sample?

Many factors related to how the sample was collected could render your relative popularity inferences incorrect. Perhaps the sample consists of an inordinate number of employees of Keith's Brewery; perhaps you didn't properly guard against multiple votes by one person who may have biased the outcome; perhaps those who elected to vote are different from those who elected not to vote; perhaps the online voters are different from the offline voters.

Most Web polls are subject to such interpretive difficulties. These interpretive difficulties arise when you try to draw conclusions about a population parameter from a sample statistic. From an experimental design point of view, one of the first questions to ask before you collect data is whether you can take steps to help ensure that your sample is representative of the population of interest.

If drawing conclusions about the population of interest is your motivation for a Web poll (versus entertainment for site visitors), then you should implement techniques to ensure one vote per person (so that, they must login with a unique ID to vote) and randomize the selection sample of voters (for instance, select a random subset of members and e-mail them encouragement to vote).

Ultimately, the aim is to eliminate, or at least reduce, various biases that might impair your ability to draw inferences about your population of interest.



View Take Web data analysis to the next level with PHP Discussion

Page:  1 2 3 4 5 6 7 8 9 10 11 Next Page: Test the hypothesis

First published by IBM developerWorks


Copyright 2004-2017 GrindingGears.com. All rights reserved.
Article copyright and all rights retained by the author.