Search

Useful Lists

Web Host
Partners

Online Manuals

Apply probability models to Web data using PHP
By Paul Meagher - 2004-04-14 Page:  1 2 3 4 5 6 7 8 9 10 11

## Is mt_rand() really random?

To obtain a pseudo-random number using PHP's random-number generator, call the `mt_rand()` function and it will return a value between 0 and RAND_MAX in which RAND_MAX is a system-defined upper limit (which you can inspect by calling the `mt_getrandmax()` function).

The `mt_rand()` function uses the Mersenne Twister algorithm and is four times faster and better characterized than PHP's older `rand()` function.

Before you use PHP's `mt_rand()` in your probability models, you might want to convince yourself that the `mt_rand()` function works correctly. How could you do this?

Most developers are content to write a script, get it to generate a few random values, and then accept that it is working correctly if they don't notice any obvious biases in the numbers that are appearing. This eyeball analysis might convince you, but it won't, as they say, convince the lawyers.

One approach to find more convincing evidence is to precisely define what it means for a sequence of numbers to be random. A random sequence of numbers should have many properties, but one of the most important properties is that each number in the range of possible values should have an equal likelihood of appearing at each point in the sequence.

A way to measure whether this is true is by counting the number of times each value occurs and graphing the frequency counts for each value. The resulting graph should approximate a uniform distribution of counts for each value in your range. If you limit the range of allowable sequence numbers from 0 to 9 and generate a sequence of 1,000 numbers, then the graph should approximate the discrete uniform distribution depicted in Figure 5.

To test whether PHP's `mt_rand()` function generates a uniform distribution of random values, I've created a script that uses the Chi Square test to determine this. The first half of the script is primarily concerned with creating a frequency distribution from output of `mt_rand()`. The second half performs the ChiSquare test.

The test involves setting the alpha cutoff to use for computing a critical Chi Square value. If the obtained Chi Square value exceeds the critical Chi Square value, then you would reject the null hypothesis that the `mt_rand()` values come from a uniform distribution. In fact, you would not reject the null hypothesis if `mt_rand()` is working as it should.

Listing 2. PHP Chi Square script to determine the accuracy of mt_rand()
 `````` ". \$Chi->showTableSummary(\$Headings) ."

"; echo "

". \$Chi->showChiSquareStats() ."

"; ?> ``````

The following table shows a sample output from this script. As the obtained Chi Square value of 7.90 is less than the critical value of 16.92, you cannot reject the null hypothesis that your observed frequencies are different than the frequencies expected under the assumption that you are sampling from a uniform distribution.

Table 1. Output from PHP Chi Square script

 0 1 2 3 4 5 6 7 8 9 Totals Observed 91 115 90 104 101 95 105 113 88 98 1000 Expected 100 100 100 100 100 100 100 100 100 100 1000 Variance 0.81 2.25 1 0.16 0.01 0.25 0.25 1.69 1.44 0.04 7.90

 Statistic DF Obtained Prob Critical Chi Square 9 7.90 0.54 16.92

It can be instructive to run this script a number of times and observe that on some occasions you reject the null hypothesis. Why do you think this occurs? How often can this occur before you need to reject the null hypothesis? And is there a tool to help make these determinations?

View Apply probability models to Web data using PHP Discussion

Page:  1 2 3 4 5 6 7 8 9 10 11 Next Page: Designing a PDL