Implement Bayesian Inference Using PHP: Part 2 Beta distribution sampling model

Implement Bayesian inference using PHP: Part 2

By Paul Meagher - 2004-05-19 Page: 1 2 3 4 5 6 7 8 9 10

Beta distribution sampling model

A random variable is said to have the standard beta distribution with parameters a and b if its probability density function is:

f() = ^{a - 1} * (1 - ) ^{b - 1} / B(a, b)

Rather than explain the formula by resorting to more mathematics, I will discuss PHP code that you can use to compute f() for various values of a and b. Towards this end I created a class called BetaDistribution.php and added it to a probability distributions package I developed for a previous article (see Resources). This class supplies methods that accept the a, b, and parameters.

The class constructor is first called with the a and b parameters as follows:

Listing 4. Instantiating the BetaDistribution class


<?php

// Demonstration of how to instantiate Beta Distribution
// class.

require_once "../BetaDistribution.php"; 

$a = 1;  // num successes
$b = 4;  // num failures

$beta = new BetaDistribution($a, $b); 

?>

In this example, the number of success events (previously k) is denoted by a. The number of failure events is equal to n - k and is denoted by b. The a and b parameters jointly control the shape and location of the beta distribution curves.

Graphically speaking, the beta distribution refers to a large family of plotting curves that can differ substantially from one another depending upon the a and b parameter values. As you shall see, the a and b parameters can be used to represent the prior probability distribution that you feel is most appropriate for representing P(_i).

Suppose you test your simple binary survey before going live. Select five people that you think are representative of the target sampling population and ask them to fill it out the survey online; then observe the following results:

One participant responds "yes" (a "success" event).
Four participants respond "no" ("failure" events).

The code in Listing 5 invokes the BetaDistribution with the appropriate parameter values of a=1 and b=4 to represent the results of this survey. Once you instantiate the beta distribution constructor with the appropriate a and b parameter values, you can then use other methods in this class (depicted in the following) to compute standard probability distribution functions:

Listing 5. Using other methods to compute probability distribution functions



<?php

$a = 1;  // num successes
$b = 4;  // num failures

$beta = new BetaDistribution($a, $b)

echo $beta->getMean() ."<br />"; 
echo $beta->getStandardDeviation() ."<br />";
echo $beta->PDF(.2) ."<br />"; 
echo $beta->CDF(.50) ."<br />"; 
echo $beta->inverseCDF(0.95); 

?>

Which produces the output displayed in this table:

Table 4. Output of beta probability distribution methods

BetaDistribution(1, 4)
Methods	Output
getMean()	0.2
getStandardDeviation()	0.16329931618555
PDF(.2)	2.048
CDF(.50)	0.9375
inverseCDF(0.95)	0.52712919549841

If the test has no glitches, you can go into your main experiment with BetaDistribution(1, 4) being used to represent your prior distribution P(). Note that the mean value ( = p = k/n = a / a + b = .20) reported in the table is what you expect it to be from common-sense considerations (such as the expected value of equal to the observed proportion of cases to date k/n).

To visualize your prior probability distribution, you can use the following code below to obtain the x and y coordinates to plot. The probability density function PDF() returns a "probability" value associated with a particular value -- for instance, P[p = .20]. Given a contiguous range of p values, the PDF() method give you a corresponding range of probability values f(p) that you can use to graph the shape of the probability distribution for fixed a, b parameters and for a range of possible values:

Listing 6. Obtaining x and y coordinates to plot



<?php

require_once '../BetaDistribution.php';

$a = 1;  // num successes
$b = 4;  // num failures

$beta = new BetaDistribution($a, $b); 

$i = 0; // counter 
for($p = 0.01; $p <= 0.99; $p = $p + 0.04 ) {
  $pdf_vals[$i]   = $beta->PDF($p);  // y coordinates
  $parameters[$i] = $p;              // x coordinates 
  $i++;
}

$mean  = sprintf("%0.3f", $beta->getMean());
$stdev = sprintf("%0.3f", $beta->getStandardDeviation());

?>

In the following graph, p = $parameters[$i], and f(p) = $pdf_vals[$i].

Figure 4. Prior distribution is not well defined; too few observations

The exact values of f(p) are of less concern than the overall shape and center of gravity for the prior distribution. What this graph shows is that your prior distribution is still not very well defined because it does not peak around a particular parameter estimate. This is as it should be when you only have a few observations to work with.

Beta distribution source code

The BetaDistribution class depends upon methods supplied by the parent ProbabilityDistribution class and functions supplied by the SpecialMath.php library of functions. The SpecialMath.php library does most of the heavy lifting since it implements the critical logBeta and incompleteBeta functions, along with some important gamma functions. All of these functions are ports of functions and classes that originally appeared as part of the open source JSci package (see Resources).

View Implement Bayesian inference using PHP: Part 2 Discussion

Page: 1 2 3 4 5 6 7 8 9 10 Next Page: Updating through conjugate priors

First published by IBM developerWorks