Beta distribution sampling model
A random variable is said to have the standard beta distribution with parameters a and b if its probability density function is:
f() = a - 1 * (1 - ) b - 1 / B(a, b)
Rather than explain the formula by resorting to more mathematics, I will discuss PHP code that you can use to compute f() for various values of a and b. Towards this end I created a class called
BetaDistribution.php and added it to a probability distributions package I developed for a previous article (see Resources). This class supplies methods that accept the a, b, and parameters.
The class constructor is first called with the a and b parameters as follows:Listing 4. Instantiating the BetaDistribution class
In this example, the number of success events (previously k) is denoted by a. The number of failure events is equal to n - k and is denoted by b. The a and b parameters jointly control the shape and location of the beta distribution curves.
Graphically speaking, the beta distribution refers to a large family of plotting curves that can differ substantially from one another depending upon the a and b parameter values. As you shall see, the a and b parameters can be used to represent the prior probability distribution that you feel is most appropriate for representing P(i).
Suppose you test your simple binary survey before going live. Select five people that you think are representative of the target sampling population and ask them to fill it out the survey online; then observe the following results:
- One participant responds "yes" (a "success" event).
- Four participants respond "no" ("failure" events).
The code in Listing 5 invokes the
with the appropriate parameter values of a=1 and b=4 to represent the
results of this survey. Once you instantiate the beta distribution
constructor with the appropriate a and b parameter values, you can then
use other methods in this class (depicted in the following) to compute standard probability distribution functions:
Which produces the output displayed in this table:
If the test has no glitches, you can go into your main experiment with BetaDistribution(1, 4) being used to represent your prior distribution P(). Note that the mean value ( = p = k/n = a / a + b = .20) reported in the table is what you expect it to be from common-sense considerations (such as the expected value of equal to the observed proportion of cases to date k/n).
To visualize your prior probability distribution, you can use the
following code below to obtain the x and y coordinates to plot. The
probability density function
PDF() returns a "probability" value associated with a particular value -- for instance, P[p = .20]. Given a contiguous range of p values, the
method give you a corresponding range of probability values f(p) that
you can use to graph the shape of the probability distribution for
fixed a, b parameters and for a range of possible values:
In the following graph, p =
$parameters[$i], and f(p) =
The exact values of f(p) are of less concern than the overall shape and center of gravity for the prior distribution. What this graph shows is that your prior distribution is still not very well defined because it does not peak around a particular parameter estimate. This is as it should be when you only have a few observations to work with.
Beta distribution source code
BetaDistribution class depends upon methods supplied by the parent
ProbabilityDistribution class and functions supplied by the SpecialMath.php library of functions. The SpecialMath.php library does most of the heavy lifting since it implements the critical logBeta and incompleteBeta functions, along with some important gamma functions. All of these functions are ports of functions and classes that originally appeared as part of the open source JSci package (see Resources).