Implement Bayesian Inference Using PHP: Part 2 Updating through conjugate priors

Implement Bayesian inference using PHP: Part 2

By Paul Meagher - 2004-05-19 Page: 1 2 3 4 5 6 7 8 9 10

Updating through conjugate priors

Suppose that you go live with your simple binary survey and collect the following responses.

Four participants respond with a 1-coded answer (success events).
Sixteen participants respond with a 0-coded answer (failure events).

People in the live survey responded with the same success proportions (k/n = 4/20 = 1/5 = .20) as in the pilot survey (k/n = 1/5 = .20).

Look at the graph of the results expressed as a beta distribution with a=4 and b=16.

Figure 5. More data means that estimates rest more firmly within a range of values

As you can see, the graph is starting to sharply peak around the parameter estimate .20 and the standard deviation of the parameter estimate is decreasing. The beta distribution is representing the fact that you have more data to learn from and that your estimates can be more firmly placed within a range of values. Confidence intervals for your parameter estimate, also known in Bayesian statistics as credible intervals, can also be computed (but I'll leave that as an exercise).

I will conclude this discussion by demonstrating how easy it is to update the Bayes parameter estimate with new information by using the concept of a conjugate prior. In particular, I will look at how to combine the parameter estimate obtained from the test survey with the parameter estimate obtained from the live survey. Don't throw out the test survey data if it is representative of the population you want to draw inferences about and the test conditions remain the same.

Essentially, a conjugate prior allows you to represent the Bayes parameter estimation formula in incredibly simple terms using the beta parameters a and b:

a_posterior = a_live + a_test
b_posterior = b_live + b_test

a_posterior = 4 + 1
b_posterior = 16 + 4

Using the conjugate priors updating rule to combine test and live survey parameter estimates, you pass a=5 and b=20 into our BetaDistribution class and plot the resulting probability distribution.

Figure 6. The posterior estimate of theta

This probability distribution represents the posterior estimate of . In accordance with Bayes theorem, you computed the posterior distribution for your parameter estimate P( | R) by combining parameter estimate information derived from your likelihood term with the parameter estimate information derived from your prior term.

You can summarize the test survey results through the parameters a_test=1 and b_test=4 in a beta prior distribution (P() = Beta[1, 4]). You can summarize the live survey results through the parameters a_live=4 and b_live=16 in a beta likelihood distribution (that is, P(D | ) = Beta[4, 16] ).

Adding these conjugate beta distributions (Beta[1, 4] + Beta[4, 16]) together amounts to adding together the a and b parameters from both beta distributions. Similarly, simple conjugate prior updating rules are available for Gaussian (Normal-Wishart family of distributions) and multinomial data (Dirichlet family of distributions) as well.

The concept of conjugate priors is attractive from the point of view of implementing Bayes networks and imagining how you might propagate information from parent nodes to child nodes. If several parent nodes use the beta a and b parameters to represent information about some aspect of the world, then you may be able to propogate this information to downstream child nodes by simply summing parent node beta weights.

Another attractive feature of the conjugate prior updating rule is that it is recursive and, in the limit, can be used to tell you how to update your posterior probabilities on the basis of a single new observation (another exercise for you to think about).

The use of conjugate priors is not, however, without its critics who argue that the mindless use of conjugate priors abrogates a Bayesian's responsibility to use all information at his disposal to represent prior knowledge about a parameter. Just because the likelihood distribution can be represented using a beta sampling model does not mean that you also need to represent your prior knowledge with a beta distribution. Personally, I would discount these criticisms in the case of a simple binary survey because the beta sampling model appears to be an appropriate representation to use to depict the prior estimate of what value of p you might observe.

I conclude this section by comparing maximum likelihood estimators of p with Bayesian estimators of p. Both estimation techniques produce unbiased estimates of p and converge on p in the long run (they share similar asymptotic behavior). MLE estimators are generally simpler to compute and are often preferred by statisticians when doing parameter estimation.

Bayesian estimators and MLE estimators differ in their small sample behavior as estimators. You should study convergence rates and bias measures to get a practical sense of how they might differ. Bayesian methods allow more flexibility in terms of how you might incorporate external information (through the prior probability distribution) into the parameter estimation process.

View Implement Bayesian inference using PHP: Part 2 Discussion

Page: 1 2 3 4 5 6 7 8 9 10 Next Page: Conclusions and Resources

First published by IBM developerWorks