Developer Forums | About Us | Site Map


Useful Lists

Web Host
site hosted by netplex

Online Manuals

Implement Bayesian inference using PHP: Part 2
By Paul Meagher - 2004-05-19 Page:  1 2 3 4 5 6 7 8 9 10

Bayes estimators

A Bayes estimator combines information from a prior parameter estimate P(thetai) and a likelihood parameter estimate P(R | thetai) to arrive at a posterior parameter estimate P(thetai | R). In the Bayes parameter estimation formula below, R stands for "results" and theta stands for "parameter":

P(thetai | R) = P(R | thetai) P(thetai) / P(R)

In the specific case of a simple binary survey, the sample results can be expressed as the number of success events k divided by the total number of events n:

R = k/n

The Bayes parameter estimation formula for poll data looks like this:

P(thetai | k/n) = P(k/n | thetai) * P(thetai) / P(k/n)

Recall that the numerator term P(k/n) plays a relatively insignificant normalizing role, so you can ignore it for the purposes of understanding how to compute the posterior distribution:

P(thetai | k/n) ~ P(k/n | thetai) * P(thetai)

In the last few sections, I have shown you how the likelihood term P(k/n | thetai) in the above formula can be computed using maximum likelihood techniques -- in particular, the binomial formula for computing the probability of various values of thetai (where p is replaced by the generic term denoting a parameter theta):

P(k/n | thetai) = nCk * thetaik * (1 - thetai) (n - k)

Now that you know how to compute the likelihood term in Bayes equation, how can you compute the prior term P(thetai)?

The key to computing P(thetai) is to first recognize that thetai represents the probability of a success event (like a 1-coded response) and as such, can only take on values in the 0 to 1 range. Each value of thetai in this range will have a different probability of occurrence associated with it. The parameter thetai can assume an infinite number of values between 0 and 1 which means that you need to represent it with a continuous probability distribution (like the normal distribution) as opposed to a discrete probability distribution (like the binomial distribution).

In the case of a simple binary survey, the beta distribution is the appropriate continuous distribution to use to represent P(thetai) because:

  1. The domain of your probability distribution function is between 0 and 1, and
  2. The outcomes of your survey arise from a Bernoulli process.

A Bernoulli process:

consists of a series of independent, dichotomous trials where the possible events occurring on each trial are labeled "success" and "failure", p is the probability of success on a given trial, and p remains unchanged from trial to trial. -- Winkler and Hayes, Statistics: Probability, Inference and Decision, p 204.

The process that generates the observed response distribution for a particular binary question in the survey can be legitimately viewed as arising from a Bernoulli process as Winkler and Hayes defined. A process that can be modeled as a Bernoulli process gives rise to a Beta distribution for the parameter p (estimated using k/n). I'm ready now to discuss the beta distribution and the critical role it plays in computing the posterior parameter estimate P(thetai | R).

View Implement Bayesian inference using PHP: Part 2 Discussion

Page:  1 2 3 4 5 6 7 8 9 10 Next Page: Beta distribution sampling model

First published by IBM developerWorks

Copyright 2004-2017 All rights reserved.
Article copyright and all rights retained by the author.