Implement Bayesian Inference Using PHP, Part 1 Joint probability

Implement Bayesian inference using PHP, Part 1

By Paul Meagher - 2004-04-21 Page: 1 2 3 4 5 6 7 8 9 10 11

Joint probability

The most basic method for computing a conditional probability using a probability format is:

P(A | B) = P(A & B) / P(B)

This probability format is identical to the frequency format, except for the probability operator P( ) surrounding the numerator and denominator terms. The P(A & B) term denotes the joint probability of A and B occurring together. To understand how the joint probability P(A & B) can be computed from cross-tabulated data, consider the following hypothetical data (taken from pp. 147-48 of Grimstead and Snell's online texbook):

	-Smokes	+Smokes	Totals
-Cancer	40	10	50
+Cancer	7	3	10
Totals	47	13	60

To convert this table of frequencies to a table of probabilities, you divide each cell frequency by the total frequency (60). Note that dividing by the total frequency also ensures that Cancer x Smokes cell probabilies sum to 1 and permits you to refer to the silver area of the table below as the joint probability distribution of Cancer and Smoking.

	-Smokes	+Smokes	Totals
-Cancer	40 / 60	10 / 60	50 / 60
+Cancer	7 / 60	3 / 60	10 / 60
Totals	47 / 60	13 / 60	60 / 60

To compute the probability of cancer given that a person smokes P(+Cancer | +Smokes), you can simply substitute the values from this table into the above formula as follows:

P(+Cancer | +Smokes) = ( 3 / 60 ) / ( 13 / 60) = 0.05 / .217 = 0.23

Note that you could have derived this value from the table of frequencies as well:

P(+Cancer | +Smokes) = 3 / 13 = 0.23

How do you interpret this result? Using the recommended approach of communicating risk in terms of frequencies, you might say that of the next 100 smokers you enounter, you can expect 23 of them to experience cancer in their lifetime. What is the probability of getting cancer if you do not smoke?

P(+Cancer | -Smokes) = ( 7 / 60 ) / ( 47 / 60) = 0.117 / .783 = 0.15

So it appears that you are more likely to get cancer if you smoke than if you do not smoke, even though the tallies appearing in the table might not have initially given you that impression. It is interesting to speculate on what the true conditional probabilities might be for various types of cancer given various criteria for defining someone as a smoker.

A "cohort" research methodology would also require you to equate smokers and non-smokers on other variables like age, gender, and weight so that smoking, and not these other co-variates, can be isolated as the root cause of the different cancer rates.

To summarize, you can compute a conditional probability (+Cancer | +Smokes) from joint distribution data by dividing the relevant joint probability P(+Cancer & +Smokes) by the relevant marginal probability P(+Smokes). As you might imagine, it is often easier and more feasible to derive estimates of a conditional probability from summary tables like this, rather than expecting to apply more data-intensive enumeration methods.

View Implement Bayesian inference using PHP, Part 1 Discussion

Page: 1 2 3 4 5 6 7 8 9 10 11 Next Page: Deriving Bayes Theorem

First published by IBM developerWorks