PHP together with probability distributions build effective probability models
To help developers learn to fit the benefits of probability modeling into Web application development, Paul Meagher introduces you to basic concepts, techniques, and PHP-based tools that define the area of probability modeling and probability distributions. He demonstrates how to develop univariate probability models in PHP; discusses how to fit empirical data distributions to a theoretical probability distribution; and showcases an important tool for all this -- the Probability Distributions Library (PDL).
One open source project that has received considerable attention in the last year is the SpamBayes project, a project that continues to provide one of the best examples of how probability theory can inform the design of applications to solve practical problems. The SpamBayes filtering engine uses machine learning and Bayesian inference techniques to compute the probability that a given piece of e-mail is spam.
This project is also interesting because the main exposure to software applications of probability theory are generally math-enabled applications such as statistics programs, and the project teaches you and me that many fruitful hybrid technologies can result from the cross-fertilization of traditional application domains with ideas and techniques from probability theory. To utilize such cross-fertilization, it is not necessary to learn advanced aspects of probability theory; some of the most elementary aspects of probability theory could be used today to inform the design of your next application.
In this article, I introduce you to some of the most basic concepts, techniques, and tools that define the area of probability modeling, focusing in particular on the role played by probability distributions in constructing univariate probability models. So you are able to use these concepts in practice, I will show you how to develop univariate probability models that are completely implemented in the popular and easy-to-use scripting language PHP. But the concepts are universal enough so that those who prefer other scripting languages will be able to understand and learn from the implementations as well.
In the first part of the article, I discuss three related concepts needed to construct univariate probability models (probability models based on a single random variable):
- What is a random variable?
- What is a frequency distribution?
- What is a probability distribution?
I then discuss the critical issue of how to fit an empirical data distribution to a theoretical probability distribution, demonstrating how you can use the ChiSquare goodness-of-fit test for this purpose.
After discussing concepts and techniques for constructing univariate probability models, I talk about an important software tool you will need to construct your probability models, a Probability Distributions Library (PDL). I demonstrate how to build a PDL in PHP and show how it can be used to model goal scoring in World Cup Soccer.
Finally, I discuss theory and future directions, as well as flag some random variables that Web developers should consider adopting.