Graphing the likelihood distribution
You've computed the MLE of p by applying the
function to the likelihood array (a more accurate root-finding method
such as Newton-Raphson should be used when more accuracy is required).
Selecting the max of the likelihood array only gives you a point
estimate of what value of p is most likely given the results. Other
values of p are also possible but less likely to produce the observed
To get a sense of how likely various values of p are, we should examine the relationship between different p values and their corresponding likelihood values (denoted as l(p) in the following graph).
JPGraph is the leading PHP-based package for creating professional graphs that can be displayed on the Web or in other media. The following code is used to create a graph of the likelihood distribution for p. One interesting feature of this code is that it demonstrates usage of the cubic spline function for interpolating values and creating smooth curves.Listing 3. Creating the likelihood distribution graph
This code produces the following graph:
The y-axis represents the likelihood of p (denoted l(p)) and was computed using the binomial formula. The subtitle of the graph tells you that the likelihood achieves a maximum (technically where the derivative is 0) of .4096 when p = 0.20, which is equal to the observed proportion of sample successes.
Why do you need to do all this work to estimate p when we could have used common sense to arrive at the same result? The fact that the MLE procedure agrees with common sense helps to convince you that you can also use this technique when estimating parameters that are not so easy to determine through common sense. In those cases you can proceed by:
- Finding a way to express the likelihood of the results as a function of the parameters.
- Computing the likelihood of the result with respect to the parameter.
- Selecting the parameter value that yields the maximum likelihood value.
The use of maximum likelihood principle is pervasive in statistical reasoning along with other principles such as the Bayesian principle of maximizing the posterior. Other notable principles include the least-squared error criterion, maximum entrophy, minimum description-length, variational inference, and various energy minimization principles.
High-level statistical reasoning is more about welding these principles effectively to estimate parameters, test hypothesis, and such, than it is about algebraic cleverness in deriving new formulas. A bit of algebraic cleverness, however, can come in handy.