What have you learned?
In this article, you demonstrated how the
class could be used to develop a data-exploration tool for small- to
medium-sized datasets. Along the way, I also developed native
probability functions for the
SimpleLinearRegression class to use and extended the class with HTML output methods and graph-generating code based upon the JpGraph library.
From a learning point of view, simple linear regression modeling is worth further study because it is arguably the gateway to understanding more advanced forms of statistical modeling. Before you plunge into learning more advanced techniques, like multiple regression or manova, you could benefit from having a solid understanding of simple linear regression.
Even though simple linear regression only uses one variable to account for, or predict, the variance in another variable, looking for simple linear relations between all your study variables is often the first step in exploratory data analysis. Just because your data is multivariate does not mean you only have to examine it with multivariate tools. Indeed, using basic tools like simple linear regression initially is a good way to begin probing data for patterns.
This series has studied two applications of simple linear regression analysis. In this article, I looked at an example of a strong linear relationship between "Distance from a Fire Station" and "Fire Damage". In the first article, I looked at a weaker but, nevertheless significant, linear relationship between a measure called "Social Concentration" and a measure called the "Exhaustion Index". (As an exercise, it might be interesting to re-examine the messier data from the first study with the data-exploration tool discussed in this article. One thing you will note is that the y intercept is a negative number, meaning that when "Social Concentration" is 0, the predicted Exhaustion Index is -29.50. Does this make sense? When modeling a phenomenon you should ask whether your equation should include the optional y intercept and, if so, what role would the y intercept play in your linear equation.)
Further studies into simple linear regression might include research into such topics as:
- When to omit the intercept term from your equation and alternative computational formulas you can use if you decide to do so
- When and how to use power, logorithmic, and other transformations to linearize the data so that simple linear regression can be used to model the data
- Other visualizations that can be used to assess the adequacy of your modeling assumptions and to gain deeper insight into the patterns in your data
These are some of the more advanced topics awaiting the student of simple linear regression. Resources contains pointers to several advanced texts that you can consult for more information on regression analysis.
Standard PHP installations provide many of the resources necessary to develop non-trivial mathematics-based applications. I hope that this series of articles inspires other developers to implement math routines in PHP for the pleasure, technical, or learning challenges involved.