This project has moved and is read-only. For the latest updates, please go here.

P Value (Significance) Multiple Linear Regression

Jun 6, 2013 at 12:30 PM
Hi Y'all!

Thanks heaps for this awesome library. Everything is working so this isn't a complaint, just me not knowing how to find something. Actually two things now that we mention it.

I'm trying to implement a stepwise function over the top to find the best possible fit, and want to rely on the statistical significance to be sure I have found what I am looking for and the generate the best R Squared value (preferably adjusted R^2 but we cant all win!)

Where do I find it in the fit result / params for a multivariate data set?

To give you an idea I have attached a photo of the bits I am trying to get from an SPSS output... Yes this just random data too!
Jun 17, 2013 at 7:45 PM
Let me make sure I understand what you are doing: (a) you have a multivariate sample (columns of continuous or indicator variables); (b) you want to predict one of the columns via linear regression on some subset of the other variables; (c) you want to determine which subset gives the best fit on a per-parameter basis. Is that all correct? Assuming it is, here is how to proceed.

(a) Put the data into an instance of the MultivariateSample class.

(b) Create a given subset using the Columns method and do the regression using the LinearRegression method. For example, if you want to predict column 0 using columns 1, 3, 4, and 5, do: reducedSample = sample.Columns(0, 1, 3, 4, 5); result = reducedSample.LinearRegression(0). result.Parameter(0) will then give you the intercept and result.Parameter(1...4) will give you the slopes of the four fit variables.

(c) This is the hard part. result.GoodnessOfFit will contain the result of a F-test for fit significance. The easiest answer is to take the fit with the largest F value (largest value of result.GoodnessOfFit.Statistic), since the F statistic already adjusts for degrees of freedom. It sounds, however, like you prefer to use R^2 and adjusted R^2 rather than F. By looking up the definition of both R^2 and F in terms of explained and unexplained sums-of-squares, and doing some algebra, you can get formulas for R^2 and adjusted R^2 from F. For example, to compute R^2, first compute f = (p-1)/(n-p) F, where n is the sample size (sample.Count) and p is the number of parameters (result.Dimension), then R^2 = f / (1 + f). To compute adjusted R^2, you can use the formula 1 - (1-R^2)(n - 1)/(n - p - 1). I typically just use F, and I just quickly ran though this algebra, so I don't guarantee these results, but the approach is right.