
Hi Y'all!
Thanks heaps for this awesome library. Everything is working so this isn't a complaint, just me not knowing how to find something. Actually two things now that we mention it.
I'm trying to implement a stepwise function over the top to find the best possible fit, and want to rely on the statistical significance to be sure I have found what I am looking for and the generate the best R Squared value (preferably adjusted R^2 but we
cant all win!)
Where do I find it in the fit result / params for a multivariate data set?
To give you an idea I have attached a photo of the bits I am trying to get from an SPSS output... Yes this just random data too!


Coordinator
Jun 17, 2013 at 6:45 PM

Let me make sure I understand what you are doing: (a) you have a multivariate sample (columns of continuous or indicator variables); (b) you want to predict one of the columns via linear regression on some subset of the other variables; (c) you want to
determine which subset gives the best fit on a perparameter basis. Is that all correct? Assuming it is, here is how to proceed.
(a) Put the data into an instance of the MultivariateSample class.
(b) Create a given subset using the Columns method and do the regression using the LinearRegression method. For example, if you want to predict column 0 using columns 1, 3, 4, and 5, do: reducedSample = sample.Columns(0, 1, 3, 4, 5); result = reducedSample.LinearRegression(0).
result.Parameter(0) will then give you the intercept and result.Parameter(1...4) will give you the slopes of the four fit variables.
(c) This is the hard part. result.GoodnessOfFit will contain the result of a Ftest for fit significance. The easiest answer is to take the fit with the largest F value (largest value of result.GoodnessOfFit.Statistic), since the F statistic already adjusts
for degrees of freedom. It sounds, however, like you prefer to use R^2 and adjusted R^2 rather than F. By looking up the definition of both R^2 and F in terms of explained and unexplained sumsofsquares, and doing some algebra, you can get formulas for R^2
and adjusted R^2 from F. For example, to compute R^2, first compute f = (p1)/(np) F, where n is the sample size (sample.Count) and p is the number of parameters (result.Dimension), then R^2 = f / (1 + f). To compute adjusted R^2, you can use the formula
1  (1R^2)(n  1)/(n  p  1). I typically just use F, and I just quickly ran though this algebra, so I don't guarantee these results, but the approach is right.

