CodePlexProject Hosting for Open Source Software

Each observation in a multi-variate sample consists of more than one measured quantity. For example, you might record the income, sex, and educational attainment for a sample of working adults. Data such as these can be analyzed using
the MultivariateSample class.

It is easy to add observations to a multivariate sample:

### Subsamples

If you want to obtain information about or perform an operation on an individual column, you can use the Column method to obtain a Sample representing that data. The following code gets the median income:

If you want to obtain data about or perform an operation on a pair of columns, you can use the TwoColumns method to obtain a BivariateSample representing that data. The following code does a linear regression of income vs. years of education.

If you want to obtain a multivariate sub-set of more than two columns, use the Columns method.

### Multivariate Regression

If your data set has more than two variables, you might want to know about the relationship between two variables
*when holding the others constant*. This is called controlling for the other variables. For example, you might want to know not only not only the correlation between income and sex, but the correlation between income and sex when controlling for educational
attainment. You can do this by performing a multi-linear regression to predict income from sex and educational attainment. Since income is variable number 0, you write:

Each parameter with the index of an input variable is a slope for that input variable. The remaining parameter, with the same index as the output variable, is the intercept.

Note that the slope associated with each variable in a multi-variate linear regression will not the be same as the slope associated with that same varaible in the bivariate linear regression against the same output variable. That is because the multi-variate regresssion attempts to hold the other variables constant, while the bivariate regression simply ignores them.

To do a multivariate regression including only a subset of the columns, just use the Columns method to first obtain a MultivariateSample consisting only the relevant columns. For example:

The first linear regression predicts column 3 based on columns 0, 1, and 2. The second linear regression predicts column 4 based on just columns 0 and 1.

It is easy to add observations to a multivariate sample:

MultivariateSample people = new MultivariateSample(3); // (there are 3 variables per observation) people.Add(85000.0, 1, 18); // (arbitrarily encode sex as male=0, female=1; educational attainment as years of schooling) people.Add(new List<double>(new double[] {56000.0, 0, 16})); // (any length-3 IList<double> will do)

people.Column(0).Median

If you want to obtain data about or perform an operation on a pair of columns, you can use the TwoColumns method to obtain a BivariateSample representing that data. The following code does a linear regression of income vs. years of education.

```
FitResult result = people.TwoColumns(2,0).LinearRegression();
Console.WriteLine("income gain per year of education = {0}", result.Parameter(1).ConfidenceInterval(0.95));
```

If you want to obtain a multivariate sub-set of more than two columns, use the Columns method.

FitResult regression = people.LinearRegression(0); Console.WriteLine("income gain for females = {0}", regression.Parameter(1).ConfidenceInterval(0.95)); Console.WriteLine("income gain per year of education = {1}", reggression.Parameter(2).ConfidenceInterval(0.95));

Each parameter with the index of an input variable is a slope for that input variable. The remaining parameter, with the same index as the output variable, is the intercept.

Note that the slope associated with each variable in a multi-variate linear regression will not the be same as the slope associated with that same varaible in the bivariate linear regression against the same output variable. That is because the multi-variate regresssion attempts to hold the other variables constant, while the bivariate regression simply ignores them.

To do a multivariate regression including only a subset of the columns, just use the Columns method to first obtain a MultivariateSample consisting only the relevant columns. For example:

MultivariateSample d = new MultivariateSample(5); d.Add(1.0, 2.0, 3.0, 4.0, 5.0); d.Add(2.0, 3.0, 5.0, 4.0, 6.0); d.Add(4.0, 3.0, 5.0, 6.0, 7.0); d.Add(4.0, 5.0, 7.0, 6.0, 8.0); d.Add(5.0, 7.0, 6.0, 8.0, 9.0); MultivariateSample d3 = d.Columns(3, 0, 1, 2); FitResult r3 = d3.LinearRegression(0); Console.WriteLine($"c3 = ({r3.Parameter(0)}) + ({r3.Parameter(1)}) c0 + ({r3.Parameter(2)} c1 + ({r3.Parameter(3)}) c2"); MultivariateSample d4 = d.Columns(4, 0, 1); FitResult r4 = d4.LinearRegression(0); Console.WriteLine($"c4 = ({r4.Parameter(0)}) + ({r4.Parameter(1)}) c0 + ({r4.Parameter(2)} c1");

The first linear regression predicts column 3 based on columns 0, 1, and 2. The second linear regression predicts column 4 based on just columns 0 and 1.

Last edited Jul 21 at 12:02 AM by ichbin, version 4

hello sir,

I am not sure if I can post this here.

about linear regression, if I need to run multiple Linear regression, how can I use this lib ?

one dependent variable,Y. Few independent variables Xs.

I am not sure if I can post this here.

about linear regression, if I need to run multiple Linear regression, how can I use this lib ?

one dependent variable,Y. Few independent variables Xs.