CodePlexProject Hosting for Open Source Software

Statistical tests are used to determine whether some observed pattern is likely to be "real" or "just chance". For example: given 100 ill patients, you give 50 a new treatment and 50 a placebo. A month later, 40 of
the patients who got the treatment have recovered but only 32 who got placebo are well. Are you convinced the treatment helps?

Well, if only 2 of the untreated group had recovered you would be pretty sure the treatment helped: the difference between a 4% recovery rate and an 80% recovery rate is rather stark. On the other hand, if 38 in the untreated group had recovered you probably wouldn't be convinced: its pretty easy to imagine a couple extra people more or less recovering just by chance. What about 8 more people? How unlikely is that?

Statisticians answer questions like these with hypothesis testing. They begin by assuming that there is no pattern -- in this case, that the recovery rate is not affected by the treatment. This is called the "null hypothesis." They then define a metric that measures the strength of the observed pattern. This is called the "test statistic." In the absense of a real pattern, the test statistic value will probably usually be small, but it won't always be exactly zero: random fluctuations mean that patterns sometimes appear by chance even when no real, consistent pattern underlies them. Taking into account such random fluctuations, statisticans then compute the distribution of the test statistic under the null hypothesis. If you know this distribution of the test statistic under the null hypotheis, you can tell just how unlikely it is that a pattern as strong as that observed would arise just by chance: it's just the area under the PDF to the right of the observed value of the test statistic.

Statisticians have devised many different statistical tests for different kinds of patterns in different kinds of data. As befits an object-oriented numerical library, Meta.Numerics surfaces the method that carries out a each particular test on the class that stores the kind of data to which it applies. We hope that this helps you to keep track of which tests might be appropriate for your data. Here is a table of all the statistical tests offered, describing each test and specifying the class and method:

You can read more about these tests in the API reference documentation and the tutorial documentation on each data container.

All test methods return a TestResult object. The TestResult object contains the test statistic value in the Statistic property, and the distribution of the test statistic under the null hypothesis in the Distribution property. You can obtain the integrated probabiltity of the test statistic being below and above its experimental value using the LeftProbability and RightProbability properties.

Here is how to use the Meta.Numerics library to solve the example problem we skteched above. This is a binary contingency experiment, so we use the BinaryContingencyTable class to store the data.

We can then use a Pearson chi squared test to determine the significance of the difference between the treatment and control groups. (We could also use a Fisher exact test.)

In our case it turns out, under the null hypothesis, that we get such a high chi squared only 7.5% of the time: colloquially, we say that we are 92.5% confident that the treatment's effect is real. Is that confident enough? Many medical journals set a threshold of 95% confidence before they will publish a result. You might think that threshold is too high -- wouldn't you want access to a treatment that you were more than 90% certain would help? On the other hand, even with that threshold, 5% of the published results claiming a real effect will be wrong -- isn't that too high a failure rate for medical science? Statisticians can't tell you what your confidence threshold should be, but with hypothesis testing, they can help you quantify your knowledge, and your ignorance.

Well, if only 2 of the untreated group had recovered you would be pretty sure the treatment helped: the difference between a 4% recovery rate and an 80% recovery rate is rather stark. On the other hand, if 38 in the untreated group had recovered you probably wouldn't be convinced: its pretty easy to imagine a couple extra people more or less recovering just by chance. What about 8 more people? How unlikely is that?

Statisticians answer questions like these with hypothesis testing. They begin by assuming that there is no pattern -- in this case, that the recovery rate is not affected by the treatment. This is called the "null hypothesis." They then define a metric that measures the strength of the observed pattern. This is called the "test statistic." In the absense of a real pattern, the test statistic value will probably usually be small, but it won't always be exactly zero: random fluctuations mean that patterns sometimes appear by chance even when no real, consistent pattern underlies them. Taking into account such random fluctuations, statisticans then compute the distribution of the test statistic under the null hypothesis. If you know this distribution of the test statistic under the null hypotheis, you can tell just how unlikely it is that a pattern as strong as that observed would arise just by chance: it's just the area under the PDF to the right of the observed value of the test statistic.

Statisticians have devised many different statistical tests for different kinds of patterns in different kinds of data. As befits an object-oriented numerical library, Meta.Numerics surfaces the method that carries out a each particular test on the class that stores the kind of data to which it applies. We hope that this helps you to keep track of which tests might be appropriate for your data. Here is a table of all the statistical tests offered, describing each test and specifying the class and method:

Test Statistic |
Tests for... |
Class.Method |

Pearson chi squared | Categorical differences | ContingencyTable.PearsonChiSquaredTest |

Fisher exact test | Binary differences | BinaryContingencyTable.FisherExactTest |

Student t | Differences in mean | Sample.StudentTTest |

Mann-Whitney U | Differences in median | Sample.MannWhitneyTest |

Fisher F | Differences in variance | Sample.FisherFTest |

Kolmogorov-Smirnov D | Deviation from distribution | Sample.KolmogorovSmirnovTest |

Kuiper V | Deviation from distribution | Sample.KuiperTest |

Pearson r | Linear correlation | MultivariateSample.PearsonTest |

Spearman rho | Correlation | MultivariateSample.SpearmanTest |

Kendall tau | Correlation | MultivariateSample.KendallTest |

Chi squared | Goodness of fit | FitResult.GoodnessOfFit (via DataSet fits) |

You can read more about these tests in the API reference documentation and the tutorial documentation on each data container.

All test methods return a TestResult object. The TestResult object contains the test statistic value in the Statistic property, and the distribution of the test statistic under the null hypothesis in the Distribution property. You can obtain the integrated probabiltity of the test statistic being below and above its experimental value using the LeftProbability and RightProbability properties.

Here is how to use the Meta.Numerics library to solve the example problem we skteched above. This is a binary contingency experiment, so we use the BinaryContingencyTable class to store the data.

```
BinaryContingencyTable table = new BinaryContingencyTable();
table[0,0] = 40;
table[0,1] = 10;
table[1,0] = 32;
table[1,1] = 18;
```

We can then use a Pearson chi squared test to determine the significance of the difference between the treatment and control groups. (We could also use a Fisher exact test.)

TestResult test = table.PearsonChiSquaredTest(); Console.WriteLine("chi^2 = {0}", test.Statistic); Console.WriteLine("P(lower) = {0}, P(higher) = {1}", test.LeftProbability, test.RightProbability);

In our case it turns out, under the null hypothesis, that we get such a high chi squared only 7.5% of the time: colloquially, we say that we are 92.5% confident that the treatment's effect is real. Is that confident enough? Many medical journals set a threshold of 95% confidence before they will publish a result. You might think that threshold is too high -- wouldn't you want access to a treatment that you were more than 90% certain would help? On the other hand, even with that threshold, 5% of the published results claiming a real effect will be wrong -- isn't that too high a failure rate for medical science? Statisticians can't tell you what your confidence threshold should be, but with hypothesis testing, they can help you quantify your knowledge, and your ignorance.

Last edited Apr 29, 2010 at 6:52 AM by ichbin, version 1