This project has moved. For the latest updates, please go here.

Distributions

Aug 27, 2013 at 10:41 PM
Hey,

A simple question, I am trying to do a two sample KS test to test if a sample is drawn from a given distribution but I am not sure how I can convert my first sample into a distribution...??

Eg,
public TestResult KolmogorovSmirnovTest(
Distribution distribution
)

I can .Add numbers to my sample, but I am not sure what to do with the Distribution..??
Coordinator
Aug 28, 2013 at 6:33 AM
To do a two-sample KS test, use the overload that takes a Sample instead of a distribution, e.g. TestResult result = sample1.KolmogorovSmirnovTest(sample2). The overload that takes a distribution is for doing a one-sample KS test of a sample against known distribution.
Aug 28, 2013 at 2:36 PM
Excellent..!
It works perfectly, with one exception.....

It seems to be using the asymptotic distributions instead of an exact p-value.

The code seems to suggest it is easy to implement the exact p-value but I am not sure where to start. Any hints...?

Thanks
Coordinator
Aug 29, 2013 at 7:09 AM
You are correct. I'm sorry if the release notes misled you here: we have implemented the exact null distributions for the one-sample KS and Kuiper tests, but not (yet) for the two-sample versions of those tests.

While they share the same large-N asymptotic distribution, the finite-N distributions in the one-sample and two-sample cases are actually quite different beasts. In fact, the one-sample distribution is continuous and the two-sample distribution is discrete (because in the two-sample case D will always be an integer multiple of 1/nm). So the bad news is that you can't simply re-use or modify the existing code for the finite-N one-sample distribution to get a finite-N two-sample distribution. The good news is that the work required to code up the finite-N two-sample distribution appears to be considerably less than what was required for the one-sample distribution, so you can be reasonably confident that the exact two-sample distribution will get in to the next release.

Until then, it looks like R is the only common statistics framework with an exact two-sample KS null distribution. (SPSS doesn't appear to have it.) You could also take a look at the original reference and consider coding it up yourself: Kim & Jenrich, "Tables of Exact Sampling Distribution of the Two-Sample Kolmogorov-Smirnov Criterion", in Selected Tables of Mathematical Statistics Vol. I (1973) pp. 80-129.