This project has moved. For the latest updates, please go here.

Interpreting Goodness of Fit Results

Mar 25, 2014 at 10:08 PM
Hello,

I'm extremely rusty at statistics and I was hoping someone could help guide me. I want to see if some data of mine follows a normal distribution. I'm using Meta.Numerics in the following way. I create a normal distribution with a mean=0 and variance=1. I build a Sample object that is full of values I created from http://www.random.org/gaussian-distributions/. Then I call FitToSample.
        NormalDistribution dist = new NormalDistribution(mmean, variance);
        FitResult fr = NormalDistribution.FitToSample(s);  
In the FitToSample->GoodnessOfFit TestResult object, I get the following.

LeftProbability = 0.093259929142123124
RightProbability = 0.90674007085787689
Statistic = 0.039259588669177781

I saw in the API documentation, the Kolmogorov-Smirnov test is used.

As to my questions,

1) What is the Statistic? The documentation says it's the value of the test statistic. Can someone please say that in another way?
2) What are Left and Right Probabilities in layman's terms? I know it has something to do with the bell curve of getting values smaller/larger than something. I'm foggy on what though.

Any help is appreciated.

mj
Coordinator
Mar 26, 2014 at 9:01 AM
Edited Mar 26, 2014 at 9:05 AM
The way a statistical test works is that you start from some assumption (which gets the fancy name "null hypothesis"; in this case it's that your data are drawn from a normal distribution). Then you compute some value from your data (in this case it's a Kolmogorov-Smirnov or KS statistic) and you determine just how unlikely you were to get such a value, given your assumption. If your computed value turns out to be extremely unlikely under your assumption, that is evidence that your assumption was wrong. If your computed value turns out to be not too unlikely, your assumption has withstood the statistical test. Okay, that's statistical testing 101.

Question 1: Since TestResult is returned by all statistical test methods, its documentation isn't going to tell you anything specific about the KS test. If you want to read more about the KS test, it has a Wikipedia article (http://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test). Basically, the KS statistic measures how far the distribution of your data is from the assumed population distribution, in this case the fit normal distribution.

Question 2: The LeftProbability is the chance (under the null hypothesis) of getting a value smaller that the test statistic; the RIghtProbability is the chance of getting a larger value. If you got a very small RightProbability, it would mean that the deviation of your measured distribution from the fit normal distribution was so large as to make it very unlikely (specifically, with a probability equal to RightProbability) that you would see such a large deviation if you data really did come from the fit normal distribution.

In your case, RightProbability is not particularly small. In fact, your data fall slightly unusually close to the fit distribution -- only 10% of the time are they so close, i.e. is the KS statistic so small as it was in your case. So there is nothing here that would make anyone suspect that your data are non-normal.
Marked as answer by jaskiewiczm on 3/26/2014 at 7:17 AM
Nov 3, 2014 at 6:34 PM
I am having implementation/understanding difficulty on this topic: verifying a sample is a specific distribution.

I expect a sample with a uniform distribution to return a low statistic when compared to a student distribution.

When j = 10.0; tr.Statistic = 1.
When j = i; ts.Statistic = 0.968.
When j = Random(); ts.Statistic = 1. (random integer)
When j = Random(); ts.Statistic = 0.509. (random double) <-- this is the only one that makes sense to me.
        static void Main(string[] args)
        {
            Sample s = new Sample();
            double j = 10.0;

            for (int i = 0; i <= 100; i++)
            {
                s.Add(j);
            }

            int nu = (s.Count < 1) ? 1 : s.Count - 1;
            TestResult tr = s.KolmogorovSmirnovTest(new StudentDistribution(nu));
            Console.WriteLine(tr.Statistic);

            Console.WriteLine("Press any key to exit...");
            Console.ReadKey();
        }
Nov 3, 2014 at 7:35 PM
Edited Nov 3, 2014 at 8:17 PM
ok.. i don't know what i was doing... but realized the sample mean != 0 but StudentDist.. mean == 0; By using the general NormalDistribution(mean, sigma); results were expected:
        static void Main(string[] args)
        {
            Sample SampleDataSet = new Sample();
            Random r = new Random();

            for (int i = 0; i < 100; i++)
            {
                SampleDataSet.Add((double)r.NextDouble());
            }

 __           double mu = SampleDataSet.Mean;
            double s = SampleDataSet.StandardDeviation;__

            s = (s != 0.0) ? s : 0.0001; // in case StdDev == 0

            TestResult tr = SampleDataSet.KolmogorovSmirnovTest(new __NormalDistribution(mu, s)__);

            Console.WriteLine(tr.Statistic);
            Console.WriteLine("Press any key to exit...");
            Console.ReadKey();
        }
EDIT: well.. maybe not... some more testing with 100 Excel Generated Normal Observations did not result in a Normal Dist. statistic > 0.05 and a Student Dist. statistic > 0.35.