InverseLeftProbability of BetaDistribution

Jun 2, 2013 at 6:37 PM
Edited Jun 2, 2013 at 6:39 PM
alpha =     2696.7564477925157;
beta =  204.76649145658143;
p = 0.04;
BetaDistribution bd = new BetaDistribution(alpha, beta);
bd.InverseLeftProbability(p); // throws an exception.
I used this temporary fix:
//inside of RefineInverseBeta

 double dx = -y / (yp - y / 2.0 * (a1 / x - b1 / (1 - x)));// NaN for this case.

#region FIX
if (dx != dx)
{
     dx = 1 / (-1 / (Math.Exp(AdvancedMath.LogBeta(alpha, beta, x) - Math.Log(Math.Pow(x, a1)) - Math.Log(Math.Pow(1.0 - x, b1))) - Math.Exp(Math.Log(P) + Math.Log(bigB) - Math.Log(Math.Pow(x, a1)) - Math.Log(Math.Pow(1.0 - x, b1)))) + (1 / 2.0 * (a1 / x - b1 / (1 - x))));
}
#endregion
        public static double LogBeta(double a, double b, double x)
        {
            if (a < 0.0) throw new ArgumentOutOfRangeException("a");
            if (b < 0.0) throw new ArgumentOutOfRangeException("b");
            if ((x < 0.0) || (x > 1.0)) throw new ArgumentOutOfRangeException("x");
            if (x == 0.0) return (0.0);
            double xtp = (a + 1.0) / (a + b + 2.0);
            if (x > xtp)
            {
                return (Math.Log(Beta(a, b) - Beta(b, a, 1.0 - x)));
            }
            // evaluate via continued fraction via Steed's method
            double aa = 1.0;            // a_1
            double bb = 1.0;            // b_1
            double D = 1.0;             // D_1 = 1 / b_1
            double Df = aa / bb;        // Df_1 = f_1 - f_0 = a_1 / b_1
            double f = 0.0 + Df;        // f_1 = f_0 + Df_1 = b_0 + Df_1
            for (int k = 1; k < Global.SeriesMax; k++)
            {
                double f_old = f;
                int m = k / 2;
                if ((k % 2) == 0)
                {
                    aa = x * m * (b - m) / ((a + k - 1) * (a + k));
                }
                else
                {
                    aa = -x * (a + m) * (a + b + m) / ((a + k - 1) * (a + k));
                }
                D = 1.0 / (bb + aa * D);
                Df = (bb * D - 1.0) * Df;
                f += Df;
                if (f == f_old)
                {
                    return (Math.Log(Math.Pow(x, a)) + Math.Log(Math.Pow(1.0 - x, b)) - Math.Log(a) + Math.Log(f));
                }
            }
            throw new NonconvergenceException();
        }
Coordinator
Jun 19, 2013 at 10:51 PM
Thanks for bringing this to our attention and sorry to have taken so long to get back to you. The problem you have uncovered is actually more subtle than I had initially supposed, and we are still trying to figure out just what the limitations of our current implementation of BetaDistribution.InverseLeftProbability are. We have also noticed a possibly related problem that returned values can be off even when the algorithm does converge. The problem seems to be worse for larger values of alpha and beta, and may actually be at least in part fundamental: when alpha or beta is large, P(x) changes so fast with x that the nearest representable value x_0 to the x actually sought gives a noticeably different value P(x_0) than the input P. I'm sorry not to have a clear answer for you, but I wanted to let you know that we appreciate your error report and are working on it.
Oct 30, 2013 at 11:23 PM
hi, is there any update on this?
Coordinator
Nov 1, 2013 at 8:23 AM
Yes! The short answer is that we have fixes in hand for our problems with large beta parameters and you will see them in the next release.

The long answer is that there are a few different things going on here, and the overlap between them made it difficult to understand what was going on at first. One thing is that B(a,b) for these values is very small, about 10^(-301), which is not far from the smallest normally representable double, which is about 10^(-308). Our algorithm for B(a,b) in this region converged to a wrong value because intermediate values are sub-normal numbers, i.e. actually smaller than 10^(-308) and thus only representable by using zeros in the mantissa and thus loosing digits of accuracy. Another thing that happened is that B(a,b,x) was (correctly) calculated to be zero and 0.04 * B(a,b) was (incorrectly) calculated to be zero (because B(a,b) was an incorrect sub-normal number), making both y and dy zero and thus dx = 0.0/0.0 = NaN, as you observed. Even though this was incorrect in this case, it would have happed correctly for a and b slightly larger because B(a,b) really would be indistinguishable from zero. One way of dealing with this is to move to log-space, as you did in your solution. Another way is to move to a calculation of I_x(a,b) which does not require calculating B(a,b) first. This second method is preferable since the subtraction in log-space can loose digits due to cancelation, and is our method going forward.

Thank you very much, by the way, for bringing this up. We have not only improved our beta algorithms, but become much more careful in our handling of extreme values in many of our algorithms.
Nov 4, 2013 at 12:32 PM
Edited Nov 4, 2013 at 2:48 PM
Thanks for replying.
I'm fairly new to this and Meta.Numerics is a lifesaver.
Is there any way I can get early access to the new code? My app is already in production and running into out of range exception?
I'm calculating inverse beta for millions of numbers and could provide a hand in testing.

Here are the two cases I ran into.

Errors for x being under 0.
var qt = 0.9999902;
var a = 0.0000434313636267175;
var b = 18474.36071078790000;
var bd = new BetaDistribution(a, b);
var binv = bd.InverseLeftProbability(qt);


Errors out for x beign over 1.
var qt = 0.9998063099306;
var a = 0.00034509911609819255;
var b = 6.8453983996634218;
var bd = new BetaDistribution(a, b);
var binv = bd.InverseLeftProbability(qt);

Specified argument was out of the range of valid values.
Parameter name: x

at Meta.Numerics.Functions.AdvancedMath.Beta(Double a, Double b, Double x)
at Meta.Numerics.Statistics.Distributions.BetaDistribution.RefineInverseBeta(Double x, Double P)
at Meta.Numerics.Statistics.Distributions.BetaDistribution.InverseLeftProbability(Double P)


Thanks.
Coordinator
Nov 6, 2013 at 10:34 PM
Thanks for these examples! They actually illustrate yet another issue we need to address: that the simple root finder we use in our inversion routine can over-run the allowed range of the result. I remember worrying about this possibility when I wrote the routine, but when it never happened in my own tests, I concluded it wasn't a problem. Obviously, that was the wrong conclusion. Updated code should hit the repository within a week or so. I will give you a heads-up when it does. I assume you are able to sync and compile?
Coordinator
Nov 25, 2013 at 7:46 PM
A fix for these issues is part of the latest check-in. There is quite a bit of other stuff going on in that check-in, so I hope it isn't too difficult to disentangle the bits you need (or that you can use the whole new build -- but because this is a major version bump there are breaking changes so that might not be possible). Let me know if you run in to any problems. Thanks again for the repro!