User:Dicklyon/Gamma median

The median of a gamma distribution does not have a closed-form formula, but has some known simple special cases and asymptotic behavior.

On Talk:Gamma distribution we have been talking about approximations for the median of a gamma distribution (well, more accurately, I've been mostly talking to myself there). Here I will discuss some approximations I have come up with for the low- $k$ regime. Per WP:NOR, this is not ready for article space.

The gamma distribution pdf is ${\frac {1}{\Gamma (k)\theta ^{k}}}x^{k-1}e^{-{\frac {x}{\theta }}}$ , but we'll use $θ = 1$ because both the mean and median scale with the scale parameter. So we'll work with $p(x)={\frac {1}{\Gamma (k)}}x^{k-1}e^{-x}$ .

The median of this pdf is the heavy black curve in the figures (computed by iteratively solving for 0.5 == gammainc(median, k) in Matlab using Newton–Raphson iteration).

Approximations for high $k$ edit

The Laurent series described in the gamma distribution article, originally by Choi and then extended and corrected by Berg and Pedersen, is increasingly accurate with more terms for high enough $k$ , but is not very good around $k = 1$ , and diverges below that.

Approximations for low $k$ edit

A graph illustrating how some approximations to the median of a gamma distribution, with shape parameter

k < 1

, might be calculated. The correct median value for this shape is about 0.22747.

To find the median, we just need to solve for $ν$ in $\int _{0}^{\nu }p(x)dx=0.5$ – but that's hard to solve. That integral is called the lower incomplete gamma function, and it can be computed as a sum of easy-to-compute terms by expressing the exponential as a power series.

0.5=\int _{0}^{\nu }p(x)dx=\int _{0}^{\nu }{\frac {x^{k-1}}{\Gamma (k)}}e^{-x}dx

0.5=\int _{0}^{\nu }{\frac {x^{k-1}}{\Gamma (k)}}\sum _{n=0}^{\infty }{\frac {(-x)^{n}}{n!}}dx

0.5={\frac {1}{\Gamma (k)}}\sum _{n=0}^{\infty }{\frac {-1^{n}}{n!}}\int _{0}^{\nu }x^{n+k-1}dx

But that expression can't be solved for $ν$ , except numerically.

What we can do, however, is take just the first ( $n = 0$ ) term, the one that treats $e x \approx 1$ , which is actually pretty accurate when $k$ is low enough. The modified pdf is the upper (dashed) curve in the figure, ${\frac {x^{k-1}}{\Gamma (k)}}$ . This is not a probability distribution, as it doesn't integrate to 1 (in fact its complete integral is infinite for all $k$ ). Nevertheless, it is an OK approximation to the pdf in the region below the median, when the median is much less than 1, as the figure suggests.

So, let's integrate, keeping just one term, to find a first approximation to the median, $ν 0$ :

0.5={\frac {1}{\Gamma (k)}}\int _{0}^{\nu _{0}}x^{k-1}dx

0.5\Gamma (k)={\frac {x^{k}}{k}}\vert _{x=\nu _{0}}

\nu _{0}=(0.5k\Gamma (k))^{1/k}=(0.5\Gamma (k+1))^{1/k}

Some approximations to the median of the gamma distribution (red) and their errors (blue). Curve c is what we here call

\nu _{0}

, d is

\nu _{1}

, and e is

\nu _{2}

. The curves

a

and

b

are simple expansions about 0 of

c

, per Wolfram; and

a

was derived by a roundabout but rigorous process by Berg and Peterson (2006).^[1]

When $ν$ is very small, $\nu _{0}$ is an excellent approximation. But as the figure shows, the vertically-hashed area between curves is excess area counted in the integral. If we can estimate that area, and increase the estimate of $\nu$ to include that much more area to the right (the horizontally-hashed area), then we can get a better estimate. So let's take the next (negative) term in the series to estimate the extra area.

Call the extra area that we counted $\Delta A$ :

\Delta A=-{\frac {1}{\Gamma (k)}}\sum _{n=1}^{\infty }{\frac {-1^{n}}{n!}}\int _{0}^{\nu _{0}}x^{n+k-1}dx

And consider just the first ( $n = 1$ ) term:

\Delta A_{1}={\frac {1}{\Gamma (k)}}\int _{0}^{\nu _{0}}x^{k}dx

\Delta A_{1}={\frac {\nu _{0}^{k+1}}{\Gamma (k)(k+1)}}

And take the area to the right to be a rectangle of width $\Delta x_{1}$ and height $p(\nu _{0})$ .

\Delta A_{1}=\Delta x_{1}\left({\frac {\nu _{0}^{k-1}}{\Gamma (k)}}e^{-\nu _{0}}\right)

\Delta x_{1}={\frac {\frac {x^{k+1}}{\Gamma (k)(k+1)}}{{\frac {\nu _{0}^{k-1}}{\Gamma (k)}}e^{-\nu _{0}}}}={\frac {e^{\nu _{0}}\nu _{0}^{2}}{(k+1)}}

So our next approximation is:

\nu _{1}=\nu _{0}+\Delta x_{1}=\nu _{0}+{\frac {\nu _{0}^{2}e^{\nu _{0}}}{(k+1)}}=\nu _{0}\left(1+{\frac {e^{\nu _{0}}\nu _{0}}{(k+1)}}\right)

Or we could write it out without the intermediate $ν 0$ as

\nu _{1}=(0.5\Gamma (k+1))^{1/k}\left(1+{\frac {e^{(0.5\Gamma (k+1))^{1/k}}(0.5\Gamma (k+1))^{1/k}}{(k+1)}}\right)

The error in estimating Δ $A$ and the error in getting Δ $x$ from Δ $A$ are in opposite directions, so they partially cancel. To get a next better estimate, we can improve the estimate of Δ $A$ , or the conversion to Δ $x$ , or both. But to get an improvement, we again want opposite signs of the errors. With two more terms for Δ $A$ , and a trapezoidal estimate for the area to the right, we get a good result:

\Delta A_{2}={\frac {1}{\Gamma (k)}}\left({\frac {\nu _{0}^{k+1}}{(k+1)}}-{\frac {\nu _{0}^{k+2}}{2(k+2)}}+{\frac {\nu _{0}^{k+3}}{6(k+3)}}\right)

And for the righthand area that should equal this, we can take the height about in the middle instead of at the left edge; we can get that by evaluating the pdf half way between $ν 0$ and $ν 1$ , or we can extrapolate from the height at $ν 0$ using the derivative there, by half of the previously computed Δ $x$ , which is what we decided to do.

\Delta A_{2}=\Delta x_{2}\left(p(\nu _{0})+p'(\nu _{0}){\frac {\Delta x_{1}}{2}}\right)=\Delta x_{2}p(\nu _{0})\left(1+{\frac {p'(\nu _{0})}{p(\nu _{0})}}{\frac {\Delta x_{1}}{2}}\right)

where the logarithmic derivative is:

{\frac {p'(\nu _{0})}{p(\nu _{0})}}={\frac {d\log(p(x))}{dx}}|_{x=\nu _{0}}=-\left(1+{\frac {1-k}{\nu _{0}}}\right)

Therefore:

\Delta x_{2}={\frac {\Delta A_{2}}{p(\nu _{0})\left(1+{\frac {p'(\nu _{0})}{p(\nu _{0})}}{\frac {\Delta x_{1}}{2}}\right)}}

\Delta x_{2}={\frac {{\frac {1}{\Gamma (k)}}\left({\frac {\nu _{0}^{k+1}}{(k+1)}}-{\frac {\nu _{0}^{k+2}}{2(k+2)}}+{\frac {\nu _{0}^{k+3}}{6(k+3)}}\right)}{\left({\frac {\nu _{0}^{k-1}}{\Gamma (k)}}e^{-\nu _{0}}\right)\left(1-\left(1+{\frac {1-k}{\nu _{0}}}\right){\frac {\nu _{0}^{2}}{2(k+1)e^{-\nu _{0}}}}\right)}}

\Delta x_{2}={\frac {e^{\nu _{0}}\left({\frac {\nu _{0}^{2}}{(k+1)}}-{\frac {\nu _{0}^{3}}{2(k+2)}}+{\frac {\nu _{0}^{4}}{6(k+3)}}\right)}{1-\left(1+{\frac {1-k}{\nu _{0}}}\right){\frac {e^{\nu _{0}}\nu _{0}^{2}}{2(k+1)}}}}

Then we have the next approximation to the median:

\nu _{2}=\nu _{0}+\Delta x_{2}=\nu _{0}+{\frac {e^{\nu _{0}}\left({\frac {\nu _{0}^{2}}{(k+1)}}-{\frac {\nu _{0}^{3}}{2(k+2)}}+{\frac {\nu _{0}^{4}}{6(k+3)}}\right)}{1-\left(1+{\frac {1-k}{\nu _{0}}}\right){\frac {e^{\nu _{0}}\nu _{0}^{2}}{2(k+1)}}}}=\nu _{0}\left(1+{\frac {e^{\nu _{0}}\left({\frac {\nu _{0}}{(k+1)}}-{\frac {\nu _{0}^{2}}{2(k+2)}}+{\frac {\nu _{0}^{3}}{6(k+3)}}\right)}{1-\left(1+{\frac {1-k}{\nu _{0}}}\right){\frac {e^{\nu _{0}}\nu _{0}^{2}}{2(k+1)}}}}\right)

It's complicated, but it's a ridiculously good approximation at low values of $k$ , and it points toward how a series of better approximations might be derived, using more terms from the exponential and higher-order derivatives.

These are essentially just simple approximations to Newton–Raphson steps that one could do numerically, for very quick convergence when the starting point is not too bad. But it's nice that they can produce "closed form" approximations, not just numerical recipes.

Approximations for medium-to-high $k$ edit

Approximations that work well at high

k

and reasonably well somewhat below

k = 1

Banneheka & Ekanayake (2009) provided the rational function $(k-1+0.2)/(k+0.2)$ as an approximation for the median at high enough $k$ .^[2] Generalizing the 0.2 to be a function $c (k)$ leads to a variety of improved approximations that work down to lower values of the shape factor, and potentially down to 0 if a better form is found.

Using $c = 8/45 = 0.1777777$ leads to the right behavior for very large $k$ , matching the $8/(9\cdot45 k)$ term in the Laurent series, so we arranged this series of approximations to use that first coefficient. The others are least-squares fitted to an ideal $c$ computed as a function of $k$ using the numerical median values. To get a good fit, the lower limit of $k$ used is adjusted for each $N$ .

Unlike the original Laurent series, in this one adding more terms makes it work to lower values of $k$ .

Approximations across all $k$ edit

Approximations that work well across a wide range of

k

. With three fitted coefficients, the worst-case absolute and relative errors are both below 0.01 everywhere, and the the high-

k

asymptote approaches zero absolute error. More digits are needed with more terms, but even with high precision the results do not improve rapidly with more terms.

To get a functional form with low relative error across all $k$ , it seems we need to combine the above strategies. E.g. use $2 -1/ k$ to get the low- $k$ shape, and a Laurent series to make it fit everywhere.

Since $2 -1/ k$ approaches $1 - log(2)/ k$ at high $k$ (according to Wolfram), multiplying by $k + log(2) - 1/3$ makes an approximation that approaches $k - 1/3$ at high $k$ , which is a good place to start. Beyond that, we find coefficients empirically, by least-squares fit. The error curves shown correspond to the rounded coefficients stated on the figure.

By using negative powers of $k + 0.5$ instead of $k$ , we allow convergence of the series down to 0. The 0.5 offset seems a bit better than other values tried, but has not been optimized.

Another series of approximations built on powers of

2 -1/ k

and those powers times

k

. These are actually quite excellent in the middle. By constraining the sum of the

d

coefficients, we get asymptotically zero relative error at high

k

, but not zero absolute error. At order 3, the relative error is less than 0.01 everywhere (or at least down to the lowest values of

k

considered.

From looking at the small- $k$ approximations, it seems that maybe a polynomial in $(0.5 Γ(k + 1)) -1/ k$ might work. But that term is proportional to $k$ at the high end, so powers of it wouldn't be useful. And being tempted to get rid of the gamma function, which is itself not really an easy closed-form expression, we can try polynomials in $2 -1/ k$ instead. However, since the latter flattens out at 1, and we want to end up with a result close to $k$ , we included terms multiplied by $k$ , yielding very good least-squares fits with not too many coefficients.

Trying to be smarter about this edit

Approximations to the median of a gamma distribution, using interpolators between the low-

k

and high-

k

coeffcients of 1 and k to multiply

2^{-1/k}

by. With this approach the absolute (blue) and relative (magenta) errors both go to zero at low and high

k

, and with the particular interpolators – sigmoids of log

k

(dashed) and Gompertz functions of log

k

(solid) – the absolute and relative errors are everywhere less than 0.00228 for sigmoid and 0.00184 for Gompertz function.

The $2^{-1/k}$ is useful. It needs a factor approaching $(1+k\pi ^{2}/12)e^{-\gamma }$ at the low end, and approaching $k+\log(2)-1/3$ at the high end, as pointed out in earlier sections. If we make an appropriate interpolator between those coefficients of 1 and $k$ , e.g. using a logistic sigmoid of $\log(k)$ , we will be able to approach the known asymptotes at both ends, such that both absolute and relative errors will tend toward zero for low and high $k$ .

The coefficient of 1 changes from 0.5615 to 0.3598, while the coefficient of $k$ changes from 0.4618 to 1.0. Finding a pair of interpolators jointly by searching over four parameters (width and centers of two sigmoids) is easy by brute-force search. We minimized the max of absolute and relative errors to arrive at these (sigmoidal in log $k$ ) interpolators, which yield absolute and relative errors everywhere less than 0.00228:

\alpha _{0}={\frac {1}{1+(k/0.2131)^{1.2900}}}

\alpha _{1}={\frac {1}{1+(k/0.00802)^{3.5251}}}

The formula for the smarter everywhere-good median approximation is then:

\nu \approx 2^{-1/k}\left(\alpha _{0}e^{-\gamma }+(1-\alpha _{0})\left(\log(2)-{\frac {1}{3}}\right)+k\left(\alpha _{1}{\frac {e^{-\gamma }\pi ^{2}}{12}}+(1-\alpha _{1})\right)\right)

Optimized interpolation coefficients:

\alpha _{0}

for the constant factor (blue) and

\alpha _{1}

for the proportional-to-k factor (red), for logistic sigmoid (dashed) and Gompertz function (solid)

Adding the single-interpolator result (dash-dot), and removing the Gompertz curves, with the range of

k

extended at the low end to see the behavior.

The logistic sigmoid leads to a rather odd-looking best fit, as the plot of interpolators shows (red dashed curve, the sigmoid for the $k$ coefficient, in particular). I actually ran this optimization down to $k=2^{-10}$ to make sure the relative error is not bouncing back up, and it does eventually like the low- $k$ coefficients down there.

Gompertz function as better interpolator edit

The Gompertz function is a good asymmetric function that ends up working just a bit better as an interpolator for max relative error (less than 0.00184), but considerably better for max absolute error (less than 0.000612) at the same time, and gives a more sensible-looking pair of interpolating curves (solid curves in the two figures).

\alpha _{0}=1-\exp \left(-(k/0.1500)^{-1.0023}\right)

\alpha _{1}=1-\exp \left(-(k/0.0500)^{-2.5080}\right)

These interpolators work with the same formula for the median approximation given above.

As 4-parameter fits go, this is the best.

Single-interpolator version edit

I tried one interpolator for both terms, but that was a lot worse. The graph of interpolators suggests that a single interpolator for the constant factor, and just fixing the $k$ factor to the high- $k$ value might be about as good. Indeed, that worked OK (and the same trick with Gompertz function did not). The best interpolator is not visually different from the (blue dashed) one plotted:

\alpha ={\frac {1}{1+(k/0.2136)^{1.2789}}}

Then we have this simpler expression, with absolute and relative errors less then 0.00271 with just two fitted parameters:

\nu \approx 2^{-1/k}\left(\alpha e^{-\gamma }+(1-\alpha )\left(\log(2)-{\frac {1}{3}}\right)+k\right)

Having the wrong coefficient of $k$ doesn't hurt very much for very small $k$ – the relative error just doesn't go to zero as quickly as it might.

Rewrite it this way:

\nu \approx 2^{-1/k}\left({\frac {1}{1+(k/0.2136)^{1.2789}}}\left(e^{-\gamma }-\log(2)+{\frac {1}{3}}\right)+\log(2)-{\frac {1}{3}}+k\right)

\nu \approx 2^{-1/k}\left({\frac {0.2016}{1+(k/0.2136)^{1.2789}}}+0.3598+k\right)

Better single interpolator function: Gudermannian edit

Compared to the previous single logistic sigmoid interpolator (dash-dot curves), the arctan or Gudermannian interpolator (solid curves) gives a better fit (less than half as much max relative error, below 0.00132).

Alternatively, we can numerically find the ideal single interpolator, see what it looks like, and seek a better functional form than a sigmoid to approximate it. The difference from a logistic sigmoid is small, but enough that we can see which direction to move, among the various sigmoid functions illustrated in Wikipedia, and it looks like the Gudermannian function would be a step in the right direction.

For the Gudermannian of log $k$ we use the arctan( $k$ ) formula, but with optimized offset and scaling, this way:

\alpha =1-{\frac {2}{\pi }}\arctan \left((k/0.21239)^{1.0554}\right)

This actually yields less than half as much max relative error, below 0.00132, and max absolute error below 0.00122. Using the same base formula from above, the new best two-parameter approximation is:

\nu \approx 2^{-1/k}\left({\frac {2}{\pi }}\arctan \left(\left({\frac {k}{0.21239}}\right)^{1.0554}\right)\left(e^{-\gamma }-\log(2)+{\frac {1}{3}}\right)+e^{-\gamma }+k\right)

\nu \approx 2^{-1/k}\left(0.56146+k-0.12837\arctan \left((4.7083k)^{1.0554}\right)\right)

I was not able to reduce the maximum error further by using two Gudermannian interpolators.

References edit

^ Berg, Christian and Pedersen, Henrik L. (March 2006). "The Chen–Rubin conjecture in a continuous setting" (PDF). Methods and Applications of Analysis. 13 (1): 63–88. Retrieved 1 April 2020.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Banneheka BMSG, Ekanayake GEMUPD (2009) "A new point estimator for the median of gamma distribution". Viyodaya J Science, 14:95–103

[berg-1] Berg, Christian and Pedersen, Henrik L. (March 2006). "The Chen–Rubin conjecture in a continuous setting" (PDF). Methods and Applications of Analysis. 13 (1): 63–88. Retrieved 1 April 2020.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[Banneheka2009-2] Banneheka BMSG, Ekanayake GEMUPD (2009) "A new point estimator for the median of gamma distribution". Viyodaya J Science, 14:95–103

[1]

[2]