# Binomial proportion confidence interval

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments (Bernoulli trials). In other words, a binomial proportion confidence interval is an interval estimate of a success probability p when only the number of experiments n and the number of successes nS are known.

There are several formulas for a binomial confidence interval, but all of them rely on the assumption of a binomial distribution. In general, a binomial distribution applies when an experiment is repeated a fixed number of times, each trial of the experiment has two possible outcomes (success and failure), the probability of success is the same for each trial, and the trials are statistically independent. Because the binomial distribution is a discrete probability distribution (i.e., not continuous) and difficult to calculate for large numbers of trials, a variety of approximations are used to calculate this confidence interval, all with their own tradeoffs in accuracy and computational intensity.

A simple example of a binomial distribution is the set of various possible outcomes, and their probabilities, for the number of heads observed when a coin is flipped ten times. The observed binomial proportion is the fraction of the flips that turn out to be heads. Given this observed proportion, the confidence interval for the true probability of the coin landing on heads is a range of possible proportions, which may or may not contain the true proportion. A 95% confidence interval for the proportion, for instance, will contain the true proportion 95% of the times that the procedure for constructing the confidence interval is employed.[1]

## Normal approximation interval

A commonly used formula for a binomial confidence interval relies on approximating the distribution of error about a binomially-distributed observation, ${\displaystyle {\hat {p}}}$ , with a normal distribution.[2] This approximation is based on the central limit theorem and is unreliable when the sample size is small or the success probability is close to 0 or 1.[3]

Using the normal approximation, the success probability p is estimated as

${\displaystyle {\hat {p}}\pm z{\sqrt {\frac {{\hat {p}}\left(1-{\hat {p}}\right)}{n}}},}$

or the equivalent

${\displaystyle {\frac {n_{S}}{n}}\pm {\frac {z}{n}}{\sqrt {\frac {n_{S}n_{F}}{n}}},}$

where ${\displaystyle {\hat {p}}=n_{S}/n}$  is the proportion of successes in a Bernoulli trial process, measured with ${\displaystyle n}$  trials yielding ${\displaystyle n_{S}}$  successes and ${\displaystyle n_{F}=n-n_{S}}$  failures, and ${\displaystyle z}$  is the ${\displaystyle 1-{\tfrac {\alpha }{2}}}$  quantile of a standard normal distribution (i.e., the probit) corresponding to the target error rate ${\displaystyle \alpha }$ . For a 95% confidence level, the error ${\displaystyle \alpha =1-0.95=0.05}$ , so ${\displaystyle 1-{\tfrac {\alpha }{2}}=0.975}$  and ${\displaystyle z=1.96}$ .

An important theoretical derivation of this confidence interval involves the inversion of a hypothesis test. Under this formulation, the confidence interval represents those values of the population parameter that would have large p-values if they were tested as a hypothesized population proportion. The collection of values, ${\displaystyle \theta }$ , for which the normal approximation is valid can be represented as

${\displaystyle \left\{\theta \,\,{\bigg |}\,\,y\leq {\frac {{\hat {p}}-\theta }{\sqrt {{\frac {1}{n}}{\hat {p}}\left(1-{\hat {p}}\right)}}}\leq z_{\tfrac {\alpha }{2}}\right\},}$

where ${\displaystyle y}$  is the ${\displaystyle {\tfrac {\alpha }{2}}}$  quantile of a standard normal distribution. Since the test in the middle of the inequality is a Wald test, the normal approximation interval is sometimes called the Wald interval, but it was first described by Pierre-Simon Laplace in 1812.[4]

## Wilson score interval

The Wilson score interval is an improvement over the normal approximation interval in that the actual coverage probability is closer to the nominal value. It was developed by Edwin Bidwell Wilson (1927).[5]

Wilson gave the confidence limits as solutions of both equations

${\displaystyle p={\hat {p}}\pm z{\sqrt {\frac {p\left(1-p\right)}{n}}}\qquad \Longrightarrow \qquad \left(p-{\hat {p}}\right)^{2}=z^{2}\cdot {\frac {p\left(1-p\right)}{n}}}$

after transforming it into quadratic equations. Hence the success probability p is estimated as

${\displaystyle {\frac {{\hat {p}}+{\frac {z^{2}}{2n}}}{1+{\frac {z^{2}}{n}}}}\pm {\frac {z}{1+{\frac {z^{2}}{n}}}}{\sqrt {{\frac {{\hat {p}}(1-{\hat {p}})}{n}}+{\frac {z^{2}}{4n^{2}}}}}}$

or the equivalent

${\displaystyle {\frac {n_{S}+{\frac {z^{2}}{2}}}{n+z^{2}}}\pm {\frac {z}{n+z^{2}}}{\sqrt {{\frac {n_{S}n_{F}}{n}}+{\frac {z^{2}}{4}}}}.}$

This interval has good properties even for a small number of trials and/or an extreme probability.

Intuitively, the center value of this interval is the weighted average of ${\displaystyle {\hat {p}}}$  and ${\displaystyle {\tfrac {1}{2}}}$ , with ${\displaystyle {\hat {p}}}$  receiving greater weight as the sample size increases. Formally, the center value corresponds to using a pseudocount of ${\displaystyle z^{2}/2}$ , the number of standard deviations of the confidence interval: add this number to both the count of successes and of failures to yield the estimate of the ratio. For the common two standard deviations in each direction interval (approximately 95% coverage, which itself is approximately 1.96 standard deviations), this yields the estimate ${\displaystyle (n_{S}+2)/(n+4)}$ , which is known as the "plus four rule".

In most cases Wilson's equations can be solved numerically using the fixed-point iteration

${\displaystyle p_{n+1}={\hat {p}}\pm z\cdot {\sqrt {\frac {p_{n}\cdot \left(1-p_{n}\right)}{n}}}}$

with ${\displaystyle p_{0}={\hat {p}}}$ .

The Wilson interval can be derived from Pearson's chi-squared test with two categories. The resulting interval,

${\displaystyle \left\{\theta \,\,{\bigg |}\,\,y\leq {\frac {{\hat {p}}-\theta }{\sqrt {{\frac {1}{n}}\theta (1-\theta )}}}\leq z\right\},}$

can then be solved for ${\displaystyle \theta }$  to produce the Wilson score interval. The test in the middle of the inequality is a score test.

### Wilson score interval with continuity correction

The Wilson interval may be modified by employing a continuity correction, in order to align the minimum coverage probability, rather than the average probability, with the nominal value.

Just as the Wilson interval mirrors Pearson's chi-squared test, the Wilson interval with continuity correction mirrors the equivalent Yates' chi-squared test.

The following formulae for the lower and upper bounds of the Wilson score interval with continuity correction ${\displaystyle (w^{-},w^{+})}$  are derived from Newcombe (1998).[6]

{\displaystyle {\begin{aligned}w^{-}&=\max \left\{0,{\frac {2n{\hat {p}}+z^{2}-\left[z{\sqrt {z^{2}-{\frac {1}{n}}+4n{\hat {p}}(1-{\hat {p}})+(4{\hat {p}}-2)}}+1\right]}{2(n+z^{2})}}\right\}\\w^{+}&=\min \left\{1,{\frac {2n{\hat {p}}+z^{2}+\left[z{\sqrt {z^{2}-{\frac {1}{n}}+4n{\hat {p}}(1-{\hat {p}})-(4{\hat {p}}-2)}}+1\right]}{2(n+z^{2})}}\right\}\end{aligned}}}

However, if p = 0, ${\displaystyle w^{-}}$  must be taken as 0; if p = 1, ${\displaystyle w^{+}}$  is then 1.

## Jeffreys interval

The Jeffreys interval has a Bayesian derivation, but it has good frequentist properties. In particular, it has coverage properties that are similar to those of the Wilson interval, but it is one of the few intervals with the advantage of being equal-tailed (e.g., for a 95% confidence interval, the probabilities of the interval lying above or below the true value are both close to 2.5%). In contrast, the Wilson interval has a systematic bias such that it is centred too close to p = 0.5.[7]

The Jeffreys interval is the Bayesian credible interval obtained when using the non-informative Jeffreys prior for the binomial proportion p. The Jeffreys prior for this problem is a Beta distribution with parameters (1/2, 1/2). After observing x successes in n trials, the posterior distribution for p is a Beta distribution with parameters (x + 1/2, n – x + 1/2).

When x ≠0 and x ≠ n, the Jeffreys interval is taken to be the 100(1 – α)% equal-tailed posterior probability interval, i.e., the α / 2 and 1 – α / 2 quantiles of a Beta distribution with parameters (x + 1/2, n – x + 1/2). These quantiles need to be computed numerically, although this is reasonably simple with modern statistical software.

In order to avoid the coverage probability tending to zero when p → 0 or 1, when x = 0 the upper limit is calculated as before but the lower limit is set to 0, and when x = n the lower limit is calculated as before but the upper limit is set to 1.[3]

## Clopper–Pearson interval

The Clopper–Pearson interval is an early and very common method for calculating binomial confidence intervals.[8] This is often called an 'exact' method, because it is based on the cumulative probabilities of the binomial distribution (i.e., exactly the correct distribution rather than an approximation). However, in cases where we know the population size, the intervals may not be the smallest possible, because they include impossible proportions: for instance, for a population of size 10, an interval of [0.35, 0.65] would be too large as the true proportion cannot lie between 0.35 and 0.4, or between 0.6 and 0.65.

The Clopper–Pearson interval can be written as

${\displaystyle S_{\leq }\cap S_{\geq }}$

or equivalently,

${\displaystyle (\inf S_{\geq }\,,\,\sup S_{\leq })}$

with

${\displaystyle S_{\leq }:=\left\{\theta \,\,{\Big |}\,\,P\left[\operatorname {Bin} \left(n;\theta \right)\leq x\right]>{\frac {\alpha }{2}}\right\}{\text{ and }}S_{\geq }:=\left\{\theta \,\,{\Big |}\,\,P\left[\operatorname {Bin} \left(n;\theta \right)\geq x\right]>{\frac {\alpha }{2}}\right\},}$

where 0 ≤ xn is the number of successes observed in the sample and Bin(nθ) is a binomial random variable with n trials and probability of success θ.

Equivalently we can say that the Clopper–Pearson interval is ${\displaystyle (x/n-\varepsilon _{1},\ x/n+\varepsilon _{2})}$  with confidence level ${\displaystyle 1-\alpha }$  if ${\displaystyle \varepsilon _{i}}$  is the infimum of those such that the following tests of hypothesis succeed with significance ${\displaystyle \alpha /2}$ :

1. H0: ${\displaystyle \theta =x/n-\varepsilon _{1}}$  with HA: ${\displaystyle \theta >x/n-\varepsilon _{1}}$

2. H0: ${\displaystyle \theta =x/n+\varepsilon _{2}}$  with HA: ${\displaystyle \theta  .

Because of a relationship between the binomial distribution and the beta distribution, the Clopper–Pearson interval is sometimes presented in an alternate format that uses quantiles from the beta distribution.

${\displaystyle B\left({\frac {\alpha }{2}};x,n-x+1\right)<\theta

where x is the number of successes, n is the number of trials, and B(p; v,w) is the pth quantile from a beta distribution with shape parameters v and w.

When ${\displaystyle x}$  is either ${\displaystyle 0}$  or ${\displaystyle n}$ , closed-form expressions for the interval bounds are available: when ${\displaystyle x=0}$  the interval is ${\displaystyle \left(0,1-\left({\frac {\alpha }{2}}\right)^{1/n}\right)}$  and when ${\displaystyle x=n}$  it is ${\displaystyle \left(\left({\frac {\alpha }{2}}\right)^{1/n},1\right)}$ .[9]

The beta distribution is, in turn, related to the F-distribution so a third formulation of the Clopper–Pearson interval can be written using F quantiles:

${\displaystyle \left(1+{\frac {n-x+1}{x\,F\!\left[{\frac {\alpha }{2}};2x,2(n-x+1)\right]}}\right)^{-1}<\theta <\left(1+{\frac {n-x}{(x+1)\,\,F\!\left[1-{\frac {\alpha }{2}};2(x+1),2(n-x)\right]}}\right)^{-1}}$

where x is the number of successes, n is the number of trials, and F(c; d1, d2) is the c quantile from an F-distribution with d1 and d2 degrees of freedom.[10]

The Clopper–Pearson interval is an exact interval since it is based directly on the binomial distribution rather than any approximation to the binomial distribution. This interval never has less than the nominal coverage for any population proportion, but that means that it is usually conservative. For example, the true coverage rate of a 95% Clopper–Pearson interval may be well above 95%, depending on n and θ.[citation needed] Thus the interval may be wider than it needs to be to achieve 95% confidence. In contrast, it is worth noting that other confidence bounds may be narrower than their nominal confidence width, i.e., the normal approximation (or "standard") interval, Wilson interval,[5] Agresti–Coull interval,[10] etc., with a nominal coverage of 95% may in fact cover less than 95%.[3]

The definition of the Clopper–Pearson interval can also be modified to obtain exact confidence intervals for different distributions. For instance, it can also be applied to the case where the samples are drawn without replacement from a population of a known size, instead of repeated draws of a binomial distribution. In this case, the underlying distribution would be the hypergeometric distribution.

## Agresti–Coull interval

The Agresti–Coull interval is also another approximate binomial confidence interval.[10]

Given ${\displaystyle X}$  successes in ${\displaystyle n}$  trials, define

${\displaystyle {\tilde {n}}=n+z^{2}}$

and

${\displaystyle {\tilde {p}}={\frac {1}{\tilde {n}}}\left(X+{\frac {z^{2}}{2}}\right)}$

Then, a confidence interval for ${\displaystyle p}$  is given by

${\displaystyle {\tilde {p}}\pm z{\sqrt {{\frac {\tilde {p}}{\tilde {n}}}\left(1-{\tilde {p}}\right)}}}$

where ${\displaystyle z}$  is the ${\displaystyle 1-{\frac {\alpha }{2}}}$  quantile of a standard normal distribution, as before. For example, for a 95% confidence interval, let ${\displaystyle \alpha =0.05}$ , so ${\displaystyle z=1.96}$  and ${\displaystyle z^{2}=3.84}$ . If we use 2 instead of 1.96 for ${\displaystyle z}$ , this is the "add 2 successes and 2 failures" interval in Agresti and Coull's 1998 paper "Approximate is Better than 'Exact' for Interval Estimation of Binomial Proportions." [10]

This interval can be summarised as employing the centre-point adjustment, ${\displaystyle {\tilde {p}}}$ , of the Wilson score interval, and then applying the Normal approximation to this point.[2][3]

${\displaystyle {\tilde {p}}={\frac {{\hat {p}}+{\frac {z^{2}}{2n}}}{1+{\frac {z^{2}}{n}}}}}$

## Arcsine transformation

Let X be the number of successes in n trials and let p = X/n. The variance of p is

${\displaystyle \operatorname {var} (p)={\frac {p(1-p)}{n}}.}$

Using the arc sine transform the variance of the arcsine of p1/2 is[11]

${\displaystyle \operatorname {var} \left(\arcsin({\sqrt {p}})\right)\approx {\frac {\operatorname {var} (p)}{4p(1-p)}}={\frac {p(1-p)}{4np(1-p)}}={\frac {1}{4n}}.}$

So, the confidence interval itself has the following form:

${\displaystyle \sin ^{2}\left(\arcsin({\sqrt {p}})-{\frac {z}{2{\sqrt {n}}}}\right)<\theta <\sin ^{2}\left(\arcsin({\sqrt {p}})+{\frac {z}{2{\sqrt {n}}}}\right)}$

where ${\displaystyle z}$  is the ${\displaystyle \scriptstyle 1\,-\,{\frac {\alpha }{2}}}$  quantile of a standard normal distribution.

This method may be used to estimate the variance of p but its use is problematic when p is close to 0 or 1.

## ta transform

Let p be the proportion of successes. For 0 ≤ a ≤ 2,

${\displaystyle t_{a}=\log \left({\frac {p^{a}}{(1-p)^{2-a}}}\right)=a\log(p)-(2-a)\log(1-p)}$

This family is a generalisation of the logit transform which is a special case with a = 1 and can be used to transform a proportional data distribution to an approximately normal distribution. The parameter a has to be estimated for the data set.

## Special cases

The rule of three is used to provide a simple way of stating an approximate 95% confidence interval for p, in the special case that no successes (${\displaystyle {\hat {p}}=0}$ ) have been observed.[12] The interval is (0,3/n).

By symmetry, one could expect for only successes (${\displaystyle {\hat {p}}=1}$ ), the interval is (1 − 3/n,1).

## Comparison of different intervals

There are several research papers that compare these and other confidence intervals for the binomial proportion.[2][6][13][14] Both Agresti and Coull (1998)[10] and Ross (2003)[15] point out that exact methods such as the Clopper–Pearson interval may not work as well as certain approximations.

Of the approximations listed above, Wilson score interval methods (with or without continuity correction) have been shown to be the most accurate and the most robust.[2][3][6]

Many of these intervals can be calculated in R using packages like "binom"..

## References

1. ^ Sullivan, Lisa (2017-10-27). "Confidence Intervals". Boston University School of Public Health.
2. ^ a b c d Wallis, Sean A. (2013). "Binomial confidence intervals and contingency tests: mathematical fundamentals and the evaluation of alternative methods" (PDF). Journal of Quantitative Linguistics. 20 (3): 178–208. doi:10.1080/09296174.2013.799918.
3. Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001). "Interval Estimation for a Binomial Proportion". Statistical Science. 16 (2): 101–133. CiteSeerX 10.1.1.50.3025. doi:10.1214/ss/1009213286. MR 1861069. Zbl 1059.62533.
4. ^ Laplace, Pierre Simon (1812). Théorie analytique des probabilités (in French). p. 283.
5. ^ a b Wilson, E. B. (1927). "Probable inference, the law of succession, and statistical inference". Journal of the American Statistical Association. 22 (158): 209–212. doi:10.1080/01621459.1927.10502953. JSTOR 2276774.
6. ^ a b c Newcombe, R. G. (1998). "Two-sided confidence intervals for the single proportion: comparison of seven methods". Statistics in Medicine. 17 (8): 857–872. doi:10.1002/(SICI)1097-0258(19980430)17:8<857::AID-SIM777>3.0.CO;2-E. PMID 9595616.
7. ^ Cai, TT (2005). "One-sided confidence intervals in discrete distributions". Journal of Statistical Planning and Inference. 131: 63–88. doi:10.1016/j.jspi.2004.01.005.
8. ^ Clopper, C.; Pearson, E. S. (1934). "The use of confidence or fiducial limits illustrated in the case of the binomial". Biometrika. 26 (4): 404–413. doi:10.1093/biomet/26.4.404.
9. ^ Thulin, Måns (2014-01-01). "The cost of using exact confidence intervals for a binomial proportion". Electronic Journal of Statistics. 8 (1): 817–840. arXiv:1303.1288. doi:10.1214/14-EJS909. ISSN 1935-7524.
10. Agresti, Alan; Coull, Brent A. (1998). "Approximate is better than 'exact' for interval estimation of binomial proportions". The American Statistician. 52 (2): 119–126. doi:10.2307/2685469. JSTOR 2685469. MR 1628435.
11. ^ Shao J (1998) Mathematical statistics. Springer. New York, New York, USA
12. ^ Steve Simon (2010) "Confidence interval with zero events", The Children's Mercy Hospital, Kansas City, Mo. (website: "Ask Professor Mean at Stats topics or Medical Research Archived October 15, 2011, at the Wayback Machine)
13. ^ Reiczigel, J (2003). "Confidence intervals for the binomial parameter: some new considerations" (PDF). Statistics in Medicine. 22 (4): 611–621. doi:10.1002/sim.1320.
14. ^ Sauro J., Lewis J.R. (2005) "Comparison of Wald, Adj-Wald, Exact and Wilson intervals Calculator" Archived 2012-06-18 at the Wayback Machine. Proceedings of the Human Factors and Ergonomics Society, 49th Annual Meeting (HFES 2005), Orlando, FL, pp. 2100–2104
15. ^ Ross, T. D. (2003). "Accurate confidence intervals for binomial proportion and Poisson rate estimation". Computers in Biology and Medicine. 33 (6): 509–531. doi:10.1016/S0010-4825(03)00019-2.