Cramér–Rao bound

  (Redirected from Cramér–Rao inequality)

In estimation theory and statistics, the Cramér–Rao bound (CRB), Cramér–Rao lower bound (CRLB), Cramér–Rao inequality, Fréchet–Darmois–Cramér–Rao inequality, or information inequality expresses a lower bound on the variance of unbiased estimators of a deterministic (fixed, though unknown) parameter. This term is named in honor of Harald Cramér,[1] Calyampudi Radhakrishna Rao,[2][3] Maurice Fréchet[4] and Georges Darmois[5] all of whom independently derived this limit to statistical precision in the 1940s.[6][7]

In its simplest form, the bound states that the variance of any unbiased estimator is at least as high as the inverse of the Fisher information. An unbiased estimator which achieves this lower bound is said to be (fully) efficient. Such a solution achieves the lowest possible mean squared error among all unbiased methods, and is therefore the minimum variance unbiased (MVU) estimator. However, in some cases, no unbiased technique exists which achieves the bound. This may occur either if for any unbiased estimator, there exists another with a strictly smaller variance, or if an MVU estimator exists, but its variance is strictly greater than the inverse of the Fisher information.

The Cramér–Rao bound can also be used to bound the variance of biased estimators of given bias. In some cases, a biased approach can result in both a variance and a mean squared error that are below the unbiased Cramér–Rao lower bound; see estimator bias.


The Cramér–Rao bound is stated in this section for several increasingly general cases, beginning with the case in which the parameter is a scalar and its estimator is unbiased. All versions of the bound require certain regularity conditions, which hold for most well-behaved distributions. These conditions are listed later in this section.

Scalar unbiased caseEdit

Suppose   is an unknown deterministic parameter which is to be estimated from   independent observations (measurements) of  , each distributed according to some probability density function  . The variance of any unbiased estimator   of   is then bounded by the reciprocal of the Fisher information  :


where the Fisher information   is defined by


and   is the natural logarithm of the likelihood function and   denotes the expected value (over  ).

The efficiency of an unbiased estimator   measures how close this estimator's variance comes to this lower bound; estimator efficiency is defined as


or the minimum possible variance for an unbiased estimator divided by its actual variance. The Cramér–Rao lower bound thus gives


General scalar caseEdit

A more general form of the bound can be obtained by considering a biased estimator  , whose expectation is not   but a function of this parameter, say,  . Hence   is not generally equal to 0. In this case, the bound is given by


where   is the derivative of   (by  ), and   is the Fisher information defined above.

Bound on the variance of biased estimatorsEdit

Apart from being a bound on estimators of functions of the parameter, this approach can be used to derive a bound on the variance of biased estimators with a given bias, as follows. Consider an estimator   with bias  , and let  . By the result above, any unbiased estimator whose expectation is   has variance greater than or equal to  . Thus, any estimator   whose bias is given by a function   satisfies


The unbiased version of the bound is a special case of this result, with  .

It's trivial to have a small variance − an "estimator" that is constant has a variance of zero. But from the above equation we find that the mean squared error of a biased estimator is bounded by


using the standard decomposition of the MSE. Note, however, that if   this bound might be less than the unbiased Cramér–Rao bound  . For instance, in the example of estimating variance below,  .

Multivariate caseEdit

Extending the Cramér–Rao bound to multiple parameters, define a parameter column vector


with probability density function   which satisfies the two regularity conditions below.

The Fisher information matrix is a   matrix with element   defined as


Let   be an estimator of any vector function of parameters,  , and denote its expectation vector   by  . The Cramér–Rao bound then states that the covariance matrix of   satisfies



  • The matrix inequality   is understood to mean that the matrix   is positive semidefinite, and
  •   is the Jacobian matrix whose   element is given by  .

If   is an unbiased estimator of   (i.e.,  ), then the Cramér–Rao bound reduces to


If it is inconvenient to compute the inverse of the Fisher information matrix, then one can simply take the reciprocal of the corresponding diagonal element to find a (possibly loose) lower bound.[8]


Regularity conditionsEdit

The bound relies on two weak regularity conditions on the probability density function,  , and the estimator  :

  • The Fisher information is always defined; equivalently, for all   such that  ,
exists, and is finite.
  • The operations of integration with respect to   and differentiation with respect to   can be interchanged in the expectation of  ; that is,
whenever the right-hand side is finite.
This condition can often be confirmed by using the fact that integration and differentiation can be swapped when either of the following cases hold:
  1. The function   has bounded support in  , and the bounds do not depend on  ;
  2. The function   has infinite support, is continuously differentiable, and the integral converges uniformly for all  .

Simplified form of the Fisher informationEdit

Suppose, in addition, that the operations of integration and differentiation can be swapped for the second derivative of   as well, i.e.,


In this case, it can be shown that the Fisher information equals


The Cramèr–Rao bound can then be written as


In some cases, this formula gives a more convenient technique for evaluating the bound.

Single-parameter proofEdit

The following is a proof of the general scalar case of the Cramér–Rao bound described above. Assume that   is an estimator with expectation   (based on the observations  ), i.e. that  . The goal is to prove that, for all  ,


Let   be a random variable with probability density function  . Here   is a statistic, which is used as an estimator for  . Define   as the score:


where the chain rule is used in the final equality above. Then the expectation of  , written  , is zero. This is because:


where the integral and partial derivative have been interchanged (justified by the second regularity condition).

If we consider the covariance   of   and  , we have  , because  . Expanding this expression we have


again because the integration and differentiation operations commute (second condition).

The Cauchy–Schwarz inequality shows that




which proves the proposition.


Multivariate normal distributionEdit

For the case of a d-variate normal distribution


the Fisher information matrix has elements[9]


where "tr" is the trace.

For example, let   be a sample of   independent observations with unknown mean   and known variance   .


Then the Fisher information is a scalar given by


and so the Cramér–Rao bound is


Normal variance with known meanEdit

Suppose X is a normally distributed random variable with known mean   and unknown variance  . Consider the following statistic:


Then T is unbiased for  , as  . What is the variance of T?


(the second equality follows directly from the definition of variance). The first term is the fourth moment about the mean and has value  ; the second is the square of the variance, or  . Thus


Now, what is the Fisher information in the sample? Recall that the score V is defined as


where   is the likelihood function. Thus in this case,


where the second equality is from elementary calculus. Thus, the information in a single observation is just minus the expectation of the derivative of V, or


Thus the information in a sample of   independent observations is just   times this, or  

The Cramer–Rao bound states that


In this case, the inequality is saturated (equality is achieved), showing that the estimator is efficient.

However, we can achieve a lower mean squared error using a biased estimator. The estimator


obviously has a smaller variance, which is in fact


Its bias is


so its mean squared error is


which is clearly less than the Cramér–Rao bound found above.

When the mean is not known, the minimum mean squared error estimate of the variance of a sample from Gaussian distribution is achieved by dividing by n + 1, rather than n − 1 or n + 2.

See alsoEdit

References and notesEdit

  1. ^ Cramér, Harald (1946). Mathematical Methods of Statistics. Princeton, NJ: Princeton Univ. Press. ISBN 0-691-08004-6. OCLC 185436716.
  2. ^ Rao, Calyampudi Radakrishna (1945). "Information and the accuracy attainable in the estimation of statistical parameters". Bulletin of the Calcutta Mathematical Society. 37: 81–89. MR 0015748.
  3. ^ Rao, Calyampudi Radakrishna (1994). S. Das Gupta (ed.). Selected Papers of C. R. Rao. New York: Wiley. ISBN 978-0-470-22091-7. OCLC 174244259.
  4. ^ Fréchet, Maurice (1943). "Sur l'extension de certaines évaluations statistiques au cas de petits échantillons". Rev. Inst. Int. Statist. 11: 182–205.
  5. ^ Darmois, Georges (1945). "Sur les limites de la dispersion de certaines estimations". Rev. Int. Inst. Statist. 13: 9–15.
  6. ^ Gart, John J. (1958). "An extension of the Cramér–Rao inequality". Ann. Math. Stat. 29: 367–380.
  7. ^ Malécot, Gustave (1999). "Statistical methods and the subjective basis of scientific knowledge" [translated from Année X 1947 by Daniel Gianola]. Genet. Sel. Evol. 31: 269–298.
  8. ^ For the Bayesian case, see eqn. (11) of Bobrovsky; Mayer-Wolf; Zakai (1987). "Some classes of global Cramer–Rao bounds". Ann. Stat. 15 (4): 1421–38.
  9. ^ Kay, S. M. (1993). Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall. p. 47. ISBN 0-13-042268-1.

Further readingEdit

External linksEdit

  • FandPLimitTool a GUI-based software to calculate the Fisher information and Cramer-Rao Lower Bound with application to single-molecule microscopy.