Open main menu

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; it is proportional to the square root of the determinant of the Fisher information matrix:

It has the key feature that its functional dependence on the likelihood is invariant under reparameterization of the parameter vector (the functional form of the prior density function itself is not invariant under reparameterization, of course: only the measure that is identically zero has that property; see below). This makes it of special interest for use with scale parameters.[1]

Contents

ReparameterizationEdit

One-parameter caseEdit

For an alternative parameterization   we can derive

 

from

 

using the change of variables theorem for transformations and the definition of Fisher information:

 

That is, the functional form of the prior   can be derived from that of the likelihood   using the same procedure for both parametrizations.

Note, however, that the form of the prior is different for the two parametrizations. For example, if   (as in the case of the normal distribution, see below), and  , then  , which is obviously different from  .

Multiple-parameter caseEdit

For an alternative parameterization   we can derive

 

from

 

using the change of variables theorem for transformations, the definition of Fisher information, and that the product of determinants is the determinant of the matrix product:

 

AttributesEdit

From a practical and mathematical standpoint, a valid reason to use this non-informative prior instead of others, like the ones obtained through a limit in conjugate families of distributions, is that its construction from the likelihood is dependent upon the set of parameter variables that is chosen to describe parameter space. It is not the only prior with this property, however. As is clear from the derivation above, instead of   we could use any other smooth function  , and the resulting prior would still have the same kind of invariance property.

Sometimes the Jeffreys prior cannot be normalized, and is thus an improper prior. For example, the Jeffreys prior for the distribution mean is uniform over the entire real line in the case of a Gaussian distribution of known variance.

Use of the Jeffreys prior violates the strong version of the likelihood principle, which is accepted by many, but by no means all, statisticians. When using the Jeffreys prior, inferences about   depend not just on the probability of the observed data as a function of  , but also on the universe of all possible experimental outcomes, as determined by the experimental design, because the Fisher information is computed from an expectation over the chosen universe. Accordingly, the Jeffreys prior, and hence the inferences made using it, may be different for two experiments involving the same   parameter even when the likelihood functions for the two experiments are the same—a violation of the strong likelihood principle.

Minimum description lengthEdit

In the minimum description length approach to statistics the goal is to describe data as compactly as possible where the length of a description is measured in bits of the code used. For a parametric family of distributions one compares a code with the best code based on one of the distributions in the parameterized family. The main result is that in exponential families, asymptotically for large sample size, the code based on the distribution that is a mixture of the elements in the exponential family with the Jeffreys prior is optimal. This result holds if one restricts the parameter set to a compact subset in the interior of the full parameter space[citation needed]. If the full parameter is used a modified version of the result should be used.

ExamplesEdit

The Jeffreys prior for a parameter (or a set of parameters) depends upon the statistical model.

Gaussian distribution with mean parameterEdit

For the Gaussian distribution of the real value  

 

with   fixed, the Jeffreys prior for the mean   is

 

That is, the Jeffreys prior for   does not depend upon  ; it is the unnormalized uniform distribution on the real line — the distribution that is 1 (or some other fixed constant) for all points. This is an improper prior, and is, up to the choice of constant, the unique translation-invariant distribution on the reals (the Haar measure with respect to addition of reals), corresponding to the mean being a measure of location and translation-invariance corresponding to no information about location.

Gaussian distribution with standard deviation parameterEdit

For the Gaussian distribution of the real value  

 

with   fixed, the Jeffreys prior for the standard deviation   is

 

Equivalently, the Jeffreys prior for   is the unnormalized uniform distribution on the real line, and thus this distribution is also known as the logarithmic prior. Similarly, the Jeffreys prior for   is also uniform. It is the unique (up to a multiple) prior (on the positive reals) that is scale-invariant (the Haar measure with respect to multiplication of positive reals), corresponding to the standard deviation being a measure of scale and scale-invariance corresponding to no information about scale. As with the uniform distribution on the reals, it is an improper prior.

Poisson distribution with rate parameterEdit

For the Poisson distribution of the non-negative integer  ,

 

the Jeffreys prior for the rate parameter   is

 

Equivalently, the Jeffreys prior for   is the unnormalized uniform distribution on the non-negative real line.

Bernoulli trialEdit

For a coin that is "heads" with probability   and is "tails" with probability  , for a given   the probability is  . The Jeffreys prior for the parameter   is

 

This is the arcsine distribution and is a beta distribution with  . Furthermore, if   the Jeffreys prior for   is uniform in the interval  . Equivalently,   is uniform on the whole circle  .

N-sided die with biased probabilitiesEdit

Similarly, for a throw of an  -sided die with outcome probabilities  , each non-negative and satisfying  , the Jeffreys prior for   is the Dirichlet distribution with all (alpha) parameters set to one half. This amounts to using a pseudocount of one half for each possible outcome.

Alternatively, if we write   for each  , then the Jeffreys prior for   is uniform on the (N−1)-dimensional unit sphere (i.e., it is uniform on the surface of an N-dimensional unit ball).

ReferencesEdit

  1. ^ Jaynes, E. T. (1968) "Prior Probabilities", IEEE Trans. on Systems Science and Cybernetics, SSC-4, 227 pdf.

Further readingEdit

  • Jeffreys, H. (1946). "An Invariant Form for the Prior Probability in Estimation Problems". Proceedings of the Royal Society of London. Series A, Mathematical and Physical Sciences. 186 (1007): 453–461. doi:10.1098/rspa.1946.0056. JSTOR 97883.
  • Jeffreys, H. (1939). Theory of Probability. Oxford University Press.