Open main menu

Independent and identically distributed random variables

In probability theory and statistics, a sequence or collection of random variables is independent and identically distributed (i.i.d. or iid or IID) if each random variable has the same probability distribution as the others and all are mutually independent.[1] Identically distributed, on its own, is often abbreviated ID. For uniformity, as both are discussed—and in widespread use—this article uses the visually cleaner IID in preference to the more prevalent convention i.i.d.

The annotation IID is particularly common in statistics, where observations in a sample are often assumed to be effectively IID for the purposes of statistical inference. The assumption (or requirement) that observations be IID tends to simplify the underlying mathematics of many statistical methods (see mathematical statistics and statistical theory). However, in practical applications of statistical modeling the assumption may or may not be realistic. To test how realistic the assumption is on a given data set, the autocorrelation can be computed, lag plots drawn or turning point test performed.[2] The generalization of exchangeable random variables is often sufficient and more easily met.

The assumption is important in the classical form of the central limit theorem, which states that the probability distribution of the sum (or average) of IID variables with finite variance approaches a normal distribution.

Often the IID assumption arises in the context of sequences of random variables. Then "independent and identically distributed" in part implies that an element in the sequence is independent of the random variables that came before it. In this way, an IID sequence is different from a Markov sequence, where the probability distribution for the nth random variable is a function of the previous random variable in the sequence (for a first order Markov sequence). An IID sequence does not imply the probabilities for all elements of the sample space or event space must be the same.[3] For example, repeated throws of loaded dice will produce a sequence that is IID, despite the outcomes being biased.



Definition for two random variablesEdit

Let the random variables   and   be defined to assume values in  , let   and   be the cumulative distribution function of   respectively  , and denote their joint cumulative distribution function by  .

We say two random variables   and   are identically distributed if and only if  .

We say two random variables   and   are independent if and only if  . See Independence (probability theory) § Two random variables.

We say two random variables   and   are IID if they are independent and identically distributed, i.e. if and only if







Definition for more than two random variablesEdit

The definition extends naturally to more than two random variables. We say that   random variables   are IID if they are independent (see Independence (probability theory)#More than two random variables) and identically distributed, i.e. if and only if







where   denotes the joint cumulative distribution function of  .


Uses in modelingEdit

The following are examples or applications of IID random variables:

  • A sequence of outcomes of spins of a fair or unfair roulette wheel is IID. One implication of this is that if the roulette ball lands on "red", for example, 20 times in a row, the next spin is no more or less likely to be "black" than on any other spin (see the Gambler's fallacy).
  • A sequence of fair or loaded dice rolls is IID
  • A sequence of fair or unfair coin flips is IID
  • In signal processing and image processing the notion of transformation to IID implies two specifications, the "ID" (ID = identically distributed) part and the "I" (I = independent) part:
    • (ID) the signal level must be balanced on the time axis;
    • (I) the signal spectrum must be flattened, i.e. transformed by filtering (such as deconvolution) to a white signal (one where all frequencies are equally present).

Uses in inferenceEdit

  • One of the simplest statistical tests, the z-test, is used to test hypotheses about means of random variables. When using the z-test, one assumes (requires) that all observations are IID in order to satisfy the conditions of the central limit theorem.


Many results that were first proven under the assumption that the random variables are IID have been shown to be true even under a weaker distributional assumption.

Exchangeable random variablesEdit

The most general notion which shares the main properties of IID variables are exchangeable random variables, introduced by Bruno de Finetti.[citation needed] Exchangeability means that while variables may not be independent, future ones behave like past ones – formally, any value of a finite sequence is as likely as any permutation of those values – the joint probability distribution is invariant under the symmetric group.

This provides a useful generalization – for example, sampling without replacement is not independent, but is exchangeable – and is widely used in Bayesian statistics.

Lévy processEdit

In stochastic calculus, IID variables are thought of as a discrete time Lévy process: each variable gives how much one changes from one time to another. For example, a sequence of Bernoulli trials is interpreted as the Bernoulli process. One may generalize this to include continuous time Lévy processes, and many Lévy processes can be seen as limits of IID variables—for instance, the Wiener process is the limit of the Bernoulli process.

White noiseEdit

White noise is a simple example of IID.

See alsoEdit


  1. ^ Aaron Clauset. "A brief primer on probability distributions" (PDF). Santa Fe Institute.
  2. ^ Le Boudec, Jean-Yves (2010). Performance Evaluation Of Computer And Communication Systems (PDF). EPFL Press. pp. 46–47. ISBN 978-2-940222-40-7.
  3. ^ Cover, Thomas (2006). Elements Of Information Theory. Wiley-Interscience. pp. 57–58. ISBN 978-0-471-24195-9.