Bernoulli distribution

In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is the discrete probability distribution of a random variable which takes the value 1 with probability $p$ and the value 0 with probability $q=1-p,$ that is, the probability distribution of any single experiment that asks a yes–no question; the question results in a boolean-valued outcome, a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. It can be used to represent a (possibly biased) coin toss where 1 and 0 would represent "heads" and "tails" (or vice versa), respectively, and p would be the probability of the coin landing on heads or tails, respectively. In particular, unfair coins would have $p\neq 1/2.$ Parameters $0\leq p\leq 1$ $q=1-p$ $k\in \{0,1\}$ ${\begin{cases}q=1-p&{\text{if }}k=0\\p&{\text{if }}k=1\end{cases}}$ ${\begin{cases}0&{\text{if }}k<0\\1-p&{\text{if }}0\leq k<1\\1&{\text{if }}k\geq 1\end{cases}}$ $p$ ${\begin{cases}0&{\text{if }}p<1/2\\{}[0,1]&{\text{if }}p=1/2\\1&{\text{if }}p>1/2\end{cases}}$ ${\begin{cases}0&{\text{if }}p<1/2\\0,1&{\text{if }}p=1/2\\1&{\text{if }}p>1/2\end{cases}}$ $p(1-p)=pq$ ${\frac {q-p}{\sqrt {pq}}}$ ${\frac {1-6pq}{pq}}$ $-q\ln q-p\ln p$ $q+pe^{t}$ $q+pe^{it}$ $q+pz$ ${\frac {1}{pq}}$ The Bernoulli distribution is a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution). It is also a special case of the two-point distribution, for which the possible outcomes need not be 0 and 1.

Properties

If $X$  is a random variable with this distribution, then:

$\Pr(X=1)=p=1-\Pr(X=0)=1-q.$

The probability mass function $f$  of this distribution, over possible outcomes k, is

$f(k;p)={\begin{cases}p&{\text{if }}k=1,\\q=1-p&{\text{if }}k=0.\end{cases}}$ 

This can also be expressed as

$f(k;p)=p^{k}(1-p)^{1-k}\quad {\text{for }}k\in \{0,1\}$

or as

$f(k;p)=pk+(1-p)(1-k)\quad {\text{for }}k\in \{0,1\}.$

The Bernoulli distribution is a special case of the binomial distribution with $n=1.$ 

The kurtosis goes to infinity for high and low values of $p,$  but for $p=1/2$  the two-point distributions including the Bernoulli distribution have a lower excess kurtosis than any other probability distribution, namely −2.

The Bernoulli distributions for $0\leq p\leq 1$  form an exponential family.

The maximum likelihood estimator of $p$  based on a random sample is the sample mean.

Mean

The expected value of a Bernoulli random variable $X$  is

$\operatorname {E} \left(X\right)=p$

This is due to the fact that for a Bernoulli distributed random variable $X$  with $\Pr(X=1)=p$  and $\Pr(X=0)=q$  we find

$\operatorname {E} [X]=\Pr(X=1)\cdot 1+\Pr(X=0)\cdot 0=p\cdot 1+q\cdot 0=p.$ 

Variance

The variance of a Bernoulli distributed $X$  is

$\operatorname {Var} [X]=pq=p(1-p)$

We first find

$\operatorname {E} [X^{2}]=\Pr(X=1)\cdot 1^{2}+\Pr(X=0)\cdot 0^{2}=p\cdot 1^{2}+q\cdot 0^{2}=p$

From this follows

$\operatorname {Var} [X]=\operatorname {E} [X^{2}]-\operatorname {E} [X]^{2}=p-p^{2}=p(1-p)=pq$ 

Skewness

The skewness is ${\frac {q-p}{\sqrt {pq}}}={\frac {1-2p}{\sqrt {pq}}}$ . When we take the standardized Bernoulli distributed random variable ${\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}$  we find that this random variable attains ${\frac {q}{\sqrt {pq}}}$  with probability $p$  and attains $-{\frac {p}{\sqrt {pq}}}$  with probability $q$ . Thus we get

{\begin{aligned}\gamma _{1}&=\operatorname {E} \left[\left({\frac {X-\operatorname {E} [X]}{\sqrt {\operatorname {Var} [X]}}}\right)^{3}\right]\\&=p\cdot \left({\frac {q}{\sqrt {pq}}}\right)^{3}+q\cdot \left(-{\frac {p}{\sqrt {pq}}}\right)^{3}\\&={\frac {1}{{\sqrt {pq}}^{3}}}\left(pq^{3}-qp^{3}\right)\\&={\frac {pq}{{\sqrt {pq}}^{3}}}(q-p)\\&={\frac {q-p}{\sqrt {pq}}}\end{aligned}}

Higher moments and cumulants

The central moment of order $k$  is given by

$\mu _{k}=(1-p)(-p)^{k}+p(1-p)^{k}.$

The first six central moments are

{\begin{aligned}\mu _{1}&=0,\\\mu _{2}&=p(1-p),\\\mu _{3}&=p(1-p)(1-2p),\\\mu _{4}&=p(1-p)(1-3p(1-p)),\\\mu _{5}&=p(1-p)(1-2p)(1-2p(1-p)),\\\mu _{6}&=p(1-p)(1-5p(1-p)(1-p(1-p))).\end{aligned}}

The higher contral moments can be espressed more compactly in terms of $\mu _{2}$  and $\mu _{3}$

{\begin{aligned}\mu _{4}&=\mu _{2}(1-3\mu _{2}),\\\mu _{5}&=\mu _{3}(1-2\mu _{2}),\\\mu _{6}&=\mu _{2}(1-5\mu _{2}(1-\mu _{2})).\end{aligned}}

The first six cumulants are

{\begin{aligned}\kappa _{1}&=0,\\\kappa _{2}&=\mu _{2},\\\kappa _{3}&=\mu _{3},\\\kappa _{4}&=\mu _{2}(1-6\mu _{2}),\\\kappa _{5}&=\mu _{3}(1-12\mu _{2}),\\\kappa _{6}&=\mu _{2}(1-30\mu _{2}(1-4\mu _{2})).\end{aligned}}

Related distributions

• If $X_{1},\dots ,X_{n}$  are independent, identically distributed (i.i.d.) random variables, all Bernoulli trials with success probability p, then their sum is distributed according to a binomial distribution with parameters n and p:
$\sum _{k=1}^{n}X_{k}\sim \operatorname {B} (n,p)$  (binomial distribution).
The Bernoulli distribution is simply $\operatorname {B} (1,p)$ , also written as ${\textstyle \mathrm {Bernoulli} (p).}$
• The categorical distribution is the generalization of the Bernoulli distribution for variables with any constant number of discrete values.
• The Beta distribution is the conjugate prior of the Bernoulli distribution.
• The geometric distribution models the number of independent and identical Bernoulli trials needed to get one success.
• If ${\textstyle Y\sim \mathrm {Bernoulli} \left({\frac {1}{2}}\right)}$ , then ${\textstyle 2Y-1}$  has a Rademacher distribution.