# Law of total expectation

The proposition in probability theory known as the law of total expectation,[1] the law of iterated expectations[2] (LIE), the tower rule,[3] Adam's law, and the smoothing theorem,[4] among other names, states that if ${\displaystyle X}$ is a random variable whose expected value ${\displaystyle \operatorname {E} (X)}$ is defined, and ${\displaystyle Y}$ is any random variable on the same probability space, then

${\displaystyle \operatorname {E} (X)=\operatorname {E} (\operatorname {E} (X\mid Y)),}$

i.e., the expected value of the conditional expected value of ${\displaystyle X}$ given ${\displaystyle Y}$ is the same as the expected value of ${\displaystyle X}$.

One special case states that if ${\displaystyle {\left\{A_{i}\right\}}_{i}}$ is a finite or countable partition of the sample space, then

${\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}$

Note: The conditional expected values E( X | Z ) is a random variable whose value depend on the value of Z. Note that the conditional expected value of X given the event Z = z is a function of z. If we write E( X | Z = z) = g(z) then the random variable E( X | Z ) is g(Z). Similar comments apply to the conditional covariance.

## Example

Suppose that only two factories supply light bulbs to the market. Factory ${\displaystyle X}$ 's bulbs work for an average of 5000 hours, whereas factory ${\displaystyle Y}$ 's bulbs work for an average of 4000 hours. It is known that factory ${\displaystyle X}$  supplies 60% of the total bulbs available. What is the expected length of time that a purchased bulb will work for?

Applying the law of total expectation, we have:

{\displaystyle {\begin{aligned}\operatorname {E} (L)&=\operatorname {E} (L\mid X)\operatorname {P} (X)+\operatorname {E} (L\mid Y)\operatorname {P} (Y)\\[3pt]&=5000(0.6)+4000(0.4)\\[2pt]&=4600\end{aligned}}}

where

• ${\displaystyle \operatorname {E} (L)}$  is the expected life of the bulb;
• ${\displaystyle \operatorname {P} (X)={6 \over 10}}$  is the probability that the purchased bulb was manufactured by factory ${\displaystyle X}$ ;
• ${\displaystyle \operatorname {P} (Y)={4 \over 10}}$  is the probability that the purchased bulb was manufactured by factory ${\displaystyle Y}$ ;
• ${\displaystyle \operatorname {E} (L\mid X)=5000}$  is the expected lifetime of a bulb manufactured by ${\displaystyle X}$ ;
• ${\displaystyle \operatorname {E} (L\mid Y)=4000}$  is the expected lifetime of a bulb manufactured by ${\displaystyle Y}$ .

Thus each purchased light bulb has an expected lifetime of 4600 hours.

## Proof in the finite and countable cases

Let the random variables ${\displaystyle X}$  and ${\displaystyle Y}$ , defined on the same probability space, assume a finite or countably infinite set of finite values. Assume that ${\displaystyle \operatorname {E} [X]}$  is defined, i.e. ${\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty }$ . If ${\displaystyle \{A_{i}\}}$  is a partition of the probability space ${\displaystyle \Omega }$ , then

${\displaystyle \operatorname {E} (X)=\sum _{i}{\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})}.}$

Proof.

{\displaystyle {\begin{aligned}\operatorname {E} \left(\operatorname {E} (X\mid Y)\right)&=\operatorname {E} {\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y){\Bigg ]}\\[6pt]&=\sum _{y}{\Bigg [}\sum _{x}x\cdot \operatorname {P} (X=x\mid Y=y){\Bigg ]}\cdot \operatorname {P} (Y=y)\\[6pt]&=\sum _{y}\sum _{x}x\cdot \operatorname {P} (X=x,Y=y).\end{aligned}}}

If the series is finite, then we can switch the summations around, and the previous expression will become

{\displaystyle {\begin{aligned}\sum _{x}\sum _{y}x\cdot \operatorname {P} (X=x,Y=y)&=\sum _{x}x\sum _{y}\operatorname {P} (X=x,Y=y)\\[6pt]&=\sum _{x}x\cdot \operatorname {P} (X=x)\\[6pt]&=\operatorname {E} (X).\end{aligned}}}

If, on the other hand, the series is infinite, then its convergence cannot be conditional, due to the assumption that ${\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty .}$  The series converges absolutely if both ${\displaystyle \operatorname {E} [X_{+}]}$  and ${\displaystyle \operatorname {E} [X_{-}]}$  are finite, and diverges to an infinity when either ${\displaystyle \operatorname {E} [X_{+}]}$  or ${\displaystyle \operatorname {E} [X_{-}]}$  is infinite. In both scenarios, the above summations may be exchanged without affecting the sum.

## Proof in the general case

Let ${\displaystyle (\Omega ,{\mathcal {F}},\operatorname {P} )}$  be a probability space on which two sub σ-algebras ${\displaystyle {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}\subseteq {\mathcal {F}}}$  are defined. For a random variable ${\displaystyle X}$  on such a space, the smoothing law states that if ${\displaystyle \operatorname {E} [X]}$  is defined, i.e. ${\displaystyle \min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty }$ , then

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]=\operatorname {E} [X\mid {\mathcal {G}}_{1}]\quad {\text{(a.s.)}}.}$

Proof. Since a conditional expectation is a Radon–Nikodym derivative, verifying the following two properties establishes the smoothing law:

• ${\displaystyle \operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]{\mbox{ is }}{\mathcal {G}}_{1}}$ -measurable
• ${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]d\operatorname {P} =\int _{G_{1}}Xd\operatorname {P} ,}$  for all ${\displaystyle G_{1}\in {\mathcal {G}}_{1}.}$

The first of these properties holds by definition of the conditional expectation. To prove the second one,

{\displaystyle {\begin{aligned}\min \left(\int _{G_{1}}X_{+}\,d\operatorname {P} ,\int _{G_{1}}X_{-}\,d\operatorname {P} \right)&\leq \min \left(\int _{\Omega }X_{+}\,d\operatorname {P} ,\int _{\Omega }X_{-}\,d\operatorname {P} \right)\\[4pt]&=\min(\operatorname {E} [X_{+}],\operatorname {E} [X_{-}])<\infty ,\end{aligned}}}

so the integral ${\displaystyle \textstyle \int _{G_{1}}X\,d\operatorname {P} }$  is defined (not equal ${\displaystyle \infty -\infty }$ ).

The second property thus holds since ${\displaystyle G_{1}\in {\mathcal {G}}_{1}\subseteq {\mathcal {G}}_{2}}$  implies

${\displaystyle \int _{G_{1}}\operatorname {E} [\operatorname {E} [X\mid {\mathcal {G}}_{2}]\mid {\mathcal {G}}_{1}]d\operatorname {P} =\int _{G_{1}}\operatorname {E} [X\mid {\mathcal {G}}_{2}]d\operatorname {P} =\int _{G_{1}}Xd\operatorname {P} .}$

Corollary. In the special case when ${\displaystyle {\mathcal {G}}_{1}=\{\emptyset ,\Omega \}}$  and ${\displaystyle {\mathcal {G}}_{2}=\sigma (Y)}$ , the smoothing law reduces to

${\displaystyle \operatorname {E} [\operatorname {E} [X\mid Y]]=\operatorname {E} [X].}$

## Proof of partition formula

{\displaystyle {\begin{aligned}\sum \limits _{i}\operatorname {E} (X\mid A_{i})\operatorname {P} (A_{i})&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )\operatorname {P} (d\omega \mid A_{i})\cdot \operatorname {P} (A_{i})\\&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )\operatorname {P} (d\omega \cap A_{i})\\&=\sum \limits _{i}\int \limits _{\Omega }X(\omega )I_{A_{i}}(\omega )\operatorname {P} (d\omega )\\&=\sum \limits _{i}\operatorname {E} (XI_{A_{i}}),\end{aligned}}}

where ${\displaystyle I_{A_{i}}}$  is the indicator function of the set ${\displaystyle A_{i}}$ .

If the partition ${\displaystyle {\{A_{i}\}}_{i=0}^{n}}$  is finite, then, by linearity, the previous expression becomes

${\displaystyle \operatorname {E} \left(\sum \limits _{i=0}^{n}XI_{A_{i}}\right)=\operatorname {E} (X),}$

and we are done.

If, however, the partition ${\displaystyle {\{A_{i}\}}_{i=0}^{\infty }}$  is infinite, then we use the dominated convergence theorem to show that

${\displaystyle \operatorname {E} \left(\sum \limits _{i=0}^{n}XI_{A_{i}}\right)\to \operatorname {E} (X).}$

Indeed, for every ${\displaystyle n\geq 0}$ ,

${\displaystyle \left|\sum _{i=0}^{n}XI_{A_{i}}\right|\leq |X|I_{\mathop {\bigcup } \limits _{i=0}^{n}A_{i}}\leq |X|.}$

Since every element of the set ${\displaystyle \Omega }$  falls into a specific partition ${\displaystyle A_{i}}$ , it is straightforward to verify that the sequence ${\displaystyle {\left\{\sum _{i=0}^{n}XI_{A_{i}}\right\}}_{n=0}^{\infty }}$  converges pointwise to ${\displaystyle X}$ . By initial assumption, ${\displaystyle \operatorname {E} |X|<\infty }$ . Applying the dominated convergence theorem yields the desired.

## References

1. ^ Weiss, Neil A. (2005). A Course in Probability. Boston: Addison–Wesley. pp. 380–383. ISBN 0-321-18954-X.
2. ^ "Law of Iterated Expectation | Brilliant Math & Science Wiki". brilliant.org. Retrieved 2018-03-28.
3. ^ Rhee, Chang-han (Sep 20, 2011). "Probability and Statistics" (PDF).
4. ^ Wolpert, Robert (November 18, 2010). "Conditional Expectation" (PDF).