# Regular conditional probability

In probability theory, regular conditional probability is a concept that formalizes the notion of conditioning on the outcome of a random variable. The resulting conditional probability distribution is a parametrized family of probability measures called a Markov kernel.

## Motivation

Consider two random variables X and Y, where the represents the roll of a die. The conditional probability of Y being in a Borel set ${\displaystyle A\subseteq \mathbb {R} }$  is given by

${\displaystyle P(Y\in A|X=x)={\frac {P(Y\in A,X=x)}{P(X=x)}}.}$

Conditional probability forms a two-variable function ${\displaystyle \nu :\mathbb {R} \times {\mathcal {F}}\to \mathbb {R} }$

${\displaystyle \nu (x,A)=P(A|X=x)}$

Note that when x is not a possible outcome of X, the function is undefined: the roll of a die coming up 27 is a probability zero event. The function ${\displaystyle \nu }$  is defined almost everywhere in x.

Now consider two continuous random variables, X and Y, with density ${\displaystyle f_{X,Y}(x,y)}$ . The conditional probability of Y being in A is given by

${\displaystyle P(Y\in A|X=x)={\frac {\int _{A}f_{X,Y}(x,y)\mathrm {d} y}{\int _{\mathbb {R} }f_{X,Y}(x,y)\mathrm {d} y}}.}$

Conditional probability is a two variable function as before, undefined outside of the support of the distribution of X.

Note that this is not the same as conditioning on the event ${\displaystyle B=\{X=x\}}$ , but is rather a limit: see Conditional probability#Conditioning on an event of probability zero.

## Relation to conditional expectation

In probability theory, the theory of conditional expectation is developed before that of regular conditional distributions.[1][2]

For discrete and continuous random variables, the conditional expectation is given by

{\displaystyle {\begin{aligned}\mathbb {E} [X|Y=y]&=\sum _{x}xP(X=x|Y=y)\\\mathbb {E} [X|Y=y]&=\int xf_{X|Y}(x,y)\mathrm {d} x\end{aligned}}}

where ${\displaystyle f_{X|Y}(x,y)}$  is the conditional density of X given Y.

It is natural to ask whether measure theoretical conditional expectation can also be expressed as

${\displaystyle \mathbb {E} [X|Y](\omega )=\int x\nu (\omega ,\mathrm {d} x)}$

where ${\displaystyle \nu :\Omega \times {\mathcal {B}}({\overline {\mathbb {R} }})\to [0,1]}$  is a family of measures parametrized by outcome ${\displaystyle \omega }$ .

Such a Markov kernel can be defined using conditional expectation:

${\displaystyle \nu (\omega ,A)=\mathbb {E} [1_{X\in A}|Y](\omega ).}$

It can be shown that for almost all ${\displaystyle \omega }$ , this is a probability measure if ${\displaystyle X:\Omega \to \mathbb {R} }$ . There are, however, counterexamples when the random variable X takes values in a more general space E. A space E can be constructed where ${\displaystyle \nu }$  does not form a probability measure almost everywhere.

## Definition

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$  be a probability space, and let ${\displaystyle T:\Omega \rightarrow E}$  be a random variable, defined as a Borel-measurable function from ${\displaystyle \Omega }$  to its state space ${\displaystyle (E,{\mathcal {E}})}$ . One should think of ${\displaystyle T}$  as a way to "disintegrate" the sample space ${\displaystyle \Omega }$  into ${\displaystyle \{T^{-1}(x)\}_{x\in E}}$ . Using the disintegration theorem from the measure theory, it allows us to "disintegrate" the measure ${\displaystyle P}$  into a collection of measures, one for each ${\displaystyle x\in E}$ . Formally, a regular conditional probability is defined as a function ${\displaystyle \nu :E\times {\mathcal {F}}\rightarrow [0,1],}$  called a "transition probability", where:

• For every ${\displaystyle x\in E}$ , ${\displaystyle \nu (x,\cdot )}$  is a probability measure on ${\displaystyle {\mathcal {F}}}$ . Thus we provide one measure for each ${\displaystyle x\in E}$ .
• For all ${\displaystyle A\in {\mathcal {F}}}$ , ${\displaystyle \nu (\cdot ,A)}$  (a mapping ${\displaystyle E\to [0,1]}$ ) is ${\displaystyle {\mathcal {E}}}$ -measurable, and
• For all ${\displaystyle A\in {\mathcal {F}}}$  and all ${\displaystyle B\in {\mathcal {E}}}$ [3]
${\displaystyle P{\big (}A\cap T^{-1}(B){\big )}=\int _{B}\nu (x,A)\,P{\big (}T^{-1}(dx){\big )}.}$

where ${\displaystyle P\circ T^{-1}}$  is the pushforward measure ${\displaystyle T_{*}P}$  of the distribution of the random element ${\displaystyle T}$ , ${\displaystyle x\in \mathrm {supp} \,T,}$  i.e. the topological support of the ${\displaystyle T_{*}P}$ . Specifically, if we take ${\displaystyle B=E}$ , then ${\displaystyle A\cap T^{-1}(E)=A}$ , and so

${\displaystyle P(A)=\int _{E}\nu (x,A)\,P{\big (}T^{-1}(dx){\big )}}$ ,

where ${\displaystyle \nu (x,A)}$  can be denoted, using more familiar terms ${\displaystyle P(A\ |\ T=x)}$  (this is "defined" to be conditional probability of ${\displaystyle A}$  given ${\displaystyle x}$ , which can be undefined in elementary constructions of conditional probability). As can be seen from the integral above, the value of ${\displaystyle \nu }$  for points x outside the support of the random variable is meaningless; its significance as a conditional probability is strictly limited to the support of T.

The measurable space ${\displaystyle (\Omega ,{\mathcal {F}})}$  is said to have the regular conditional probability property if for all probability measures ${\displaystyle P}$  on ${\displaystyle (\Omega ,{\mathcal {F}}),}$  all random variables on ${\displaystyle (\Omega ,{\mathcal {F}},P)}$  admit a regular conditional probability. A Radon space, in particular, has this property.

## Alternate definition

Consider a Radon space ${\displaystyle \Omega }$  (that is a probability measure defined on a Radon space endowed with the Borel sigma-algebra) and a real-valued random variable T. As discussed above, in this case there exists a regular conditional probability with respect to T. Moreover, we can alternatively define the regular conditional probability for an event A given a particular value t of the random variable T in the following manner:

${\displaystyle P(A|T=t)=\lim _{U\supset \{T=t\}}{\frac {P(A\cap U)}{P(U)}},}$

where the limit is taken over the net of open neighborhoods U of t as they become smaller with respect to set inclusion. This limit is defined if and only if the probability space is Radon, and only in the support of T, as described in the article. This is the restriction of the transition probability to the support of T. To describe this limiting process rigorously:

For every ${\displaystyle \epsilon >0,}$  there exists an open neighborhood U of the event {T=t}, such that for every open V with ${\displaystyle \{T=t\}\subset V\subset U,}$

${\displaystyle \left|{\frac {P(A\cap V)}{P(V)}}-L\right|<\epsilon ,}$

where ${\displaystyle L=P(A|T=t)}$  is the limit.

## Example

To continue with our motivating example above, we consider a real-valued random variable X and write

${\displaystyle P(A|X=x_{0})=\nu (x_{0},A)=\lim _{\epsilon \rightarrow 0+}{\frac {P(A\cap \{x_{0}-\epsilon

(where ${\displaystyle x_{0}=2/3}$  for the example given.) This limit, if it exists, is a regular conditional probability for X, restricted to ${\displaystyle \mathrm {supp} \,X.}$

In any case, it is easy to see that this limit fails to exist for ${\displaystyle x_{0}}$  outside the support of X: since the support of a random variable is defined as the set of all points in its state space whose every neighborhood has positive probability, for every point ${\displaystyle x_{0}}$  outside the support of X (by definition) there will be an ${\displaystyle \epsilon >0}$  such that ${\displaystyle P(\{x_{0}-\epsilon

Thus if X is distributed uniformly on ${\displaystyle [0,1],}$  it is truly meaningless to condition a probability on "${\displaystyle X=3/2}$ ".