# Total variation distance of probability measures

(Redirected from Total variation distance)

In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance or variational distance.

## Definition

The total variation distance between two probability measures P and Q on a sigma-algebra ${\displaystyle {\mathcal {F}}}$  of subsets of the sample space ${\displaystyle \Omega }$  is defined via[1]

${\displaystyle \delta (P,Q)=\sup _{A\in {\mathcal {F}}}\left|P(A)-Q(A)\right|.}$

Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event.

## Properties

### Relation to other distances

The total variation distance is related to the Kullback–Leibler divergence by Pinsker's inequality:

${\displaystyle \delta (P,Q)\leq {\sqrt {{\frac {1}{2}}D_{\mathrm {KL} }(P\parallel Q)}}.}$

When the set is countable, the total variation distance is related to the L1 norm by the identity:[2]

${\displaystyle \delta (P,Q)={\frac {1}{2}}\|P-Q\|_{1}={\frac {1}{2}}\sum _{\omega \in \Omega }|P(\omega )-Q(\omega )|.}$

### Connection to transportation theory

The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is ${\displaystyle c(x,y)={\mathbf {1} }_{x\neq y}}$ , that is,

${\displaystyle {\frac {1}{2}}\|P-Q\|_{1}=\delta (P,Q)=\inf _{\pi }\operatorname {E} _{\pi }[{\mathbf {1} }_{x\neq y}],}$

where the expectation is taken with respect to the probability measure ${\displaystyle \pi }$  on the space where ${\displaystyle (x,y)}$  lives, and the infimum is taken over all such ${\displaystyle \pi }$  with marginals ${\displaystyle P}$  and ${\displaystyle Q}$ , respectively[3].

## References

1. ^ Chatterjee, Sourav. "Distances between probability measures" (PDF). UC Berkeley. Archived from the original (PDF) on July 8, 2008. Retrieved 21 June 2013.
2. ^ David A. Levin, Yuval Peres, Elizabeth L. Wilmer, 'Markov Chains and Mixing Times', 2nd. rev. ed. (AMS, 2017), Proposition 4.2, p. 48.
3. ^ Villani, Cédric (2009). Optimal Transport, Old and New. Grundlehren der mathematischen Wissenschaften. 338. Springer-Verlag Berlin Heidelberg. p. 10. doi:10.1007/978-3-540-71050-9. ISBN 978-3-540-71049-3.