A function
f
:
X
1
×
X
2
×
⋯
×
X
n
→
R
{\displaystyle f:{\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}\rightarrow \mathbb {R} }
satisfies the bounded differences property if substituting the value of the
i
{\displaystyle i}
th coordinate
x
i
{\displaystyle x_{i}}
changes the value of
f
{\displaystyle f}
by at most
c
i
{\displaystyle c_{i}}
. More formally, if there are constants
c
1
,
c
2
,
…
,
c
n
{\displaystyle c_{1},c_{2},\dots ,c_{n}}
such that for all
i
∈
[
n
]
{\displaystyle i\in [n]}
, and all
x
1
∈
X
1
,
x
2
∈
X
2
,
…
,
x
n
∈
X
n
{\displaystyle x_{1}\in {\mathcal {X}}_{1},\,x_{2}\in {\mathcal {X}}_{2},\,\ldots ,\,x_{n}\in {\mathcal {X}}_{n}}
,
sup
x
i
′
∈
X
i
|
f
(
x
1
,
…
,
x
i
−
1
,
x
i
,
x
i
+
1
,
…
,
x
n
)
−
f
(
x
1
,
…
,
x
i
−
1
,
x
i
′
,
x
i
+
1
,
…
,
x
n
)
|
≤
c
i
.
{\displaystyle \sup _{x_{i}'\in {\mathcal {X}}_{i}}\left|f(x_{1},\dots ,x_{i-1},x_{i},x_{i+1},\ldots ,x_{n})-f(x_{1},\dots ,x_{i-1},x_{i}',x_{i+1},\ldots ,x_{n})\right|\leq c_{i}.}
McDiarmid's Inequality[2] — Let
f
:
X
1
×
X
2
×
⋯
×
X
n
→
R
{\displaystyle f:{\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}\rightarrow \mathbb {R} }
satisfy the bounded differences property with bounds
c
1
,
c
2
,
…
,
c
n
{\displaystyle c_{1},c_{2},\dots ,c_{n}}
.
Consider independent random variables
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\dots ,X_{n}}
where
X
i
∈
X
i
{\displaystyle X_{i}\in {\mathcal {X}}_{i}}
for all
i
{\displaystyle i}
.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
f
(
X
1
,
X
2
,
…
,
X
n
)
−
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
]
≥
ε
)
≤
exp
(
−
2
ε
2
∑
i
=
1
n
c
i
2
)
,
{\displaystyle {\text{P}}\left(f(X_{1},X_{2},\ldots ,X_{n})-\mathbb {E} [f(X_{1},X_{2},\ldots ,X_{n})]\geq \varepsilon \right)\leq \exp \left(-{\frac {2\varepsilon ^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right),}
P
(
f
(
X
1
,
X
2
,
…
,
X
n
)
−
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
]
≤
−
ε
)
≤
exp
(
−
2
ε
2
∑
i
=
1
n
c
i
2
)
,
{\displaystyle {\text{P}}(f(X_{1},X_{2},\ldots ,X_{n})-\mathbb {E} [f(X_{1},X_{2},\ldots ,X_{n})]\leq -\varepsilon )\leq \exp \left(-{\frac {2\varepsilon ^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right),}
and as an immediate consequence,
P
(
|
f
(
X
1
,
X
2
,
…
,
X
n
)
−
E
[
f
(
X
1
,
X
2
,
…
,
X
n
)
]
|
≥
ε
)
≤
2
exp
(
−
2
ε
2
∑
i
=
1
n
c
i
2
)
.
{\displaystyle {\text{P}}(|f(X_{1},X_{2},\ldots ,X_{n})-\mathbb {E} [f(X_{1},X_{2},\ldots ,X_{n})]|\geq \varepsilon )\leq 2\exp \left(-{\frac {2\varepsilon ^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right).}
Unbalanced distributions
edit
A stronger bound may be given when the arguments to the function are sampled from unbalanced distributions, such that resampling a single argument rarely causes a large change to the function value.
McDiarmid's Inequality (unbalanced)[3] [4] — Let
f
:
X
n
→
R
{\displaystyle f:{\mathcal {X}}^{n}\rightarrow \mathbb {R} }
satisfy the bounded differences property with bounds
c
1
,
c
2
,
…
,
c
n
{\displaystyle c_{1},c_{2},\dots ,c_{n}}
.
Consider independent random variables
X
1
,
X
2
,
…
,
X
n
∈
X
{\displaystyle X_{1},X_{2},\ldots ,X_{n}\in {\mathcal {X}}}
drawn from a distribution where there is a particular value
χ
0
∈
X
{\displaystyle \chi _{0}\in {\mathcal {X}}}
which occurs with probability
1
−
p
{\displaystyle 1-p}
.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
|
f
(
X
1
,
…
,
X
n
)
−
E
[
f
(
X
1
,
…
,
X
n
)
]
|
≥
ε
)
≤
2
exp
(
−
ε
2
2
p
(
2
−
p
)
∑
i
=
1
n
c
i
2
+
2
3
ε
max
i
c
i
)
.
{\displaystyle {\text{P}}(|f(X_{1},\ldots ,X_{n})-\mathbb {E} [f(X_{1},\ldots ,X_{n})]|\geq \varepsilon )\leq 2\exp \left({\frac {-\varepsilon ^{2}}{2p(2-p)\sum _{i=1}^{n}c_{i}^{2}+{\frac {2}{3}}\varepsilon \max _{i}c_{i}}}\right).}
This may be used to characterize, for example, the value of a function on graphs when evaluated on sparse random graphs and hypergraphs , since in a sparse random graph, it is much more likely for any particular edge to be missing than to be present.
Differences bounded with high probability
edit
McDiarmid's inequality may be extended to the case where the function being analyzed does not strictly satisfy the bounded differences property, but large differences remain very rare.
McDiarmid's Inequality (Differences bounded with high probability)[5] — Let
f
:
X
1
×
X
2
×
⋯
×
X
n
→
R
{\displaystyle f:{\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}\rightarrow \mathbb {R} }
be a function and
Y
⊆
X
1
×
X
2
×
⋯
×
X
n
{\displaystyle {\mathcal {Y}}\subseteq {\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}}
be a subset of its domain and let
c
1
,
c
2
,
…
,
c
n
≥
0
{\displaystyle c_{1},c_{2},\dots ,c_{n}\geq 0}
be constants such that for all pairs
(
x
1
,
…
,
x
n
)
∈
Y
{\displaystyle (x_{1},\ldots ,x_{n})\in {\mathcal {Y}}}
and
(
x
1
′
,
…
,
x
n
′
)
∈
Y
{\displaystyle (x'_{1},\ldots ,x'_{n})\in {\mathcal {Y}}}
,
|
f
(
x
1
,
…
,
x
n
)
−
f
(
x
1
′
,
…
,
x
n
′
)
|
≤
∑
i
:
x
i
≠
x
i
′
c
i
.
{\displaystyle \left|f(x_{1},\ldots ,x_{n})-f(x'_{1},\ldots ,x'_{n})\right|\leq \sum _{i:x_{i}\neq x'_{i}}c_{i}.}
Consider independent random variables
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\dots ,X_{n}}
where
X
i
∈
X
i
{\displaystyle X_{i}\in {\mathcal {X}}_{i}}
for all
i
{\displaystyle i}
.
Let
p
=
1
−
P
(
(
X
1
,
…
,
X
n
)
∈
Y
)
{\displaystyle p=1-\mathrm {P} ((X_{1},\ldots ,X_{n})\in {\mathcal {Y}})}
and let
m
=
E
[
f
(
X
1
,
…
,
X
n
)
∣
(
X
1
,
…
,
X
n
)
∈
Y
]
{\displaystyle m=\mathbb {E} [f(X_{1},\ldots ,X_{n})\mid (X_{1},\ldots ,X_{n})\in {\mathcal {Y}}]}
.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
f
(
X
1
,
…
,
X
n
)
−
m
≥
ε
)
≤
p
+
exp
(
−
2
max
(
0
,
ε
−
p
∑
i
=
1
n
c
i
)
2
∑
i
=
1
n
c
i
2
)
,
{\displaystyle {\text{P}}\left(f(X_{1},\ldots ,X_{n})-m\geq \varepsilon \right)\leq p+\exp \left(-{\frac {2\max \left(0,\varepsilon -p\sum _{i=1}^{n}c_{i}\right)^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right),}
and as an immediate consequence,
P
(
|
f
(
X
1
,
…
,
X
n
)
−
m
|
≥
ε
)
≤
2
p
+
2
exp
(
−
2
max
(
0
,
ε
−
p
∑
i
=
1
n
c
i
)
2
∑
i
=
1
n
c
i
2
)
.
{\displaystyle {\text{P}}(|f(X_{1},\ldots ,X_{n})-m|\geq \varepsilon )\leq 2p+2\exp \left(-{\frac {2\max \left(0,\varepsilon -p\sum _{i=1}^{n}c_{i}\right)^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right).}
There exist stronger refinements to this analysis in some distribution-dependent scenarios,[6] such as those that arise in learning theory .
Sub-Gaussian and sub-exponential norms
edit
Let the
k
{\displaystyle k}
th centered conditional version of a function
f
{\displaystyle f}
be
f
k
(
X
)
(
x
)
:=
f
(
x
1
,
…
,
x
k
−
1
,
X
k
,
x
k
+
1
,
…
,
x
n
)
−
E
X
k
′
f
(
x
1
,
…
,
x
k
−
1
,
X
k
′
,
x
k
+
1
,
…
,
x
n
)
,
{\displaystyle f_{k}(X)(x):=f(x_{1},\ldots ,x_{k-1},X_{k},x_{k+1},\ldots ,x_{n})-\mathbb {E} _{X'_{k}}f(x_{1},\ldots ,x_{k-1},X'_{k},x_{k+1},\ldots ,x_{n}),}
so that
f
k
(
X
)
{\displaystyle f_{k}(X)}
is a random variable depending on random values of
x
1
,
…
,
x
k
−
1
,
x
k
+
1
,
…
,
x
n
{\displaystyle x_{1},\ldots ,x_{k-1},x_{k+1},\ldots ,x_{n}}
.
McDiarmid's Inequality (Sub-Gaussian norm)[7] [8] — Let
f
:
X
1
×
X
2
×
⋯
×
X
n
→
R
{\displaystyle f:{\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}\rightarrow \mathbb {R} }
be a function.
Consider independent random variables
X
=
(
X
1
,
X
2
,
…
,
X
n
)
{\displaystyle X=(X_{1},X_{2},\dots ,X_{n})}
where
X
i
∈
X
i
{\displaystyle X_{i}\in {\mathcal {X}}_{i}}
for all
i
{\displaystyle i}
.
Let
f
k
(
X
)
{\displaystyle f_{k}(X)}
refer to the
k
{\displaystyle k}
th centered conditional version of
f
{\displaystyle f}
.
Let
‖
⋅
‖
ψ
2
{\displaystyle \|\cdot \|_{\psi _{2}}}
denote the sub-Gaussian norm of a random variable.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
f
(
X
1
,
…
,
X
n
)
−
m
≥
ε
)
≤
exp
(
−
ε
2
32
e
‖
∑
k
∈
[
n
]
‖
f
k
(
X
)
‖
ψ
2
2
‖
∞
)
.
{\displaystyle {\text{P}}\left(f(X_{1},\ldots ,X_{n})-m\geq \varepsilon \right)\leq \exp \left({\frac {-\varepsilon ^{2}}{32e\left\|\sum _{k\in [n]}\|f_{k}(X)\|_{\psi _{2}}^{2}\right\|_{\infty }}}\right).}
McDiarmid's Inequality (Sub-exponential norm)[8] — Let
f
:
X
1
×
X
2
×
⋯
×
X
n
→
R
{\displaystyle f:{\mathcal {X}}_{1}\times {\mathcal {X}}_{2}\times \cdots \times {\mathcal {X}}_{n}\rightarrow \mathbb {R} }
be a function.
Consider independent random variables
X
=
(
X
1
,
X
2
,
…
,
X
n
)
{\displaystyle X=(X_{1},X_{2},\dots ,X_{n})}
where
X
i
∈
X
i
{\displaystyle X_{i}\in {\mathcal {X}}_{i}}
for all
i
{\displaystyle i}
.
Let
f
k
(
X
)
{\displaystyle f_{k}(X)}
refer to the
k
{\displaystyle k}
th centered conditional version of
f
{\displaystyle f}
.
Let
‖
⋅
‖
ψ
1
{\displaystyle \|\cdot \|_{\psi _{1}}}
denote the sub-exponential norm of a random variable.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
f
(
X
1
,
…
,
X
n
)
−
m
≥
ε
)
≤
exp
(
−
ε
2
4
e
2
‖
∑
k
∈
[
n
]
‖
f
k
(
X
)
‖
ψ
1
2
‖
∞
+
2
ε
e
max
k
∈
[
n
]
‖
‖
f
k
(
X
)
‖
ψ
1
‖
∞
)
.
{\displaystyle {\text{P}}\left(f(X_{1},\ldots ,X_{n})-m\geq \varepsilon \right)\leq \exp \left({\frac {-\varepsilon ^{2}}{4e^{2}\left\|\sum _{k\in [n]}\|f_{k}(X)\|_{\psi _{1}}^{2}\right\|_{\infty }+2\varepsilon e\max _{k\in [n]}\left\|\|f_{k}(X)\|_{\psi _{1}}\right\|_{\infty }}}\right).}
Bennett and Bernstein forms
edit
Refinements to McDiarmid's inequality in the style of Bennett's inequality and Bernstein inequalities are made possible by defining a variance term for each function argument. Let
B
:=
max
k
∈
[
n
]
sup
x
1
,
…
,
x
k
−
1
,
x
k
+
1
,
…
,
x
n
|
f
(
x
1
,
…
,
x
k
−
1
,
X
k
,
x
k
+
1
,
…
,
x
n
)
−
E
X
k
f
(
x
1
,
…
,
x
k
−
1
,
X
k
,
x
k
+
1
,
…
,
x
n
)
|
,
V
k
:=
sup
x
1
,
…
,
x
k
−
1
,
x
k
+
1
,
…
,
x
n
E
X
k
(
f
(
x
1
,
…
,
x
k
−
1
,
X
k
,
x
k
+
1
,
…
,
x
n
)
−
E
X
k
f
(
x
1
,
…
,
x
k
−
1
,
X
k
,
x
k
+
1
,
…
,
x
n
)
)
2
,
σ
~
2
:=
∑
k
=
1
n
V
k
.
{\displaystyle {\begin{aligned}B&:=\max _{k\in [n]}\sup _{x_{1},\dots ,x_{k-1},x_{k+1},\dots ,x_{n}}\left|f(x_{1},\dots ,x_{k-1},X_{k},x_{k+1},\dots ,x_{n})-\mathbb {E} _{X_{k}}f(x_{1},\dots ,x_{k-1},X_{k},x_{k+1},\dots ,x_{n})\right|,\\V_{k}&:=\sup _{x_{1},\dots ,x_{k-1},x_{k+1},\dots ,x_{n}}\mathbb {E} _{X_{k}}\left(f(x_{1},\dots ,x_{k-1},X_{k},x_{k+1},\dots ,x_{n})-\mathbb {E} _{X_{k}}f(x_{1},\dots ,x_{k-1},X_{k},x_{k+1},\dots ,x_{n})\right)^{2},\\{\tilde {\sigma }}^{2}&:=\sum _{k=1}^{n}V_{k}.\end{aligned}}}
McDiarmid's Inequality (Bennett form)[4] — Let
f
:
X
n
→
R
{\displaystyle f:{\mathcal {X}}^{n}\rightarrow \mathbb {R} }
satisfy the bounded differences property with bounds
c
1
,
c
2
,
…
,
c
n
{\displaystyle c_{1},c_{2},\dots ,c_{n}}
.
Consider independent random variables
X
1
,
X
2
,
…
,
X
n
{\displaystyle X_{1},X_{2},\dots ,X_{n}}
where
X
i
∈
X
i
{\displaystyle X_{i}\in {\mathcal {X}}_{i}}
for all
i
{\displaystyle i}
. Let
B
{\displaystyle B}
and
σ
~
2
{\displaystyle {\tilde {\sigma }}^{2}}
be defined as at the beginning of this section.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
f
(
X
1
,
…
,
X
n
)
−
E
[
f
(
X
1
,
…
,
X
n
)
]
≥
ε
)
≤
exp
(
−
ε
2
B
log
(
1
+
B
ε
σ
~
2
)
)
.
{\displaystyle {\text{P}}(f(X_{1},\ldots ,X_{n})-\mathbb {E} [f(X_{1},\ldots ,X_{n})]\geq \varepsilon )\leq \exp \left(-{\frac {\varepsilon }{2B}}\log \left(1+{\frac {B\varepsilon }{{\tilde {\sigma }}^{2}}}\right)\right).}
McDiarmid's Inequality (Bernstein form)[4] — Let
f
:
X
n
→
R
{\displaystyle f:{\mathcal {X}}^{n}\rightarrow \mathbb {R} }
satisfy the bounded differences property with bounds
c
1
,
c
2
,
…
,
c
n
{\displaystyle c_{1},c_{2},\dots ,c_{n}}
. Let
B
{\displaystyle B}
and
σ
~
2
{\displaystyle {\tilde {\sigma }}^{2}}
be defined as at the beginning of this section.
Then, for any
ε
>
0
{\displaystyle \varepsilon >0}
,
P
(
f
(
X
1
,
…
,
X
n
)
−
E
[
f
(
X
1
,
…
,
X
n
)
]
≥
ε
)
≤
exp
(
−
ε
2
2
(
σ
~
2
+
B
ε
3
)
)
.
{\displaystyle {\text{P}}(f(X_{1},\ldots ,X_{n})-\mathbb {E} [f(X_{1},\ldots ,X_{n})]\geq \varepsilon )\leq \exp \left(-{\frac {\varepsilon ^{2}}{2\left({\tilde {\sigma }}^{2}+{\frac {B\varepsilon }{3}}\right)}}\right).}
The following proof of McDiarmid's inequality[2] constructs the Doob martingale tracking the conditional expected value of the function as more and more of its arguments are sampled and conditioned on, and then applies a martingale concentration inequality (Azuma's inequality ).
An alternate argument avoiding the use of martingales also exists, taking advantage of the independence of the function arguments to provide a Chernoff-bound -like argument.[4]
For better readability, we will introduce a notational shorthand:
z
i
⇁
j
{\displaystyle z_{i\rightharpoondown j}}
will denote
z
i
,
…
,
z
j
{\displaystyle z_{i},\dots ,z_{j}}
for any
z
∈
X
n
{\displaystyle z\in {\mathcal {X}}^{n}}
and integers
1
≤
i
≤
j
≤
n
{\displaystyle 1\leq i\leq j\leq n}
, so that, for example,
f
(
X
1
⇁
(
i
−
1
)
,
y
,
x
(
i
+
1
)
⇁
n
)
:=
f
(
X
1
,
…
,
X
i
−
1
,
y
,
x
i
+
1
,
…
,
x
n
)
.
{\displaystyle f(X_{1\rightharpoondown (i-1)},y,x_{(i+1)\rightharpoondown n}):=f(X_{1},\ldots ,X_{i-1},y,x_{i+1},\ldots ,x_{n}).}
Pick any
x
1
′
,
x
2
′
,
…
,
x
n
′
{\displaystyle x_{1}',x_{2}',\ldots ,x_{n}'}
. Then, for any
x
1
,
x
2
,
…
,
x
n
{\displaystyle x_{1},x_{2},\ldots ,x_{n}}
, by triangle inequality ,
|
f
(
x
1
⇁
n
)
−
f
(
x
1
⇁
n
′
)
|
≤
|
f
(
x
1
⇁
n
)
−
f
(
x
1
⇁
(
n
−
1
)
′
,
x
n
)
|
+
c
n
≤
|
f
(
x
1
⇁
n
)
−
f
(
x
1
⇁
(
n
−
2
)
′
,
x
(
n
−
1
)
⇁
n
)
|
+
c
n
−
1
+
c
n
≤
…
≤
∑
i
=
1
n
c
i
,
{\displaystyle {\begin{aligned}&|f(x_{1\rightharpoondown n})-f(x'_{1\rightharpoondown n})|\\[6pt]\leq {}&|f(x_{1\rightharpoondown \,n})-f(x'_{1\rightharpoondown (n-1)},x_{n})|+c_{n}\\\leq {}&|f(x_{1\rightharpoondown n})-f(x'_{1\rightharpoondown (n-2)},x_{(n-1)\rightharpoondown n})|+c_{n-1}+c_{n}\\\leq {}&\ldots \\\leq {}&\sum _{i=1}^{n}c_{i},\end{aligned}}}
and thus
f
{\displaystyle f}
is bounded.
Since
f
{\displaystyle f}
is bounded, define the Doob martingale
{
Z
i
}
{\displaystyle \{Z_{i}\}}
(each
Z
i
{\displaystyle Z_{i}}
being a random variable depending on the random values of
X
1
,
…
,
X
i
{\displaystyle X_{1},\ldots ,X_{i}}
) as
Z
i
:=
E
[
f
(
X
1
⇁
n
)
∣
X
1
⇁
i
]
{\displaystyle Z_{i}:=\mathbb {E} [f(X_{1\rightharpoondown n})\mid X_{1\rightharpoondown i}]}
for all
i
≥
1
{\displaystyle i\geq 1}
and
Z
0
:=
E
[
f
(
X
1
⇁
n
)
]
{\displaystyle Z_{0}:=\mathbb {E} [f(X_{1\rightharpoondown n})]}
, so that
Z
n
=
f
(
X
1
⇁
n
)
{\displaystyle Z_{n}=f(X_{1\rightharpoondown n})}
.
Now define the random variables for each
i
{\displaystyle i}
U
i
:=
sup
x
∈
X
i
E
[
f
(
X
1
⇁
(
i
−
1
)
,
x
,
X
(
i
+
1
)
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
,
X
i
=
x
]
−
[
f
(
X
1
⇁
(
i
−
1
)
,
X
i
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
]
,
L
i
:=
inf
x
∈
X
i
E
[
f
(
X
1
⇁
(
i
−
1
)
,
x
,
X
(
i
+
1
)
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
,
X
i
=
x
]
−
[
f
(
X
1
⇁
(
i
−
1
)
,
X
i
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
]
.
{\displaystyle {\begin{aligned}U_{i}&:=\sup _{x\in {\mathcal {X}}_{i}}\mathbb {E} [f(X_{1\rightharpoondown (i-1)},x,X_{(i+1)\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)},X_{i}=x]-\mathbb {[} f(X_{1\rightharpoondown (i-1)},X_{i\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)}],\\L_{i}&:=\inf _{x\in {\mathcal {X}}_{i}}\mathbb {E} [f(X_{1\rightharpoondown (i-1)},x,X_{(i+1)\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)},X_{i}=x]-\mathbb {[} f(X_{1\rightharpoondown (i-1)},X_{i\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)}].\\\end{aligned}}}
Since
X
i
,
…
,
X
n
{\displaystyle X_{i},\ldots ,X_{n}}
are independent of each other, conditioning on
X
i
=
x
{\displaystyle X_{i}=x}
does not affect the probabilities of the other variables, so these are equal to the expressions
U
i
=
sup
x
∈
X
i
E
[
f
(
X
1
⇁
(
i
−
1
)
,
x
,
X
(
i
+
1
)
⇁
n
)
−
f
(
X
1
⇁
(
i
−
1
)
,
X
i
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
]
,
L
i
=
inf
x
∈
X
i
E
[
f
(
X
1
⇁
(
i
−
1
)
,
x
,
X
(
i
+
1
)
⇁
n
)
−
f
(
X
1
⇁
(
i
−
1
)
,
X
i
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
]
.
{\displaystyle {\begin{aligned}U_{i}&=\sup _{x\in {\mathcal {X}}_{i}}\mathbb {E} [f(X_{1\rightharpoondown (i-1)},x,X_{(i+1)\rightharpoondown n})-f(X_{1\rightharpoondown (i-1)},X_{i\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)}],\\L_{i}&=\inf _{x\in {\mathcal {X}}_{i}}\mathbb {E} [f(X_{1\rightharpoondown (i-1)},x,X_{(i+1)\rightharpoondown n})-f(X_{1\rightharpoondown (i-1)},X_{i\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)}].\\\end{aligned}}}
Note that
L
i
≤
Z
i
−
Z
i
−
1
≤
U
i
{\displaystyle L_{i}\leq Z_{i}-Z_{i-1}\leq U_{i}}
. In addition,
U
i
−
L
i
=
sup
u
∈
X
i
,
ℓ
∈
X
i
E
[
f
(
X
1
⇁
(
i
−
1
)
,
u
,
X
(
i
+
1
)
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
]
−
E
[
f
(
X
1
⇁
(
i
−
1
)
,
ℓ
,
X
(
i
+
1
)
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
]
=
sup
u
∈
X
i
,
ℓ
∈
X
i
E
[
f
(
X
1
⇁
(
i
−
1
)
,
u
,
X
(
i
+
1
)
⇁
n
)
−
f
(
X
1
⇁
(
i
−
1
)
,
l
,
X
(
i
+
1
)
⇁
n
)
∣
X
1
⇁
(
i
−
1
)
]
≤
sup
x
u
∈
X
i
,
x
l
∈
X
i
E
[
c
i
∣
X
1
⇁
(
i
−
1
)
]
≤
c
i
{\displaystyle {\begin{aligned}U_{i}-L_{i}&=\sup _{u\in {\mathcal {X}}_{i},\ell \in {\mathcal {X}}_{i}}\mathbb {E} [f(X_{1\rightharpoondown (i-1)},u,X_{(i+1)\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)}]-\mathbb {E} [f(X_{1\rightharpoondown (i-1)},\ell ,X_{(i+1)\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)}]\\[6pt]&=\sup _{u\in {\mathcal {X}}_{i},\ell \in {\mathcal {X}}_{i}}\mathbb {E} [f(X_{1\rightharpoondown (i-1)},u,X_{(i+1)\rightharpoondown n})-f(X_{1\rightharpoondown (i-1)},l,X_{(i+1)\rightharpoondown n})\mid X_{1\rightharpoondown (i-1)}]\\&\leq \sup _{x_{u}\in {\mathcal {X}}_{i},x_{l}\in {\mathcal {X}}_{i}}\mathbb {E} [c_{i}\mid X_{1\rightharpoondown (i-1)}]\\[6pt]&\leq c_{i}\end{aligned}}}
Then, applying the general form of Azuma's inequality to
{
Z
i
}
{\displaystyle \left\{Z_{i}\right\}}
, we have
P
(
f
(
X
1
,
…
,
X
n
)
−
E
[
f
(
X
1
,
…
,
X
n
)
]
≥
ε
)
=
P
(
Z
n
−
Z
0
≥
ε
)
≤
exp
(
−
2
ε
2
∑
i
=
1
n
c
i
2
)
.
{\displaystyle {\text{P}}(f(X_{1},\ldots ,X_{n})-\mathbb {E} [f(X_{1},\ldots ,X_{n})]\geq \varepsilon )=\operatorname {P} (Z_{n}-Z_{0}\geq \varepsilon )\leq \exp \left(-{\frac {2\varepsilon ^{2}}{\sum _{i=1}^{n}c_{i}^{2}}}\right).}
The one-sided bound in the other direction is obtained by applying Azuma's inequality to
{
−
Z
i
}
{\displaystyle \left\{-Z_{i}\right\}}
and the two-sided bound follows from a union bound .
◻
{\displaystyle \square }