User:InfoTheorist/Shannon's theorem

In information theory, a result is known as a converse if it provides an upper bound on the achievable rate for a given channel. Thus a converse, combined with a matching achievability result (lower bound) determines the capacity of a channel. In general, two types of converse results have been studied. The first type, known as a weak converse, provides an upper bound on the achievable rate for a given probability of error ε. The implies that if a transmitter sends at a rate higher than the bound provided by the weak converse, the probability of error will be greater than ε. A strong converse, on the other hand, provides a much stronger result: if a transmitter sends at a rate higher than the given bound, the probability of error will not only be greater than ε, but will converge to one as codes with larger blocklengths are utilized.

In Shannon's noisy-channel coding theorem, Fano's inequality is used to obtain a weak converse. However, while Fano's inequality provides a non-vanishing lower bound for the probability of error when the transmitter's rate is above the channel capacity, it is not sufficient to prove that the probability of error converges to one when the blocklength goes to infinity.

Problem Formulation edit

Let   and   be sets. A channel, with input alphabet   and output alphabet  , is a sequence of conditional probability distributions   where

 

The channel is said to be discrete if both   and   are finite sets. A channel is called memoryless if for every positive integer   we have

 

We say a channel is stationary if every stationary input results in a stationary output. A memoryless channel is stationary if the functions   are equal for all  .

Therefore a stationary memoryless channel can be simply represented as the triple

 

A (2nR,n) code consists of a message set

 

an encoder

 

a decoder

 

The average probability of error of the code is given by

 

The value of n is known as the blocklength of the code.

A rate R (which is a nonnegative number) is said to be achievable, if there exists a sequence of (2nR,n) codes with P(n)e going to zero as n goes to infinity. The noisy-channel coding theorem states that a rate R is achievable if and only if R is smaller than the capacity of the channel C, where

 

Wolfowitz's theorem states that for any discrete memoryless channel with capacity C and any (2nR,n) code with rate R>C,

 

for some positive constant A dependent on the channel but not on n or M. The proof which follows is based on Gallager's book[1].

Proof edit

For the proof we first require a lemma. This lemma is essentially a special case of the method of Lagrange multipliers for a concave function defined on the standard simplex  . It is then followed by a corollary which simply applies the lemma to the mutual information.

Lemma edit

Let   be a concave function. Suppose f has continuous partial derivatives on its domain. Then   maximizes   iff there exists some real   such that for every  

 

and for every  

 

Proof of Lemma edit

Suppose   satisfies the above conditions. We'll show that   achieves its maximum at  . Let   be any element of  . By the concavity of   for any  , we have

 

thus

 

Allowing   and making use of the continuity of partial derivatives results in

 

For the other direction suppose   maximizes  . Then for every   and every  ,

 

This implies

 

and by the continuity of the partial derivatives,

 

(1)

Since  , at least one of its components, say   is strictly positive. Now let   be an arbitrary element of  . Furthermore, for every  , let   denote the element of   that consists of all zeros but one one in the   position. Define

 

Then inequality (1) simplifies to

 

In addition, if   and we define

 

then (1) results in

 

Thus if we define   as

 

the result follows.

Corollary edit

For any discrete memoryless channel the distribution   achieves capacity iff there exists a real number   such that   for  , and   for  , where

 .

Furthermore,   is the capacity of the channel.

Proof of Corollary edit

The proof of the corollary is straightforward and follows directly from the lemma. To see this, note that for any  ,

 

Since

 

this implies

 

Now using the lemma, the claim follows.

Proof of the Strong Converse edit

For any two random variables X and Y define the information density as

 

Note that

 

and

 

Let   be the capacity-achieving output distribution. For any positive integer n, define

  For any  , define

 

where

 

Consider a (2nR,n) code with codewords   and decoding regions  . Then the probability that a codeword is decoded correctly is given by

 

Fix positive ε. For every m, define the set

 

Then

 

Based on the definition of Bm, the second sum can be upper bounded as

 

Using Chebyshev's inequality we can find an upper bound on the first sum

 

where

 

If we define

 

(note that A only depends on the channel and is independent of n and M) then

 

Therefore

 

Since  , we get

 

Should R>C, setting ε to R-C/2 results in

 

Notes edit

References edit

  • Gallager, Robert G. (1968). Information Theory and Reliable Communication. Wiley. p. 87-89 and 173-175.

InfoTheorist (talk) 00:24, 5 November 2014 (UTC)