# Information content

(Redirected from Surprisal)

In information theory, information content, self-information, or surprisal of a random variable or signal is the amount of information gained when it is sampled. Formally, information content is a random variable defined for any event in probability theory regardless of whether a random variable is being measured or not.

Information content is expressed in a unit of information, as explained below. The expected value of self-information is information theoretic entropy, the average amount of information an observer would expect to gain about a system when sampling the random variable.

## Definition

Given a random variable $X$  with probability mass function $p_{X}{\left(x\right)}$ , the self-information of measuring $X$  as outcome $x$  is defined as $\operatorname {I} _{X}(x):=-\log {\left[p_{X}{\left(x\right)}\right]}=\log {\left({\frac {1}{p_{X}{\left(x\right)}}}\right)}.$ 

Broadly given an event $E$  with probability $P$ , information content is defined analogously:

$\operatorname {I} (E):=-\log {\left[\Pr {\left(E\right)}\right]}=-\log {\left(P\right)}.$

In general, the base of the logarithmic chosen does not matter for most information-theoretic properties; however, different units of information are assigned based on popular choices of base.

If the logarithmic base is 2, the unit is named the Shannon but "bit" is also used. If the base of the logarithm is the natural logarithm (logarithm to base Euler's number e ≈ 2.7182818284), the unit is called the nat, short for "natural". If the logarithm is to base 10, the units are called hartleys or decimal digits.

The Shannon entropy of the random variable $X$  above is defined as

{\begin{alignedat}{2}\mathrm {H} (X)&=\sum _{x}{-p_{X}{\left(x\right)}\log {p_{X}{\left(x\right)}}}\\&=\sum _{x}{p_{X}{\left(x\right)}\operatorname {I} _{X}(x)}\\&{\overset {\underset {\mathrm {def} }{}}{=}}\ \operatorname {E} {\left[\operatorname {I} _{X}(x)\right]},\end{alignedat}}

by definition equal to the expected information content of measurement of $X$ .:11:19-20

## Properties

### Antitonicity for probability

For a given probability space, measurement of rarer events will yield more information content than more common values. Thus, self-information is antitonic in probability for events under observation.

• Intuitively, more information is gained from observing an unexpected event—it is "surprising".
• For example, if there is a one-in-a-million chance of Alice winning the lottery, her friend Bob will gain significantly more information from learning that she won than that she lost on a given day. (See also: Lottery mathematics.)
• This establishes an implicit relationship between the self-information of a random variable and its variance.

### Additivity of independent events

The information content of two independent events is the sum of each event's information content. This property is known as additivity in mathematics, and sigma additivity in particular in measure and probability theory. Consider two independent random variables ${\textstyle X,\,Y}$  with probability mass functions $p_{X}(x)$  and $p_{Y}(y)$  respectively. The joint probability mass function is

$p_{X,Y}\!\left(x,y\right)=\Pr(X=x,\,Y=y)=p_{X}\!(x)\,p_{Y}\!(y)$

because ${\textstyle X}$  and ${\textstyle Y}$  are independent. The information content of the outcome $(X,Y)=(x,y)$  is

{\begin{aligned}\operatorname {I} _{X,Y}(x,y)&=-\log _{2}\left[p_{X,Y}(x,y)\right]=-\log _{2}\left[p_{X}\!(x)p_{Y}\!(y)\right]\\&=-\log _{2}\left[p_{X}{(x)}\right]-\log _{2}\left[p_{Y}{(y)}\right]\\&=\operatorname {I} _{X}(x)+\operatorname {I} _{Y}(y)\end{aligned}}

See § Two independent, identically distributed dice below for an example.

The corresponding property for likelihoods is that the log-likelihood of independent events is the sum of the log-likelihoods of each event. Interpreting log-likelihood as "support" or negative surprisal (the degree to which an event supports a given model: a model is supported by an event to the extent that the event is unsurprising, given the model), this states that independent events add support: the information that the two events together provide for statistical inference is the sum of their independent information.