The Theil index TT is the same as redundancy in information theory which is the maximum possible entropy of the data minus the observed entropy. It is a special case of the generalized entropy index. It can be viewed as a measure of redundancy, lack of diversity, isolation, segregation, inequality, non-randomness, and compressibility. It was proposed by econometrician Henri Theil at the Erasmus University Rotterdam.
For a population of N "agents" each with characteristic x, the situation may be represented by the list xi (i = 1,...,N) where xi is the characteristic of agent i. For example, if the characteristic is income, then xi is the income of agent i.
The Theil T index is defined as
and the Theil L index is defined as
where is the mean income:
Equivalently, if the situation is characterized by a discrete distribution function fk (k = 0,...,W) where fk is the fraction of the population with income k and W = Nμ is the total income, then and the Theil index is:
where is again the mean income:
Note that in this case income k is an integer and k=1 represents the smallest increment of income possible (e.g., cents).
if the situation is characterized by a continuous distribution function f(k) (supported from 0 to infinity) where f(k) dk is the fraction of the population with income k to k + dk, then the Theil index is:
where the mean is:
Theil indices for some common continuous probability distributions are given in the table below:
Income distribution function PDF(x) (x ≥ 0) Theil coefficient (nats) Dirac delta function 0 Uniform distribution Exponential distribution Log-normal distribution Pareto distribution (α>1) Chi-squared distribution Gamma distribution Weibull distribution
If everyone has the same income, then TT equals 0. If one person has all the income, then TT gives the result , which is maximum inequality. Dividing TT by can normalize the equation to range from 0 to 1, but then the independence axiom is violated: and does not qualify as a measure of inequality.
The Theil index measures an entropic "distance" the population is away from the egalitarian state of everyone having the same income. The numerical result is in terms of negative entropy so that a higher number indicates more order that is further away from the complete equality. Formulating the index to represent negative entropy instead of entropy allows it to be a measure of inequality rather than equality.
Derivation from entropyEdit
The Theil index is derived from Shannon's measure of information entropy , where entropy is a measure of randomness in a given set of information. In information theory, physics, and the Theil index, the general form of entropy is
where is the probability of finding member from a random sample of the population. In physics, is Boltzmann's constant. In information theory, when information is given in binary digits, and the log base is 2. In physics and also in computation of Theil index, the natural logarithm is chosen as the logarithmic base. When is chosen to be income per person , it needs to be normalized by dividing by the total population income, . This gives the observed entropy of a population to be:
The Theil index is where is the theoretical maximum entropy that is reached when all incomes are equal, i.e. for all . This is substituted into to give , a constant determined solely by the population. So the Theil index gives a value in terms of an entropy that measures how far is away from the "ideal" . The index is a "negative entropy" in the sense that it gets smaller as the disorder gets larger, hence it is a measure of order rather than disorder.
When is in units of population/species, is a measure of biodiversity and is called the Shannon index. If the Theil index is used with x=population/species, it is a measure of inequality of population among a set of species, or "bio-isolation" as opposed to "wealth isolation".
The Theil index measures what is called redundancy in information theory. It is the left over "information space" that was not utilized to convey information, which reduces the effectiveness of the price signal. The Theil index is a measure of the redundancy of income (or other measure of wealth) in some individuals. Redundancy in some individuals implies scarcity in others. A high Theil index indicates the total income is not distributed evenly among individuals in the same way an uncompressed text file does not have a similar number of byte locations assigned to the available unique byte characters.
|Notation||Information theory||Theil index TT|
|number of unique characters||number of individuals|
|a particular character||a particular individual|
|count of ith character||income of ith individual|
|total characters in document||total income in population|
|unused information space||unused potential in price mechanism|
|data compression||progressive tax|
According to the World Bank,
"The best-known entropy measures are Theil’s T ( ) and Theil’s L ( ), both of which allow one to decompose inequality into the part that is due to inequality within areas (e.g. urban, rural) and the part that is due to differences between areas (e.g. the rural-urban income gap). Typically at least three-quarters of inequality in a country is due to within-group inequality, and the remaining quarter to between-group differences."
If the population is divided into subgroups and
- is the income share of group ,
- is the total population and is the population of group ,
- is the Theil index for that subgroup,
- is the average income in group , and
- is the average income of the population,
then Theil's T index is
For example, inequality within the United States is the average inequality within each state, weighted by state income, plus the inequality between states.
- Note: This image is not the Theil Index in each area of the United States, but of contributions to the Theil Index for the U.S. by each area. The Theil Index is always positive, although individual contributions to the Theil Index may be negative or positive.
The decomposition of the Theil index which identifies the share attributable to the between-region component becomes a helpful tool for the positive analysis of regional inequality as it suggests the relative importance of spatial dimension of inequality.
Theil's T versus Theil's LEdit
Both Theil's T and Theil's L are decomposable. The difference between them is based on the part of the outcomes distribution that each is used for. Indexes of inequality in the generalized entropy (GE) family are more sensitive to differences in income shares among the poor or among the rich depending on a parameter that defines the GE index. The smaller the parameter value for GE, the more sensitive it is to differences at the bottom of the distribution.
- GE(0) = Theil's L and is more sensitive to differences at the lower end of the distribution. It is also referred to as the mean log deviation measure.
- GE(1) = Theil's T and is more sensitive to differences at the top of the distribution.
The decomposability is a property of the Theil index which the more popular Gini coefficient does not offer. The Gini coefficient is more intuitive to many people since it is based on the Lorenz curve. However, it is not easily decomposable like the Theil.
- Introduction to the Theil index from the University of Texas
- "Segregation Measures". www.urban.org. Urban Institute. Retrieved 5 February 2018.
- Parker, Lauren (20 July 2015). "Racial and Ethnic Segregation: In the News and On PolicyMap". PolicyMap. Retrieved 5 February 2018.
- http://www.poorcity.richcity.org (Redundancy, Entropy and Inequality Measures)
- "6. Inequality Measures". Poverty Manual (pdf). World Bank. 8 August 2005. p. 95. Retrieved 4 February 2018.
- Novotny, J. (2007). "On the measurement of regional inequality: Does spatial dimension of income inequality matter?" (PDF). Annals of Regional Science. 41 (3): 563–580.
- "Inequality Measures". www.urban.org. Urban Institute. Retrieved 5 February 2018.
- Rajan K. Sampath. Equity Measures for Irrigation Performance Evaluation. Water International, 13(1), 1988.
- A. Serebrenik, M. van den Brand. Theil index for aggregation of software metrics values. 26th IEEE International Conference on Software Maintenance. IEEE Computer Society.
- Free Online Calculator computes the Gini Coefficient, plots the Lorenz curve, and computes many other measures of concentration for any dataset
- Free Calculator: Online and downloadable scripts (Python and Lua) for Atkinson, Gini, and Hoover inequalities
- Users of the R data analysis software can install the "ineq" package which allows for computation of a variety of inequality indices including Gini, Atkinson, Theil.
- A MATLAB Inequality Package, including code for computing Gini, Atkinson, Theil indexes and for plotting the Lorenz Curve. Many examples are available.