# Horvitz–Thompson estimator

In statistics, the Horvitz–Thompson estimator, named after Daniel G. Horvitz and Donovan J. Thompson, is a method for estimating the total and mean of a pseudo-population in a stratified sample. Inverse probability weighting is applied to account for different proportions of observations within strata in a target population. The Horvitz–Thompson estimator is frequently applied in survey analyses and can be used to account for missing data.

## The method

Formally, let $Y_{i},i=1,2,\ldots ,n$  be an independent sample from n of N ≥ n distinct strata with a common mean μ. Suppose further that $\pi _{i}$  is the inclusion probability that a randomly sampled individual in a superpopulation belongs to the ith stratum. The Horvitz–Thompson estimate of the total is given by:

${\hat {Y}}_{HT}=\sum _{i=1}^{n}\pi _{i}^{-1}Y_{i},$

and the estimate of the mean is given by:

${\hat {\mu }}_{HT}=N^{-1}{\hat {Y}}_{HT}=N^{-1}\sum _{i=1}^{n}\pi _{i}^{-1}Y_{i}.$

In a Bayesian probabilistic framework $\pi _{i}$  is considered the proportion of individuals in a target population belonging to the ith stratum. Hence, $\pi _{i}^{-1}Y_{i}$  could be thought of as an estimate of the complete sample of persons within the ith stratum. The Horvitz–Thompson estimator can also be expressed as the limit of a weighted bootstrap resampling estimate of the mean. It can also be viewed as a special case of multiple imputation approaches.

For post-stratified study designs, estimation of $\pi$  and $\mu$  are done in distinct steps. In such cases, computating the variance of ${\hat {\mu }}_{HT}$  is not straightforward. Resampling techniques such as the bootstrap or the jackknife can be applied to gain consistent estimates of the variance of the Horvitz–Thompson estimator. The "survey" package for R conducts analyses for post-stratified data using the Horvitz–Thompson estimator.

## Proof of Horvitz-Thompson Unbiased Estimation of the Mean

The Horvitz–Thompson estimator can be shown to be unbiased when evaluating the expectation of the Horvitz–Thompson estimator, $\mathbf {E} {\bar {X}}_{n}^{HT}$ , as follows:

$\mathbf {E} {\bar {X}}_{n}^{HT}=\mathbf {E} {\frac {1}{N}}\sum _{i=1}^{n}{\frac {\mathbf {X} _{I_{i}}}{\pi _{I_{i}}}}$
$=\mathbf {E} {\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}1_{i\in D_{n}}$
$=\sum _{b=1}^{B}P(D_{n}^{(b)})\left[{\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}1_{i\in D_{n}^{(b)}}\right]$
$={\frac {1}{N}}\sum _{i=1}^{N}{\frac {X_{i}}{\pi _{i}}}\sum _{b=1}^{B}1_{i\in D_{n}^{(b)}}P(D_{n}^{(b)})$
$={\frac {1}{N}}\sum _{i=1}^{N}\left({\frac {X_{i}}{\pi _{i}}}\right)\pi _{i}$
$={\frac {1}{N}}\sum _{i=1}^{N}X_{i}$
${\text{where}}~D_{n}=\{x_{1},x_{2},...,x_{n}\}$