User:Bassis/Bootstrapping populations

sample.

Method

edit

Given a   of a random variable X and a sampling mechanism   for X, we have  , with  . Focusing on well behaving statistics

 
   
 

for their parameters, the master equations read

 
    (1)
 

For each sample seed   you obtain a vector of parameters   from the solution of the above system with   fixed to the observed values. Having computed a huge set of compatible vectors, say N, you obtain the empirical marginal distribution of   by:

  (2)

denoting by   the j-th component of the generic solution of (1) and by   the indicator function of   in the interval  . Some indeterminacies remain with X discrete which we will consider shortly. The whole procedure may be summed up in the form of the following Algorithm, where the index   of   denotes the parameters vector which the statics vector refers to.

Algorithm

edit
Generating parameter populations through a bootstrap
Given a sample   from a random variable with parameter vector   unknown,
  1. Identify a vector of well behaving statistics   for  ;
  2. compute a specification   of   from the sample;
  3. repeat for a satisfactory number N of iterations:
    • draw a sample seed   of size m from the seed random variable;
    • get   as a solution of (1) in θ with   and  ;
    • add   to  ; population.


 
Cumulative distribution function of the parameter Λ of an Exponential random variable when statistic  
 
Cumulative distribution function of the parameter A of a uniform continuous random variable when statistic  

You may easily see from the Table of sufficient statistics that we obtain the curve in the picture on the left by computing the empirical distribution (2) on the population obtained through the above algorithm when: i) X is an Exponential random variable, ii)  , and

 ,

and the curve in the picture on the right when: i) X is a Uniform random variable in  , ii)  , and

 .

Remark

edit

Note that the accuracy with which a parameter distribution law of populations compatible with a sample is obtained is not a function of the sample size. Instead, it is a function of the number of seeds we draw. In turn, this number is purely a matter of computational time but does not require any extension of the observed data. With other bootstrapping methods focusing on a generation of sample replicas (like those proposed by (Efron and Tibshirani 1993)) the accuracy of the estimate distributions depends on the sample size.

Example

edit

For   expected to represent a Pareto distribution, whose specification requires values for the parameters   and k [1], we have that the cumulative distribution function reads:

 
Joint empirical cumulative distribution function of parameters   of a Pareto random variable when   and   based on 5,000 replicas.
 .

A sampling mechanism   has   uniform seed U and explaining function   described by:

 

A relevant statistic   is constituted by the pair of joint sufficient statistics for   and K, respectively  . The master equations read

 
 

with  .

Figure on the right reports the three dimensional plot of the empirical cumulative distribution function (2) of  .

Notes

edit
  1. ^ We denote here with symbols a and k the Pareto parameters elsewhere indicated through k and  .

References

edit
  • Efron, B. and Tibshirani, R. (1993). An introduction to the Boostrap. Freeman, New York: Chapman and Hall.{{cite book}}: CS1 maint: multiple names: authors list (link)
  • Apolloni, B (2006). Algorithmic Inference in Machine Learning. International Series on Advanced Intelligence. Vol. 5 (2nd ed.). Adelaide: Magill. Advanced Knowledge International {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  • Apolloni, B., Bassis, S., Gaito. S. and Malchiodi, D. (2007). "Appreciation of medical treatments by learning underlying functions with good confidence". Current Pharmaceutical Design. 13 (15): 1545–1570.{{cite journal}}: CS1 maint: multiple names: authors list (link)

Category:Computational statistics Category:Data analysis Category:Statistical inference Category:Resampling (statistics)