Talk:James–Stein estimator

Statistics Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Mid	This article has been rated as Mid-importance on the importance scale.

Mathematics Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Mid	This article has been rated as Mid-priority on the project's priority scale.

Is the assumption of equal variances fundamental to this? Should say one way or another. —The preceding unsigned comment was added by 65.217.188.20 (talk) .

Thanks for the comment. The assumption of equal variances is not required. I will add some information about this shortly. --Zvika 19:29, 27 September 2006 (UTC)Reply

Looking forward to this addition. Also, what can be done if the variances are not known? After all, if

\theta

is not known then probably

\sigma ^{2}

is not either. (Can you use some version of the sample variances, for instance?) Thanks! Eclecticos (talk) 05:26, 5 October 2008 (UTC)Reply

Thanks for the great article on the James-Stein estimator. I think you may also want to mention the connection to Emprirical Bayes methods (e.g., as discsussed by Effron and Morris in their paper "Stein's Estimation Rule and Its Competitors--An Empirical Bayes Approach"). Personally, I found the Empirical Bayes explanation provided some very useful intuition to the "magic" of this estimator. — Preceding unsigned comment added by 131.239.52.20 (talk) 17:54, 18 April 2007 (UTC)Reply

Thanks for the compliment! Your suggestion sounds like a good idea. User:Billjefferys recently suggested a similar addition to the article Stein's example, but neither of us has gotten around to working on it yet. --Zvika 07:55, 19 April 2007 (UTC)Reply

dimensionality of y edit

Latest comment: 6 years ago5 comments5 people in discussion

A confusing point about this article: y is described as "observations" of an m-dimensional vector $\theta$ , suggesting that it should be an m by n matrix, where n is the number of observations. However, this doesn't conform to the use of y in the formula for the James-Stein estimator, where y appears to be a single m-dimensional vector. (Is there some mean involved? Is $||y||^{2}$ computed over all mn scalars?) Furthermore, can we still apply some version of the James-Stein technique in the case where we have more observations of $\theta _{1}$ than of $\theta _{2}$ , i.e., there is not a single n? Thanks for any clarification in the article. Eclecticos (talk) 05:19, 5 October 2008 (UTC)Reply

The setting in the article describes a case where there is one observation per parameter. I have added a clarifying comment to this effect. In the situation you describe, in which several independent observations are given per parameter, the mean of these observations is a sufficient statistic for estimating θ, so that this setting can be reduced to the one in the article. --Zvika (talk) 05:48, 5 October 2008 (UTC)Reply

The wording is still unclear, especially the sentence: "Suppose θ is an unknown parameter vector of length m, and let y be a vector of observations of θ (also of length m)". How can a vector of m-dimensional observations have length m? --StefanVanDerWalt (talk) 11:07, 1 February 2010 (UTC)Reply

Indeed, it does not make sense. I'll give it a shot. 84.238.115.164 (talk) 19:49, 17 February 2010 (UTC)Reply

Me too. What do you think of my edit? Yak90 (talk) 08:05, 24 September 2017 (UTC)Reply

Is the formula using σ²/ni applicable for different sample sizes in groups?. In Morris, 1983, Parametric Empirical Bayes Inference: Theory and Applications, it is claimed that a more general version (which is also derived there) of Stein's estimator is needed if the variances Vi are unequal, where Vi denotes σ²_i/n_i so as I understands it, Steins formula is only applicable for equal n_i as well.

Bias edit

Latest comment: 13 years ago1 comment1 person in discussion

The estimator is always biased, right? I think this is worth mentioning directly in the article. Lavaka (talk) 02:09, 22 March 2011 (UTC)Reply

Risk functions edit

Latest comment: 13 years ago1 comment1 person in discussion

The graph of the MSE functions would need a bit more precisions : we are in the case where ν=0, probably m=10 and σ=1, aren't we ? (I thought that, in this case , for θ = 0, MSE should be equal to 2 ; maybe the red curve represents the positive JS ?) —Preceding unsigned comment added by 82.244.59.11 (talk) 15:40, 10 May 2011 (UTC)Reply

Extensions edit

In the case of unknown variance, multiple observations are necessary, right? Thus it would make sense to swap bullet points 1 and 2 and reference the then first from the second. Also, the "usual estimator of the variance" is a bit dubious to me. Shouldn't it be something like: ${\widehat {\sigma }}^{2}={\frac {1}{m(n-1)+2}}\sum _{i}\left\|y_{i}-{\overline {y}}\right\|_{2}$ ?

Always or on average? edit

Latest comment: 6 years ago2 comments2 people in discussion

Currently the lead says

the James–Stein estimator dominates the "ordinary" least squares approach, i.e., it has lower mean squared error on average.

But two sections later the article says

the James–Stein estimator always achieves lower mean squared error (MSE) than the maximum likelihood estimator. By definition, this makes the least squares estimator inadmissible when $m\geq 3$ .

(Bolding is mine.) This appears contradictory, and I suspect the passage in the lead should be changed from "on average" to "always". Since I don't know for sure, I won't change it myself. Loraof (talk) 23:55, 14 October 2017 (UTC)Reply

It sounds like the first one doubles the "mean". On average the squared error is lower. The mean squared error is lower. There is nothing more to average over. If you have a specific sample the squared error of the James-Stein estimator can be worse. --mfb (talk) 03:09, 15 October 2017 (UTC)Reply

Concrete examples? edit

Latest comment: 5 years ago3 comments3 people in discussion

This article would be greatly improved if some concrete examples were given in the lead and text so that laymen might have some idea of what the subject deals with in the real world. μηδείς (talk) 23:17, 8 November 2017 (UTC)Reply

There is an example starting at "A quirky example". I'm not sure if there are real world implications. --mfb (talk) 07:31, 9 November 2017 (UTC)Reply

I agree with Medeis. And by "concrete", I don't mean more hand-waving, I mean an actual variance-covariance matrix and set of observations, so that I can check the claim for myself. Maproom (talk) 09:42, 22 June 2018 (UTC)Reply

Using a single observation?? edit

Latest comment: 6 years ago3 comments2 people in discussion

Is this correct?

"We are interested in obtaining an estimate

{\widehat {\boldsymbol {\theta }}}

of

{\boldsymbol {\theta }}

, based on a single observation,

{\mathbf {y} }

, of

{\mathbf {Y} }

."

How can you get an estimate from a single observation? Presumably the means along each dimension of ${\boldsymbol {\theta }}$ are uncorrelated... Danski14^(talk) 19:53, 2 March 2018 (UTC)Reply

apparently it is right. They use the prior to get the estimate. Nevermind Danski14^(talk) 20:13, 4 April 2018 (UTC)Reply

You make a single observation (in all dimensions), and then you either use this observation as your estimate or you do something else with it. --mfb (talk) 05:02, 5 April 2018 (UTC)Reply

Using James-Stein with a linear regression edit

Latest comment: 4 years ago1 comment1 person in discussion

I am wondering how to use James-Stein in an ordinary least squares regression.

First, if $\beta$ are the coefficient estimates for an OLS (I skipped the hat), is the following the formula for shrinking it towards zero:

${\hat {\beta }}^{JS}=\beta \left(1-{\frac {(p-2)\sigma _{n}^{2}}{\beta '\beta }}\right)$

where $\sigma _{n}^{2}$ is the true variance (I might substitute the sample variance here), and p is the number of parameters in $\beta$ . (I'm a bit fuzzy on whether $\alpha$ , the constant in the regression, is in the $\beta$.)

I guessed this formula from "The risk of James-Stein and Lasso Shrinkage", but I don't know if it's right.

Second, what would the formula be for the confidence intervals of the shrunken $\beta$ estimates?

dfrankow (talk) 20:46, 12 May 2020 (UTC)Reply

Add topic