Talk:James–Stein estimator

Latest comment: 4 years ago by Dfrankow in topic Using James-Stein with a linear regression

Is the assumption of equal variances fundamental to this? Should say one way or another. —The preceding unsigned comment was added by 65.217.188.20 (talk) .

Thanks for the comment. The assumption of equal variances is not required. I will add some information about this shortly. --Zvika 19:29, 27 September 2006 (UTC)Reply
Looking forward to this addition. Also, what can be done if the variances are not known? After all, if is not known then probably is not either. (Can you use some version of the sample variances, for instance?) Thanks! Eclecticos (talk) 05:26, 5 October 2008 (UTC)Reply

Thanks for the great article on the James-Stein estimator. I think you may also want to mention the connection to Emprirical Bayes methods (e.g., as discsussed by Effron and Morris in their paper "Stein's Estimation Rule and Its Competitors--An Empirical Bayes Approach"). Personally, I found the Empirical Bayes explanation provided some very useful intuition to the "magic" of this estimator. — Preceding unsigned comment added by 131.239.52.20 (talk) 17:54, 18 April 2007 (UTC)Reply

Thanks for the compliment! Your suggestion sounds like a good idea. User:Billjefferys recently suggested a similar addition to the article Stein's example, but neither of us has gotten around to working on it yet. --Zvika 07:55, 19 April 2007 (UTC)Reply

dimensionality of y edit

A confusing point about this article: y is described as "observations" of an m-dimensional vector  , suggesting that it should be an m by n matrix, where n is the number of observations. However, this doesn't conform to the use of y in the formula for the James-Stein estimator, where y appears to be a single m-dimensional vector. (Is there some mean involved? Is   computed over all mn scalars?) Furthermore, can we still apply some version of the James-Stein technique in the case where we have more observations of   than of  , i.e., there is not a single n? Thanks for any clarification in the article. Eclecticos (talk) 05:19, 5 October 2008 (UTC)Reply

The setting in the article describes a case where there is one observation per parameter. I have added a clarifying comment to this effect. In the situation you describe, in which several independent observations are given per parameter, the mean of these observations is a sufficient statistic for estimating θ, so that this setting can be reduced to the one in the article. --Zvika (talk) 05:48, 5 October 2008 (UTC)Reply
The wording is still unclear, especially the sentence: "Suppose θ is an unknown parameter vector of length m, and let y be a vector of observations of θ (also of length m)". How can a vector of m-dimensional observations have length m? --StefanVanDerWalt (talk) 11:07, 1 February 2010 (UTC)Reply
Indeed, it does not make sense. I'll give it a shot. 84.238.115.164 (talk) 19:49, 17 February 2010 (UTC)Reply
Me too. What do you think of my edit? Yak90 (talk) 08:05, 24 September 2017 (UTC)Reply
Is the formula using σ2/ni applicable for different sample sizes in groups?. In Morris, 1983, Parametric Empirical Bayes Inference: Theory and Applications, it is claimed that a more general version (which is also derived there) of Stein's estimator is needed if the variances Vi are unequal, where Vi denotes σ2i/ni so as I understands it, Steins formula is only applicable for equal ni as well.

Bias edit

The estimator is always biased, right? I think this is worth mentioning directly in the article. Lavaka (talk) 02:09, 22 March 2011 (UTC)Reply

Risk functions edit

The graph of the MSE functions would need a bit more precisions : we are in the case where ν=0, probably m=10 and σ=1, aren't we ? (I thought that, in this case , for θ = 0, MSE should be equal to 2 ; maybe the red curve represents the positive JS ?) —Preceding unsigned comment added by 82.244.59.11 (talk) 15:40, 10 May 2011 (UTC)Reply

Extensions edit

In the case of unknown variance, multiple observations are necessary, right? Thus it would make sense to swap bullet points 1 and 2 and reference the then first from the second. Also, the "usual estimator of the variance" is a bit dubious to me. Shouldn't it be something like:  ?

Always or on average? edit

Currently the lead says

the James–Stein estimator dominates the "ordinary" least squares approach, i.e., it has lower mean squared error on average.

But two sections later the article says

the James–Stein estimator always achieves lower mean squared error (MSE) than the maximum likelihood estimator. By definition, this makes the least squares estimator inadmissible when  .

(Bolding is mine.) This appears contradictory, and I suspect the passage in the lead should be changed from "on average" to "always". Since I don't know for sure, I won't change it myself. Loraof (talk) 23:55, 14 October 2017 (UTC)Reply

It sounds like the first one doubles the "mean". On average the squared error is lower. The mean squared error is lower. There is nothing more to average over. If you have a specific sample the squared error of the James-Stein estimator can be worse. --mfb (talk) 03:09, 15 October 2017 (UTC)Reply

Concrete examples? edit

This article would be greatly improved if some concrete examples were given in the lead and text so that laymen might have some idea of what the subject deals with in the real world. μηδείς (talk) 23:17, 8 November 2017 (UTC)Reply

There is an example starting at "A quirky example". I'm not sure if there are real world implications. --mfb (talk) 07:31, 9 November 2017 (UTC)Reply
I agree with Medeis. And by "concrete", I don't mean more hand-waving, I mean an actual variance-covariance matrix and set of observations, so that I can check the claim for myself. Maproom (talk) 09:42, 22 June 2018 (UTC)Reply

Using a single observation?? edit

Is this correct?

"We are interested in obtaining an estimate   of  , based on a single observation,  , of  ."

How can you get an estimate from a single observation? Presumably the means along each dimension of   are uncorrelated... Danski14(talk) 19:53, 2 March 2018 (UTC)Reply

apparently it is right. They use the prior to get the estimate. Nevermind Danski14(talk) 20:13, 4 April 2018 (UTC)Reply
You make a single observation (in all dimensions), and then you either use this observation as your estimate or you do something else with it. --mfb (talk) 05:02, 5 April 2018 (UTC)Reply

Using James-Stein with a linear regression edit

I am wondering how to use James-Stein in an ordinary least squares regression.

First, if   are the coefficient estimates for an OLS (I skipped the hat), is the following the formula for shrinking it towards zero:

 

where   is the true variance (I might substitute the sample variance here), and p is the number of parameters in  . (I'm a bit fuzzy on whether  , the constant in the regression, is in the $\beta$.)

I guessed this formula from "The risk of James-Stein and Lasso Shrinkage", but I don't know if it's right.

Second, what would the formula be for the confidence intervals of the shrunken   estimates?

dfrankow (talk) 20:46, 12 May 2020 (UTC)Reply