Talk:Prediction interval

Latest comment: 6 months ago by Biggerj1 in topic Non-parametric

Which percentile to use edit

I'm not sure about the line "where Ta is the 100(1 − (p/2))th percentile of Student's t-distribution..." for a 100p% prediction interval. For example, for a 90% prediction interval that would be the 55th percentile, which doesn't sound right - or am I missing something?

Perhaps it should instead read "where Ta is the 100(1 − (α/2))th percentile of Student's t-distribution..." for a 100(1-α)% interval, also replacing p by 1-α in the line above (i.e. α is the error rate in the prediction, whereas p was the success rate). For a 90% prediction interval (α=0.1) that would mean using the 95th percentile, which sounds more reasonable.

For possible support for this formulation see http://www.amstat.org/publications/jse/secure/v8n3/preston.cfm which defines α in the same way and uses the 100(α/2)th and 100(1-(α/2))th percentiles of a general distribution. Also http://www.math.umd.edu/~jjm/tpredictionintervals.pdf, which uses the 100(α/2)th percentile of the t-distribution - I assume that the choice of 100(α/2)th or 100(1-(α/2))th percentile depends on how your t-distribution tables are written.

Alternatively, the definition of p as a success rate in the article could be retained by referring to the 100((1+p)/2)th percentile of the t-distribution, in which case the error rate α would not need to be introduced.

Richard J Price 10:54, 22 March 2007 (UTC)Reply

In agreement to the table on the student's t page, if T_a is the 100((1+p)/2)th percentile, then P(T<T_a)=(1+p)/2=(1+1-alpha)/2=1-alpha/2, with T student t distributed, which is the correct error for two-sided interval (see confidence interval). Gummif (talk) 00:49, 3 August 2013 (UTC)Reply

Michael Hardy kindly edited the article at 20:01, 22 March 2007 to address my first comment above. I've just noticed that much later on this change was reversed, I think by Rnjma99 at 12:13, 20 November 2015, and it has remained that way ever since. I don't understand why - can anyone explain? — Preceding unsigned comment added by RichardJPrice (talkcontribs)

OK, I see now that it was corrected in a different way on 14 March 2017 by an anonymous contributor, who changed the definition of p from the success rate of the prediction to its failure rate (what I called α above). Richard J Price (talk) 10:07, 25 October 2018 (UTC)Reply

Unclarity edit

Could we please get another example, with a population variable such as apple width or orange peel thickness, instead of a bunch of abstract equations? Thanks in advance. 75.35.79.113 21:41, 19 April 2007 (UTC)Reply

I’ve elaborated and given some simpler and clearer examples, notably the simple non-parametric estimation – hope it’s clearer now!
—Nils von Barth (nbarth) (talk) 17:21, 19 April 2009 (UTC)Reply

Bayesian Statistics edit

Why exactly is this stated --- "In Bayesian statistics, one can compute (Bayesian) prediction intervals from the posterior probability of the random variable, as a credible interval. In theoretical work, credible intervals are not often calculated for the prediction of future events, but for inference of parameters – i.e., credible intervals of a parameter, not for the outcomes of the variable itself"? --- It's quite common in practice to create a posterior predictive distribution which gives you an interval for the actual outcome of the variable itself. 97.125.169.175 (talk) 19:23, 30 April 2011 (UTC)Reply

unclear on scope edit

The article could possibly be clarified by relating a prediction interval to a tolerance interval. The intro currently uses language that a prediction interval is not normally appropriate for, although terminology in this area can be a bit inconsistent:

an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed

If not read carefully, this could imply that a prediction interval is an interval bounding n% of all future samples from a process, which would be equivalent to n% population coverage, which is not typically what a prediction interval gives you (except on average). However I'm not entirely sure where to go in clarifying this article. I've started by expanding tolerance interval instead. --130.207.127.232 (talk) 14:14, 26 August 2011 (UTC)Reply

Standard score edit

The source of confusion is clearly explained by Melcombe on the project page. A prediction interval [L,U] is an interval such that for a future observation X it holds: P(L<X<U) has a given value. For the standard score Z of X therefore it gives:

 

By determine the quantile z such that

 

it follows:

 

Notice Z is a standard score, z is not. Actually I don't think the use of the term standard score is much of a help. Nijdam (talk) 08:16, 13 May 2012 (UTC)Reply

I still think it's necessary to mention standard score in the article. Let's continue the issue at that project page: Wikipedia_talk:WikiProject_Statistics#Standard_score. Mikael Häggström (talk) 18:57, 13 May 2012 (UTC)Reply

If Known mean and known variance then it is not a prediction interval but a tolerance interval edit

Maybe this is just about semantics but if you agree then we should remove the example for "Known mean, known variance" and just link this case to Tolerance interval, what you think?

There should be a link to [[tolerance interval], but the material should stay as the structure of the intervals needs to be compared across the cases where the parameters needs to be estimated or not. However, overall the univariate normal example is somewhat long and written at a textbook level, so perhaps this part can be reduced. Melcombe (talk) 07:03, 18 July 2012 (UTC)Reply

On known mean, unknown variance edit

In this case, for normal population we have:

 

and, therefore

 

is chi-squared distributed with n degrees of freedom;

 

is t-distributed with n degrees of freedom, i.e. the statistic

 

is scaled Student-:  distributed. J. Angelova — Preceding unsigned comment added by 46.10.58.124 (talk) 20:34, 12 October 2012 (UTC)Reply

Your point being? Fgnievinski (talk) 20:31, 6 February 2013 (UTC)Reply

>>> The text of the page reads   . It's missing the  , correct? Hoggenbit99 (talk) 05:01, 8 November 2019 (UTC)Reply

Non-parametric edit

I don't catch!

When forecasting a growth curve (x1, x2, ..., xn), then P(xi < xi+1) > P(xi > xi+1).

In facts, P(xi < xi+1) = 1-e where e is of the order of magnitude of the error on data.

Please explain or cite references. — Preceding unsigned comment added by AlainD (talkcontribs) 18:09, 25 January 2014 (UTC)Reply

I added a reference to conformal Prediction, which I think, here this is a special case Biggerj1 (talk) 10:06, 1 October 2023 (UTC)Reply

Regression edit

When looking in my text book, I see the best estimate for   is has an expectation of  , and standard-deviation  .

This implies that the error on the forecast estimate is mimimum for   and widens as   increases. It also implies that the confidence interval for the best estimator of  , is always wider than the confidence interval for  .

Is it the same concept? If then, is there a reason not to include the complete formula? AlainD (talk) 21:00, 25 January 2014 (UTC)Reply