Talk:68–95–99.7 rule

Learn more about this page

This is the talk page for discussing improvements to the 68–95–99.7 rule article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
Low	This article has been rated as Low-importance on the importance scale.

Mathematics Low‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
Low	This article has been rated as Low-priority on the project's priority scale.

Merging or keeping as own article edit

Latest comment: 13 years ago4 comments4 people in discussion

Shouldn't this just redirect to normal distribution? Richard001 06:30, 11 April 2007 (UTC)Reply

No. This page gives a practical dervied result of a normal distribution. Its also much more directly useful to most people than the math-heavy normal distribution article (which is linked to). --Nantonos (talk) 13:16, 29 June 2008 (UTC)Reply

However, the only useful information are the percentages, which you can work out anyway from the Normal Distribution article. It's pretty misleading to wrap it up in an 'empirical rule', since it's not *empirical* by that definition. Darktachyon (talk) 09:24, 22 October 2010 (UTC)Reply

It seems like the mention of rejecting the normality data based on a"6σ" event occurring more than once in 1.5 million years is subject to the Gambler's fallacy. The probability doesn't determine how often it happens over time, but rather how likely the event is to occur for each specific instance, which in the case of the example, is a day. The expected time between events may be 1.5 million years, but this says nothing about time between individual events. You can't outright reject a coin's fairness if it lands on heads ten times in a row, so likewise you can't really reject the normality of the data based on only a few samples, regardless of the magnitude of probabilities involved. Of course this doesn't rule out that the initial prediction is incorrect, but it requires more samples than 2 or 3 to make a statistically significant conclusion. --Styrofoamboots (talk) 05:06, 6 May 2009 (UTC)Reply

"A description and illustration using Java Applets by Balasubramanian Narasimhan" edit

Latest comment: 14 years ago2 comments2 people in discussion

Appears to be an error in the example used in "A description and illustration using Java Applets by Balasubramanian Narasimhan"

Exerpt:
An Example
Let us apply the Empirical Rule to Example 1.17 from Moore and McCabe.
The distribution of heights of American women aged 18 to 24 is approximately normally distributed with mean 65.5 inches and standard deviation 2.5 inches. From the above rule, it follows that
68% of these American women have heights between 65.5 - 2.5 and 65.5 + 2.5 inches, or between 63 and 68 inches,
95% of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 63 and 68 inches.

It should read:
95% of these American women have heights between 65.5 - 2(2.5) and 65.5 + 2(2.5) inches, or between 60.5 and 70.5 inches.
--Hallbg (talk) 17:15, 21 July 2009 (UTC)Reply

The demo linked to was reported earlier as not working, but it worked OK in Firefox 3.5.5 today for me. --Kay Dekker (talk) 21:14, 14 November 2009 (UTC)Reply

Why 68, 95, and 99.7 edit

Latest comment: 11 years ago2 comments2 people in discussion

Where do the figures come from. What is the reasoning behind using these values? — Preceding unsigned comment added by 92.27.94.107 (talk) 10:32, 8 March 2012 (UTC)Reply

Yeah, why not 69.2, or 96? Is this rule used to determine a std deviation, or does a std dev always follow this rule? Mang (talk) 04:04, 2 December 2012 (UTC)Reply

three-sigma rule is a more elegant name edit

Latest comment: 9 years ago4 comments2 people in discussion

From my stat courses I rarely heard professors saying "68–95–99.7 rule". Don't you think "three-sigma rule" is a more elegant title for this page? 128.97.77.169 (talk) 02:25, 1 August 2014 (UTC)Reply

I agree the title is unintuitive, as this is the page I have to look up when I want to know the value of the error function corresponding to "n" sigma (we list 1 to 7 sigma in half-sigma steps). I often need this for "back-of-the-envelope" estimates when I get into arguments, and it is hard to remember what exactly I need to google to get to this page directly (essentially, the page title is the information you are actually trying to look up, so for 1 to 3 sigma if you remembered the page title, you wouldn't need to look it up in the first place).

perhaps it would be more "wikilike" to choose a boring but straightforward title like "table of error-function values" and then do a huge table in steps of .1 or so, giving not just erf(x) but also the more useful information we have here, i.e. erf(x*2^-.5) and (1-erf(x*2^.-5)^-1.

as for "three sigma rule", idk, this sounds as if it was a rule dealing with a 3-sigma case, while "68-95-99.7" is actually a list of cases of n sigma, with a modest n=1..3. The page title actually helped me remember "68-95-99.7" by now, but as 4 or 5 sigma also occur in everyday considerations, I keep having to look it up anyway. Right, I could just try to remember more values, but then I'd just keep coming back to check half-sigma steps etc.

--dab (𒁳) 10:15, 5 January 2015 (UTC)Reply

I just noted that "three-sigma rule" may have a more subtle meaning besides simply "let's use 99.7% as 'near-certain'" -- I found the expression in the title of a paywalled paper, which has the abstract:

For random variables with a unimodal Legesgue [sic] density, the 3[sgrave] rule is proved by elementary calculus. It emerges as a special case of the Vysochanskiĭ-Petunin inequality, which in turn is based on the Gauss inequality.

Whatever this is about, it goes beyond assumption of normal distribution. --dab (𒁳) 10:46, 5 January 2015 (UTC)Reply

I think I see what is going on, it's a case of an actual theorem trickling down into popular usage as a heuristic, although I still have to figure out the specifics of the actual theorem (apparently you need a bunch of assumptions ("variables with a unimodal Legesgue density") and it needs to be explained what these assumptions represent and how plausible they are for everyday non-normally distributed statistics. It seems this is a result of the early 1990s, so on one hand it cannot be extremely trivial, but on the other hand you should expect that there are pedestrian explanations in textbooks by now. --dab (𒁳) 10:59, 5 January 2015 (UTC)Reply

German reference missing edit

Latest comment: 8 years ago1 comment1 person in discussion

There is no link to the German chapter that talks about the same thing: https://de.wikipedia.org/wiki/Standardabweichung#Streuintervalle I already tried to get it linked, but the script seems not to provide a chapter-specific link. If that's true, there is no way to link it as the main article talks about standard deviation, which is already linked. Anybody has an idea how to solve this or should it just be ignored? Moedn (talk) 18:33, 13 February 2016 (UTC)Reply

Contrasting to non-normal data edit

Latest comment: 8 years ago1 comment1 person in discussion

Hi, new here, so apologies in advance for bad form. From the intro: The "three sigma rule of thumb" is related to a result also known as the three-sigma rule, which states that even for non-normally distributed variables, at least 98% of cases should fall within properly-calculated three-sigma intervals. [2] This seems in contradiction to the Chebyshev's inequality page which gives the bound as 88.9%? Should this be modified to "most distribution's" with a reference to http://www.qualitydigest.com/inside/twitter-ed/are-you-sure-we-don-t-need-normally-distributed-data.html ? Doc-aj (talk) 16:28, 11 March 2016 (UTC)Reply

A compliment edit

Latest comment: 7 years ago1 comment1 person in discussion

Hi, also new here, just wanted to compliment the wikigeniuses who gave the Table of Numerical Values the "Approx. freq. for daily event" section. Should anyone ever try to argue that it doesn't belong on Wikipedia (math = original research?), here's my vote, as a user, that this was helpful, informative, and exactly the sort of information that belongs in Cyclopedies.

Oh, and I see that there was discussion almost a decade ago complaining about the title. I like the title.Gsnerd (talk) 01:05, 15 June 2016 (UTC)Reply

Black Swan Paragrah edit

Latest comment: 7 years ago1 comment1 person in discussion

Is the paragraph about "The Black Swan" necessary? I'm not sure Taleb counts as a credible enough statistician to be mentioned here; even if he were, he might have a better place on the page for the normal distribution.

But on top of that, the un-cited critique of his ideas is, as far as I can tell, flat-out wrong. The gambler's fallacy doesn't say anything about high-probability events not undermining confidence in a theory. Second, with a sufficiently high prior against something, a single event can completely undermine a hypothesis. If every person on Earth hovered off the ground for twenty minutes tomorrow, you wouldn't say, "Well that's extremely unlikely according to the current laws of physics, but it only happened once. We'll revise our theories if it happens a few more times". You would start asking questions immediately.

I added a "citation needed" tag but I think the whole paragraph should be deleted. — Preceding unsigned comment added by Sam Jaques (talk • contribs) 22:19, 11 November 2016 (UTC)Reply

Pr edit

Latest comment: 2 years ago2 comments2 people in discussion

What is the Pr in the equation? It would be helpful if someone would describe that. 198.213.89.136 (talk) 20:24, 17 July 2017 (UTC)Reply

Done. Jamplevia (talk) 17:40, 8 January 2022 (UTC)Reply

"histogram-based illustration" edit

Latest comment: 5 years ago1 comment1 person in discussion

I'm sorry, I don't see how this is an improvement. The values discussed are those of a normal distribution, so why would you show a sample of an "approximately normal distributed set", when all that does is give you random deviations forcing you to insert "approximately" in every statement? Just say these are the expectation values if the distribution is normal. --dab (𒁳) 05:32, 30 August 2018 (UTC)Reply

Thanks edit

Latest comment: 2 years ago1 comment1 person in discussion

Very useful article when you need these. One thing, though. I would have made the "outside the range" column for each end. That is, half the probability and twice the number before it happens, as it seems that more often one needs to know that. But then again, it isn't so hard to multiply/divide by 2. Gah4 (talk) 11:09, 11 July 2021 (UTC)Reply

Usage of sigma edit

Latest comment: 1 year ago1 comment1 person in discussion

I cannot find any article which covers usage of sigma probability. For example, five-sigma is required to prove the existence of a new particle such as the Higgs boson, whereas two-sigma is the standard in radiocarbon dating. Can anyone point me to an article covering this aspect? If not, that is a major gap in Wikipedia's coverage. Dudley Miles (talk) 11:02, 19 November 2022 (UTC)Reply

Table of Numerical Values edit

Latest comment: 3 months ago3 comments3 people in discussion

This seems to be original research. It is, imho, poorly done, but useful. The assumptions underlying it should be declared. The term "expected" seems to have a technical meaning, rather than the common (English) one. Average is a better understood term. The 15 decimal digit precision is both absurd and wrong. There is no Normal population of which any random variable is Normal to 15 digit accuracy. There is no clarification that the "outside the range" is for BOTH tails (i.e. two-tailed), while many situations will be concerned with the % below X-sigma of the mean or above X-sigma of the mean, BUT NOT BOTH. I also believe the point that if an outlier is "expected" to occur once every "time period" that it will certainly (given a random sampling) occur more often some periods and less often others, and only average that frequency.174.130.71.156 (talk) 15:27, 29 January 2023 (UTC)Reply

It is probably copied from an unacknowledged source rather than being original research, but not properly explained. The more I look at it the less I understand it. It is also way beyond the scope of the article, which is about sigma 1 to 3. I would delete. Dudley Miles (talk) 17:51, 29 January 2023 (UTC)Reply

Expected, as in expectation value, is a common term in statistics. It may or may not correspond to some users' meaning. I suspect that the table is within WP:CALC. As not so many calculators have erf(), it is nice to have a quick reference. Though I agree that 15 digits is unneeded. It is pretty obvious that outside means two-tailed, as the first one is over 60%. But also is the common meaning of outside. I suspect I would rather have one-sided, though. Given the popularity of six-sigma, we should go at least that far. I might not mind intermediate values, though.

Seems that I added this a year ago, but forgot to sign. It should be January 2023. In any case, it would be nice to have one sided, or one tailed values. Gah4 (talk) 20:54, 22 January 2024 (UTC)Reply

asymptotic limit edit

Latest comment: 3 months ago1 comment1 person in discussion

For larger than the 8 sigma and of the table, there should be a simple asymptotic limit. I might find the reference, but if someone else finds it first, they can add it. Gah4 (talk) 20:56, 22 January 2024 (UTC)Reply

Add topic