Talk:Clustering illusion

Latest comment: 8 years ago by Cyberbot II in topic External links modified

Incorrect Assertion edit

"when, in fact, it has several characteristics maximally probable for a pseudorandom stream, such as an equal number of each result (P(O) = P(X)) and an equal number of adjacent results with the same outcome for both possible outcomes (P(O|X) = P(X|O) = P(X|X) = P(O|O))."

O = 10 X = 11

O|X = 5 X|X = 6 X|O = 5 O|O = 4

So neither of these statements are true. Anyway, if they were true the example would be silly - the probability of a 21 character sequence containing perfectly equalized pseudorandom properties would be very low. Actually, given that it's 21 characters, the probability would be zero. — Preceding unsigned comment added by 118.148.3.205 (talk) 03:42, 3 August 2012 (UTC)Reply

Lcamtuf / PRNG prediction edit

I just removed the reference to Lcamtuf's research on network vulnerabilities caused by Strange Attractors in Pseudorandom Number Generators. Super interesting stuff, but the Clustering Illusion is a psychological phenomenon, and has nothing to do with predictability caused by flaws in PRNGs. —Preceding unsigned comment added by 94.212.58.194 (talk) 10:29, 5 May 2011 (UTC)Reply

SAT Reference? edit

The claim that SAT answers are intentionally declustered should have an attribution.

has this been proven?

In another example, Londoners during World War II developed elaborate theories on the impacts of Nazi V-2 rocket attacks on the city. Dividing up the city in certain ways seemed to produce clusters of bombings that were believed to be intentional. In fact there was no way the V-2 rockets could have been so precise, and any clustering was due solely to random variation.

Lazarus666 07:26, 4 Sep 2004 (UTC)

The example is from the referenced Gilovich book, page 19. Is there any specific part you have doubts about?

--Taak 23:14, 5 Sep 2004 (UTC)

The following quote is shamelessly borrowed, without permission - from V2ROCKET.COM, make of it what you will.

Several factors come into play for the "modest" number of V-2s Antwerp suffered each day, but the main reasons were the German bottleneck in their alcohol and liquid-oxygen supply and the enormous dispersion of the still imperfect weapon. Antwerp would probably have suffered more direct impacts if the Germans would have equipped all of their units with the Leitstrahl remote guidance apparatis instead of just the single SS 500 Batterie.

From other information on the same site one may notice that the Leitstrahl remote guidance apparatus equipped V-2 rockets had the ability to strike a target within 250 meters, even at a 250 kilometer range, whereas the less accurate version had a typical dispersion at the target of 4 to 11 km.

Lazarus666 18:47, 11 Sep 2004 (UTC)

Erroneous example? edit

User:80.175.217.179 removed the following, calling it an erroneous example in the edit summary:

Believing that a date and time with an obvious pattern (e.g., 01:02:03 04/05/06) is rarer (i.e., "won't ever happen again") than one without an obvious pattern (e.g., 07:03:34 10/24/06).

I see why one may find this erroneous, as nothing is really clustering here, but the intro defines the illusion like this:

the clustering illusion refers to the natural human tendency to associate some meaning to certain types of patterns which must inevitably appear in any large enough data set

- and that seems applicable.--Niels Ø 07:45, 7 May 2006 (UTC)Reply

I'm the one who added the example, and I agree with Niels Ø. While the name of the illusion is clustering, the description encompasses various kinds of "pattern illusions". So could User:80.175.217.179 please give a brief explanation of why the example is erroneous. Thanks. --Nick 15:06, 12 June 2006 (UTC)Reply


Theory v. Practice edit

                   "Consider the sequence "XXOXOXOOOXOXOOOXOX"; is it random?"

YES, i CONSIDERED ! As a practical gambler, I am obsessed with my own theory, that: "all sequences are random sequences if we define some limit.."; Because of that reason, I made a comparison between the given sequence and my database derived from diligent note taking and analyzing gambling outcomes. I wouch for the validity and correctness of all of my data,(they are archived), however a few errors could have crept in. Presently I have a corresponding RANDOM SEQUENCE experienced in the Casino wich match the given one in sixteen (16) places. I suppose, cannot calculate, that even this matching has an extraordinary low probability. (Moreover, the latter day I experienced a series of actions which matched the random binary result eleven or twelve times. Interesting!) When I am ready with my present tasks, I'll return to my search and if I found the matching sequence, I will post the date when it occured and the exact data sequence which caused it. Till then, I hope that my fate allow me to reach the level of knowledge necessary to understand your teaching. Yours with thanks 144.139.11.122 02:51, 21 June 2007 (UTC)Reply

Your suggestion that "all sequences are random sequences if we define some limit" is not quite accurate. Of course, it depends on what definition of randomness you choose. If we use the standard definition that "outcomes cannot be predicted in advance", then it is certainly not true, since many sequences are generated deterministically. If you are using the definition of information theory, that a sequence is random if it cannot be described more concisely than simply by reading it out, then by definition some sequences are not random since otherwise the definition would be meaningless. For example, "00000000000000000000000000000000000000000000000000" can more easily be described as "50 0s". Robin S 16:40, 18 September 2007 (UTC)Reply


I beg to differ! I am sorry of not being able to determine my point(s) using axiomatic mathematical language, but my deliberate avoidance of formal mathematics led to my conclusion - which could be falsified as any philosophical thought. Regarding your fifty zeroes example : I am sure that in the UNIVERSE, there once was, or there will be a random sequence which corresponds to your sample. Moreover, with my very limited knowledge, I assume, that the first combination of 37 Roulette outcomes in 37 repeated trials with replacement shall be : 00000000000000000000000000000000000, that is 37 zeroes, which shall be part of the all available outcomes (37 on the power of 37, which is the number of possible events. Bewersdorff, Luck, Logic and White Lies Ch14, p90). Regarding the "definition" - it is exactly what it says : definition. By human being, to go around in the UNIVERSE, until they lost their way. Then : We will find a new "definition" of which transformations I saw quite a few in my life. Yours 121.210.9.3 21:04, 4 December 2007 (UTC)Reply

Sequence of Prime Numbers isn't Random? edit

Another way of looking at Robin's argument (above) is to say, "Is there a way to 'compress' the data?" Obviously the statement "write fifty zeros" is more "compressed" than "00000000000000000000000000000000000000000000000000", just as "write the first 20 natural numbers" is more compressed than "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19". So one way of saying a sequence is random is to say "there is no more compressed way to describe this sequence."

But this is exactly why I believe the article is incorrect to say that "OXXOXOXOOOXOXOOOXOX" is non-random. Simply attaching meaning to the sequence doesn't mean it's not random. Saying "an X stands for a prime number" doesn't predict when the next X will occur; in fact, it is well known that the distribution of prime numbers is (in one sense) random.

One might say, "But you can 'compress' the sequence by saying, 'X stands for a prime number.'" But this ignores the fact that calculating the next prime number takes more informations that simply stating what the next prime number is. The algorithm for creating fifty zeros or twenty natural numbers is much shorter than a prime computing function.

So I believe the statement about "OXXOXOXOOOXOXOOOXOX" being non-random should be remove or re-written. --KSnortum 21:57, 2 December 2007 (UTC)Reply

This is truly tricky. If you write down 10 random dice throws, is it still random? It's what you actually got (which is now in the unalterable past); it is what is written on your paper; the probability that the first number is a six is either 0 or 1 (either/or). But if you then phone a friend and ask him about the first number, he'd say the probability is 1/6. So randomness is about what you know. If you know for a fact that OXXOXOXOOOXOXOOOXOX is defined by the prime sequence, it's not random; you'd even be able to generate the next letter with certainty. If you don't know (and haven't noticed the possibility), it is as random as your ten dice throws, and odds for the next letter being X are close to 50% (or might it even be a Z?!?). If you don't know but have hypothesised the connection to the prime sequence, it's something in between random and non-random. You might (quite subjectively, of course) estimate the probability of the next letter being X much below 50% - say if you had to make a bet on it.
And what does this mean in terms of improving our article? I'm afraid I don't really know.--Niels Ø (noe) 08:42, 3 December 2007 (UTC)Reply
What I was trying to say about compression is said better here.
And I still think that just because you can calculate the next number in a sequence doesn't make it non-random. The digits in the decimal approximation of π are certainly random, yet they can be calculated. Taken to the extreme, you would never be able to produce random numbers, because you would always have to "know" how to calculate the next one! --KSnortum (talk) 04:35, 6 December 2007 (UTC)Reply
You could use a physical noise signal to generate true random numbers. Numbers from a formula are pseudorandom numbers, at best. - If you ask someone to give you 15 random digits and they said 314159265358979 - wouldn't you say they'd been cheating?--Niels Ø (noe) (talk) 17:06, 6 December 2007 (UTC)Reply

Patterns where none exist? edit

The intro says: "see patterns where actually none exist". I say: if one can see a pattern, then it definitely exists. SJ2571 (talk) 11:36, 30 January 2008 (UTC)Reply

Clarification: I'm talking about seeing patterns literally. SJ2571 (talk) 11:37, 30 January 2008 (UTC)Reply

Gravity's Rainbow has a plot? — Preceding unsigned comment added by 216.31.238.114 (talk) 22:52, 21 June 2012 (UTC)Reply

This is actually a very good point. It's incoherent to define the illusion as seeing patterns where none exist. Having read the Gilovich book, it seems the illusion is seeing the clusters that are naturally produced by a random process as signs that the process is non-random. I haven't time right now to look at how a variety of sources define it. MartinPoulter (talk) 13:36, 7 September 2013 (UTC)Reply

Rewrite edit

I'm rewriting this article as it has lost sight of the original definition of the clustering illusion. "The natural human tendency to 'see patterns where actually none exist'" is an overgeneralization from its more specific meaning, Gilovich's defines it as "the intuition that random events such as coin flips should alternate between heads and tails more than they do... Random distributions seem to us to have too many clusters or streaks of consecutive outcomes of the same type, and so we have difficulty accepting their true origins." As this person defines it: "The observation that people frequently view random distributions, for example, sequences of coin-tosses, as seeming to have too many clusters or ‘streaks’ of consecutive outcomes of the same type" [1].

Much of the deleted content would be better placed in apophenia or pareidolia.


Since, according to a branch of mathematics known as Ramsey Theory, complete mathematical disorder in any physical system is an impossibility, it may be more correct to state, however, that the clustering illusion refers to the natural human tendency to associate some meaning to certain types of patterns which must inevitably appear in any large enough data set.

I'm not 100% sure what the above means, but the clustering illusion refers to seeing "clusters" or "streaks" in typical and most small samples of random data, not patterns that will eventually appear in a large amount of data.


Whether or not patterns exist in a data set can often be decided by means of statistical analysis, or even methods of computational cryptanalysis. The sequence "OXXOXOXOOOXOXOOOXOX" may appear random to most viewers, but if the position of the X's are associated with prime numbers, and the O's with composite numbers, the pattern is clearly non-random. Data compression algorithms are designed, in a sense, to "look for patterns" in data, and to create alternative representations from which it is possible to reconstruct the original data from a compressed form. Large datasets which contain "clusters" of a non-random nature can in general be expected to compress well, given the right encoding algorithm. On the other hand, if there is no real clustering, or pattern, in a particular data set, then one would expect it to compress poorly, if at all.

The clustering illusion is not about any kind of pattern, it's about streaks or clusters specifically, and in small sets of data, not large ones.

--Taak (talk) 00:23, 29 April 2008 (UTC)Reply

Prediction edit

Is the clustering illusion more for analyzing existing data and searching for a pattern or predicting a possible outcome given a certain set of existing data? While similar, I don't think they are quite the same.

Take this example, I remember seeing it in countless math and reasoning tests:

If you flip a coin 99 times, and 99 times it lands on heads, then what is the probability of the coin landing on heads on the 100th flip? The answer is always 50/50.

However, the probability of 100 consequective flips landing on heads is certainly not 50%. This is the phenomenom where people bet against the odds. The odds will always be 50/50, but it seems people will tend to perceive a pattern. Or, they will think the odds are too great for it to happen 100 times in a row, and go against the establishment of a pattern. If it's being used to predict the outcome, is it still the clustering illusion? Lime in the Coconut 16:57, 23 December 2009 (UTC)Reply

That's the gambler's fallacy which is already linked in the article, and mentioned in passing. Do you think the mention needs to be changed in some way? - DavidWBrooks (talk) 17:03, 23 December 2009 (UTC)Reply
Ah, I see. I'm just ignorant of the subject - didn't know about the gambler's fallacy. I don't think you need to change it the article, if I had just clicked on the links I would have answered my own question ;) Lime in the Coconut 18:40, 29 December 2009 (UTC)Reply
When in doubt, click! I think that's the Official Motto Of The Internet. - DavidWBrooks (talk) 19:55, 29 December 2009 (UTC)Reply

Pictures showing the opposite cases would be good edit

I think it would be good if the article had a picture that people often would find clusters in but that statistical analysis says there are none, and a picture where people would not say there clusters but that statistical analysis says there are. --TiagoTiago (talk) 03:05, 31 October 2011 (UTC)Reply

P(O|X)=P(X|O)=P(X|X)=P(O|O)? edit

Shouldn´t the formula be P(O|X)+P(X|O)=P(X|X)+P(O|O)? According to my understanding this is what Gilovich mentions 92.72.246.123 (talk) 09:52, 6 July 2012 (UTC) WitekReply

Merger proposal edit

I propose that Hot-hand fallacy be merged into Clustering illusion. I think that the content in the Hot-hand Fallacy article can easily be explained in the context of Clustering Illusion. The merger will be especially helpful for those who read the "List of cognitive biases"-article. Because they will avoid confusion. Spannerjam 11:04, 19 October 2013 (UTC)

Oppose. Clustering Illusion is the general tendency of humans to see patterns in any random data (which includes scientific data, etc.). Hot Hand fallacy seems to be the effect when applied to sports and games, and what is done based on the perception. Merging the two will overpower the resulting article towards the latter. (Hohum @) 11:46, 19 October 2013 (UTC)Reply
Oppose for the reasons stated above. - DavidWBrooks (talk) 15:27, 31 July 2014 (UTC)Reply
Your comment doesn't add much to the discussion. Remember that Wikipedia is not a democracy

Goose121 (talk) 22:48, 19 September 2015 (UTC)Reply

Nor is it always speedy, judging from this molasses-like discussion. - DavidWBrooks (talk) 23:43, 19 September 2015 (UTC)Reply
Support. Above states that it is the effect when applied to sports and games. That would mean that it is a subcategory. As for the merging overpowering the article, Hot Hand fallacy could be made into its own subsection of the article, since it is a subtopic. Goose121 (talk) 22:23, 19 September 2015 (UTC)Reply
Oppose also; the subjects are related but very distinct. --McGeddon (talk) 15:57, 31 July 2014 (UTC)Reply
Support. Above states that it is the effect when applied to sports and games. That would mean that it is a subcategory. As for the merging overpowering the article, Hot Hand fallacy could be made into its own subsection of the article, since it is a subtopic. — Preceding unsigned comment added by 148.177.1.217 (talk) 13:37, 14 October 2015 (UTC)Reply
Oppose. The Hot Hand thing is a special case. Studies trying to answer the question whether the clusters people experience in sports performance, found no effect and concluded that the clusters were caused by the clustering illusion. Since an elementary mistake was recently found in those studies, their conclusion seems to be wrong. So the "hot hand fallacy" is possibly non-existent, but the clustering illusion is alive and well. I think this is a good reason against a merge. --Hob Gadling (talk) 15:00, 14 October 2015 (UTC)Reply

Shouldn't it say that humans "overpredict" variability? edit

Don't people expect to see MORE variability than what is actually observed, ie, things appear to be less random than we expect? SpxB fan (talk) 13:46, 14 October 2015 (UTC) SpxB FanReply

External links modified edit

Hello fellow Wikipedians,

I have just added archive links to one external link on Clustering illusion. Please take a moment to review my edit. If necessary, add {{cbignore}} after the link to keep me from modifying it. Alternatively, you can add {{nobots|deny=InternetArchiveBot}} to keep me off the page altogether. I made the following changes:

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at {{Sourcecheck}}).

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 18 January 2022).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—cyberbot IITalk to my owner:Online 12:28, 28 February 2016 (UTC)Reply