Wikipedia:Reference desk/Archives/Mathematics/2017 June 29

Mathematics desk
< June 28 << May | June | Jul >> Current desk >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


June 29 edit

Formula for probability of x identical digits in set of y numbers? edit

I know someone who has five identical digits in their nine-digit social security number. Something like 234-33-1330. What formula would determine the probability of this happening for any digit in any order? That is, there just has to be five instances of one digit, not specifically 3, and not in that particular order? I know that the chance of getting five identical digits in a row is 1/10,000; but I don't know how to show the effect of the four extra slots making the chance greater than 1/10^4. I'm guessing it might be 9!/5!, but that's a guess. I don't know where to look to confirm this. Thanks. μηδείς (talk) 23:33, 29 June 2017 (UTC)[reply]

That won't do it. You need to count up the number of ways to pick a sequence of 9 digits with (I'm assuming exactly) 5 of them identical. Try counting up the number of different positions the identical digits could occupy within the whole string. Once you do that, multiply by the number of possibilities for what that digit is. Then, multiply by the number of choices for each of the remaining four digits. Finally, divide by the total number of possible strings of digits to get the actual probability (assuming all strings are possible and equally likely). On a side note, this approach won't work if you want the probability of a SSN with exactly 4 identical digits. I'll leave it to you to figure out why. --Deacon Vorbis (talk) 00:15, 30 June 2017 (UTC)[reply]
It's worth noting that the number of times a specific digit appears in the SSN, is distributed binomially. This makes it easy to express the values.
The 5 repetition case is easier because different digits can't both repeat 5 times. The problem is still straightforward for a lower number of repetitions, you just have to account for double-counting, or in other words, use the inclusion-exclusion principle. -- Meni Rosenfeld (talk) 00:24, 30 June 2017 (UTC)[reply]
Those two responses leave me entirely clueless. What does the number of positions have to do with anything? I just want the probability of an SSN of nine digits in which any 5 digits are identical. I don't have a degree in mathematics, and probably spent a month on probability in HS four decades ago. Leaving something for me to figure out is unhelpful, when I don't know where to begin in the first place. The most complicated math I have done since the 80's is compound interest, and dividing by fractions. Please either direct me to the proper equation, or an article or web page that contains it. μηδείς (talk) 01:29, 30 June 2017 (UTC)[reply]
As Meni mentioned, the relevant distribution is the binomial distribution. A 3, say, is a success, and a non-3 is a failure. The pmf (probability mass function) according to the table at the start of the linked article, is
 
Here the number of trials is n = 9, and the exact number k of successes is 5. The success probability is 1/10. (We assume all digits are equally probable, which is likely not true in practice, but that's where I got the probability 1/10 from.) So the probability of getting exactly five 3s is   if I've done the arithmetic right. That's the probability of exactly five 3s. The probability of five of anything is not, however, 10 times this (for all ten digits), since these events are not independent (if you have five threes, for example, you have zero probability of getting five of anything else). If I get any ideas tomorrow for the case of allowing for any digit to appear five times, I'll let you know. Loraof (talk) 02:35, 30 June 2017 (UTC)[reply]
Three was used as an example, so the probability that any digit will be reproduced 5 times (out of five tries) is 1/10^4th, since the first number doesn't matter, only that 4 other numbers match it. .00827 seems intuitively too large, as that is about 120/1. μηδείς (talk) 03:03, 30 June 2017 (UTC)[reply]
First, you missed a decimal point, it should be about 0.0008, not 0.008. But the desired probability is 10 times that, because the events are all disjoint. You get the same computation the way I was talking about: There are   ways to form strings of the desired type, and there are 109 total strings, giving a probability of 0.00827..., or about 1 in 121. --Deacon Vorbis (talk) 03:44, 30 June 2017 (UTC)[reply]
What does the 9 over 5 in parentheses mean? I am unfamiliar with that notation. I am assuming it is not the fraction 9/5. μηδείς (talk) 14:01, 30 June 2017 (UTC)[reply]
See Binomial coefficient. --Wrongfilter (talk) 14:07, 30 June 2017 (UTC)[reply]
I think you are mistaken, Deacon Vorbis. The events are disjoint, but they are not independent. For instance, the chance that
You seem to have left that unfinished. But in any case, you're right that they're not independent. But independence doesn't matter. For example, if A is the event "5 threes", and B is the event "5 sevens", then we want  , and since A and B are disjoint, this reduces to   And likewise for all 10 possible digits. --Deacon Vorbis (talk) 16:44, 30 June 2017 (UTC)[reply]
The generalized problem is quite interesting, and I am not quite sure how to tackle it. TigraanClick here to contact me 16:07, 30 June 2017 (UTC)[reply]
Generalized version in mathspeak, if someone feels like posting it on a specialized forum

We throw   balls at random (uniform distribution, independent draws) in   urns, what is the probability that at least one urn has at least   balls? (The SSN problem is k=9, b=10, m=5; the problem is if k≥2m, because then the events are not disjoint)

A related but somewhat easier problem is the probability that an SSN has exactly d different digits. See Stirling numbers of the second kind. --RDBury (talk) 04:16, 1 July 2017 (UTC)[reply]
In case anyone is still sceptical about the theoretical answer to the original question, I did an experiment. I generated 1 million strings of 9 digits (with uniform distribution of each digit) and observed 8304 with exactly 5 digits the same, and 8921 with 5 or more digits the same. For exactly 5 digits the same this is an observed probability of 0.8304% or 1 in 120. For 5 or more digits the same it is 0.8921% or 1 in 112. Gandalf61 (talk) 09:35, 1 July 2017 (UTC)[reply]

This kind of question and its many variations are well-studied. To give an exact answer in a small case like this is an unpleasant exercise in enumeration and inclusion-exclusion; simulation (as in Gandalf61's approach) is the best way to produce good enough estimates. Most interesting statements are about asymptotic behavior (that is, large random examples). The right article to start with is probably balls into bins. --JBL (talk) 19:53, 1 July 2017 (UTC)[reply]