Bertrand's Box Paradox edit

I'm sorry, my comments didn't come out as I meant them to. I meant "If people keep trying to treat it as a puzzle like you are doing now." What I mean by "treating it like a puzzle" is "treating it as if the goal is to demonstrate the correct answer." That is what you are doing; and it is completely irrelevant in this encyclopedia article. We must rely on what our sources say what the correct answer is, and Bertrand says it is 2/3. I also did not mean that you, Michaeldsp, "count cases." If I said that, I meant the generic "you" who is the reader.

The point of the Three Card Swindle is that people can be fooled about the answer, which makes it a puzzle. The point of Bertrand's Box Paradox is that people do not understand how to properly model a problem in the form of a sample space and a (prior) probability distribution for that sample space. Especially with conditional probability. And what's worse (and these are my own thoughts), it seems like the simpler the problem is, the harder people try to rationalize how the problem fits their mistaken model instead of examining that model for its appropriateness.

People who you think should know better will do this. Many "experts" refuse to see the ambiguity in the Boy or Girl Paradox, even when it is pointed out. Martin Gardner did, and retracted his answer 6 months after publishing it. In his May, 2010 column, Keith Devlin said if he had used the word "given" instead of "I tell you that," that the ambiguity would be gone. Nobel physicists have been known to insist the answer to the Game Show Problem (which is mathematically equivalent to the Box Paradox) is 1/2. So please don't feel insulted when I say that you, too, are modeling incorrectly - or at least, are wrong when you say Bertrand and I are doing that. As long as the events in your sample space are independent and span all possibilities, it does not matter how they are "mixed together." If handled properly, any such model will get the right answer. You could even incorporate the phase of the moon into your sample space, and still be correct.


In the Bertrand Box Paradox, our sample space and distribution can be either of these (the notation is "COIN from BOX"):

  • {any G from GG, G from GS, S from GS, any S from SS}, {1/3,1/6,1/6,1/3}
  • {G1 from GG, G2 from GG, G from GS, S from GS, S1 from SS, S2 from SS}, {1/6,1/6,1/6,1/6,1/6,1/6}

Next, a correct solution (there can be others) to any conditional probability problem, given a sample space {S1, S2, ..., SN}, an event of interest E=S1+S2+...+SM (for M<N), a distribution {P1, P2, ..., PN}, and a condition C, is:

Prob(E|C}=sum(I=1 to M, PI*Prob(C|SI)) / sum(I=1 to N, PJ*Prob(C|SJ))

This is a form of Bayes' Theorem. It can be used with either sample space to get the correct answer, regardless of how the SJ's "mix." This is what Bertrand did, and it is what I did. If you arrange it so all the events in the sample space are equally probable, and that Prob(C|SJ)=0 or 1 for all of them, then counting cases is equivalent. So please, don't say there was anything wrong with any of my math.

I do not like to use words like "prior" and "posterior" in the article because they will not be understood by most readers, and are necessary only if you take a full Bayesian approach. Which also won't be understood, and also is unnecessary. But as to what I meant, probabilities for events that might happen, like for "getting a gold coin [in a] case you do consider" is obviously a prior. The conditional probability for "the box was GG, given that you got a gold coin" is a posterior. But my use of tenses will be understood in general, whether or not the importance is grasped. "Prior" and "posterior" will not.

Finally, I have tried very hard to differentiate between the correct answer, which is the fraction 2/3, and the correct solution, which means the method you use to get that answer. Counting cases is an incorrect solution, but can get the correct answer under certain conditions; specifically, when the (prior) probability of observing the known result is either 0 or 1 for every element of your sample space. Bertrand's point is that people assume those conditions are met without considering how. They fail to consider how the (prior) probability for things that didn't happen needs to be accounted for. Bertrand emphasized that by using the GS box, where part of the (prior) probability gets "eliminated" (the word non-mathematicians will use) when you know you didn't get a silver coin. JeffJor (talk) 20:59, 3 August 2011 (UTC)Reply

Second Reply edit

You said "to count coins as starting cases is incorrect." Did you mean "coins are an incorrect set of starting cases to use?" They are not. Any set of cases that (1) comprises all possibilities and (2) consists of independent events, is correct once you apply probabilities properly. My point here is that your argument about "how they mix" is invalid. It is the type of intuition-based argument that is the root of all controversies about these problems. Any random process that involves two separate decision points can use, as a sample space, the possible results of the first, or the combined possible results of both. Regardless of how they "mix."

Or did you mean "coins may be correct as cases to use, but it is incorrect to merely count them?" Technically true no matter what cases you use. But counting is equivalent to the correct solution if and only if the cases you use are the coins. Which is the other point. So either way, your statement "to count coins as starting cases is incorrect" is what is incorrect.

The article does indeed mention, in all approaches, that all of the cases are equally probable. It's the first bulleted point in each line of reasoning. It may not be a property of sample spaces in general, but it is true here. I didn't include the term for it for space reasons, since I would immediately factor them out anyway. I viewed it as a needless complication. And the unequal probabilities you mention, if the number of coins in each box is something other than 2, is irrelevant speculation. The number of coins in each box is two. This isn't a survey of probability theory, it is an article about a specific thought, and should restrict itself to that thought.

Regarding the MVS thread: please note that the problem as discussed starts out "You know that...", not "it is given that..." The wording can be critical in some people's minds. But Robert is a troll who seems to hold to the philosophy "I thought of it, therefore it is so." He obviously has no formal background in math, and refuses (because he didn't think of it first?) to accept the difference between a prior probability and a posterior probability. So his solution to the TCP is to construct the sample space {BB, BG, GB} with prior probabilities {1/3,1/3,1/3}, and work from there. He thinks age is a vital component of the problem, and must be used as the only possible means to discriminate the children (which isn't even necessary to do). A formal solution to the TCP can be made with a sample space that merely counts the boys in the family: {(B0, B1, B2),(1/4, 1/2, 1/4)}. If C represents the (for now) incompletely defined condition in the problem statement, the formal solution is:

P(B2|C) = [P(C|B2)*P(B2)]/[P(C|B0)*P(B0)+P(C|B1)*P(B1)+P(C|B2)*P(B2)] = P(C|B2)/[P(C|B0)+2*P(C|B1)+P(C|B2)]

Technically, a solution could start with a prior of any family, go to a first posterior where the family has two children, a second posterior where you know one gender in the family, and finally to the posterior where that gender is "boy." But the first two have no effect on anything (if the very tersely worded puzzle is to be solvable at all), so I treat the prior as a random family of two where you know one gender. Note that I don't define how you know this, so all possibilities for that are still open, despite what Robert thinks. In that case, P(C|B0)=0 and P(C|B2)=1 even with my incomplete definition of C. But until we can complete that definition, P(C|B1) is ambiguous. Still, the answer reduces quite nicely to P(B2|C)=1/[2*P(C|B1)+1]. I trust your "point of view" does not disagree with any of this so far, since it is just an application of accepted (and necessary) assumptions, plus some laws of probability?

So the answer boils down to P(C|B1). But there is another partition of the event space that exists: {ALOB,ALOG}. "ALOB" means "you know there is At Least One Boy," but "ALOG" means "you know something else which, in whole or part, implies 'you know there is At Least One Girl'." So P(ALOB|B2)+P(ALOG|B2)=1. This form for ALOG is necessary because we must represent the entire prior somehow, and this is the most general way.

Now, if you can find a reason why P(ALOB|B1)=1 and P(ALOG|B1)=0 is the only possible interpretation of the problem statement, I'd be glad to hear it. I've heard lots of hand-waving arguments as to why; you say you read Robert's. But none say anything effectively different than "the posterior can't be different than prior," which is absurd. But if we give any non-zero value to P(ALOG|B1) at all, and it seems to me we must, it can only be 1/2. You use the same reasoning that leads to P(BB)=P(BG)=P(GB)=P(GG)=1/4, which is called the Principle of Indifference. While it may be bad to assume in real life (where those four prior probabilities are not all equal, I'll add), it is required in puzzles like this.

Did you read any of the more educated presentations I mentioned? Martin Gardner merely said his TCP was ambiguous, and both P(ALOB|B1)=1 and P(ALOB|B1)=1/2 were possible interpretations. Keith Devlin said P(ALOB|B1)=1/2 for this wording, but if you change the wording to "given that ALOB," then P(ALOB|B1)=1; I disagree (it's a hand-waving argument at best), but he agrees with me about this problem. Or Marks and Smith, who take a full-blown Bayesian approach? They say of P(ALOB|B1)=1 that "this extreme assumption is never included in the presentation of the two-child problem, however, and is surely not what people have in mind when they present it." In fact, these are representative of every published paper that considers whether there is an ambiguity. They all say there is one. Some leave it ambiguous, some suggest alternate wordings to support P(ALOB|B1)=1, but any that attach an answer without rewording it say P(ALOB|B1)=1/2.

I'll present one final argument, that brings us back to Bertrand. This actually is a paraphrasal of Jason Roesnhouse's translation of Bertrand's introduction to his Box Paradox:

You know that a certain family has two children. What is the probability the two children share a gender? It is obviously 1/2, since there are equal numbers of families with shared, and mixed, genders. But let us suppose you recall the (unspecified) gender of a child (also unspecified) in the family. Now it seems that the probability should be 1/3 since there is one family type with two of any one gender, but two that are mixed. This is clearly fallacious, since you would have to get the same answer regardless of what gender you recall. That would make the answer to the first question 1/3 as well, but we know it is 1/2. In actuality, since there are equal numbers of shared- and mixed-families, it is self evident that either type must be as likely as the other, regardless of what gender you recall.

Better math follows, as I have described, but this is the gist of his argument. Being recalling one gender does not imply that it is yhe only that gender you could recall; yet that is a requirement to get 1/3 as the answer, and it makes the answer to the symmetric problem about recalling a girl different. Similarly, when Monty Hall opens a door to show a goat, it does not mean that he always opens that particular door when he can; that is, when it isn't chosen by the contestant and has a goat. Yet that is the assumption people unknowingly make when they say the answer is 1/2. It is the exact same error in both problems. JeffJor (talk) 21:13, 5 August 2011 (UTC)Reply

Response edit

I can only repeat that I did not talk about using cases (you may use whatever you want), I talked about counting cases, which is correct only if it is quite obvious that the cases you are starting with are equally probable. You write ‘But counting is equivalent to the correct solution if and only if the cases you use are the coins’ -- it is typical hand waving. Why? Have you checked it? Are your cases indeed probabilistic events, compound or elementary ones? You write ‘And the unequal probabilities you mention, if the number of coins in each box is something other than 2, is irrelevant speculation.’ It is only your opinion, while my point is that it allows better understanding of our specific case. Let me formulate this generally, as it often helps. Imagine there are several parameters in the problem, a,b,c,…. I have found the correct (let us assume it) general answer, and observed that it depends on b, c, …, but does not depend on parameter a (read, does not depend on number of gold coins in GG box). It is exactly the situation when we say that parameter a is irrelevant to our problem. Then you bring me a solution for specific values of parameters, b_0, and, say, a_0 = 2 * b_0 with the correct (i.e. consistent with general) answer. However, you use en route some characteristic of the system that clearly depends on parameter a and you state that some property associated with this very characteristic is self-evident. I become suspicious and check whether this property (all coins are equally probable) holds in general, and I see it does not. So, it is not that evident, and I have to check it. I calculate probabilities of events of interest, and suppose, I see you were right. OK, you have the solution. But after I have calculated necessary probabilities, I do not need to count cases anymore – I have probabilities at hand. That is what I meant by saying that checking your conditions is equivalent (in effort) to solving the problem. It would be much worse if I found discrepancies in your arguments. Consider your own examples of the sample space • {any G from GG, G from GS, S from GS, any S from SS}, {1/3,1/6,1/6,1/3} • {G1 from GG, G2 from GG, G from GS, S from GS, S1 from SS, S2 from SS}, {1/6,1/6,1/6,1/6,1/6,1/6} The former one is exactly in accordance with what I suggest to do (calculate probabilities). Of course you can not merely count cases one and two here since they are not equally probable. The latter looks more promising, with equal probabilities (1/6) of all cases. But think, what could cases G1 and G2 mean. There is no coin G1 and G2 in GG box, they are indistinguishable! Your ‘cases’ are not outcomes, they are not compound events either. Indeed, how would you suggest to distinguish them? To engrave small numbers 1 and 2? Will I then be allowed to see them? If no, they do not exist for me. If yes, what will we do with gold coin from GS box as it becomes different from those two? In fact, there is a coin that you have drawn from the box, and another one left there, but you cannot call them G1 and G2. It is a bogus, something like counting number of angels on the top of aneedle. Unfortunately, this very bad reasoning goes back to great H. Poincare who himself gave such explanation (G1,G2) to Bertrand.

(Michaeldsp (talk) 19:23, 8 August 2011 (UTC))Reply

I can only repeat that I did not say that counting cases is correct. I said that counting cases is equivalent to the correct solution under special circumstances which are not true in general, but are true if and only if, IN THIS SPECIFIC PROBLEM, the cases you use are the coins. All your objections are moot, TO THIS SPECIFIC PROBLEM. And yes, the cases in this specific problem are equally probable.
"Why is it equivalent?", you ask? Because the correct solution involves the quotient of two sums. Each term in these sums is the product P(C|X(i))*P(X(i)), where C is the condition and the set of X(i) is a partition of the sample space. If P(X(i))=1/N for every I - which is true when you partition on the coins - and P(C|X(i)) is always 0 or 1 - which is true when you partition on the coins - then you can multiply the numerator and denominator of this quotient each by N, and the sum turns into a count. That doesn't make it the correct solution, but it is equivalent to the correct solution TO THIS SPECIFIC PROBLEM.
And again, this is not a survey of general probability. I am not advocating that you count cases in general. The article only deals with three boxes, with two coins in each. Anythign else is outside of its scope. Treating the (two) coins as indistinguishable is an irrelevant complication IN THIS SPECIFIC PROBLEM since it cannot change the answer, only make it more complicated. And since the point is not to get the right answer (a puzzle), but to demonstrate what is wrong with merely counting cases, a paradox that isolates a discrepancy when you do that is perfect. The point is to show that you may not properly account for all possibilities when you count the cases, not to assure that you do when you change the problem to a more general case, or to teach what the technique should be in the most general case.
Your "general solution" is irrelevant. You change a known system by parameterizing a known value so that the problem is beyond the scope of the article, and then you object to using the known value that puts it back. And finally, I'm sorry, but this is an encyclopedia article; you can't put things into it because you feel they are more correct and you want to make a broader point. You have to draw from published literature about the topic, and none do what you want to add is so published. JeffJor (talk) 15:49, 9 August 2011 (UTC)Reply

Reply edit

I wonder, whether you indeed do not understand what does depend on or be independent of mean or you make a show to keep your word last? Again, if in general case solution does not depend on number of coins in GG box, then any method that is based on concrete value of this number, even if it leads to the correct answer, is almost for sure logically flawed. In Bert. box problem, taking G1 and G2 as cases and assigning some probabilities to them is meaningless, because G1 and G2 are not possible outcomes in our sample space -- you can not reproduce and/or describe them in series of repeated experiments. (Michaeldsp (talk) 23:11, 15 August 2011 (UTC))Reply

And I wonder if you arguing so that you can make a point, regardless of how appropriate the point is. (1) There is no "general case," only the case of two coins per box. (2) Since you only see one of the two coins, it is irrelevant whether they are distinguishable or not. There is nothing to distinguish the coin you see from, and it changes nothing about the problem's presentation if you call them G1 and G2. The whole concept of indistinguishability is a red herring in this problem. Actually, it is worse than that, since it gets in the way of the point of the paradox (see point #4). (3) Say you go to your special case, which is not in any way implied by the problem: the boxes contain (Box G) any number of gold coins, (Box GS) equal numbers of gold and silver coins, and (Box S) any number of silver coins. Then it is a simple re-arrangement of the events I used to get P(gold|Box G)=P(silver|Box S)=1 and P(gold|Box GS)=P(silver|Box SS)=1/2, and you get the same answer TO A DIFFERENT PROBLEM. (4) But the point is not to get the answer. Let me say that again, since you keep ignoring it: THE POINT IS NOT TO GET THE ANSWER. It is to show that a solution technique which equates prior probabilities with posterior probabilities is flawed. Using indistinguishable coins, or different numbers of coins (and so my alternate solution), does not allow for the direct comparison that lets the reader see why this is true. JeffJor (talk) 21:08, 16 August 2011 (UTC)Reply

Say you claim that after careful analysis of Moon phases you correctly predicted heavy rain that has happened today in the morning. When I ask how about predicting something about to-morrow, day after, week ahead, you answer that these problems are totally irrelevant and completely out of scope of what you are doing. OK, I got some idea of your level of comprehension of general vs specific, relevant vs irrelevant, correct vs incorrect, etc. Good luck.(38.108.195.69 (talk) 17:07, 18 August 2011 (UTC))Reply

I'm sorry, but that is apples and oranges. What you are suggesting above is equivalent to asking that moon-phase model to predict the likelihood of rain based on the phases of any other objects orbiting the earth. That is the "more general problem" analog to what you propose, not using the same model on repeated trials of the same experiment. But know I also have some idea of your level of comprehension. But before I sign off, there is another point I've made repeatedly that you have ignored. Whether or not you think a more general approach to the problem is important is COMPLETELY irrlevant to an encyclopedia article. It is not a place to express original thought, only to compile such thought from other places. Have you found a reference that takes this approach to Bertrand's Box Paradox? If not, it does nto belong even if it is as correct as you wnt it to be. JeffJor (talk) 18:00, 18 August 2011 (UTC)Reply