Welcome To My User Page edit

My name is Deen Gu. I make occasional contributions to the article: Northwestern University, where I majored in Economics with an interest in development economics. My home town is Rockaway, NJ.

Random Problems That Have My Attention edit

Lady tasting tea edit

I friend recently said "I can tell, only by feeling a bank note and without any additional information, which denomination it is." This reminded me of a similar situation in the 18th century. As legend has it, Ronald Fisher (creator of modern statistical science) devised a simple experiment to test this hypothesis:

A Lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea was first added to the cup. (p.11)

There are two versions of this legend. One has eight cups each chosen randomly to have milk added first, or tea first. The other has the first four having milk first, the last four tea first, then the order of the cups randomized (so there are always four with milk first, four with tea first).

Devise a simple but objective experiment (for both cases) to test whether or not the lady is telling the truth.

First Case edit

The first case can be solved using hypothesis testing. Again, the first case is where each individual cup has a 0.5 probability of having milk added first – like flipping a coin.

Null hypothesis states that she’s lying; alternative hypothesis says she’s telling the truth. Since each cup is independent of each other, a binomial distribution can be used. The significance level of the test will be arbitrarily set at α = 0.05.

Let Y be the number of cups she correctly identifies:

 
 
 

So the probability of her getting all 8 cups correct is around 0.004, which is way below our alpha level and would conclusively show that she is telling the truth. Same with getting seven cups correct. Note that this test would be wrong around 3.1% of the time.

But getting six correct would happen around 11% of the time anyway (given the null hypothesis that she’s guessing), and is a bit above our alpha value, so we would fail to reject the null hypothesis if she correctly identified six cups.

So the answer of the first part is: she would have to get 7 or 8 cups correct to pass this hypothesis test.

Second Case edit

Unfortunately the second part (where there are four cups with milk added first, four with tea added first; order randomized) is a bit harder because most of the time she will guess four cups milk first, four cups tea first. Thus the probability of her getting all eight correct is higher.

The math for this answer is actually easier, though the reasoning is not. The probability of her being completely correct is higher, because she will always correctly guess 4 cups with milk added first - the only variable is order.

An easy way to think about this is to use variables for milk added first or tea added first i.e. {1,1,1,1,0,0,0,0}. Guessing one wrong means changing a 1 to a 0 or vice versa. There are four 1s to choose from and four 0s to choose from. So the probability that she gets six correct is (she cannot get seven correct since one wrong means another being wrong as well):

 

While the probability of guessing all correct is:

 

Since getting six right is above the alpha value, we would have to conclude that in this specific experiment, getting any wrong would mean she is lying. Given that the test used in famous experiment was the Fisher's Exact Test, the above solution was probably how Fisher worked out the solution.

Banknote edit

The banknote problem is similar to that of the first case. Given that your wallet has a random distribution of four different kinds of banknotes, the equations are as follows:

 
 

The procedure would be to give your blindfolded friend one note at a time, and ask him which denomination it is, for a total of four bills. Getting three correct is very close to our alpha value of 0.5, thus we could give the benefit of the doubt to our friend that guessing three correct would reasonably satisfy his claim. But flaw of this method is that we rarely have a random distribution of banknotes in our wallet. For instance if we have recently withdrawn from an ATM, our friend might guess a $20 bill more frequently.

Exactly one of each denomination edit

A more practical test (since we're using a small sample here) would be to have one of each of the four denominations. Again, you can think of this as having four number {1,2,3,4} to choose from. The probability of guessing all four correct is:

 

You cannot get three correct, since getting one wrong means getting another wrong as well. Using combinations, we find that there of six ways of getting two correct when choosing from four numbers.

 

The procedure would would be the same as the first case, though you friend would know that each denomination would only appear once. We are using combinations since 1) we cannot repeat a number and 2) the order of which banknotes he gets correct is not important. Given the results, we would only believe your friend if he gets all four correct.

Guessing the US Banknote edit

This third case might amuse international travelers who are bored at the airport. Given four banknotes originating from random countries, guess which ONE, and only one, is the US banknote. This can be tested using a simple binomial distribution:

 
 

The probability of getting the answer correct per trial is 0.25. The above equations show the p-value for multiple trials. Given the results, your friend has to be correct three out of four trials. Note that because each trial is independent, the math is exactly the same as the first banknote problem (where each bill guess is independent of each other).

Job Problem edit

Today, a friend posted a website that randomly chose a job for you to consider: http://wtfshouldidowithmylife.com/. I assume that the computer had a bin of 100 jobs to choose from. In five page reloads, the website recommended the same job twice: crop duster pilot. So I asked myself, what is the probability of that happening?

The difficulty of this problem is that this question does not try to find the chance of getting one specific job listing twice (in five page reloads), but rather whether any pair shows up twice in five page reloads. Taking from the classic birthday problem, If P(A) is the probability of at least two out of five reloads having the same draw, it may be simpler to calculate P(A'), the probability of there not having two out of five reloads having the same draw. Then, because P(A) and P(A') are the only two possibilities and are also mutually exclusive, P(A') = 1 − P(A).

Each P(A') is independent, and is described as not sharing the same job as a previous draw. Therefore, the equation becomes

 

Calculating everything, the answer comes to 9.7%, which a bit higher than expected.

I guess I don't really have to be a crop duster pilot =).