Talk:Law of large numbers/Archive 2

99 out of 100 apples

The article had the example following example: "If you took a sample of 99 apples out of 100 apples, the average would be almost exactly the same as the average for all 100 apples." I removed it because in this example there is only one apple left to be counted, so the sample average will be close to the population average. It suggests that LLN works because 99% of the apples have been counted, which is not correct. The correct example to illustrate LLN would be say: "If you took a sample of 1,000,000 apples out of 100,000,000 apples, the average would be almost exactly the same as the average for all 100,000,000 apples." That is even though 99% of the apples have not been counted, still the sample average will be very close to the population average. JS 00:43, 11 January 2007 (UTC)


I believe what you have described is the "Central Limit Theorum" which derives from the Law of Large Numbers but is not the same thing. Consequently I have reverted back. --Blue Tie 13:18, 11 January 2007 (UTC)

CLT makes a statement about the (normal) distribution of means of samples. Here I am referring to LLN which does not say anything about distributions but rather convergence in probability of one particular estimator (the sample mean) to one particular number (the population mean).
The question is whether we can apply LLN when we have a large sample but have sampled only, say 1%, of a population. I believe we can, because of convergence in probability.
The probability of the sample mean being "significantly" different from the population mean becomes smaller as the sample grows larger. This probability "approaches" zero (convergence in probability) as the size of the sample becomes "large". This is based on an analysis of probabilities. In fact this paragraph would probably be a reasonable way to explain LLN to a layman.
The example 99 out of 100 on the other hand is not based on probabilities, but merely because most (99%) of the population has been counted. The example is based on algebra. Thanks! JS 17:53, 11 January 2007 (UTC)


But look at it this way: Doesn't the law of large numbers explicitly say that as N approaches infinity (for an infinite population) the probability that Xn = mu approaches exactly 1.00? Doesn't that translate into "As a sample measurement of a finite population encompasses the full population, the probability that the mean of the sample and the mean of the population converges to 1.00"? This is like saying that as you get closer and closer to 100 out of 100 apples, you will get closer and closer to the population mean. So the apples example is a specific and detailed example of the concept that is described later in the equations. It is what the Law of Large Numbers really says.
You are right that the sample mean is an estimate of the population mean -- an estimate with some degree of error to it. And the Law of Large Numbers says that is the case. What you are describing is an outgrowth of analysis under the Law of Large Numbers, which leads to the Central Limit Theorum and Hypothesis Testing using sample means. But that concept is better handled under those topics. This article is more limited; It is just the Law of Large Numbers... not an article about sampling probability, which is what you are addressing.

The example of 99 Apples was developed to help people who had no familiarity with Statistics to grasp the basic concept of the Law of Large Numbers. With that in mind, recognizing what the equations say, and realizing that the focus of the article is not sampling confidence probabilities, is it really a good idea to remove that example? I do not think so. --Blue Tie 15:21, 12 January 2007 (UTC)

Hello Blue, You wrote "Doesn't the law of large numbers explicitly say that as N approaches infinity (for an infinite population) the probability that Xn = mu approaches exactly 1.00?"
Yes, LLN does say that as N approaches infinity the probability approaches 1.
But LLN says much more than that. It says that even if you sample only 1% (or even less) of the population, the sample mean will still approach will still approach the population mean as long as the sample is "large".
Suppose you wish to know how 100 million voters are going to vote in the elections (a binomial distribution). You do not have to sample 99 million of them (as the example suggests), you only have to sample 10,000 to estimate the mean quite accurately.
The 99/100 apples example works because of algebra.
Population Mean = Mean of 99 apples * (99/100) + Mean of last apple * (1/100). As the mean of the last apple is divided by a large number (100), so its impact is small and the mean of 99 is "close" to the population mean. This is just algebra, there is no probability involved. The means are "close" because 99% have been counted.
LLN on the other hand says means will be close even if only 1% (or less) are counted, as long as the sample is "large".
The 99/100 apples example is misleading because the logic behind it is algebra, whereas the logic of LLN is probability theory.
The 99/100 example would also lead a reader to wrongly believe that most of the population (~99%) has to be sampled for LLN to work, whereas LLN will work for samples 1% or smaller as long as they are "large".
Regards, JS 17:53, 12 January 2007 (UTC)


I understand what you are saying. In essence, you are talking about the notion of events that may have an infinite number of trials, having success p and that a less than infinite number of repetitions is required to estimate p. I agree that this is true. I also agree that this is a reasonable conclusion from analysis under the law of large numbers. But that is not exactly what the law of large numbers says when we examine the equations in the article. They do not directly refer to the rate of convergence and, in fact, I can find no source that discusses rate of convergence as an intrinsic part of the Law of Large Numbers, but rather, all sources simply point out that as the numbers get larger, they get closer to the population mean. it is discussed as a limit when the sample size approaches infinity. As Bernoulli, (who first described the law of large numbers) said: Even the stupidest man knows by some instinct of nature per se and by no previous instruction that the greater the number of confrming observations, the surer the conjecture.
But, of course, when we apply the law, we find that the convergence generally exists and operates as a square function of the sample size.
So, I again would argue that the Apple example is not a bad one, but it may not be exactly complete, and by itself it could be misleading. What is missing is that the rate of convergence is rapid (a square law) so that reasonable conclusions may be developed with a sample that is much smaller than the whole population -- as long as the population is rather large. So I would change the example from 100 apples to 1000 apples and then point out that, while a sample of 999 would be the most accurate partial sample, a sample of only 40 or 50 will typically suffice under conditions where we accept the risk of certain errors. It is not that this sample size is expressly described by the Law of Large Numbers, but rather that it is a corellary in most cases.
Am I still wrong about that?--Blue Tie 04:07, 13 January 2007 (UTC)
1.There are rules of thumb about what number may be regarded as "large", and use convergence results without worrying about rate of convergence when the sample size exceeds the number. I am not a statistician, but I do remember an example of 30 (or was it 60?) being mentioned as a sample size large enough for CLT to be applied to a binomial distribution.
2.If you worry about the rate of convergence and wish to be strict, you can always use CLT rather than just LLN to estimate confidence intervals for the population mean.
3.I suppose it would help the lay reader if we were able to provide some rules of thumb for sample sizes.
4.The standard deviation of sample means decreases at the rate of square-root of sample size (CLT).
5.My problem with the 99/100 example is it seems to say "LLN works because almost all apples have been counted", whereas LLN really says "Leave 99% uncounted, or even 99.99% uncounted, just make sure the number you count is large and you will get close to the population mean."
6.I would say the example you suggest would be improved by saying that if there were 1,000,000 apples, then 99 would give a good estimate, 999 an even better estimate, 9,999 still better, 99,999 best yet and so on... The point is that "yes, larger samples improve accuracy". But LLN is an asymptopic result and in practice can be used for samples that exceed, say 100, without worrying about convergence. For example, if you are trying to estimate a binomial distribution mean, and the population mean is 0.5, then a sample of 100 would give the standard deviation to be 0.5 * 0.5 /sq-root(100) = 0.025. So the probability that the sample mean will lie outside 0.45 to 0.55 (interval length 4 sd) is less than 1%. I suppose whether you regard that interval as sufficiently convergent to 0.50 is a matter of taste. If you increase the sample size to 10,000, then the interval shrinks to 0.495 to 0.505.
Regards, JS 07:34, 13 January 2007 (UTC)
Your points one at a time:
1. Yes there are rules of thumb, but the LLN does not give them.
2. It is not that I am worried about the rate of convergence. I am saying that the LLN does not specifically address that. It simply talks about sample size going to infinity.
3.Regarding rules of thumb for sample size, this article is not about sample size or statistical inferencing. I think you should address those issues in these other articles. This is just about the Law of Large Numbers.
4.Yes you are right about the SD decreasing according to a square law, and you are right to cite the CLT for that, but this is not the CLT, this is the LLN.
5.In the 99 out of 100 apples examples, LLN works because MORE apples have been counted. Read the example. 3 is not as good as 9. 99 is better than 9. It is simply a restatement of the LLN. The larger the sample the more correct or accurate the mean. Your problem with the example is invalid. You are conflating the concepts of statistical confidence with the related but different concept of LLN.
6.Again, you are trying to address statistical inferencing and confidence. That is not really part of LLN but are things that come out of the Law of Large Numbers and the Central Limit Theorem.
I would ask, that if you feel that the LLN addresses the issues you raise, that you find a source that says so. But in the mean-time, look at the equations and see that the LLN simply states that as the sample size gets larger the mean approaches more surely the mean of the population or the theoretical mean. --Blue Tie 17:09, 13 January 2007 (UTC)
Hello Blue, you wrote "Yes there are rules of thumb, but the LLN does not give them." and "Regarding rules of thumb for sample size, this article is not about sample size or statistical inferencing. I think you should address those issues in these other articles. This is just about the Law of Large Numbers." Yes, strictly speaking LLN applies to infinitely large samples, but of course samples are never infinitely large. Does this mean LLN has no application in real life? No, all it means is that LLN can only be applied when the sample is regarded "large" enough, for which we need rules of thumb.
I repeat, when the example says 99 out of 100, it is suggesting to the reader that the means are close because most apples have been counted. If the example said 99 out of 10,000, or 999 out of 30,000 or 999 out of 20,000 or 100 out of 5,000 or 120 out of 60,000 or 1,354 out of 2,245,523 I would not have any problem. The problem is 99 out of 100 very strongly suggests that LLN works because most apples have been counted, whereas LLN does NOT require most of the population to be counted. It can work even if only 0.001% or less have been counted as long as the number counted is "large". Regards, JS 22:36, 13 January 2007 (UTC)
You wrote "Read the example. 3 is not as good as 9. 99 is better than 9. It is simply a restatement of the LLN." I agree that 99 more than 9 more than 3 is in the spirit of LLN. I propose we change the example from population size 100 to 100,000 apples. You should not have a problem with that as the 3 vs. 9 vs. 99 will be retained. Regards, JS 22:44, 13 January 2007 (UTC)
I would agree with that if you would agree that the example should go to 99,999 apples. But then, there would be no point for the change would there? You see the Law of Large Numbers does not explicitly say that there is some small number that is sufficient. It simply says that more is better and that infinite is best. You are trying to make it say something else -- something about a small sample size being enough. That certainly falls out of the analysis that can be conducted later but that is NOT what the Law of Large Numbers says. But again, if you can find a valid source that says so, then use that source and I shall be satisfied.
Let me be clear. I want the example to say that the Law of Large numbers works because MORE have been counted and when all but 1 have been counted that is next to the very best thing and when they are all counted that is best. I specifically disagree with this point that you keep making: "It can work even if only 0.001% or less have been counted as long as the number counted is 'large'". Although that statement is true -- sometimes -- it is not specifically what the LLN says.
I feel pretty strongly about this. And apparently so do you. Shall we seek outside comment?--Blue Tie 23:33, 14 January 2007 (UTC)
Hello Blue, I said "It can work even if only 0.001% or less have been counted as long as the number counted is 'large'". You replied "Although that statement is true -- sometimes -- it is not specifically what the LLN says." As this is math, there should be no ambiguity about what LLN says. When you say "sometimes" you imply that there exist examples where 0.001% is not sufficient even though the sample is "large". If you can find such an example where the sample size is "large" but LLN is not true you would have disproved LLN as it is currently stated.
Specifically look at the assumptions of LLN and you will find no assumption that says the sample size has to be larger than some fraction of the population. What you are saying is that for LLN to apply the sample size has to be (at least sometimes) larger than a certain fraction of the population. This would be a new assumption for LLN.
I can prove the equality of sample and population means for a large population of size M is all but one data is counted without resorting to probabilities.
Algebraically,
Population Mean = Sample Mean * (M-1)/M + Last Value * (1/M) = Sample Mean + (Last Value - Sample Mean) * (1/M)
=> Sample Mean = Population Mean - (Last Value - Sample Mean)*(1/M)
=>as M approaches infinity we have Sample Mean = Population Mean as (Last Value - Sample Mean) = finite.
Note this is just algebra, and not LLN.
I agree that an opinion by an outsider, especically a statistician, would be helpful.
JS 06:30, 15 January 2007 (UTC)

The apple example is horribly wrong. It is NOT because something near the whole population has been counted that the LLN works. The apple example implies trials being nowhere near independent when the sample size approaches the whole population size. In the apple example, the sample average approaches the population average precisely because of the LACK OF INDEPENDENCE. But the LLN relies very heavily on the assumption of independence. Picture sampling apples WITH REPLACEMENT, so that trials are independent. Maybe only 100 apples are there, but the sample size may be 1 million. As the sample size approaches infinity, the sample average approaches the population average, and that has nothing at all to do with whether anything near the whole population is represented when the sample size is big enough. Michael Hardy 21:39, 17 January 2007 (UTC)

I think you are right... The lack of independance invalidates the example. I would like to address this in more detail and produce a better example, but I am busy right now. Maybe later. --Blue Tie 11:35, 18 January 2007 (UTC)

I agree completely that the apple example is misleading. It encourages a common misconception (that the sample needs to be a large fraction of the population). I would rather suggest an example that contradicts this misguided intuition: "If you take photographs of the faces of a random sample of about 30 male American college students, and project them upon each other, then you will see a pretty good picture of the average face of all male American college students, even though that population is much larger than the sample. Moreover, if you take a new sample of about 30 male American college students, and project their 30 portraits upon each other, then you will see almost the same face again. That is, the first sample of 30 persons has approximately the same average face as the second sample, even though these samples have no persons in common!" JulesEllis 01:34, 19 January 2007 (UTC)

Hello Jules, I think the photograph example may be hard to understand. Normally when we say sample average, we think of a scalar. But a photograph is a vector of facial feature points, and it is not easy to associate sample average with such a vector. I think the old apple example with the total number of apples increased to, say 100,000 (rather than just 100) could work. Regards, JS 01:39, 23 January 2007 (UTC)
I do not know about a common misperception that the sample needs to be a large fraction of the population, but the problem with the example of 99 Apples is not that some people would not understand it (That may also be a problem but a different one). The problem is that the independence criteria might be violated. --Blue Tie 04:15, 19 January 2007 (UTC)

Question: doesn't the idea that something "will be almost exactly the same as" neglect the idea of probability in the convergence? For instance, I cannot guarantee that after 100 tosses of a fair coin, assigning a value of 1 to H and 0 to T, that your sample average "will be almost exactly" 0.5. It could literally be anything between 0 and 1. In other words, you have no way of knowing, after any finite number of trials, that you are in fact anywhere near the expectation value of the random variable. It could be that, today, you encounter the unfortunately weird scenario in which the first 1 million trials give you H. You *should not* conclude, after any one of those trials, that the expectation value *is actually* 1, or even near 1. Likewise, if you happen to get a sample mean of 0.5 even after 3.7 trillion trials, you cannot guarantee me that the coin is truly fair. What is true is that your sample mean is *probably* getting closer to the expectation value. The more trials you do, the more I'd be willing to bet on your sample average as being near the expectation value. But no matter how many trials you sample, your sample mean can, in a worst case scenario (such as a distribution for which the random variable can take on any real number) be arbitrarily far off from the expectation value. The LLN gives us comfort that, "most of the time," our intuition is sound to believe that, by some reasonably large number of trials, we're in the vicinity of the expectation value. But it seems that many people who are not familiar with the technicalities of "almost sure convergence" will be misled by the "apples" example to think that is the same as "sure convergence," or that those who are not familiar with convergence in metric spaces in general will be misled to think that the first n points in a sequence tell us anything *for certain* about where the sequence is going.

First, I'm glad the apples example is gone; it had more to do with sampling than the LLN. Now for the above questions:
  • The LLN only guarantees convergence eventually. There is no question that there is a nonzero chance for any finite sample's average to be far off from the expected value.
  • As the sample size approaches infinite, the average almost surely converges to the expected value. "Almost surely" happens with a probability of 1. You might want to look up the subject.
  • The LLN is not supposed to estimate parameters such as whether a coin's chance of flipping heads is really 50%. Other areas of Statistics can do that, with appropriate levels of confidence. Shivan Bird 17:19, 26 July 2007 (UTC)

LLN follows from CLT, not the other way around

Currently the article says "One of the most important conclusions of the Law of Large Numbers is the Central Limit Theorem which, generally, describes how sample means tend to occur in a Normal Distribution around the mean of the population regardless of the shape of the population distribution, especially as sample sizes get larger." I think it would be more accurate to say that LLN arises out of CLT rather than the other way around. CLT gives the mean, the variance, and the distribution of sample means. As the variance of the distribution collapses as the sample size grows larger, it follows that the mean converges to a number (the population mean) in probability. This is the LLN. So LLN is a result that can be obtained from the CLT rather than the other way around. Of course there have to be some assumptions for this to work, for example finite variance etc. I am correcting the article accordingly. JS 18:08, 12 January 2007 (UTC)

Since LLN was proposed prior to CLT and since LLN is more basic than CLT there is no way that LLN could have come out of CLT. It may well be that LLN does not produce CLT (I think a case can be made that it does) but logically it is impossible that the LLN comes from CLT and historically it did not happen that way. Any changes you made to suggest that CLT produced LLN should be removed. --Blue Tie 04:30, 13 January 2007 (UTC)

In fact, isn't LLN the First Fundamental Theorem of Probability and CLT the second? --Blue Tie 04:39, 13 January 2007 (UTC)
I am not sure about the nomenclature, what is given what name. But I think it is true that if you have proven CLT, then you have also proven LLN. While on the way to proving CLT, you may prove LLN.
CLT does contain within it LLN. CLT "exceeds" LLN in the sense that not only does it give the mean value for the distributions of sample means to be equal to the population mean, but it also gives the variance and the nature of the distribution (normal).
If you have evidence that LLN preceeded CLT in time (which it probably did as LLN is a weaker result), and you regard the history of development of these results worth including, you are welcome to add that information. However the current statement "So LLN is a result that can be obtained from the CLT." is true and should be retained. Essentially it tells the reader that LLN is contained within CLT.
Regards, JS 06:47, 13 January 2007 (UTC)
I am not sure that is true. I have to think about it. The proof for both LLN and CLT can be similar but I do not think that CLT certainly contains LLN. CLT talks to the distribution around the mean of the sample. I have to think about whether it alone refers the mean to the mean of the population. Maybe it does. LLN talks to the mean of the sample vs the mean of the population but not to the distribution around the mean of either. I would agree however that the CLT certainly IMPLIES the LLN. But to settle the issue, can you find a source that says CLT contains LLN or that if you have proven CLT you have done more than imply LLN? --Blue Tie 17:29, 13 January 2007 (UTC)
The CLT does give the mean. It gives the entire normal distribution which includes the mean and variance. For example see the Wikipedia article on Central Limit Theorem. It says that the modified distribution Z approaches the standard normal distribution N(0,1). Z is obtained by subtracting the n * nu (where n is sample size and Greek letter nu is population mean) from sum of random variables and then dividing by sigma * sq-root(n). So CLT gives not only the nature of the distribution, but also the mean and variance. JS 17:45, 13 January 2007 (UTC)
But the CLT does not say that as the sample size increases, the estimate of the mean improves. It is a theoretical construct looking at a variety of possible sample means and explaining that these fall into a normal distribution pattern. Sure, it gives the mean, but it does not say whether the mean is "right" because the sample size is larger. --Blue Tie 20:44, 14 January 2007 (UTC)
"CLT does not say that as the sample size increases, the estimate of the mean improves." That is not right. The variance of the normal distribution as given by CLT decreases at the rate of the sq-root of sample size. Hence the estimate of the mean becomes more accurate as the sample grows larger. In CLT as the sample size approaching infinity, the variance approaches zero, the sample mean converges in probability to the population mean. This is the LLN. Regards, JS 21:16, 14 January 2007 (UTC)


I think that this is an example of the problem. The current wording is: "The variance as given by CLT collapses as the sample size grows larger, it follows that the mean converges to a number (which CLT says is the population mean)." But CLT does not directly address sample size. (In fact, sample size can be almost any positive number and CLT still works -- if the number of samples is large enough). It addresses the means of samples of ANY size following a normal distribution. It is the Law of Large Numbers that addresses sample size. Of course the LLN also addresses variance, but not really directly. In the same way, CLT addresses sample size, but not directly. I think that this distinction is why these two things are called the first and second fundamental theorems of probability: they work together. LLN describes sample size and CLT describes the distribution of possible sampling results around the true mean. These work together, and the results are stronger when you have both large sample size and large numbers of samples. But the normality of the distributuions follows the number of samples more than it follows the sample size of the samples. In addition, CLT was preceeded by LLN both historically and logically. CLT is, in a sense, a refinement of LLN in that it takes one very large sample and "looks" at what happens when it is divided into a number of smaller samples.
In any case, I think the current wording is not quite right.--Blue Tie 21:13, 14 January 2007 (UTC)

(Unindent)Hello Blue,

1) You wrote "CLT does not directly address sample size." That is not correct. CLT does consider the sample size, in fact the variance it provides for the normal distribution depends upon sample size (inversely proportional to square root). 2)Also see the article on CLT on Wikipedia. It starts by saying "any sum of many independent identically-distributed random variables". Note the use of the word "many". 3)Also go down to the proofs of CLT and you will see they are for n approaching infinity. 4)As a simple example, consider a binomial sample of heads = 0 and tails = 1 of size just 2. The probability distribution for the mean will be 0 with probability 1/4, 0.5 with probability 1/2, and 1 with probability 1/4. This certainly isn't a normal distribution. 5) So saying "sample size can be almost any positive number and CLT still works" is not right. Regards, JS 21:37, 14 January 2007 (UTC)

Your points, one at a time
1. CLT uses sample size, but it does not make a comment about how the mean of the sample approaches the mean of the population as sample size increases. It does view the variance as changing with sample size -- but LLN did this first and it is somewhat irrelevant to the point that it does so.
2. The article on wikipedia is written by just anyone. Perhaps it is a mistake to say "many" or perhaps they meant many "sets" or many "types". It is not the typical wording for CLT definitions and unless you are refering to types of variables then it does not make sense.
3. The proofs do use N going to infinity, but this is for a different effect. They are looking at dividing a continuum and it says that as N increases the distribution of the sample approaches Normality. But this says NOTHING about the original population. Indeed, the original population might be some a distorted, perhaps even discontinuous function and yet the sample means will be normally distributed.
4. Why do you believe that is not a normal distribution? It looks like one to me -- with just 4 points extracted. How do you figure it isn't?
5. Let's see.... you can see with your eyes that what I have said is right. take a look here: http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/ Try N=2 and # of Samples =1000 you will see a normal curve. That is with N=2. As I said, sample size does not matter, CLT still works. This is part of the miracle of the Central Limit Theorem. --Blue Tie 00:46, 15 January 2007 (UTC)

It's not altogether true that NOTHING is assumed in the CLT about the original population. Usually it's assumed to have finite variance, and sometimes weaker assumptions are involved. These assumptions may be weakened but not simply discarded, since the Cauchy distribution would then provide a counterexample to such a proposed modified version of the CLT. Also, the sample size matters, in the sense that with too small a sample, the distribution will fail to be a close approximation to a normal distribution. Michael Hardy 01:30, 15 January 2007 (UTC)

about assumptions, I agree. If I said that, (I do not see it above but if I said it I was wrong). When you talk about the Central Limit Theorem, sample size does not matter, unless for some reason you are trying to discuss some practical application of the Central Limit Theorem where you are (for some odd reason) not sure if your sample will be normally distribution, or if you are looking at the practical implications of the range in a hypothesis test. However, with regard to the Central Limit Theorem describing how the distibution of sample means will be normal, how does the distribution of the means get affected by the size of the samples that create those means? There may be some cross talk but essentially the normal distribution will occur in the sample means if you use sample sizes of 2 or 200.
If I take a sample of say --- 3 points and if someone else takes a sample of 30 points... both these samples are testable against some hypothesis by virtue of the central limit theorem. How did the sample size change that in either case? Certainly I can wish I had done the 30 observations -- I will get a tighter result. But, how did the lack of a large sample make the CLT void in that case? (I think this article makes the same case: http://en.wikipedia.org/wiki/Central_limit_theorem#Convergence_to_the_limit) --Blue Tie 01:57, 15 January 2007 (UTC)


1) If you take a sample of 3 points, and the underlying distribution is binomial, then the possible values of samples means will number 4. This certainly is not a normal distribution which has an infinite number of possible values. You cannot do hypothesis testing by approximating a distribution that has probability mass only at 4 points to a normal distribution.
2)I looked at the Rice simulation, but couldn't get it to work. What was the distribution of the population? Can you select binomial? If the distribution of the population is, say, normal, then samples of size N=2 will indeed have normal distributions.
3) I feel the discussion is digressing now. You wrote "CLT uses sample size, but it does not make a comment about how the mean of the sample approaches the mean of the population as sample size increases. It does view the variance as changing with sample size". There is a lack of mathematical precision in your statements. You need to define what these phrases mean: make a comment; approaches; view the variance.
4) The current sentence in the article that you find objectionable was (many posts back): "The variance as given by CLT collapses as the sample size grows larger, it follows that the mean converges to a number (which CLT says is the population mean)." To this you said "But CLT does not directly address sample size." Define "directly address". Also please state what part of the sentence in the current article is wrong, and precisely what the error is? Regards, JS 07:52, 15 January 2007 (UTC)


1)Yes, you can do testing on a small sample but I understand what you mean. As sample size increases, you are more confident of a normal distribution. Thus the small sample adjustments that modify the normal distribution.
2)You can choose a variety of non normal distributions such as skewed, uniform or custom distributions. You have to hit the button to the top left.
3)I agree generally and accept the blame. Sometimes it is that I have done an answer and then somehow it gets lost and I have to retype and I get impatient. (That just happened) I do not understand your last request though ("You need to define what these phrases mean: make a comment; approaches; view the variance.") But that might be a distraction anyway.
4)Actually, my main problem is that you are declaring LLN to be superceded or replaced by CLT. But maybe you are right. I don't think so, but maybe. I have been tossing it back and forth in my mind. And I can see the math.
I think that part of my problem with this is how I connect the two concepts. I think of LLN as saying "large enough samples can help you determine population mean and variance" while CLT says: "FURTHERMORE the sample is normally distributed". To me, the furthermore is important. I think you do really think about "futhermore" and instead see it this way: the CLT says that "a large enough sample can give you mean and variance of the population in a normally distributed response". All complete. You see CLT as including LLN and I see CLT as additive to LLN. Perhaps this is like a comparison of the set of counting integers with the set of rational numbers. Rational Numbers may include the counting integers, while the counting integers do not include the rational numbers but they do form the foundation for the rational numbers. I could go on, but this is long enough for now. --Blue Tie 12:47, 15 January 2007 (UTC)
Hello Blue, I do not care much how LLN and CLT are connected as long as they are accurately described. You wrote "I think of LLN as saying "large enough samples can help you determine population mean and variance" while CLT says: "FURTHERMORE the sample is normally distributed"." I agree with what you wrote except that LLN only provides mean, not the variance. I think what "precedes" what can be argued either way, and really I think it is unimportant.
If you do have only the CLT result, you can get to LLN. But if you have only LLN, you cannot get to CLT. But for the lay reader to know this may be unimportant, I would be satisfied if they were described as "related results". You can accordingly change the article if you believe the current version says that CLT "precedes" LLN. Regards, JS 22:27, 15 January 2007 (UTC)

Regarding the original question. It is fairly easy to obtain a weak LLN from a CLT. This is because convergence in distribution to a constant implies convergence in probability to the constant. However, the strong LLN does not follow immediately from the CLT, since there is no such implication for almost sure convergence. OliAtlason (talk) 21:38, 10 January 2008 (UTC)

LLN and likelihood

The article had the sentence "In particular, it permits precise measurement of the likelihood that an estimate is close to the "right" or true number." I removed it as I think this is wrong. "measurement of likelihood" requires the distribution, whereas LLN is only an asymptotic result. For distributions we need the CLT. LLN does not enable "precise measurement of likelihood". JS 22:54, 13 January 2007 (UTC)

An asymptotic result IS a distribution. The LLN provides for an estimate of both the population mean and the population standard deviation or variance. However, to define this in terms of likelihood may indeed require the Central Limit Theorem. So, rather than it permitting a precise measurement of the likelihood, it provides an estimate of the mean and an analysis of the "reasonable" range in which the mean may be found. --Blue Tie 20:39, 14 January 2007 (UTC)
One requires the distribution, and (estimate of) variance to estimate "likelihood". Do you have any reference showing LLN provides these? The reason I ask is that if LLN indeed provided the distribution and variance, then it appears to me that it would provide the results of CLT. JS 21:39, 14 January 2007 (UTC)
As I reflect on it, I do not think you can show likelihood without the CLT. For example, when those assumptions are not used, pollsters use an "error margin of XX %". But what that margin really means is not well defined. The CLT actually defines such things because it uses a distribution to provide a probability. However, the LLN certainly considers variance. Look at the assumptions. It assumes a population with a fixed mean and a fixed variance. Consider: As N increases toward infinity the sample variance will approach, asymptotically toward the expected variance. If you were to know, in advance, the population variance, you could use that information to provide a measure of closeness, if not exactly probability. I do not think anyone tries to do such analysis in depth because the research has gone in the direction of the CLT, but I believe it would be a natural progression if the CLT were not discovered.
Getting more to your point, I believe that the two fundamental theorems are co-equal and describe different things. The CLT looks at the distribution of sample means -- regardless of sample size -- while the LLN looks at the size of a sample without regard to how sample means are distributed.
But without original research, lets look outside of wikipedia. If the CLT answered the same question as the LLN then there would be no need for the LLN -- it would not be the 1st Fundamental Theorem of the Probability and CLT would be the 2nd. Instead, CLT would be the only one mentioned. It is not. That is because CLT does not describe the behavior of the mean with respect to the population as N approaches infinity. --Blue Tie 23:59, 14 January 2007 (UTC)
Hello Blue, There are multiple confusions in your last post.
1) You wrote "LLN certainly considers variance". Most statistical results require population variance to be finite and LLN is no different. However I never said that LLN does not consider population, what I said was that LLN does not provide the variance for the sample mean. And sample variance is also different from the variance of the distribution of sample means.
2) CLT does not look as sample mean distribution "regardless of sample size". Look at the proofs of CLT and you will see they require sample size to be "large" (infinite).
3) I do not know how the nomenclature 1st, 2nd etc. arose. Nor can you prove that CLT does not lead to LLN by saying "CLT would be the only one mentioned". These are mathematical issues, and there should be no reason for this kind of ambiguity. JS 07:08, 15 January 2007 (UTC)
You wrote "pollsters use an "error margin of XX %". But what that margin really means is not well defined." That is not correct. Pollsters are pretty good statisticians. They usually use 95% confidence intervals when they give error margins. If you delve into their reports you will find the details, though they may not be reported for the lay public. For example Gallup describing one of its polls says "For results based on this sample, one can say with 95% confidence that the maximum error attributable to sampling and other random effects is ±3 percentage points." [1] —The preceding unsigned comment was added by Jayanta Sen (talkcontribs) 07:59, 15 January 2007 (UTC).
1)LLN does not provide variance for the sample mean but it is a normally contemplated statistic for such samples (separate from either LLN and CLT). I have lost track about why this might be important.
2)The proof of both LLN and CLT contemplate infinity but is that really required in either case? Doesn't LLN hold true even if a sample has only 2 observations? Doesn't that also hold for CLT?
3)I do not know either, but they exist and are standard. The history would be intesting. Referring to my previous comment before and relating it to your current comment.. would you say that Rational Numbers lead to Counting Integers or that Counting Integers lead to Rational Numbers?
4)I agree that I erred in that example --Blue Tie 13:08, 15 January 2007 (UTC)
You wrote "Doesn't LLN hold true even if a sample has only 2 observations? Doesn't that also hold for CLT?" As far as I understand, both LLN and CLT require sample sizes to go to infinity. However these results are applied whenever the statistician believes the sample size is "large" enough.
Variance of the distribution of the sample means is important, because if statistical inferences are to be made it is required. My point was that as LLN doesn't provide the variance (or the distribution) it is not sufficient for estimating likelihood (testing statistical hypotheses). Regards, JS 22:33, 15 January 2007 (UTC)

Gambler's Fallacy etc.

The article had the sentences "However, in an infinite (or very large) set of observations, the value of any one individual observation cannot be predicted based upon past observations. Such predictions are known as the Gambler's Fallacy." These sentences seem to have no connection with LLN and I am removing them. The issues addressed by these sentences are sampling with or without replacement, independence of draws etc. Not central to LLN and confusing to the reader to have them here. JS 23:07, 13 January 2007 (UTC)

Big mistake. The gambler's fallacy is frequently mentioned in connection with misperceptions about the Law of Large Numbers. They should not be removed. --Blue Tie 16:32, 14 January 2007 (UTC)
At the end of a later paragraph which ends with the sentence "For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets." the removed sentences make more sense. I am accordingly repositioning. JS 16:37, 14 January 2007 (UTC)
I have explicitly explained the misperception you refer to. JS 17:04, 14 January 2007 (UTC)

Sample mean may never become exactly equal

The current articles contains this sentence:

"The strong law states that as the sample size grows larger, the probability that the sample mean and the population mean will be exactly equal approaches 1."

This is not true. Consider a population with values 0 and 1, where 1 has probability 1/sqrt(2). Then the population mean is also 1/sqrt(2), which is an irrational number. The sample means are always rational numbers however. So the probability that the sample mean is exactly equal to the population mean is 0, and it remains 0 if the sample size goes to infinity.

Furthermore, the strong law does not say that some probability approaches 1, it says that a certain probability is exactly 1, namely the probability that the sample mean converges to the population mean.

A better formulation would be

"The strong law states that almost every sample mean will approach the population mean arbitrarily close as the sample size increases. Although one can theoretically conceive samples for which this does not hold (for example, throwing infinitely many fives with a dice), the law strong law implies that these samples jointly have a probability exactly 0, which means that they are practically impossible."

JulesEllis 23:18, 18 January 2007 (UTC)

The sample mean will approach 1/sqrt(2) as the sample size goes to infinity. There is no problem with this. Your restatement would not be right, because as long as the assumptions hold, the law will hold also... not "almost all". I will try to return this weekend to go further, but not now.--Blue Tie 04:12, 19 January 2007 (UTC)

I believe Jules is right about the current "exactly equal" has a specific meaning in mathematics, different from "approaches" or "converges". A well known fact is that an irrational number cannot be expressed as the ratio of two integers, and that is what Jules is referring to here. But Jules, shouldn't the better formulation be "samples have a probability "approaching" or "converging" to 0, rather than "exactly" zero? JS 08:07, 19 January 2007 (UTC)
It seems to me what I stated in the above post would remove the difference between the weak law and the strong law. As this is "technical" I will remove myself from this discussion. JS 17:19, 19 January 2007 (UTC)

Blue Tie, of course there is no problem with the law, the problem is the formulation in the quoted sentence of the article. My restatement is exactly right, and not in contradiction with the law. The only problem is that you obviously do not know the meaning of the term almost all, which has an exact meaning in measure theory and probability theory (where it usually rephrased as almost surely). This meaning is that the complement of the event has measure 0, or in the special case of a probability space, probability 0. This implies that the probability of the stated event is 1. Exactly 1. Not merely approaches 1. The complement having probabiliy 0, however, does not mean that the complement is the empty set (= the impossible event). The situation is entirely analogous to the length of a mathematical point on a line. The point has length 0 (exactly 0) even though it exists. Anyhow, many other readers will probably not know the meaning of almost either, so on this point a rephrasing of my formulation is needed. The present formulation in the article is evidently an error though. Believe me, or read a good book like Billingsley, Probability and measure, 1986.

JS, indeed, that would rather be the weak law. JulesEllis 23:29, 19 January 2007 (UTC)

It occurs to me that for non-technical readers I can just delete the word almost in my reformulation. That there are exceptions with probability 0 is explained in the sentence after it. JulesEllis 10:59, 20 January 2007 (UTC)

I just noticed that the sentence before it, about the weak law, is wrong too. The person who wrote this seems to confuse the weak and the strong law! I wonder why someone writes about something he obviously didn't understand. I will change it shortly. JulesEllis 10:19, 22 January 2007 (UTC)

Removed lottery example

I removed the text "For example, the odds that you will win the lottery are very low; however, the odds that someone will win the lottery are quite good, provided that a large enough number of people purchased lottery tickets." I would like to see some cites that this is really a "less technical way to refer". Actually LLN is inappropriate for this example, as LLN speaks of sample means, whereas this is about the probability of one success in a sample of very low probability events. Besides as it says "win the lottery", it violates the independence requirement of LLN. JS 19:58, 3 March 2007 (UTC)

You are missing the point. It is a description of how some people use the term and usually this is their informal way of viewing it. Though informal, it is not technically incorrect. With a large enough number of trials, events with low probability may happen... and this derives from the law of large numbers. Because, as with the roll of the dice, the law of large numbers says that each side will get, on average, very close to its fair share of events given enough rolls. Thus, with enough trials, something with a probability of .000000001 may still be likely to occur at least once or more often. So, though the lottery may pay off with a probability of .0000001, it will happen if enough people buy tickets. --Blue Tie 03:20, 4 March 2007 (UTC)

Moving CLT

I have a problem with this paragraph:

The central limit theorem (CLT) gives the distribution of sums of identical random variables, regardless of the shape of the distribution of the random variables (as long as the distribution has finite variance), as long as the number of random variables added is large. CLT thus applies to the sample mean of a large sample as the mean is a sum. The variance as given by CLT collapses as the sample size grows larger, it follows that the mean converges to a number (which CLT says is the population mean). This is the LLN. So LLN is a result that can be obtained from the CLT.

CLT allows statisticians to evaluate the reliability of their results because they are able to make assumptions about a sample and extrapolate their results or conclusions to the population from which the sample was derived with a certain degree of confidence. See Statistical hypothesis testing.


Here is part of my problem: The central limit theorum is about the distribution of the sample mean. The law of large numbers, not only says that the probability of all rolls will be the expected mean, but that the probability of each outcome will be exactly 1/6. This is not quite the same thing as the CLT. Furthermore, this paragraph justifies by saying that the variance collapses as the sample size grows, which is not exactly true.

Another part of my problem is that this is an article about the Law of large numbers and not about the central limit theorum. The law of large numbers does not depend upon the central limit theorum and no discussion of the CLT is required to understand the LLN. So, in this article it becomes confusing. It is further confusing in this paragraph because the wording is a bit obtuse.

For these reasons I have moved it here.

--Blue Tie 03:57, 4 March 2007 (UTC)

Article has become less or more useful?

There have been large scale changes made in one day to this article. In my opinion it has made the article to be less helpful to a reader wanting to understand LLN. Specifically the history of LLN and the quotes provided are confusing and unhelpful. If other editors agree then we should revert these changes. Also the reasons for the creation of the two sections "Probability" and "Statistics" and the distinction between the two are not apparent to me. To make it easier to judge, here is the version of the artcle prior to the changes: http://en.wikipedia.org/w/index.php?title=Law_of_large_numbers&oldid=112389587 and here is the version after the changes: http://en.wikipedia.org/w/index.php?title=Law_of_large_numbers&oldid=112521145 Regards, JS 12:23, 4 March 2007 (UTC) I am okay with removing the section about CLT as it may or many not aid the reader in understanding LLN, but the rest of the edits are problematic. JS 13:14, 4 March 2007 (UTC)

I think it is helpful to read what Bernoulli originally said as he discussed the theory. He expressed his views in a way that are helpful for non-mathematicians to understand. When someone can do that, it is useful. Einstein tried to do the same thing with his papers.
The difference between probability and statistics is this: The Law of Large Numbers is expressed in terms of probability. It may not be exactly clear to some readers how it applies to statistics.
I think the structure is not quite right. I thought so last night, but it was late. So I will try to fix it. But my main interest is to make the opening paragraphs more accessable to people who do not care about the mathematical proofs -- people who hear about the Law and want to know briefly what it is, or parents having to deal with kid's homework. --Blue Tie 13:47, 4 March 2007 (UTC)
Check it now. I still think it would be nice to quote Bernouilli more fully but this is probably sufficient. I hope it reads better --Blue Tie 16:50, 4 March 2007 (UTC)
Shivan Bird's edit [2] of July 24 imporved the article tremendously in my view.
Generally I think there's been a tendency to to clutter the article with superfluous explanations -- all in good intent, I'm sure, but the article is not the place for defining and discusing convergence in probability and almost sure convergence, that should be done elsewhere. Aastrup 11:58, 29 July 2007 (UTC)

Maybe a silly question

I'm not a mathematician but I can follow the basic idea in the article (I think!) It explains how the law describes certain behavior. My question, though, is why the behavior occurs in the first place. What is it that maintains the tendency for randomness to "even out" over large samples of events? In other words, why should I expect that the coin would come up roughly 500 times heads and roughly 500 times tails, rather than (e.g.) 499 heads and 1 tail (etc.)? Can anyone answer this? Thanks. 89.100.149.237 11:16, 4 April 2007 (UTC)

It's not a silly question. It gets to the heart of what "probability" is. One of the reasons that this is called the "Law" of large numbers is that it is considered somewhat axiomatic -- it "just is". Physcially, perhaps the reasons are rooted in physics. We have a certain number of dimensions in which a coin flip operates. We have gravity. We have mass. But really, it is a thought experiment. There can be only four possible alternatives: 1)The coin turns up heads. 2) The coin turns up tails. 3) The coin lands on its edge. 4)The coin vanishes into thin air when tossed. Ok, clearly the last is not reasonable. Coins actually CAN come to rest on their edges and sometimes do, but for this experiment we PRESUME that there are only two final outcomes -- heads or tails -- even if the coin lands on its edge it falls one way or the other. No other outcomes are allowed to be thought about. Assuming (this is a thought experiment)that the coin is "fair" -- that means that the coin is perfectly round and has no weight anomalies that favor landing on one side or the other -- andy toss may result in either a head or a tail. Since there are only two possible outcomes and neither outcome is favored by the coin, it will "choose" to fall one way sometimes and the other way other times, in a fashion that is entirely random. On average, this randomness will not favor either side but will be a perfect split between the two choices -- 50%. --Blue Tie 12:23, 4 April 2007 (UTC)

Thanks! I guess what bothers me is the concept of probability in the first place. Since the coin (obviously) doesn't "know" which way to fall in one case--i.e the result is entirely random, or let us suppose so for the sake of argument--then there is no principle at work in the individual case other than randomness. What I don't understand is why randomness multiplied by n equals some kind of pattern. (I mean, I understand that it does--I just don't understand why.)89.100.149.237 17:18, 4 April 2007 (UTC)

You are right. It is guided entirely by randomness.
Lets see if we can understand where the confusion is. When the coin comes down, there is absolutely no way to know, in advance which side will be up. It could be heads or it could be tails. But it can ONLY be one of those two choices.
First you must recognize that there are only two possible outcomes. Is that a problem?
Next you must recognize that out of these possible outcomes, there is only going to be one result each time - either heads or tails. Do you recognize that?
Third, and maybe this is the hardest one of all, you must recognize that the coin is "fair", that is that it does not "prefer" to fall one way or the other and it does not "remember" how it fell in the past. Each toss is its own toss and heads may turn up just as easily as tails each and every time. There is no magical force that will cause it to suddenly prefer to fall one way or the other.
Finally, since there are 2 possible outcomes but only 1 of these will actually occur, the chance of that one occurrance is the number of things that will actually occur divided by the number that might have possibly occurred. In this case, for example, the chance that the coin will land heads is 1/2 =50%. Incidentally, the chance that the coin will land tails is also 1/2 =50%. This means that the chance the coin will land either heads or tails is 1/2 +1/2 =1 =100% of the time it will be either heads or tails.. (We ignored landing on its edge or disappearing into air). Is that part clear?
Maybe what you are really asking is "Why does it not work perfectly? Why is it that I can toss a coin 10 times and sometimes it will be 7 heads and 3 tails. Other times it will be 4 heads and 6 tails. Why is it not exactly 5 heads and 5 tails?" I will answer that, but I have to use a smaller example of 4 tosses instead of 10. You will see why in just a moment.

Looking into the future, the chance that you will get a head or a tail is 50% on any throw. So what is the chance that you will get 2 heads and 2 tails out of 4 throws? First, figure out how many different ways the coin can come up in 4 tosses. This is called an "Outcome Table". Here it is:

Outcome Table - 4 Coin Tosses

First Second Third Fourth
H H H H
H H H T
H H T H
H H T T
H T H H
H T H T
H T T H
H T T T
T H H H
T H H T
T H T H
T H T T
T T H H
T T H T
T T T H
T T T T

One thing you notice immediately is how long it is. The number of final possible outcomes is the number of outcomes possible on a toss (2) raised to the number of trials or 2^4. That equals 16. If we had gone with 10 tosses our table would have been 2^10 =1024 rows long. Way too long. That is why I chose just 4 for the example.

If you look at the rows that are highlighted, you can see that those rows have exactly 2 heads and 2 tails. That means that the chance of tossing the coin and getting exactly 50-50 heads and tails is only 6/16 = 37.5% - a bit more than 1 in 3. That means we are more likely to get a some other combination of heads and tails. You would think that with a fair 50-50 coin you would get exactly 2 heads and 2 tails more often. But you won't. This is called the Binomial Probability Distribution by the way.

If we had chosen to look at 10 coin tosses, out of 1024 possible outcomes we would have seen exactly 512 heads and 512 tails 252 times. That would be 252/1024 = about .246 -- not quite one in four.

We can also see the probability of getting 4 heads in a row. (That is the one on the top row) It is 1 in 16. About 6%. If we looked at getting 10 heads in a row it would be 1 in 1024 -- less than 1 chance in a thousand.

I hope that helps a bit --Blue Tie 20:50, 4 April 2007 (UTC)

To 89.100.149.237 LLN is a mathematical result. In math you start with axioms and build up. So the axioms of probability, statistics, number theory etc lead to LLN. Math results can inform us about the real world, for example LLN informs us about coin tosses (your example). I understand your concern about LLN, and that concern would be justified if LLN actually said the absolute number of coin tosses would approach the 50% value. However LLN does not say that. It actually says the proportion approaches 50% not the absolute number approaches the 50% value (500 in your case).
Suppose the difference of the number of heads from the 50% value is d. Then the expected magnitude of d grows at the rate of square-root of N (where N is the number of tosses). Magnitude of d grows at square-root of N, rather than N, due to the randomness of the process. In comparison the number of tosses grows at the rate of N. This is the fact that lies at the heart of LLN. So the ratio of the difference from the 50% value (that is d) to the number of tosses (that is N) collapses to zero as N grows (even though d itself is growing). This leads the ratio to approach 50% in proportion. So even though the difference from the 50% (that is d) grows as N gets larger (rather than growing smaller as was your concern), it does not grow as fast as N, leading the proportion to approach 50%. JS 15:50, 28 April 2007 (UTC)

Correct Introduction Replaced by an Incorrect One

I do not understand why every time I come back to this important article it has deteriorated!!!

Whereas previously the introduction "The law of large numbers is a fundamental concept in statistics and probability that describes how the average of a randomly selected large sample from a population is likely to be close to the average of the whole population.", it has now been replaced by the introduction "The law of large numbers (LLN) is a fundamental concept in probability that states: If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large." by this edit [3] and this [4].

The current introduction is WRONG because it implies the random variable being described is a binomial (either occurs with probability p or doesn't occur with 1-p), whereas LLN applies to all random variables (subject to some regularity conditions).

So essentially the introduction of the article has been changed from something RIGHT to something WRONG!!!

The previous introduction was easy to understand and correct, the current is confusing and wrong. Also many other changes have made the article lose focus and ramble.

I don't understand why things are so difficult with this article. I had fill pages over weeks just to get errors removed (for example the 99/100 apples example).

If you are not an expert on statistics AND are not sure of what you are doing, I request you, please stay away from this article.

Thanks, JS 21:38, 25 April 2007 (UTC)

First of all, wikipedia is the encyclopedia that ANYONE can edit. If you do not like that approach, you should find a different venue. It is entirely inappropriate to tell other editors to leave an article. You should read the agreement here. If you do not want your contributions to be mercilessly edited then do not edit here. The article WILL change over time. That is fundamental to wikipedia. Other editors WILL contribute. That is fundamental. If you cannot stand this, you are in the wrong place.
Second, there is no assumption of binomial probability. You are reading that into the statement.
Third, the way you have changed it may not be quite so correct. (Mind you, I am the one who wrote the sentence that you changed it to). But the law of large numbers is really a result of probability theory. It is then extrapolated to statistics. So, I think that this distinction is lost.
Fourth, there is a way to handle this better. Use wikipedia standards. This means that we should use verifiable, reliable sources and not Original Research.
I do agree that the version you changed back to is more readable but I am not sure it is the best.
So, in that context, lets see what other sources say:

"in repeated, independent trials with the same probability p of success in each trial, the chance that the percentage of successes differs from the probability p by more than a fixed positive amount, e > 0, converges to zero as the number of trials n goes to infinity, for every positive e." (http://stat-www.berkeley.edu/~stark/Java/Html/lln.htm). Technically correct, but it would be nice to find a definition more accessible to non-math majors. Notice also that the same imaginary binomial problem exists that you complain about.

"If the probability of a given outcome to an event is P and the event is repeated N times, then the larger N becomes, so the likelihood increases that the closer, in proportion, will be the occurrence of the given outcome to N*P." (http://www.probabilitytheory.info/topics/the_law_of_large_numbers.htm) Again, this has your imagined but nonexistent binomial problem. And again, it is not so clearly worded for the non-math major.

"the theorem in probability theory that the number of successes increases as the number of experiments increases and approximates the probability times the number of experiments for a large number of experiments" (Dictionary.com).

"A fundamental law in probability theory and statistics stating that if an event or probability p is observed repeatedly during independent repetitions the proportion of the observed frequency of that event to the number of repetitional converges (see convergence) towards p as the number of repetitions become large. "

"Theoretical and experimental probabilities are linked by the Law of Large Numbers. This law states that if an experiment is repeated numerous times, the relative frequency, or experimental probability, of an outcome will tend to be close to the theoretical probability of that outcome. Here the relative frequency is the quotient of the number of times an outcome occurs divided by the number of times the experiment was performed." (http://www.bookrags.com/research/probability-and-the-law-of-large-nu-mmat-03/) Probably one of the clearest definitions, but it is not 100% correct. For example it misses the requirement of independence. And we cannot use it word for word -copyright violations.

"in statistics, the theorem that, as the number of identically distributed, randomly generated variables increases, their sample mean (average) approaches their theoretical mean." (Encyclopedia Britannica). Almost readable, but it makes the LLN a statistical concept rather than a concept in probability theory that extends to statistics.

"If an experiment is repeated over and over, then the empirical probability approaches the actual probability." (http://www.andrews.edu/~calkins/math/webtexts/prod01.htm#LLN) a really excellent, short concise statement. but the phrase "empirical probability approaches the actual probability" could be clearer.

So, if you do not like the current version, we should use some reliable source to proceed. --Blue Tie 12:06, 26 April 2007 (UTC)

I have asked Michael Hardy to give his opinion. Who first wrote some particular text is relatively unimportant, but since you mention it, I will point out that the introduction I had reverted to was written by Michael Hardy [5] and was revised by me [6]. I have no problem with you or anyone else editing Wiki articles, all I am requesting is please be very knowledgeable about the subject if you edit this article. JS 13:34, 26 April 2007 (UTC)

I'll probably take a look at this later today. The fact that anyone can edit a Wikipedia article does not mean that anyone should edit an article that they cannot improve. I agree that the introductory section should be comprehensible to a broad audience, but not at the cost of an incorrect statement of the LLN. Perhaps a special case could be stated first, but only with a caveat saying that it is a special case. I'm not sure I see a need for that in this case, however. Michael Hardy 16:07, 26 April 2007 (UTC)

...and now I've rewritten the introduction. "Concept" is the wrong word. The LLN is a proposition, not a concept. I've also made it clear in the introduction that there are various different versions of the LLN. Michael Hardy 19:47, 26 April 2007 (UTC)

Michael, thanks! The introduction now looks good, no longer confined to binomial rvs. JS 21:39, 26 April 2007 (UTC)
It is better. I would like someone in high school who is unfamiliar with statistics or probability to look at it and see if they understand it. --Blue Tie 22:13, 27 April 2007 (UTC)


It is now going a bit further away from better. It is now a bit too wordy and technical an introduction. I would like to see it start simply and become more complex later on. Is there no way to do this?
And the binomial bit is simply unnecessary complexity. It only makes matters worse --Blue Tie 06:38, 30 April 2007 (UTC)
The binomial bit is necessary as LLN talks about a sample mean approaching a population mean, and then suddenly we have a form of LLN that talks about frequency. It therefore needs to be clarified to the reader that for a binomial rv with values of 0 and 1, the sample mean is the same thing as frequency. JS 14:35, 30 April 2007 (UTC)

Tyranny of averages

I've created an article that is the converse of this/the law of averages, and welcome any comments/contributions. I've not yet x-ref'd it, in case people feel it ought to simply be merged and a redirect placed in its stead. --Belg4mit 20:05, 31 July 2007 (UTC)

I doubt that any of the serious contributors to the article of the law of large numbers believe that the average is the begin all and end all of any distribution. Skewness exists in some distriutions e.g. the distribution of wealth, this is nothing new.
This Tyranny of averages has nothing to do with the LLN or whether or not it is true. Why do people continue to interpret this mathematical theorem beyond its premise. It is about the convergence of the sample mean of identically distributed random variables. Nothing more and nothing less. - Aastrup 21:59, 31 July 2007 (UTC)

References

Could somebody explain to me what basic contributions to the LLN, relevant to the topics discussed in this article, have been done by Vapnik and Chervonenkis, since their names appear right after those of Chebyshev, Markov, Borel, Cantelli and Kolmogorov? I mean the statements of the LLN as given here certainly do not seem to owe Vapnik and Chervonenkis anything...

Here is a page that mentions their work with regard to a Uniform Law of Large Numbers. --Blue Tie 05:29, 15 September 2007 (UTC)

Thanks. Their contribution has indeed little to do with the topics discussed in this page (it's about uniform laws of large numbers, which actually would seem to fit more naturally on a page dealing with concentration of measure). —Preceding unsigned comment added by 129.194.8.73 (talk) 07:58, 18 September 2007 (UTC)

Things that could improve this article

  • The first sentence is good, but needs something to follow it. However what follows is not a discussion of probability but of an application of probability, namely statistics. It is somewhat illusory that it is a descriptive follow-on to the first sentence.
  • The graph, which was a very good addition and provides a good way to illustrate the concept, is somewhat unrefined in appearance and is way too big for wikipedia standards. Maybe to illustrate the point well it cannot be too small, but it is too big now.
    • Compared to other pictures on wikipedia, the graph is huge. While I agree that it could be more refined, I think that its size is good, it's an integral point of the intuitive explanation of the LLN. An illustration of that many rolls of a dice has to be big to illustrate the point, especially for people not used to running mean plots. --- Aastrup 21:17, 16 September 2007 (UTC)
  • The focus on the toss of a coin is superfluous, and if the article were written with the focus on probability with statistics being used as examples of how the LLN is applied, then the statement "The law of large numbers works equally well for proportions" would be redundant as well. As it stands, it looks a bit odd anyway.
  • But within the example of the throw of a die, we are missing an opportunity to explain another aspect of the LLN: namely that each of the individual results (1,2,3,4,5,6) will ALSO approach 1/6 of the total throws. Another graph comes to mind.
  • The Bernoulli quote about stupid people should have quote marks around it. And his book was published posthumously ... that should be mentioned.
  • The Ars Conjectandi should be added as a link. I tried to find a version of the original. Could not but an English Translation is Here

--Blue Tie 05:48, 15 September 2007 (UTC)

A couple suggestions that would improve this article for the lay user; this article doesn't address 2 more common useages of "LLN". The first is the invocation of "LLN" when discussing the occurance of improbable events. Perhaps this useage is already imbodied by the current article, but not explicitly. Secondly, one will see "LLN" used when explaining the phenomena whereby the rate of increase tends to slow as the numbers get larger. Since it appears my two examples of useage are not addressed in the current article a lay person is left scratching their head. I'm not suggesting these particular useages are proper, but their discussion, if only to explain their incorrectness would help the average user.---24.253.40.138