Wikipedia:Reference desk/Archives/Mathematics/2010 January 6

Mathematics desk
< January 5 << Dec | January | Feb >> January 7 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is a transcluded archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


January 6 edit

Predicting number of items in a set (statistics) edit

I remember reading about a method to solve this type of problem, but I don't remember what it's called.

There is a set of n integers. I pick 10 randomly, one at a time, and with replacement.

10 11 14 0 9 11 7 13 14 0

Using this list, how would I predict n? The actual numbers pulled are not important, but repetition is. Can someone explain how to analyze this data? (n=15 in this example)

The general problem is:

  1. Using the output list of k items, predict n. k can be greater than n.
  2. What is the probability that the predicted n is accurate (predicted n = actual n)?

--Yanwen (talk) 00:33, 6 January 2010 (UTC)[reply]

I believe (although did not check rigorously) that maximum likelihood estimator of n would be maximum of your set of numbers. I'd leave calculating probabilities to others (Igny (talk) 02:25, 6 January 2010 (UTC))[reply]

We have an article about this: German tank problem. The maximum likelihood estimator in this case is the maximum observed value, but generally that's not likely to be the right value; it's quite biased. The best unbiased estimator is (k + 1)m/k where m is the maximum observed value. If I recall correctly, there's another multiple of m that gives a biased estimator with a slightly smaller mean squared error than that of the bset unbiased estimator. Michael Hardy (talk) 03:35, 6 January 2010 (UTC)[reply]

PS: You haven't said explicitly that you take the set to be {1, 2, 3, ..., n}, but I've assumed that's what you meant. But If we drop that assumption, I think we can still deal with that problem, but I'd have to think that one through. Michael Hardy (talk) 03:37, 6 January 2010 (UTC)[reply]

You are correct. I did not specify that the list was {1, 2, 3, ..., n} because I did not mean for it to be. In fact, I am attempting to solve this problem where a server randomly outputs a message. And I need to estimate the number of distinct messages so I can make sure I find all of them without wasting resources by simply sending a billion requests. The simple numbering is to make the problem easier, and then generalize it. --Yanwen (talk) 04:26, 6 January 2010 (UTC)[reply]
If your observations are drawn from a population that is not numbered consecutively, then the analysis in mark and recapture may be useful. Gandalf61 (talk) 08:05, 6 January 2010 (UTC)[reply]
Mark and recapture seems to give underestimates. If you divide the list in two and pretend that 5 were on the first capture, and another 5 were on the second capture, you get.
10 11 14 0 9
11 7 13 14 0
  --Yanwen (talk) 15:57, 6 January 2010 (UTC)[reply]
The number of times each integer appears in your sample sequence is distributed multinomially with k trials and n equally probable outcomes (the notation is reversed from that in the article). A logical thing to do is to choose n for which the distribution most closely matches the observed counts.
One way to do this is choose some statistic of the counts and pick n for which the expected statistic is equal to the observed one. For example, you can take the sum of squares of counts. The expected value is   so you can estimate  . In your example, the count vector is (2,1,1,1,2,1,2), the sum of squares is 16 and so  .
This method is not rigorous enough to answer your second question. For this, you need to choose a prior distribution on n and update it using Bayesian inference. -- Meni Rosenfeld (talk) 09:58, 6 January 2010 (UTC)[reply]
I thought that in second question OP asked for   where n is fixed and the estimate   is random. But if one treats n as random, one needs the prior distribution indeed. (Igny (talk) 13:38, 6 January 2010 (UTC))[reply]
That's one way to interpret it. But since the sample is known and n is not, it is more useful to know the posterior distribution (in the subjective probability sense) of n. -- Meni Rosenfeld (talk) 13:59, 6 January 2010 (UTC)[reply]
That seems like an accurate method Meni Rosenfeld.

Just one thing, how did you derive the formula for expected value? --Yanwen (talk) 15:57, 6 January 2010 (UTC)[reply]

Each of the counts (call it  ) is distributed binomially with k trials of probability   (they are dependent, but it doesn't matter). We have  . Then  . -- Meni Rosenfeld (talk) 19:37, 6 January 2010 (UTC)[reply]

The article on Multiset may be useful for your problem. Bo Jacoby (talk) 23:01, 6 January 2010 (UTC).[reply]

Try a simpler problem first: "There is a set of n objects. I pick k=3 objects randomly, one at a time, and with replacement". The n3 outcomes are classified into 3 possible observations: "three of a kind", "a pair", and "singletons". For n=1 the odds for these three observations are as 1 : 0 : 0. For n=2 the odds are as 2 : 6 : 0. For n=3 the odds are as 3 : 18 : 6. For n=4 the odds are as 4 : 36 : 24. Generally the odds are as n : 3n(n−1) : n(n−1)(n−2). The probabilities are obtained by dividing odds by n3, giving n−2 : 3n−1(1−n−1) : (1−n−1)(1−2n−1). Each of these probabilities, considered as a function of n, is a likelihood function containing the information of the observation. The likelihood function n−2 can be normalized into a probability distribution function (see Basel problem) with infinite mean value and infinite standard deviation. The likelihood functions 3n−1(1−n−1) and (1−n−1)(1−2n−1) cannot be normalized. It now remains to generalize this approach to higher values of k. Bo Jacoby (talk) 16:56, 7 January 2010 (UTC).[reply]

The following seems relevant:

  • Inferences from Multinomial Data: Learning about a Bag of Marbles
  • Author(s): Peter Walley
  • Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 58, No. 1 (1996), pp. 3-57
  • Published by: Blackwell Publishing for the Royal Statistical Society
  • Stable URL: http://www.jstor.org/stable/2346164

Abstract
A new method is proposed for making inferences from multinomial data in cases where there is no prior information. A paradigm is the problem of predicting the colour of the next marble to be drawn from a bag whose contents are (initially) completely unknown. In such problems we may be unable to formulate a sample space because we do not know what outcomes are possible. This suggests an invariance principle: inferences based on observations should not depend on the sample space in which the observations and future events of interest are represented. Objective Bayesian methods do not satisfy this principle. This paper describes a statistical model, called the imprecise Dirichlet model, for drawing coherent inferences from multinomial data. Inferences are expressed in terms of posterior upper and lower probabilities. The probabilities are initially vacuous, reflecting prior ignorance, but they become more precise as the number of observations increases. This model does satisfy the invariance principle. Two sets of data are analysed in detail. In the first example one red marble is observed in six drawings from a bag. Inferences from the imprecise Dirichlet model are compared with objective Bayesian and frequentist inferences. The second example is an analysis of data from medical trials which compared two treatments for cardiorespiratory failure in newborn babies. There are two problems: to draw conclusions about which treatment is more effective and to decide when the randomized trials should be terminated. This example shows how the imprecise Dirichlet model can be used to analyse data in the form of a contingency table.

Michael Hardy (talk) 19:55, 7 January 2010 (UTC)[reply]

I think Bo Jacoby's statement of what the likelihood function is may be just as general as the one for the stated problem, except his sample size is 3 instead of 10.

Call whichever number you observe first "1"; call the next distinct number you observe "2", and so on. Let k1 be the number of "1"s observed; let k2 be the number of "2"s, etc. Let be the number of distinct numbers observed. Given the value of n, the probability of the observed outcome is then

 

As a function of n, this is the likelihood function:

 

There are at least two things that can be done with the likelihood function:

  • Find the value of n that maximizes it. That is the maximum likelihood estimate of n.
  • Multiply it by a prior probability distribution of n, and then normalize it, to get the posterior probability distribution of n.

Michael Hardy (talk) 23:40, 7 January 2010 (UTC)[reply]

Now I'm wondering if I should have written
 
(with "1" instead of in the denominator). More later..... Michael Hardy (talk) 03:29, 9 January 2010 (UTC)[reply]
Yes, the prior likelihood function is   for   because nothing is said about n except that it is a positive integer.
The posterior likelihood function can be written   where   is the number of different objects in the sample. The normlizing divisor is  . The posterior probability distribution of n is  .
Yanwen's case is  , N=7.46·10−7. The maximum likelihood estimate is n≈19, but the probability that n=19 is (only) 0.0165.
Bo Jacoby (talk) 08:24, 8 January 2010 (UTC).[reply]

"Prior likelihood"??? There's no such thing. Maybe you meant prior probability? "Likelihood function" is a standard term. You're using it incorrectly. Michael Hardy (talk) 17:40, 8 January 2010 (UTC)[reply]

... Maybe by "prior likelihood" you meant the likelihood function when the sample size is zero? If so, that's a confusing term, since it conflicts with the way "prior" is used when speaking of probabilities. Michael Hardy (talk) 18:00, 8 January 2010 (UTC)[reply]

Prior means before the observation. Posterior means after the observation. The different observations have probabilities depending on the parameter n. The parameter values n have likelihoods depending on the observation . Bo Jacoby (talk) 05:48, 9 January 2010 (UTC). [reply]
I don't see where that L(n) came from, shouldn't that just be a multinomial distribution? It certainly makes life much easier having a prior distribution!, I'd have thought one would just get the number of different ones seen just using the maximum likelihood estimator, that's why other methods were used in the tank problem to get a reasonable unbiased estimator. Dmcq (talk) 08:53, 8 January 2010 (UTC)[reply]
No, you don't just get the number of kinds observed as the MLE. If all of the ones observed are distinct, then L(n) just increases with n, so you could say the MLE is infinite, except that infinity is not in the parameter space, so there's no MLE then. Michael Hardy (talk) 17:51, 8 January 2010 (UTC)[reply]
... OK, concrete example. In five observations you observe four numbers. Then the likelihood is
 
This reaches its maximum value when n = 8, although at n = 9 it's very close to the maximum. Michael Hardy (talk) 17:56, 8 January 2010 (UTC)[reply]

Thank you for asking. The L(n) came from the example above for k=3: (n−2 , 3n−1(1−n−1) , (1−n−1)(1−2n−1)) ≈ ( ). Factors that do not depend on n are omitted as they do not change the meaning of the likelihood function. The generalization to other values of k and is  . The factor   counts how many ways you can pick different objects out of n objects. The divisor nk counts the total number of ways you can pick k objects, one at a time, and with replacement, from n objects. Several estimators can be computed from the likelihood function. Brute force computation gives the the median n≈50, and the 90% confidence interval 8≤n≤317, for Yanwen's case =8, k=10.

  L=.8&!*^&_10
  N=.+/L 1+i.200000
  +/(0.5*N)>+/\L 1+i.100
50
  +/(0.9*N)>+/\L 1+i.400
317

Bo Jacoby (talk) 11:41, 8 January 2010 (UTC).[reply]

Oh I see, yes thanks it is just the multinomial with irrelevant bits chopped out. It certainly doesn't seem to tie the total number down very accurately for that sample! Dmcq (talk) 15:47, 8 January 2010 (UTC)[reply]

That is true. When a computational result differ from our intuition, our intuition may be wrong, and we learn. We need a nice closed formula for  . As the binomial coefficient is a polynomial in n,  , the result is a linear combination of values of the Riemann Zeta function,  . But I am not sure that this represents a computational shortcut. Bo Jacoby (talk) 16:23, 8 January 2010 (UTC).[reply]

You might like to work out the case where you sample two and they are both the same. Dmcq (talk) 18:55, 8 January 2010 (UTC)[reply]

k=2, =1.  , N=∞. Maximum likelihood estimate n≈1. Median infinite. Confidence interval infinite. You may like to work the case k=5, =1? Bo Jacoby (talk) 22:03, 8 January 2010 (UTC).[reply]

It'll just be 1/n in the limit however big k is. You must admit it is a bit surprising that I pick two red marbles out of a bag and in effect the best guess is they're all different. Dmcq (talk) 22:37, 8 January 2010 (UTC)[reply]

No, the likelihood function   is not like 1/n. It is like n−4 for n=1 2 3.... It defines a finite MLE (=1), normalizing divisor N (=1.082), mean value (=1.111), and standard deviation (=0.535). The median (=1) is not well defined. Bo Jacoby (talk) 05:48, 9 January 2010 (UTC).[reply]

Sorry didn't read properly, I was thinking of l=k-1 not 1. Dmcq (talk) 11:37, 9 January 2010 (UTC)[reply]

It is a good idea to use m=k as the observation rather than itself. m is the number of duplicates in the observation. In Yanwen's example, m=3, because the 3 numbers 11, 14 and 0 are duplicates. I counted wrongly before and so all my results are wrong. =7, not 8. Stupid me. The maximum likelihood estimate of n is 12. The n-values from 10 through 14 have likelihoods higher than or equal to 90% of the maximum. The n-values from 8 through 22 have likelihoods higher than or equal to 50% of the maximum. The n-values from 7 through 45 have likelihoods higher than or equal to 10% of the maximum.

   (<./,>./)@#&n&.><"1]1 0.9 0.5 0.1<:/(%>./)(7&!*^&_10)n=.1+i.100
┌─────┬─────┬────┬────┐
│12 12│10 14│8 22│7 45│
└─────┴─────┴────┴────┘

Bo Jacoby (talk) 00:50, 13 January 2010 (UTC).[reply]

division by zero edit

I have some thing posted in my blog about the zero division and logarithm of zero i need to check it right or not please some one help me please visit here...

http://www.unfoldallmysteries.blogspot.com/ —Preceding unsigned comment added by Ajithrajnair (talkcontribs) 13:00, 6 January 2010 (UTC)[reply]

No, that is wrong.   is definitely not 0. If anything, it's  .
See also division by zero. -- Meni Rosenfeld (talk) 13:11, 6 January 2010 (UTC)[reply]
I see that you have also provided a "proof" that  . Your error is that   only holds if x is a (real, or complex, or likewise) number. It does not hold in more general settings. -- Meni Rosenfeld (talk) 13:15, 6 January 2010 (UTC)[reply]
Reading further, I see that you also tried to find  . The correct answer is   where   (see also Complex number).
Perhaps you should study a bit more math before trying to invent things of your own? -- Meni Rosenfeld (talk) 13:21, 6 January 2010 (UTC)[reply]

our calculators and teachers will not explain what log0 is

they says it is a mathematical error but it is our duty for all Indians we must find out the value of log0

it is because zero is discovered by our grand fathers

I am quoting the above comment from your blog. Although you certainly seem like a nice person, you must understand that the evaluation of "log0" is based on mere convention, and has little mathematical value. Furthermore, I think a more modest duty for all Indians would be to do research in areas such as modern algebra (Indians are excellent at this, as well as at number theory, but of course this is not to say that Indians are not excellent at every branch of mathematics); one could argue that the invention of 0 ultimately served to create this field. Do you not agree? If I were feeling a little more bored at present, I would also note that a "calculator" can do as much as a "human" (in fact, humans can do much more than a calculator or even a supercomputer; the research in which a mathematician engages subsumes ideas to which a calculator cannot be associated); if a calculator cannot "explain log0" neither can any human, whoever his grandfather(s) may be. In any case, log0 is mathematically not very interesting; one could argue for it endlessly, but in the end it serves little purpose. (And to answer your question, if  ,   by "definition". However, this is impossible, since the graph of the exponential function never intersects the x-axis. Perhaps a slighly more intellectual question would be to formally prove this, using theoretical ideas?). --PST 14:07, 6 January 2010 (UTC)[reply]

And let me also add that this elementary mathematics does not require a genius to crack, partly because it is not real mathematics, and partly because it does not subsume any mathematical intuition. In fact, even if someone could not "solve it", I would not dispel the idea that he/she is a mathematical genius; in fact the two circumstances are entirely independent. But perhaps I am being slightly extravagent in these remarks. --PST 14:20, 6 January 2010 (UTC)[reply]
I beg to differ. log0 is very interesting, and while a calculator cannot explain it, I certainly can: The function   can be extended uniquely to a continuous function  , where the domain is the extended real number line. The extended function is strictly increasing and satisfies   and  . It has an inverse function  , which satisfies  . A similar argument follows for the definition  . -- Meni Rosenfeld (talk) 14:36, 6 January 2010 (UTC)[reply]
OK. But what are the non-trivial consequences of this particular evaluation of "log0"? The fact that the exponential function can be uniquely extended to a larger domain is of course interesting, but I cannot see furthur than the fact that it is a special case of the (general) question of "extending continuous functions" (or maybe I am simply not thinking properly; if you could contradict me, and show me how this leads to something truly non-trivial, I would be curious). Of course, one could note that it (the exponential function) cannot be extended to a continuous function defined on the one-point compactification of the space of real numbers (since that would define an unbounded continuous function on a compact space). This, and that which you mentioned, are both quite interesting when compared. But I think we are losing the OP at this stage... --PST 15:17, 6 January 2010 (UTC)[reply]
This extension of log is indeed a special case of a much more general phenomenon - it is worth noting simply because log is such an important function, so how this general question applies to it specifically is of interest.
It probably doesn't have any far-reaching consequences, but it can certainly pop up occasionally and simplify things.
Hopefully the OP will get a feeling for how there's more to these things than some handwavy calculations. -- Meni Rosenfeld (talk) 15:41, 6 January 2010 (UTC)[reply]
A first nontrivial consequence is that this way one reduces the notion of limit, including all cases at infinity, to the notion of continuity, which is more elementary and handable (just compare the simplicity of the composition property of continuity with the weird proofs of rules of compositions of limits, that still one finds in some sadistic highschool textbooks).--pma 19:48, 6 January 2010 (UTC)[reply]

Quotients and products of topological spaces. edit

Let X and Y be topological spaces and let   be a quotient of Y with quotient map  . (1) Is   a quotient map? (2) If not, is it true if we further assume that   and  . (3) If not, is it true if we further assume that  .

I'm asking because this question often arises when trying to determine if a map with domain   is continuous. Case (3) above is the case I'm looking at at the moment. Aenar (talk) 17:50, 6 January 2010 (UTC)[reply]

Do you know the definition of a quotient map? Answering your first question should be a simple matter of applying it. --Tango (talk) 22:01, 6 January 2010 (UTC)[reply]
I can't remember clearly I had to return my topology books. There's a theorem about maps of the form  ; I think it's a quotient map if X is locally compact hausdorff. The restriction must be placed on X not Y. If your trying to prove continuity I suggest trying something else because quotient maps are quite special continuous maps (the topology on the range is the finest for which the function is continuous), so it's probably not a quotient map. Plus it's generally VERY difficult to prove a map is quotient. Hope that helps Money is tight (talk) 10:38, 7 January 2010 (UTC)[reply]
Thanks. I was speculating that I might need compactness or local compactness at some place. Do you have a reference for the result that   is a quotient map when X is locally compact Hausdorff? I found a reference for the result. Aenar (talk) 11:37, 7 January 2010 (UTC)[reply]

Limits edit

  will equal 0 if you apply the usual limit laws and consider the limit as t tends to -3 in the numerator.

Factorisation of the numerator and denominator gives us  , which gives us the correct limit  . However, I only know this because I looked at the graph.

It seems arbitrary that a simplification should produce a different result, if the expression is equal. How can I possibly tell whether a function has a limit at a particular point unless I do the "correct" manipulations? There must be some strict method I can apply to every function to ensure I know I am taking limits correctly. —Preceding unsigned comment added by 94.171.225.236 (talk) 19:09, 6 January 2010 (UTC)[reply]

You are incorrect about the first assumption. You have to put -3 into both the numerator AND denominator. You will get 0/0, which is nonsense. So, you know that you must reduce in some way to get a valid answer since 0/0 is not valid. -- kainaw 19:16, 6 January 2010 (UTC)[reply]
Also, L'Hôpital's rule is very useful for computing such limits. -- Meni Rosenfeld (talk) 19:44, 6 January 2010 (UTC)[reply]
Try writing the original expression as  , then plug in numbers very close to -3. You'll see the first factor of the numerator and denominator canceling and leaving you with  . This happens at every t value except for -3, where you get   as noted above. Thus, the unreduced fraction and the reduced fraction are identical as functions, except that -3 is in the domain of the second one, and not in the domain of the first. Thus, you just can't plug in -3 at all until you switch to the reduced expression. -GTBacchus(talk) 20:08, 6 January 2010 (UTC)[reply]

Intuitively, what is  ? It is the value   as t approaches −3 (while still maintaining a distance from this number). Thus, it is not necessarily the value of   at  . Furthermore, if one considers t sufficiently close to (but not equal to) −3, one can divide both numerator and denominator by  , since   when t is close to, but not equal to, −3. Therefore,  , as desired. Since   exists, but   does not, we say that   has a removable discontinuity at −3. Hope this helps. --PST 03:00, 7 January 2010 (UTC)[reply]

Extension fields edit

Hi. I'm preparing for an algebra exam, and I've been chewing on this problem, from a past exam. I'd appreciate a nudge in the right direction, if someone can give me a hint.

Let   be irreducible over  , let the degree of f be odd, and let   be a zero of f. Prove or find a counter-example:  .

My hunch is that the claim holds for, say,  , but not for  . I don't know how to show that, though. Thanks in advance. -GTBacchus(talk) 19:56, 6 January 2010 (UTC)[reply]

For α a root of x3-2, α4/2 = α and for α a root of x5-2, α6/2 = α so the hunch is not right. It might be helpful to think about it in terms of the degrees of the extensions. In Q2), α is a root of x2 - α2 so [Q(α) : Q2)] divides 2. On the other hand [Q(α) : Q] is odd by assumption, which tells you [Q2) : Q]. Rckrone (talk) 22:09, 6 January 2010 (UTC)[reply]
Thank you very much. -GTBacchus(talk) 22:12, 6 January 2010 (UTC)[reply]
(ec) Write f(x)=xg(x2)-h(x2) with g and h in Q[x], and deg(g)≥deg(h). Then we have to show that g(α2)≠0, which implies that α=h(α2)/g(α2), which is in the field Q2]. Certainly g(α2)=0 implies h(α2)=0, and it would follow that f(x) is not the minimal polynomial of α (there is a polynomial of less degree of the form x(g(x)-a(x)h(x)). I wrote that without thinking; my algebraic head is somewhere in the hayloft you know; I'd be glad of any comment.--pma 22:23, 6 January 2010 (UTC)[reply]

How to solve for B edit

I swear this is not a homework question as I am in my late 20's and not in school, but I was helping my niece with her homework last month and I never got around to ask her how she solved it. I don't want to look stupid and ask her but this was bugging me for some time:

(A+B) x (B+C) = D

I know the answer is staring at me in the face, but I have gone completely blank. How do you solve for B? --Reticuli88 (talk) 21:39, 6 January 2010 (UTC)[reply]

Expand the brackets on the L.H.S What do you get? Theresa Knott | token threats 21:43, 6 January 2010 (UTC)[reply]

(edit conflict.) (Assuming these are all real numbers.) Multiplying out the left hand side gives you the equation  . This is a quadratic equation -- the article explains how to solve such an equation. Aenar (talk) 21:49, 6 January 2010 (UTC)[reply]
Note that  , so that we obtain the equation  . Make the substituition   and we obtain the equation  . Now, by the quadratic formula (see quadratic equation for more details), we have:
 
Is this clear? --PST 02:40, 7 January 2010 (UTC)[reply]

how did you make it equal to zero? --Reticuli88 (talk) 14:57, 7 January 2010 (UTC)[reply]

Subtract D from both sides. -- Meni Rosenfeld (talk) 15:10, 7 January 2010 (UTC)[reply]
He subtracted D from both sides of the equation, which moves the D from the left-hand side of the equals sign to the right-hand side. asyndeton talk 15:10, 7 January 2010 (UTC)[reply]

Logarithms and order of operations edit

Is   considered   or  ? According to [1], logarithms are on the same level as square roots and exponents, so then with the above question, as one goes right-to-left with equivalent-level operations, is   thus correct? Thanks, SpencerT♦Nominate! 22:51, 6 January 2010 (UTC)[reply]

With log x (regardless of the base), log xa = log (xa). To raise the function to a power, loga x = (log x)a. Of course, you can use it however you like as long as you make it clear what you mean. -- kainaw 00:35, 7 January 2010 (UTC)[reply]
Caution: some people also write  . — Emil J. 11:47, 7 January 2010 (UTC)[reply]

Note that, in the context of the question,  . For instance, if we consider the natural logarithm,   but   (and of course, 5 is not equal to 1). Thus, in general, no:  . Hope this helps. --PST 02:31, 7 January 2010 (UTC)[reply]

Okay, thanks. This answers my question. SpencerT♦Nominate! 02:54, 7 January 2010 (UTC)[reply]