Wikipedia:Reference desk/Archives/Mathematics/2012 November 28

Mathematics desk
< November 27 << Oct | November | Dec >> November 29 >
Welcome to the Wikipedia Mathematics Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


November 28 edit

Generating correlated non-normal random variables edit

A question about generating correlated random variables with non-normal distributions.

I have a Monte-Carlo-type problem requiring that I generate a series of pairs (and maybe n-tuples) of correlated random variables with a correlation coefficient of rho. I know how to do this when they are normally distributed (by using a Cholesky decomposition of the covariance matrix). But what if they are not normally distributed? I’m particularly thinking about multivariate uniform distributions, t distributions, Laplace distributions and some fat-tailed distributions like the Cauchy or Levy.

I found a suggestion to generate two univariate series, X1 and X2, and then define a new series X3 as rho X1 + sqrt(1-rho^2) X2, so that X1 and X3 will be my correlated random variables. But doesn’t this method ONLY work for normally distributed random variables? Is there an equivalent sort of thing that I can do with other distributions, like uniform or t?

And what about fat-tailed distributions with no finite variance? Since they have no finite variance, I presume that the correlation coefficient rho, which is derived from variances and covariances, has no equivalent. How does correlation work in such cases? Thorstein90 (talk) 05:04, 28 November 2012 (UTC)[reply]

I think you're going to have to specify a bit better, and perhaps think a bit more about, what multivariate distribution you want. I doubt that the distribution of the individual variables, combined with a covariance matrix, is actually enough to determine the multivariate distribution. --Trovatore (talk) 03:16, 28 November 2012 (UTC)[reply]
There are a few that I would like to try. I think that I can use an R function for the multivariate t distribution, and with one degree of freedom this will define for me a multivariate cauchy distribution (if I understand correctly). But I'd also like to try drawing randomly from multivariate uniform, laplace, pareto and levy distributions and in the absence of any usable functions I was wondering if there is some way that I can make the univariate draws correlated. Thorstein90 (talk) 05:04, 28 November 2012 (UTC)[reply]
According to stable distribution,
a random variable is said to be stable (or to have a stable distribution) if it has the property that a linear combination of two independent copies of the variable has the same distribution, up to location and scale parameters. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution.
So if you use a stable distribution with finite variance, you could generate iid variables X and Z, and generate Y = a + bX + cZ. Then cov(Y, X) = cov(bX, X) = b×var(X), and var(Y) = b2var(X) + c2var(Z). Then corr(X, Y) = cov(X, Y)/ stddev(X)stddev(Y) = b×var(X)/sqrt{var(X)[b2var(X) + c2var(Z)]}. Set b and c equal to whatever you want to get the desired corr(X, Y). Duoduoduo (talk) 16:04, 28 November 2012 (UTC)[reply]
And the above gives you the two free parameters b and c, so you can pin down desired values of two things: var(Y) and corr(X, Y). Also the parameter a allows you to control the location of the Y distribution. Duoduoduo (talk) 16:09, 28 November 2012 (UTC)[reply]
OR here: Draw N observations on X from, say, a uniform distribution. Now since the sequence in which these happened to be drawn does not alter the fact that they were drawn from a uniform distribution, you can shuffle these in any way you like into a data series Y that is also drawn from a uniform distribution. For example, using mod N arithmetic, you could generate Yn = Xn–k for any lag value k. Then experimenting around with different values of k, you could find the one that gives you a corr(X, Y) that is closest to what you want. Whether or not this would be appropriate for the use to which you're going to put the simulated data would depend on what that use is.
In fact, suppose you do the above with data drawn from an infinite-variance distribution. Since you're only drawing a finite number of data points, your sample variances and covariances will be finite, so the above procedure still applies. Duoduoduo (talk) 16:43, 28 November 2012 (UTC)[reply]
You could ask at the Computing Reference Desk whether there are any packages that would have a simple command to draw from, say, a multi-uniform distribution. Duoduoduo (talk) 16:50, 28 November 2012 (UTC)[reply]
As for situations with infinite variances when it can't be finessed with a finite sample covariance, our article Covariance#Definition says
The covariance between two jointly distributed real-valued random variables x and y with finite second moments is defined as.... [bolding added]
And I don't see any way to get around that boldfaced restriction by limiting ourselves to one particular definition that doesn't actually involve variances. So regarding your question "How does correlation work in such cases?", I think the answer is that it doesn't -- you can't generate or even conceptualize two infinite-variance random variables with a covariance of rho. Duoduoduo (talk) 17:33, 28 November 2012 (UTC)[reply]
However, when variances and covariances don't exist, there can still be measures of the variables' dispersions and co-dispersion. See elliptical distribution -- in that class of distributions, in cases for which the variance does not exist, the on-diagonal parameters of the   matrix are measures of dispersion of individual variables, and the off-diagonal parameters are measures of co-dispersion. Duoduoduo (talk) 17:45, 28 November 2012 (UTC)[reply]
A little more OR here: Take a look at Normally distributed and uncorrelated does not imply independent#A symmetric example. It points out that if X is normally distributed, then so is Y=XW where P(W=1) = 1/2 = P(W= –1) and where W is independent of X; but cov(X,Y) = 0. I think this preservation of the form of the distribution holds for any symmetric distribution, not just the normal. And I think that if you specify P(W=1) = p and P(W= –1) = 1–p, then as p varies from 0 to 1/2 to 1 the correlation between X and Y varies from –1 to 0 to 1. If this is right, then this is probably the answer to your question for all finite-variance symmetric distributions. Duoduoduo (talk) 18:20, 28 November 2012 (UTC)[reply]
X is centered on zero of course. Duoduoduo (talk) 18:40, 28 November 2012 (UTC)[reply]
All very interesting suggestions, thank you. I will have a look into these. Am I correct in interpreting your 'stable distribution' suggestion with the X, Y and Z as being something that applies only for sufficiently large N, as in the case of the central limit theorem? I am also curious if you have any opinion on the usefulness of copulas for problems such as these. I don't properly understand copulas and am unsure whether they are necessary for my problem. Thorstein90 (talk) 23:50, 28 November 2012 (UTC)[reply]
I don't know much of anything about copulas, but from a glance at the article I don't see how they could help.
An analog to the central limit theorem would involve adding together a sufficiently large number of random variables. That's not involved here -- you can just add two random variables from a stable distribution and end up with the sum being of the same distribution. Now if by N you mean the number of simulated points, then even for X and Z, as well as for Y, if N is small then a plot of your simulated points won't look like it's from that distribution, even though you know it is. But for a large number of simulated points, a plot of the X data, the Z data, or the Y data will indeed give a good picture of the underlying distribution. Duoduoduo (talk) 17:34, 29 November 2012 (UTC)[reply]
This describes some interesting methods. You might want to ask on stats.stackexchange.com and if you don't get an answer there, then mathoverflow.net. I don't actually understand what copulas are, but I think they have more to do with the reverse problem: you have an observed joint distribution of several correlated variables, and you want to decompose it to the component variables. 67.119.3.105 (talk) 08:30, 30 November 2012 (UTC) Added: here is an SE thread with some other suggestions. 67.119.3.105 (talk) 08:37, 30 November 2012 (UTC)[reply]
The first of 67.119.3.105's links doesn't, as far as I can see, output variables that both have the same given type of distribution as each other. That objection also applies to some of the answers at his second link here, but I think the answer with the big number 4 beside it may be relevant -- not sure. Duoduoduo (talk) 18:47, 30 November 2012 (UTC)[reply]

Request for Help Deriving the Formula Describing the Probability that a Set of Event Outcomes Might Occur for Make-Up Homework edit

     Hello again, could somebody help me figure out how to derive the actual equation for probability so that I can get started on some homework that I'm doing to catch up in my Probability and Statistics class? I would like to directly input the event or events and sample space associated with each probability I need to calculate for my homework–equations with fillable variables, after all, tend to trump those with words in the field of mathematics–, but my textbook, Elementary Statistics: Picturing the World, Fourth Edition by Ron Larson and Betsy Farber, only gives me the formula, as displayed below using my own conventions, in words:

 

Curiously, this same textbook also gives examples of events whose probability one could find using this equation in the notation used to denote sets in set theory, as one could see if he or she wrote down, say, the event of rolling an even number on a standard, six-sided die as follows:

 

Similarly, the sample space one would use in the context of an experiment he or she might have begun to investigate how many even numbers he or she could roll using the provided six-sided die would also, as defined by my book as a convention, simply contain every single value one could obtain from rolling this given die:

 

I found it obvious that one could not directly input the outcome sets given for each event and sample space involved in each assigned probability problem directly into the probability formula given by my text. Therefore, I assumed that I must convert this formula, which unnecessarily requires one to input the number of elements which exist in each given outcome set, into one that accepts these originally-given outcome sets as inputs. The number of inputs in the outcome sets given by each assigned problem as the definition of each involved event and sample space is, of course, the cardinality of each of these sets. If I assign the outcome set whose cardinality gives   to the set A and the outcome set whose cardinality gives   to the set S, then I can formalize these definitions using mathematics prior to substituting the sets A and S for an event and the sample space that contains it. So:

 

Plugging these two equations into the original formula given by my textbook for finding the probability of an event results in the following formula, which now accepts sets as arguments just like I wanted:

 

     I now have the equation I was looking for in the first place. However, the sample space acts as a limit on this function's domain such that  , and what my book calls the "Range of Probabilities Rule" likewise limits the range of this function to between 0 and 1, inclusive. I would like to explicitly define these limits which exist for the formula for finding probability if possible. The article on functions provides a notation which would allow me to do so, but it unfortunately spans multiple lines as shown below:

 

So, here's my question: does a single-line, f(x) notation that expresses the same information as the above mapping exist? I tried to create one myself…:

 

…but neither this nor the same function with the outermost parentheses from each side of the equation replaced with the brackets of set notation seems right to me. Could somebody help me out? Thanks in advance, BCG999 (talk) 20:00, 28 November 2012 (UTC).[reply]

Not sure that I fully understand your question, but here goes. I think your function definition
 
looks just fine. You could call the right hand side f(|A|, |S|) so
 
Then
f: N* × N → [0, 1] ∩ Q
|A|, |S| →  
where Q is the set of rationals, N refers to the set {1, 2,...} and N* × N refers to the set of pairs (a,b) in {0, NN such that a≤b. Probably a better notation exists than N* × N. I don't see how you could collapse the last two lines into a single line though. Duoduoduo (talk) 18:22, 29 November 2012 (UTC)[reply]
     Hey, Duoduoduo; didn't you mean that I could call the left-hand side  ? However, I don't think that this notation could be the basis of what I want to write down as part of my homework because I want the function to take the sets   and   as its inputs, not their cardinalities   and  . This is because the function   takes these cardinalities internally all by itself. Therefore, shouldn't my basic function use the letter   to represent the input event's set of outcomes instead of the letter   while also taking the sample space that contains this input event as an input? If I were to make these changes, then the basic function–the one that you said looked fine–should read   unless the standard   notation used to denote functions needs modifications to take sets as arguments instead of variables. Also, I'm a bit confused about how you mapped the function's domain to its range because I'm not quite sure that I understand how you defined the set denoted by   and the product   of the sets   and  . I know that you could have explained that the set   of the elements   is equivalent to the set   of all natural numbers except, but you've lost me after that. As for needing to collapse this map-based function definition into one using the standard   notation, I need to do this because my Probability and Statistics class is actually a high school one, which means that my teacher wouldn't expect me to have ever even seen the former notation. Maybe we could express the limits that I need to place on the probability formula's domain and range as conditions on the function to the right of it and a comma? I really need to do this because I'm trying to understand where all of this probability stuff comes from and my book's original, word-based definition didn't really help me that much at all to do this so that I could make sure I have all of the formulas that I need for my homework assignment before I begin.
P.S.: Is it okay if I fixed how some of your math looks? I made it LaTeX, like mine. Also, your name prompted a little Pokémon-related non sequitur, by the way…
I'm sure you didn't realize it, but it's bad etiquette to change anyone's post. As for the Pokémon-related non sequitur, I'm from the older generation and, believe it or not, have no idea what pokemon is. I'd be curious to know what the non sequitor is.
The notation P(E) conventionally means the probability (P for probability) of event E happening. So I think the notation P(A, S) or P(|A|, |S|) is not good. That's why I used f. P(A, S) would mean the probability of both A and S occurring, which is not what is intended.
You say "I'm not quite sure that I understand how you defined the set denoted by   and the product   of the sets   and  .". The notation   is not a product -- it means that we're talking about a two-dimensional space we're mapping from; one of the dimensions is N* for one of the arguments of the function, and the other dimension is N for the other argument.
I'm kind of reluctant to try my hand at helping out with this any more, for two reasons: your question really pushes the boundaries of what I understand about set notation, and we're not really supposed to be helping out with homework anyway. I'm impressed, though, by the sophistication of your high school homework! Duoduoduo (talk) 22:53, 29 November 2012 (UTC)[reply]
     I'm sorry I changed your math to look like mine, Duoduoduo. At the time I really only considered how much better this thread's math would all look if its styling was consistent. I also didn't realize that you had meant to change   to   so as to clarify the function's intentions. Please forgive me for my ignorance and naïveté, and thank you for explaining why you changed my function so that it would not take the probability   of both the event   and the sample space   because of how it took both inputs to be events, which, as you guessed, is exactly what I did not want it to do! What I really intended, of course, was for this formula   to take the probability   of an event   as an argument of a function that maps the sample space   to the interval  , which you explicitly extended to include all real numbers  . Unfortunately, the way you expressed the limit which I wanted to act on the domain of the derived probability function   initially confused me because I thought that you had somehow changed the limits which I had originally said that I wanted to place on the probability function  's domain and range. After all, the domain specified by the set   didn't seem equivalent to the domain specified by the set   because the sample space   is a one-dimensional set   having as its members every outcome   of all events   given by each problem in my homework assignment such that   is the number of events involved in each of these said problems and   is a set   of indices which I can use to access all events   defined by each problem in my homework assignment such that   et cetera; whereas the set   contains all integer points on the two-dimensional Euclidean space. Since each problem in my homework assignment may specify a different sample space   by defining different sets of outcomes   that make up each such problem's multiple, different, mutually-exclusive events  , what I really wanted to do in the first place was find a way to input the sample space   into my probability function   without making it an argument of this function. If all of this confuses you, could you maybe ask somebody else who might be able to continue helping me with my homework, which this reference desk does allow so long as the user in question who asks for help shows significant effort in having already attempted answering their own question?
Thanks, BCG999 (talk) 23:46, 30 November 2012 (UTC)[reply]
P.S.: Your username reminded me of the names of the Pokémon Doduo and Duosion.
I'm bothered by your notation P(A). As I understand it A is the set of all possible affirmative outcomes and S is the set of all possible outcomes. Since A is not an outcome itself, you can't talk about its probability. E is then the event in which an outcome is in the affirmative set A. So I think the notation would be P(E). Equivalently, if you let O be the outcome of a drawing, then P(E) is defined as P(O   A). Duoduoduo (talk) 13:29, 1 December 2012 (UTC)[reply]
     No, Duoduoduo; you misunderstand:   is a general instance of the even more general event object   such that I can work with multiple events   if a problem requires me to do so. So, the function   can take the event   as an input because both of the events   and   are…well, events made up of all of the outcomes   given by the problem to be members of the event. In other words, it makes sense to me that I can define the events   as the set of their members, the outcomes  . For example, an event   comprises all of the even outcomes that one could possibly obtain by rolling a standard, six-sided die. I gave this event in my initial post as an example of how my textbook denotes events:
 
As you can see, I'm defining each instance   of this generic object   to be the set of all outcomes   making up this said event   and/or   of type  , if I may use some computer science terminology to clarify my math even though it won't actually come into play when I do each problem in my homework assignment. I initially defined this object   explicitly such that it would guarantee any input event   to be a member of the power set   of the sample space   by attempting to limit the domain of my probability function   (or  , if you want to make the function's input more abstract; it doesn't matter) to this power set   of the given sample space  . The only problem is that I don't know how to limit the domain of   to the sample space   while still allowing this same function   to use this sample space   to compute the probability   of the input event  . In essence, knowing how to do this would allow me to do what I basically have been wanting to do since the beginning of this discussion: write a function   that returns the probability   of any input object   of type 'event ( )' given that this input event object   is a subset of the sample space   such that the function's return value lies within the interval between the real numbers 0 and 1, inclusive. Note as well that the input event object   is a subset   because it is a member   of the power set   of the sample space   over which one must calculate the probability   of this input event object  . Now do you understand what I've been trying to ask this entire time, Duoduoduo?
Here's hoping we get this right,
BCG999 (talk) 18:03, 1 December 2012 (UTC)[reply]
Sorry, I can't understand your most recent post at all. Since this seems to be beyond me, I'm afraid I can't help any more. But I wish you success in getting it figured out. Duoduoduo (talk) 21:22, 1 December 2012 (UTC)[reply]
     Maybe it would help you or whoever else is interested in helping me with this problem if I told you that I'm trying to write another function which restricts my original function, which I now clarify below through a few slight modifications which change the domain from the power set   of the sample space   to the set   of all real numbers and remove the sample space   from the mapping…:
 
…, to the domain defined by the members of the sample space   as explained here by the article on function domains, like so:
 
The only problem is that, since this is a high-school Probability and Statistics class for which I'm doing my homework, my teacher wouldn't expect me to know how to define functions in terms of maps; in fact, I had never seen the map-based way of defining a function before I read about in the article on functions. That's why I wanted to know if could define a probability function restricted to the sample space   of relevance to each problem in my homework that takes an event   (or  , if you're only using a single event; each of my problems wants me to solve for multiple probabilities   in the context of a probability distribution) as its only argument while still allowing me to use this restricted domain to calculate its result   in standard,   notation. So, what do I do?
BCG999 (talk) 20:19, 4 December 2012 (UTC)[reply]
     Wait a minute, I think I just figured this out myself! To verify my thinking, could I say that the following function:
 
…is actually equivalent to   when the event   is either a subset   of the sample space   or a member   of the power set   of the sample space  ?
Here's hoping I'm right,
     BCG999 (talk) 20:40, 9 December 2012 (UTC)[reply]

Calculating an invariant ellipse edit

Hi. Suppose we have a linear map on the plane, given by a 2x2 matrix, which has complex eigenvalues and determinant equal to 1. The invariant sets in the plane under this transformation - if I'm not mistaken - should be ellipses. Does anyone know the easiest way to find the equations of these ellipses, given the matrix of the transformation? Thanks in advance. -68.185.201.210 (talk) 23:19, 28 November 2012 (UTC)[reply]

If the matrix is   and the invariant ellipse family is  , for any   the LHS must be equal for   and  , thus the coefficients must be equal. By solving (and letting  ) we get
 
 
-- Meni Rosenfeld (talk) 08:51, 29 November 2012 (UTC)[reply]
Thanks a million! -68.185.201.210 (talk) 15:14, 29 November 2012 (UTC)[reply]
You might be interested to look at Matrix representation of conic sections and Conic section#Matrix notation. Duoduoduo (talk) 18:41, 29 November 2012 (UTC)[reply]
Probably just brainlock on my part, but I don't understand Meni Rosenfeld's answer. If we write   = R2 and   and equate coefficients, we get e=a, f=b+c, and g=d. (Generally b and c are taken as equal.) Duoduoduo (talk) 18:52, 29 November 2012 (UTC)[reply]
I got the same answer as Meni Rosenfeld, but not using the equation you wrote down. I set  , and then equated coefficients in  , which I don't think is the same thing. This answer also agrees with the actual ellipses I'm looking at in my application, so I'm confident that it's right. -129.120.41.156 (talk) 23:23, 29 November 2012 (UTC) (O.P. at a different computer)[reply]
Thanks. I misunderstood what the matrix was. Duoduoduo (talk) 23:29, 29 November 2012 (UTC)[reply]
So, the other matrix you wrote down would be.... some kind of quadratic form associated to the transformation? Is there a nice bijection there, or do we get equivalence classes on one side? -68.185.201.210 (talk) 03:12, 30 November 2012 (UTC)[reply]
There's one more degree of freedom in (2D) linear transformations than quadratic forms; changing b and c while keeping   fixed is equivalent. But other than that, as far as I know the quadratic form represented by a rectangle of numbers has very little to do with the linear transformation represented by the same. -- Meni Rosenfeld (talk) 09:00, 30 November 2012 (UTC)[reply]
(1) By Meni Rosenfeld's solutions for e and g in terms of f, note that if sgn(b)=sgn(c) then you get a hyperbola, not an ellipse. (2) If b=0=c you get a rectangular hyperbola. Duoduoduo (talk) 15:35, 30 November 2012 (UTC)[reply]
See Conic section#Discriminant classification. Duoduoduo (talk) 15:38, 30 November 2012 (UTC)[reply]
Note, however, that if the eigenvalues of the matrix are nonreal (as the OP hints at with the term "complex"), then it must be the case that   and the solutions are ellipses. -- Meni Rosenfeld (talk) 20:26, 1 December 2012 (UTC)[reply]
I interpreted complex eigenvalues as simply being in the complex plane but not necessarily on the real line. But I don't think that matters. I get the characteristic equation to be   hence  , with eigenvalues half of  , and I can't see how this requires   for the eigenvalues to be non-real. (E.g. consider the case a = 1.2 , d=1 / 1.1 . Since ad-bc=1, bc=ad-1 which in my example is >0, yet the eigenvalues are complex). So we can have non-real eigenvalues, bc>0, and a hyperbola.
Note that even if bc<0, the conic could still be a hyperbola if the eigenvalues are real, since an ellipse requires  ; using your solutions for e and g gives an ellipse iff  , which does not always hold when bc < 0 (but does hold if the eigenvalues are non-real). Duoduoduo (talk) 22:18, 1 December 2012 (UTC)[reply]
If   then   and the eigenvalues are real.
If the eigenvalues are nonreal, the discriminant must be negative,  . From this subtract   to get   so   and  .
Put differently, for a given sum  ,   is largest when  , and then  ; since  ,  . -- Meni Rosenfeld (talk) 10:44, 2 December 2012 (UTC)[reply]
Thanks. Bad mental math on my part. Maybe I should buy a calculator. Duoduoduo (talk) 12:41, 2 December 2012 (UTC)[reply]
Good idea, it makes life so much easier. In the meantime, Win+R calc is also handy. -- Meni Rosenfeld (talk) 12:59, 2 December 2012 (UTC)[reply]