Talk:Sample mean and covariance/Archive 1

This page is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Explanation needed

Latest comment: 17 years ago1 comment1 person in discussion

This topic is in need of attention from an expert on the subject.

The section or sections that need attention may be noted in a message below.

Perhaps someone more knowledgeable than me can add an explanation what a weighted sample and its covariance actually mean, esp. from the probability theory point of view (sample space). I know of 3 contexts: 1. weighted linear regression 2. biased samples, e.g., Efromovich, (2004) 3. weighted ensembles, as in Particle filters. Jmath666 06:24, 12 March 2007 (UTC)

Typo

Latest comment: 17 years ago2 comments2 people in discussion

Should the individual entries for the sample mean have a (1/N) in front of it? Or am I missing something here? --WillBecker 11:05, 2 May 2007 (UTC)

Thank you, I have fixed that now. You can be bold and make changes yourself. I do not own this (or any other page) even if I made most of the edits here to date. Thanks again for your help. Jmath666 20:49, 2 May 2007 (UTC)

Notation

Latest comment: 15 years ago1 comment1 person in discussion

Can we change the notation from x bar to \mu_x ? There just seems to be a ton of x's flying around and it may be easier to follow with mu's in some places?daviddoria (talk) 18:44, 11 September 2008 (UTC)

Samples weighted with a matrix?

The "Weighted samples" section assumes each sample has a scalar weight. What if you have a weight matrix? —Ben FrantzDale (talk)

Nomenklatura

Latest comment: 15 years ago2 comments2 people in discussion

Is there any special name to the quantity $s^{2}={\frac {1}{n}}\sum (x_{i}-{\bar {x}})^{2}$ and its square root s? Albmont (talk) 16:12, 14 November 2008 (UTC)

It could be called the sample variance, but that term is also used for the epression with n − 1 in the denominator. It is the maximum likelihood estimate of the population variance if the sample is taken to be from a normal distribution of unkown population variance. Michael Hardy (talk) 16:48, 14 November 2008 (UTC)

Weighted Covariance

Latest comment: 15 years ago3 comments2 people in discussion

Is the formula for the weighted covariance really correct? I'm wondering about the normalizing part... If the formula from the cited page is correct, the denominator should have the wrong sign? 141.76.62.164 (talk) 11:06, 26 November 2008 (UTC)

The denominator is positive, all fine there. Let me show what happens in the special case of equal weights:

q_{ij}={\frac {\sum _{k=1}^{N}w_{k}\left(x_{ik}-{\bar {x}}_{i}\right)\left(x_{jk}-{\bar {x}}_{j}\right)}{1-\sum _{k=1}^{N}w_{k}^{2}}}={\frac {\sum _{k=1}^{N}(1/N)\left(x_{ik}-{\bar {x}}_{i}\right)\left(x_{jk}-{\bar {x}}_{j}\right)}{1-\sum _{k=1}^{N}(1/N)^{2}}}={\frac {\sum _{k=1}^{N}\left(x_{ik}-{\bar {x}}_{i}\right)\left(x_{jk}-{\bar {x}}_{j}\right)}{N-1}}.

Jmath666 (talk) 23:31, 26 November 2008 (UTC)

Thanks! You're right. 141.76.62.164 (talk) 18:19, 28 November 2008 (UTC)

Denominator Explanation is lacking

Latest comment: 12 years ago2 comments2 people in discussion

The page says:

"The reason the sample covariance matrix has

\textstyle N-1

in the denominator rather than

\textstyle N

is essentially that the population mean

E(X)

is not known and is replaced by the sample mean

\mathbf {\bar {x}}

. If the population mean

E(X)

is known, the analogous unbiased estimate

q_{jk}={\frac {1}{N}}\sum _{i=1}^{N}\left(x_{ij}-E(X_{j})\right)\left(x_{ik}-E(X_{k})\right),

using the population mean, has

\textstyle N

in the denominator."

This is essentially meaningless, since the direct implication is not spelled out. Why does the population mean cause the denominator to increase by 1? If the causal link is not present, it seems equivalent to saying something like "The denominator is N-1 because Iran is a country" or "because cheese is made from milk." While this seems ridiculous to someone familiar with math, those who are not gather nothing from that statement. --18.111.14.144 (talk) 23:14, 12 October 2011 (UTC)

Agree; I fixed this by noting that the sample mean is correlated with the sample it's being compared against and refering to Bessel's correction for more details. Eamon Nerbonne (talk) 12:17, 15 October 2011 (UTC)

Text structure / row vs. column vectors.

Latest comment: 12 years ago1 comment1 person in discussion

Usually, random vectors are presented as column vectors; e.g. see the Random Vector page. This page presents them as row vectors, which is potentially confusing. I think we should swap that.

Also, the page is complex; there's lots of variables and lots of subscripts; some of which are used before they are introduced, and many of which are introduced without clear context. For example, the x_ij variable is introduced before x, i, and j are independently. I think a lot of this complexity can simply be dropped. E.g.:

The sample mean vector

\mathbf {\bar {x}}

is a row vector whose j^th element (j = 1, ..., K) is the average value of the N observations on the j^th random variable. Thus the sample mean vector is the average of the row vectors of observations on the K variables:

==> change to ==>

The sample mean vector

\mathbf {\bar {x}}

is the element-wise mean of

\mathbf {x}

's observations. So the j^th element of

\mathbf {\bar {x}}

is the average of the j^th elements of the observations of

\mathbf {x}

:

We really don't need to repeat the fact that there are N observations and K elements. Whether it's a row or column vector, what exactly the range of the indexes is, or how many variables there is all not central to the point; this text (and lots of other bits) are basically piloting prose to intuitively highlight the subsequent formula. For the readers that don't know the topic in detail, it's just confusing; and for those that do but just want to look up a detail it's telling them something they know and making the actual formula harder to find.

So I'd propose using the normal approach of column vectors, to introduce the variables a little more elaborately, and to rephrase text focusing on the intent/intuition behind the formula rather than a precise replacement for the formula. Finally, it'd be nice to add a variant formula using matrix/vector notation rather than elementwise sums; all those subscripts make it look more complicated than it is. Eamon Nerbonne (talk) 12:15, 15 October 2011 (UTC)

Sample covariance matrix rapid calculation

Latest comment: 12 years ago2 comments2 people in discussion

Is there a way to calculate the sample covariance matrix incrementaly? Like unidimensional standard deviation rapid calculation but for multidimensional data. Wat (talk) 20:34, 7 February 2012 (UTC)

Such topics (or very similar) are dealt with in Algorithms for calculating variance ... formulae for variances and covariances there (but not in matrix form). Melcombe (talk) 22:56, 7 February 2012 (UTC)

This article missed some fundamental parts

Latest comment: 9 years ago1 comment1 person in discussion

It is too complicated to introduce the concept by matrices in the first part. How about the population size? Infinite population (N->inf.) and finite population N should be separated into two parts. — Preceding unsigned comment added by 113.254.237.242 (talk) 16:09, 27 April 2015 (UTC)