# Centering matrix

In mathematics and multivariate statistics, the centering matrix[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.

## DefinitionEdit

The centering matrix of size n is defined as the n-by-n matrix

${\displaystyle C_{n}=I_{n}-{\tfrac {1}{n}}\mathbb {O} }$

where ${\displaystyle I_{n}\,}$  is the identity matrix of size n and ${\displaystyle \mathbb {O} }$  is an n-by-n matrix of all 1's. This can also be written as:

${\displaystyle C_{n}=I_{n}-{\tfrac {1}{n}}\mathbf {1} \mathbf {1} ^{\top }}$

where ${\displaystyle \mathbf {1} }$  is the column-vector of n ones and where ${\displaystyle \top }$  denotes matrix transpose.

For example

${\displaystyle C_{1}={\begin{bmatrix}0\end{bmatrix}}}$ ,
${\displaystyle C_{2}=\left[{\begin{array}{rrr}1&0\\0&1\end{array}}\right]-{\frac {1}{2}}\left[{\begin{array}{rrr}1&1\\1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {1}{2}}&-{\frac {1}{2}}\\-{\frac {1}{2}}&{\frac {1}{2}}\end{array}}\right]}$  ,
${\displaystyle C_{3}=\left[{\begin{array}{rrr}1&0&0\\0&1&0\\0&0&1\end{array}}\right]-{\frac {1}{3}}\left[{\begin{array}{rrr}1&1&1\\1&1&1\\1&1&1\end{array}}\right]=\left[{\begin{array}{rrr}{\frac {2}{3}}&-{\frac {1}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&{\frac {2}{3}}&-{\frac {1}{3}}\\-{\frac {1}{3}}&-{\frac {1}{3}}&{\frac {2}{3}}\end{array}}\right]}$

## PropertiesEdit

Given a column-vector, ${\displaystyle \mathbf {v} \,}$  of size n, the centering property of ${\displaystyle C_{n}\,}$  can be expressed as

${\displaystyle C_{n}\,\mathbf {v} =\mathbf {v} -({\tfrac {1}{n}}\mathbf {1} '\mathbf {v} )\mathbf {1} }$

where ${\displaystyle {\tfrac {1}{n}}\mathbf {1} '\mathbf {v} }$  is the mean of the components of ${\displaystyle \mathbf {v} \,}$ .

${\displaystyle C_{n}\,}$  is symmetric positive semi-definite.

${\displaystyle C_{n}\,}$  is idempotent, so that ${\displaystyle C_{n}^{k}=C_{n}}$ , for ${\displaystyle k=1,2,\ldots }$ . Once the mean has been removed, it is zero and removing it again has no effect.

${\displaystyle C_{n}\,}$  is singular. The effects of applying the transformation ${\displaystyle C_{n}\,\mathbf {v} }$  cannot be reversed.

${\displaystyle C_{n}\,}$  has the eigenvalue 1 of multiplicity n − 1 and eigenvalue 0 of multiplicity 1.

${\displaystyle C_{n}\,}$  has a nullspace of dimension 1, along the vector ${\displaystyle \mathbf {1} }$ .

${\displaystyle C_{n}\,}$  is a projection matrix. That is, ${\displaystyle C_{n}\mathbf {v} }$  is a projection of ${\displaystyle \mathbf {v} \,}$  onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace ${\displaystyle \mathbf {1} }$ . (This is the subspace of all n-vectors whose components sum to zero.)

## ApplicationEdit

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix ${\displaystyle X\,}$ , the multiplication ${\displaystyle C_{m}\,X}$  removes the means from each of the n columns, while ${\displaystyle X\,C_{n}}$  removes the means from each of the m rows.

The centering matrix provides in particular a succinct way to express the scatter matrix, ${\displaystyle S=(X-\mu \mathbf {1} ')(X-\mu \mathbf {1} ')'}$  of a data sample ${\displaystyle X\,}$ , where ${\displaystyle \mu ={\tfrac {1}{n}}X\mathbf {1} }$  is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

${\displaystyle S=X\,C_{n}(X\,C_{n})'=X\,C_{n}\,C_{n}\,X\,'=X\,C_{n}\,X\,'.}$

${\displaystyle C_{n}}$  is the covariance matrix of the multinomial distribution, in the special case where the parameters of that distribution are ${\displaystyle k=n}$ , and ${\displaystyle p_{1}=p_{2}=\cdots =p_{n}={\frac {1}{n}}}$ .

## ReferencesEdit

1. ^ John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0-412-99521-2, page 59.