User:Milanambiar/sandbox


Matrix Completion is the task of filling in the missing entries of a partially observed matrix. Applications of matrix completion include the Netflix problem: Given a ratings matrix in which each entry represents the rating of movie by customer if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Without any restrictions on the number of degrees of freedom in the completed matrix this problem is underdetermined since the hidden entries could be assigned arbitrary values. Thus matrix completion often seeks to find the lowest rank matrix or, if the rank of the completed matrix is known, a matrix of rank that matches the known entries. In the case of the Netflix problem the ratings matrix is expected to be low- rank since user preferences can often be described by a few factors, such as the movie genre and time of release. Other applications include computer vision, where missing pixels in images need to be reconstructed, detecting the global positioning of sensors in a network from partial distance information, and multiclass learning.

The matrix completion problem is in general NP-hard, but there are tractable algorithms that achieve exact reconstruction with high probability.

Problem Statement edit

One of the variants of the matrix completion problem is to find the lowest rank matrix   which matches the matrix  , which we wish to recover, for all entries in the set   of observed entries. The mathematical formulation of this problem is as follows:

 

Candes and Recht[1] proved that with assumptions on the sampling of the observed entries and sufficiently many sampled entries this problem has a unique solution with high probability.

An equivalent formulation, given that the matrix   to be recovered is known to be of rank  , is to solve for   where  

Assumptions edit

A number of assumptions on the sampling of the observed entries and the number of sampled entries are frequently made to simplify the analysis and to ensure the problem is not underdetermined.

Uniform sampling of observed entries edit

To make the analysis tractable, it is often assumed that the set   of observed entries and fixed cardinality is sampled uniformly at random from the collection of all subsets of entries of cardinality  . To further simplify the analysis, it is instead assumed that   is constructed by Bernoulli sampling, i.e. that each entry is observed with probability  . If   is set to   where   is the desired expected cardinality of  , and   are the dimensions of the matrix (let   without loss of generality),   is within   of   with high probability, thus Bernoulli sampling is a good approximation for uniform sampling[1]. Another simplification is to assume that entries are sampled independently and with replacement[2].

Lower bound on number of observed entries edit

Suppose the   by   matrix   (with  ) we are trying to recover has rank  . There is an information theoretic lower bound on how many entries must be observed before   can be uniquely reconstructed. Firstly, the number of degrees of freedom of a matrix of rank   is  . This can be shown by looking at the Singular Value Decomposition of the matrix and counting the degrees of freedom. Then at least   entries must be observed for matrix completion to have a unique solution.

Secondly, there must be at least one observed entry per row and column of  . The Singular Value Decomposition of   is given by  . If row   is unobserved, it is easy to see the   right singular vector of  ,  , can be changed to some arbitrary value and still yield a matrix matching   over the set of observed entries. Similarly, if column   is unobserved, the   left singular vector of  ,   can be arbitrary. If we assume Bernoulli sampling of the set of observed entries, the Coupon collector effect implies that entries on the order of   must be observed to ensure that there is an observation from each row and column with high probability[3].

Combining the necessary conditions and assuming that   (a valid assumption for many practical applications), the lower bound on the number of observed entries required to prevent the problem of matrix completion from being underdetermined is on the order of  .

Incoherence edit

The concept of incoherence arose in compressed sensing. It is introduced in the context of matrix completion to ensure the singular vectors of   are not too "sparse" in the sense that all coordinates of each singular vector are of comparable magnitude instead of just a few coordinates having significantly larger magnitudes[4]. The standard basis vectors are then undesirable as singular vectors, and the vector   in   is desirable. As an example of what could go wrong if the singular vectors are not sufficiently "sparse", consider the   by   matrix   with singular value decomposition  . Almost all the entries of   must be sampled before it can be reconstructed.

Candes and Recht[1] define the coherence of a matrix   with column space an  dimensional subspace of   as  , where   is the orthogonal projection onto  . Incoherence then asserts that given the singular value decomposition   of the   by   matrix  ,

  1.  
  2. The entries of   have magnitudes upper bounded by  

for some  .

Approaches edit

Convex Relaxation edit

The rank minimization problem is NP-hard. One approach, proposed by Candes and Recht, is to form a convex relaxation of the problem and minimize the nuclear norm   (which gives the sum of the singular values of   instead of   (which counts the number of non zero singular values of  )[1]. This is analogous to minimizing the L1- norm rather than the L0- norm for vectors. The convex relaxation can be solved using semidefinite programming (SDP) by noticing that the optimization problem is equivalent to

 

The complexity of using SDP to solve the convex relaxation is  . State of the art solvers like SDP3 can only handle matrices of size up to 100 by 100 [5] An alternative first order method that approximately solves the convex relaxation is the Singular Value Thresholding Algorithm introduced by Cai, Candes and Shen.[5]

Candes and Recht show, using the study of random variables on Banach spaces, that if the number of observed entries is on the order of   (assume without loss of generality  ), the rank minimization problem has a unique solution which also happens to be the solution of its convex relaxation with probability   for some constant  . If the rank of   is small ( ), the size of the set of observations reduces to the order of  . These results are near optimal, since the minimum number of entries that must be observed for the matrix completion problem to not be underdetermined is on the order of  .

This result has been improved by Candes and Tao[3]. They achieve bounds that differ from the optimal bounds only by polylogarithmic factors by strengthening the assumptions. Instead of the incoherence property, they assume the strong incoherence property with parameter  . This property states that:

  1.   for   and   for  
  2. The entries of   are bounded in magnitude by  

Intuitively, strong incoherence of a matrix   asserts that the orthogonal projections of standard basis vectors to   has magnitudes that have high likelihood if the singular vectors were distributed randomly[4].

Candes and Tao find that when   is   and the number of observed entries is on the order of  , the rank minimization problem has a unique solution which also happens to be the solution of its convex relaxation with probability   for some constant  . For arbitrary  , the number of observed entries sufficient for this assertion hold true is on the order of  

Gradient Descent edit

Keshavan, Montanari and Oh[6] consider a variant of matrix completion where the rank of the   by   matrix  , which is to be recovered, is known to be  . They assume Bernoulli sampling of entries, constant aspect ratio  , bounded magnitude of entries of   (let the upper bound be  ), and constant condition number   (where   and   are the largest and smallest singular values of   respectively). Further, they assume the two incoherence conditions are satisfied with   and   where   and   are constants. They then propose the following algorithm:

  1. Let   be a matrix that matches   on the set   of observed entries and is 0 elsewhere. Trim   by removing all observations from columns with degree larger than   by setting the entries in the columns to 0. Similarly remove all observations from rows with degree larger than  .
  2. Project   onto its first   principal components. Call the resulting matrix  .
  3. Solve   where   is some regularization function by gradient descent with line search. Initialize   at   where  . Set   as some function forcing   to remain incoherent throughout gradient descent if   and   are incoherent. Return the matrix  .

Steps 1 and 2 of the algorithm yield a matrix   very close to the true matrix   (as measured by the root mean square error (RMSE) with high probability. In particular, with probability  ,   for some constant  .   denotes the Frobenius norm. Note that the full suite of assumptions is not needed for this result to hold. The incoherence condition, for example, only comes into play in exact reconstruction. Finally, although trimming may seem counter intuitive as it involves throwing out information, it ensures projecting   onto its first   principal components gives more information about the underlying matrix   than about the observed entries.

In Step 3, the space of candidate matrices   can be reduced by noticing that the inner minimization problem has the same solution for   as for   where   and   are orthonormal   by   matrices. Then gradient descent can be performed over the cross product of two Grassman manifolds. If   and the observed entry set is in the order of  , the matrix returned by Step 3 is exactly  . Then the algorithm is order optimal, since we know that for the matrix completion problem to not be underdetermined the number of entries must be in the order of  .

Problem Variations edit

Entries not sampled uniformly at random edit

The assumption that the set of observed entries of matrix   is sampled uniformly at random is too strict for many real world applications. In the Netflix problem, for example, customers do not watch movies at random. The set of observed entries itself is skewed according to customer preferences. Singer and Cucuringu[7] use rigidity theory to devise an algorithm determining whether a unique low-rank approximation to a partially observed matrix exists.

Noisy observation of entries edit

In another set up of the matrix completion problem, the observations of the true matrix   are not only partial but noisy over the uniformly sampled set  . In particular, the observed matrix   has entries   for   where   are the noises. Bayesian estimation is one technique that can be applied to this class of problems[8]. It assumes the noise   is independently and identically distributed with mean 0 and is sub-Gaussian with known parameter  , i.e.  . For example, Gaussian noise with mean 0 and variance at most   satisfies these conditions.

To perform Bayesian estimation, a prior is put on the singular values of   (the matrix we wish to recover) in the following sense: Parameters   are introduced, where  , and it is assumed the   column of   and the   row of   follow a multivariate normal distribution with mean 0 and covariance matrix   (  is the identity matrix). A prior is placed on the  s (conditional upon   and  ) and on the rank   of  . If the priors on the  s are chosen to be conjugate (e.g. the inverse-gamma distribution. Gibbs sampling or variational Bayesian methods can then be applied to calculate and iteratively update the posterior distribution

See Also edit

References edit

  1. ^ a b c d E. J. Candès and B. Recht, "Exact matrix completion via convex optimization", Found. of Comput. Math., 2008
  2. ^ B. Recht, "A simpler approach to matrix completion", Journal of Machine Learning Research, Vol 12, pp. 3413--3430, 2011.
  3. ^ a b E. J. Candès and T. Tao, "The power of convex relaxation: Near-optimal matrix completion", arXiv:0903.1476, 2009.
  4. ^ a b http://terrytao.wordpress.com/2009/03/10/the-power-of-convex-relaxation-near-optimal-matrix-completion/, T. Tao, "The power of convex relaxation: Near-optimal matrix completion", What's New, 2009
  5. ^ a b Cite error: The named reference caicandesshen was invoked but never defined (see the help page).
  6. ^ R. H. Keshavan, A. Montanari and S. Oh, "Matrix completion from a few entries", arXiv:0901.3150, 2009.
  7. ^ A. Singer, M. Cucuringu, "Uniqueness of low-rank matrix completion by rigidity theory", arXiv:0902.3846, 2009
  8. ^ P. Alqier, V. Cottet, N. Chopin, J. Rousseau, Bayesian matrix completion: prior specification, arXiv:1406.1440, 2014.
Cite error: A list-defined reference named "candescaishen" is not used in the content (see the help page).