Open main menu

Wikipedia β

Sparse approximation

A sparse approximation is a sparse vector that approximately solves a system of equations. Techniques for finding sparse approximations have found wide use in applications such as image processing, audio processing, biology, and document analysis[citation needed].


Sparse decompositionEdit

Noiseless observationsEdit

Consider a linear system of equations  , where   is an underdetermined   matrix   and  .  , called the dictionary or the design matrix, is given. The problem is to estimate the signal  , subject to the constraint that it is sparse. The underlying motivation for sparse decomposition problems is that even though the signal is in high-dimensional ( ) space, it can actually be obtained in some lower-dimensional subspace ( ) due to it being sparse ( ).

Sparsity implies that only a few ( ) components of   are non-zero and the rest are zero. This implies that   can be decomposed as a linear combination of only a few   vectors in  , called atoms. The column-span of   is over-complete  . Such vectors are sometimes called the basis of  , even though being over-complete means they are not a basis. In addition, unlike other dimensionality reducing decomposition techniques such as Principal Component Analysis, the basis vectors are not required to be orthogonal.

The sparse decomposition problem is represented as,


where   is a pseudo-norm,  , which counts the number of non-zero components of  . This problem is NP-Hard with a reduction to NP-complete subset selection problems in combinatorial optimization. A convex relaxation of the problem can instead be obtained by taking the   norm instead of the   norm, where  . The   norm induces sparsity under certain conditions involving the mutual coherence of  .[1] The   problem is called basis pursuit.

Noisy observationsEdit

Often the observations   are noisy. By imposing an   norm on the data-fitting term and relaxing the equality constraint, the sparse decomposition problem is given by,


where   is a slack variable (similar to a Lagrange multiplier) and   is the sparsity-inducing   term. The slack variable balances the trade-off between fitting the data perfectly, and employing a sparse solution. This problem is called basis pursuit denoising.


There are several variations to the basic sparse approximation problem.

Structured sparsityEdit

In the original version of the problem, any atoms in the dictionary can be picked. In the structured (block) sparsity model, instead of picking atoms individually, groups of atoms are to be picked. These groups can be overlapping and of varying size. The objective is to represent   such that it is sparse in the number of groups selected. Such groups appear naturally in many problems. For example, in object classification problems the atoms can represent images, and groups can represent category of objects.

Collaborative sparse codingEdit

The original version of the problem is defined for only a single point   and its noisy observation. Often, a single point can have more than one sparse representation with similar data fitting errors. In the collaborative sparse coding model, more than one observation of the same point is available. Hence, the data fitting error is defined as the sum of the   norm for all points.


There are several algorithms that have been developed for solving sparse approximation problem.

Matching pursuitEdit

Matching pursuit is a greedy iterative algorithm for approximatively solving the original   pseudo-norm problem. Matching pursuit works by finding a basis vector in   that maximizes the correlation with the residual (initialized to  ), and then recomputing the residual and coefficients by projecting the residual on all atoms in the dictionary using existing coefficients. Matching pursuit suffers from the drawback that an atom can be picked multiple times which is addressed in orthogonal matching pursuit.

Orthogonal matching pursuitEdit

Orthogonal Matching Pursuit is similar to Matching Pursuit, except that an atom once picked, cannot be picked again. The algorithm maintains an active set of atoms already picked, and adds a new atom at each iteration. The residual is projected on to a linear combination of all atoms in the active set, so that an orthogonal updated residual is obtained. Both Matching Pursuit and Orthogonal Matching Pursuit use the   norm.


The LASSO method solves the   norm version of the problem. In LASSO, instead of projecting the residual on some atom as in Matching Pursuit, the residual is moved by a small step in the direction of the atom iteratively.

Projected Gradient DescentEdit

Projected Gradient Descent methods operate in a similar fashion with the Gradient Descent: the current gradient provides the information to point to new search directions. Since we are looking for a sparse solution, the putative solutions are projected onto the sparse scaffold of   vectors.[2] [3] Because the projection can often be viewed as a thresholding operator, the described algorithm is also known as Iterative Thresholding algorithm. [4] The specific form of the thresholding operator is closely related to the chosen penalty function. For the   norm, the corresponding thresholding operator is known as hard thresholding. For the   norm, the corresponding thresholding operator is known as soft thresholding.

Other methodsEdit

There are several other methods for solving sparse decomposition problems[5]

  • Homotopy method
  • Coordinate descent
  • First order/proximal methods
  • Dantzig selector[6]
  • Sparse Karhunen-Loeve Transform (SKLT)[7]


Sparse approximation has been used in image processing, biology, document analysis, and audio analysis for representation, compression, and estimation.

Audio AnalysisEdit

In the audio domain, sparse approximation has been applied to the analysis of speech, environmental sounds, and music. For classification of everyday sound samples, Adiloglu et al.[8] decomposed sounds in terms of a dictionary of Gammatone functions. Applying matching pursuit, they yielded a point pattern of time-frequency components. They then defined a dissimilarity of two sounds via a one-to-one correspondence between the most prominent atoms of two sounds. Scholler and Purwins [9] have used sparse approximation for the classification of drum sounds based on atom counts resulting from a sparse approximation with a learned sound dictionary including the optimisation of the atom length.

Computer VisionEdit

Sparse coding combined with spatial pyramid matching in approach[10] showed robust recognition performance on Caltech 101 database. In evaluations with the Bag-of-Words model,[11][12] sparse coding was found empirically to outperform other coding approaches on the object category recognition tasks.

See alsoEdit


  1. ^ Donoho, D.L. (2006). "For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution" (PDF). Communications on pure and applied mathematics. Wiley Online Library. 56 (6): 797–829. doi:10.1002/cpa.20132. 
  2. ^ Deanna Needell and Joel Tropp. "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples". arXiv:0803.2392 . 
  3. ^ Anastasios Kyrillidis & Volkan Cevher. "Recipes for hard thresholding methods". 
  4. ^ Yu Wang, Jinshan Zeng, Zhimin Peng, Xiangyu Chang and Zongben Xu. "On Linear Convergence of Adaptively Iterative Thresholding Algorithms for Compressed Sensing". arXiv:1408.6890 . 
  5. ^ Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro. "Sparse Coding and Dictionary Learning for Image Analysis" (PDF). 
  6. ^ Candes, Emmanuel; Tao, Terence (2007). "The Dantzig selector: Statistical estimation when p is much larger than n". Annals of Statistics. 35 (6): 2313–2351. arXiv:math/0506081 . doi:10.1214/009053606000001523. MR 2382644. 
  7. ^ Yilmaz, Onur; Akansu, Ali N. (2015). "Quantization of Eigen Subspace for Sparse Representation" (PDF). IEEE Transactions on Signal Processing. 63 (14): 3616–3625. 
  8. ^ Adiloglu, Kamil; Annies, Robert; Wahlen, Elio; Purwins, Hendrik; Obermayer, Klaus (2012). "A Graphical Representation and Dissimilarity Measure for Basic Everyday Sound Events". IEEE Journal of Selected Topics in Signal Processing. 20 (5): 1542–1552. doi:10.1109/TASL.2012.2184752. 
  9. ^ Scholler, Simon; Purwins, Hendrik (2011). "Sparse Approximations for Drum Sound Classification". IEEE Journal of Selected Topics in Signal Processing. 5 (5): 933–940. doi:10.1109/JSTSP.2011.2161264. 
  10. ^ Jianchao Yang; Kai Yu; Yihong Gong; Huang, T. (2009). "Linear spatial pyramid matching using sparse coding for image classification". 2009 IEEE Conference on Computer Vision and Pattern Recognition. p. 1794. doi:10.1109/CVPR.2009.5206757. ISBN 978-1-4244-3992-8. 
  11. ^ Koniusz, Piotr; Yan, Fei; Mikolajczyk, Krystian (2013-05-01). "Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection". Computer Vision and Image Understanding. 117 (5): 479–492. doi:10.1016/j.cviu.2012.10.010. ISSN 1077-3142. 
  12. ^ Koniusz, Piotr; Yan, Fei; Gosselin, Philippe Henri; Mikolajczyk, Krystian (2017-02-24). "Higher-order occurrence pooling for bags-of-words: Visual concept detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 39 (2): 313–326. doi:10.1109/TPAMI.2016.2545667. ISSN 0162-8828.