Stress majorization

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of $n$ $m$ -dimensional data items, a configuration $X$ of $n$ points in $r$ $(\ll m)$ -dimensional space is sought that minimizes the so-called stress function $\sigma (X)$ . Usually $r$ is $2$ or $3$ , i.e. the $(n\times r)$ matrix $X$ lists points in $2-$ or $3-$ dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function $\sigma$ is a cost or loss function that measures the squared differences between ideal ( $m$ -dimensional) distances and actual distances in r-dimensional space. It is defined as:

\sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}

where $w_{ij}\geq 0$ is a weight for the measurement between a pair of points $(i,j)$ , $d_{ij}(X)$ is the euclidean distance between $i$ and $j$ and $\delta _{ij}$ is the ideal distance between the points (their separation) in the $m$ -dimensional data space. Note that $w_{ij}$ can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration $X$ which minimizes $\sigma (X)$ gives a plot in which points that are close together correspond to points that are also close together in the original $m$ -dimensional data space.

There are many ways that $\sigma (X)$ could be minimized. For example, Kruskal^[1] recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.^[2] De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds $\sigma$ from above and touches the surface of $\sigma$ at a point $Z$ , called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").

The SMACOF algorithm

The stress function $\sigma$ can be expanded as follows:

\sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}=\sum _{i<j}w_{ij}\delta _{ij}^{2}+\sum _{i<j}w_{ij}d_{ij}^{2}(X)-2\sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)

Note that the first term is a constant $C$ and the second term is quadratic in $X$ (i.e. for the Hessian matrix $V$ the second term is equivalent to tr $X'VX$ ) and therefore relatively easily solved. The third term is bounded by:

\sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)=\,\operatorname {tr} \,X'B(X)X\geq \,\operatorname {tr} \,X'B(Z)Z

where $B(Z)$ has:

b_{ij}=-{\frac {w_{ij}\delta _{ij}}{d_{ij}(Z)}}

for

d_{ij}(Z)\neq 0,i\neq j

and $b_{ij}=0$ for $d_{ij}(Z)=0,i\neq j$

and $b_{ii}=-\sum _{j=1,j\neq i}^{n}b_{ij}$ .

Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg^[3] (pp. 152–153).

Thus, we have a simple quadratic function $\tau (X,Z)$ that majorizes stress:

\sigma (X)=C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(X)X

\leq C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(Z)Z=\tau (X,Z)

The iterative minimization procedure is then:

at the $k^{th}$ step we set $Z\leftarrow X^{k-1}$
$X^{k}\leftarrow \min _{X}\tau (X,Z)$
stop if $\sigma (X^{k-1})-\sigma (X^{k})<\epsilon$ otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw^[2]).

Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.^[4]^[5] That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the $\delta _{ij}$ are usually set to the graph-theoretic distances between nodes $i$ and $j$ and the weights $w_{ij}$ are taken to be $\delta _{ij}^{-\alpha }$ . Here, $\alpha$ is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for $\alpha =2$ .^[6]

References

^ Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", Psychometrika, 29 (1): 1–27, doi:10.1007/BF02289565.
^ ^a ^b de Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in Barra, J. R.; Brodeau, F.; Romie, G.; et al. (eds.), Recent developments in statistics, pp. 133–145.
^ Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag.
^ Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat., 16 (3): 435–450, CiteSeerX 10.1.1.28.9372, doi:10.1007/s001800100077.
^ Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, vol. 3383, Springer-Verlag, pp. 239–250.
^ Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction, 4 (3): 197–229, doi:10.1145/264645.264657.

[1] Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", Psychometrika, 29 (1): 1–27, doi:10.1007/BF02289565.

[de_Leeuw-2] Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in Barra, J. R.; Brodeau, F.; Romie, G.; et al. (eds.), Recent developments in statistics, pp. 133–145.

[borg-3] Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag.

[4] Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat., 16 (3): 435–450, CiteSeerX 10.1.1.28.9372, doi:10.1007/s001800100077.

[5] Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, vol. 3383, Springer-Verlag, pp. 239–250.

[6] Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction, 4 (3): 197–229, doi:10.1145/264645.264657.

[1]

[2]

[3]

[4]

[5]

[6]