Article · Wikipedia archive · Last revised Jun 17, 2026

Stress majorization

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of -dimensional data items, a configuration of points in -dimensional space is sought that minimizes the so-called stress function . Usually is or , i.e. the matrix lists points in or dimensional Euclidean space so that the result may be visualised. The function is a cost or loss function that measures the squared differences between ideal distances and actual distances in r-dimensional space. It is defined as:

Last revised
Jun 17, 2026
Read time
≈ 5 min
Length
1,170 w
Citations
7
Source

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of n {\displaystyle n} m {\displaystyle m} -dimensional data items, a configuration X {\displaystyle X} of n {\displaystyle n} points in r {\displaystyle r} ( m ) {\displaystyle (\ll m)} -dimensional space is sought that minimizes the so-called stress function σ ( X ) {\displaystyle \sigma (X)} . Usually r {\displaystyle r} is 2 {\displaystyle 2} or 3 {\displaystyle 3} , i.e. the ( n × r ) {\displaystyle (n\times r)} matrix X {\displaystyle X} lists points in 2 {\displaystyle 2-} or 3 {\displaystyle 3-} dimensional Euclidean space so that the result may be visualised (i.e. an MDS plot). The function σ {\displaystyle \sigma } is a cost or loss function that measures the squared differences between ideal ( m {\displaystyle m} -dimensional) distances and actual distances in r-dimensional space. It is defined as:

σ ( X ) = i < j n w i j ( d i j ( X ) δ i j ) 2 {\displaystyle \sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}}

where w i j 0 {\displaystyle w_{ij}\geq 0} is a weight for the measurement between a pair of points ( i , j ) {\displaystyle (i,j)} , d i j ( X ) {\displaystyle d_{ij}(X)} is the euclidean distance between i {\displaystyle i} and j {\displaystyle j} and δ i j {\displaystyle \delta _{ij}} is the ideal distance between the points (their separation) in the m {\displaystyle m} -dimensional data space. Note that w i j {\displaystyle w_{ij}} can be used to specify a degree of confidence in the similarity between points (e.g. 0 can be specified if there is no information for a particular pair).

A configuration X {\displaystyle X} which minimizes σ ( X ) {\displaystyle \sigma (X)} gives a plot in which points that are close together correspond to points that are also close together in the original m {\displaystyle m} -dimensional data space.

There are many ways that σ ( X ) {\displaystyle \sigma (X)} could be minimized. For example, Kruskal1 recommended an iterative steepest descent approach. However, a significantly better (in terms of guarantees on, and rate of, convergence) method for minimizing stress was introduced by Jan de Leeuw.2 De Leeuw's iterative majorization method at each step minimizes a simple convex function which both bounds σ {\displaystyle \sigma } from above and touches the surface of σ {\displaystyle \sigma } at a point Z {\displaystyle Z} , called the supporting point. In convex analysis such a function is called a majorizing function. This iterative majorization process is also referred to as the SMACOF algorithm ("Scaling by MAjorizing a COmplicated Function").

The SMACOF algorithm

The stress function σ {\displaystyle \sigma } can be expanded as follows:

σ ( X ) = i < j n w i j ( d i j ( X ) δ i j ) 2 = i < j w i j δ i j 2 + i < j w i j d i j 2 ( X ) 2 i < j w i j δ i j d i j ( X ) {\displaystyle \sigma (X)=\sum _{i<j\leq n}w_{ij}(d_{ij}(X)-\delta _{ij})^{2}=\sum _{i<j}w_{ij}\delta _{ij}^{2}+\sum _{i<j}w_{ij}d_{ij}^{2}(X)-2\sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)}

Note that the first term is a constant C {\displaystyle C} and the second term is quadratic in X {\displaystyle X} (i.e. for the Hessian matrix V {\displaystyle V} the second term is equivalent to tr X V X {\displaystyle X'VX} ) and therefore relatively easily solved. The third term is bounded by:

i < j w i j δ i j d i j ( X ) = tr X B ( X ) X tr X B ( Z ) Z {\displaystyle \sum _{i<j}w_{ij}\delta _{ij}d_{ij}(X)=\,\operatorname {tr} \,X'B(X)X\geq \,\operatorname {tr} \,X'B(Z)Z}

where B ( Z ) {\displaystyle B(Z)} has:

b i j = w i j δ i j d i j ( Z ) {\displaystyle b_{ij}=-{\frac {w_{ij}\delta _{ij}}{d_{ij}(Z)}}} for d i j ( Z ) 0 , i j {\displaystyle d_{ij}(Z)\neq 0,i\neq j}

and b i j = 0 {\displaystyle b_{ij}=0} for d i j ( Z ) = 0 , i j {\displaystyle d_{ij}(Z)=0,i\neq j}

and b i i = j = 1 , j i n b i j {\displaystyle b_{ii}=-\sum _{j=1,j\neq i}^{n}b_{ij}} .

Proof of this inequality is by the Cauchy-Schwarz inequality, see Borg3 (pp. 152–153).

Thus, we have a simple quadratic function τ ( X , Z ) {\displaystyle \tau (X,Z)} that majorizes stress:

σ ( X ) = C + tr X V X 2 tr X B ( X ) X {\displaystyle \sigma (X)=C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(X)X}
C + tr X V X 2 tr X B ( Z ) Z = τ ( X , Z ) {\displaystyle \leq C+\,\operatorname {tr} \,X'VX-2\,\operatorname {tr} \,X'B(Z)Z=\tau (X,Z)}


The iterative minimization procedure is then:

  • at the k t h {\displaystyle k^{th}} step we set Z X k 1 {\displaystyle Z\leftarrow X^{k-1}}
  • X k min X τ ( X , Z ) {\displaystyle X^{k}\leftarrow \min _{X}\tau (X,Z)}
  • stop if σ ( X k 1 ) σ ( X k ) < ϵ {\displaystyle \sigma (X^{k-1})-\sigma (X^{k})<\epsilon } otherwise repeat.

This algorithm has been shown to decrease stress monotonically (see de Leeuw2).

Use in graph drawing

Stress majorization and algorithms similar to SMACOF also have application in the field of graph drawing.45 That is, one can find a reasonably aesthetically appealing layout for a network or graph by minimizing a stress function over the positions of the nodes in the graph. In this case, the δ i j {\displaystyle \delta _{ij}} are usually set to the graph-theoretic distances between nodes i {\displaystyle i} and j {\displaystyle j} and the weights w i j {\displaystyle w_{ij}} are taken to be δ i j α {\displaystyle \delta _{ij}^{-\alpha }} . Here, α {\displaystyle \alpha } is chosen as a trade-off between preserving long- or short-range ideal distances. Good results have been shown for α = 2 {\displaystyle \alpha =2} .6

References

References

  1. Kruskal, J. B. (1964), "Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis", Psychometrika, 29 (1): 1–27, doi:10.1007/BF02289565.
  2. de Leeuw, J. (1977), "Applications of convex analysis to multidimensional scaling", in Barra, J. R.; Brodeau, F.; Romie, G.; et al. (eds.), Recent developments in statistics, pp. 133–145.
  3. Borg, I.; Groenen, P. (1997), Modern Multidimensional Scaling: theory and applications, New York: Springer-Verlag.
  4. Michailidis, G.; de Leeuw, J. (2001), "Data visualization through graph drawing", Computation Stat., 16 (3): 435–450, CiteSeerX 10.1.1.28.9372, doi:10.1007/s001800100077.
  5. Gansner, E.; Koren, Y.; North, S. (2004), "Graph Drawing by Stress Majorization", Proceedings of 12th Int. Symp. Graph Drawing (GD'04), Lecture Notes in Computer Science, vol. 3383, Springer-Verlag, pp. 239–250.
  6. Cohen, J. (1997), "Drawing graphs to convey proximity: an incremental arrangement method", ACM Transactions on Computer-Human Interaction, 4 (3): 197–229, doi:10.1145/264645.264657.