Part III: Decompositions | Chapter 2

Singular Value Decomposition

The most important matrix factorization in applied mathematics

Historical Context

The SVD was independently discovered by Eugenio Beltrami (1873) and Camille Jordan (1874) for bilinear forms, and later rediscovered by James Joseph Sylvester (1889). Erhard Schmidt extended it to infinite-dimensional operators in 1907. The computational SVD algorithms were developed by Gene Golub and William Kahan (1965), making the decomposition practical for large-scale computation.

The Eckart-Young theorem (1936) established the SVD's role in optimal low-rank approximation, which underpins modern data science: principal component analysis, latent semantic analysis, recommender systems, and image compression all rely on truncated SVDs.

3.1 The SVD Theorem

Theorem (Singular Value Decomposition)

Every matrix $A \in \mathbb{R}^{m \times n}$ can be factored as $A = U\Sigma V^T$ where$U \in \mathbb{R}^{m \times m}$ is orthogonal, $V \in \mathbb{R}^{n \times n}$ is orthogonal, and $\Sigma \in \mathbb{R}^{m \times n}$ has non-negative entries $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$only on the diagonal. The $\sigma_i$ are the singular values.

The SVD exists for every matrix (including rectangular and rank-deficient ones), unlike eigendecomposition. The singular values are the square roots of the eigenvalues of $A^TA$ (or equivalently $AA^T$): $\sigma_i = \sqrt{\lambda_i(A^TA)}$.

The outer product form expresses $A$ as a sum of rank-1 matrices:$A = \sum_{i=1}^r \sigma_i u_i v_i^T$ where $r = \text{rank}(A)$.

3.2 Geometric Interpretation

The SVD reveals that every linear transformation is a rotation → scaling → rotation. The matrix $V^T$ rotates the input space, $\Sigma$ scales along the coordinate axes by the singular values, and $U$ rotates the output space. Under $A$, the unit sphere maps to an ellipsoid whose semi-axes have lengths $\sigma_1, \sigma_2, \ldots$ and directions $u_1, u_2, \ldots$

3.3 Computing the SVD

The SVD can be computed from the eigendecompositions of $A^TA$ and $AA^T$: the columns of $V$ are eigenvectors of $A^TA$, the columns of $U$ are eigenvectors of $AA^T$, and $\sigma_i = \sqrt{\lambda_i}$. In practice, the Golub-Kahan bidiagonalizationfollowed by the implicit QR algorithm is the standard approach, achieving $O(mn^2)$ complexity for an $m \times n$ matrix with $m \geq n$.

3.4 Low-Rank Approximation

Theorem (Eckart-Young-Mirsky)

The best rank-$k$ approximation to $A$ in both the Frobenius and operator norms is the truncated SVD $A_k = \sum_{i=1}^k \sigma_i u_i v_i^T$:

$$\min_{\text{rank}(B) \leq k} \|A - B\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}$$

3.5 Applications

  • Pseudoinverse: $A^+ = V\Sigma^+U^T$ where $\Sigma^+$ inverts the nonzero singular values
  • PCA: The top-$k$ right singular vectors capture the $k$ directions of maximum variance
  • Image compression: Truncated SVD stores $k(m + n + 1)$ values instead of $mn$
  • Condition number: $\kappa(A) = \sigma_1/\sigma_r$ measures sensitivity to perturbation
  • Recommender systems: Matrix factorization via SVD powers collaborative filtering

Computational Laboratory

This simulation computes the SVD, verifies the factorization, demonstrates low-rank approximation with the Eckart-Young theorem, constructs the pseudoinverse, and visualizes the geometric action of SVD as circle-to-ellipse mapping.

Singular Value Decomposition: Theory and Applications

Python
svd.py137 lines

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Practice Problems

Problem 1:Compute the full SVD of $A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix}$.

Solution:

1. Compute $A^TA = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \end{pmatrix}$. Eigenvalues: $\lambda_1 = 2, \lambda_2 = 1, \lambda_3 = 0$.

2. Singular values: $\sigma_1 = \sqrt{2}, \sigma_2 = 1$. The rank is 2.

3. Right singular vectors (eigenvectors of $A^TA$): $v_1 = \frac{1}{\sqrt{2}}(1, 0, 1)^T$, $v_2 = (0, 1, 0)^T$, $v_3 = \frac{1}{\sqrt{2}}(-1, 0, 1)^T$.

4. Left singular vectors: $u_i = \frac{1}{\sigma_i}Av_i$. $u_1 = \frac{1}{\sqrt{2}} \cdot \frac{1}{\sqrt{2}}(2, 0)^T = (1, 0)^T$. $u_2 = (0, 1)^T$.

5. $U = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}$, $\Sigma = \begin{pmatrix} \sqrt{2} & 0 & 0 \\ 0 & 1 & 0 \end{pmatrix}$, $V^T = \begin{pmatrix} 1/\sqrt{2} & 0 & 1/\sqrt{2} \\ 0 & 1 & 0 \\ -1/\sqrt{2} & 0 & 1/\sqrt{2} \end{pmatrix}$.

6. Verify: $U\Sigma V^T = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix} = A$.

Problem 2:Given a matrix with singular values $\sigma = (10, 5, 2, 0.5, 0.1)$, find the best rank-2 approximation error in the Frobenius norm.

Solution:

1. By the Eckart-Young theorem, the best rank-$k$ approximation error is $\|A - A_k\|_F = \sqrt{\sigma_{k+1}^2 + \cdots + \sigma_r^2}$.

2. For $k = 2$: $\|A - A_2\|_F = \sqrt{\sigma_3^2 + \sigma_4^2 + \sigma_5^2} = \sqrt{4 + 0.25 + 0.01} = \sqrt{4.26} \approx 2.064$.

3. The total energy is $\|A\|_F^2 = 100 + 25 + 4 + 0.25 + 0.01 = 129.26$.

4. The fraction captured by rank-2: $(100 + 25)/129.26 = 96.7\%$.

5. The relative error is $\sqrt{4.26/129.26} \approx 18.2\%$ in the Frobenius norm.

6. In the operator (2-) norm: $\|A - A_2\|_2 = \sigma_3 = 2$, capturing $100\%$ of the top two singular directions perfectly.

Problem 3:Compute the pseudoinverse of $A = \begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \end{pmatrix}$ via SVD and solve $Ax = (3, 2)^T$ in the least-norm sense.

Solution:

1. From Problem 1: $A = U\Sigma V^T$ with $\sigma_1 = \sqrt{2}, \sigma_2 = 1$.

2. The pseudoinverse is $A^+ = V\Sigma^+ U^T$ where $\Sigma^+ = \begin{pmatrix} 1/\sqrt{2} & 0 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$.

3. $A^+ = V\Sigma^+ U^T = \begin{pmatrix} 1/2 & 0 \\ 0 & 1 \\ 1/2 & 0 \end{pmatrix}$.

4. The minimum-norm solution: $x^* = A^+ b = \begin{pmatrix} 1/2 & 0 \\ 0 & 1 \\ 1/2 & 0 \end{pmatrix}\begin{pmatrix} 3 \\ 2 \end{pmatrix} = \begin{pmatrix} 3/2 \\ 2 \\ 3/2 \end{pmatrix}$.

5. Verify: $Ax^* = \begin{pmatrix} 3/2 + 3/2 \\ 2 \end{pmatrix} = \begin{pmatrix} 3 \\ 2 \end{pmatrix}$. Correct.

6. This is the shortest solution: $\|x^*\| = \sqrt{9/4 + 4 + 9/4} = \sqrt{17/2} \approx 2.92$, which is minimal among all solutions $x = (3/2 - t, 2, 3/2 + t)^T$.

Problem 4:Explain the connection between SVD and PCA. Given centered data matrix $X$ ($n \times p$), how do the right singular vectors relate to principal components?

Solution:

1. The sample covariance matrix is $C = \frac{1}{n-1}X^TX$. PCA seeks its eigenvectors.

2. Let $X = U\Sigma V^T$ be the SVD. Then $X^TX = V\Sigma^2 V^T$.

3. So $C = \frac{1}{n-1}V\Sigma^2 V^T$. The eigenvectors of $C$ are the columns of $V$ (right singular vectors of $X$).

4. The eigenvalues of $C$ are $\sigma_i^2/(n-1)$, representing the variance along each principal direction.

5. The $k$-th principal component scores are $z_k = Xv_k = \sigma_k u_k$, the projection of data onto the $k$-th right singular vector.

6. The fraction of variance explained by the top $k$ components is $\sum_{i=1}^k \sigma_i^2 / \sum_{i=1}^r \sigma_i^2$. SVD computes PCA without ever forming $X^TX$, which is numerically superior.

Problem 5:A matrix has singular values $\sigma_1 = 100$ and $\sigma_2 = 0.01$. Find its condition number and explain the implications for solving $Ax = b$.

Solution:

1. The condition number is $\kappa(A) = \sigma_{\max}/\sigma_{\min} = 100/0.01 = 10{,}000$.

2. This means a relative perturbation $\|\delta b\|/\|b\| = \epsilon$ can produce a relative error in the solution up to $\|\delta x\|/\|x\| \leq \kappa \cdot \epsilon = 10{,}000\,\epsilon$.

3. In floating-point arithmetic with machine epsilon $\epsilon_{\text{mach}} \approx 10^{-16}$ (double precision), we lose about $\log_{10}(\kappa) = 4$ digits of accuracy.

4. The solution will have at most $16 - 4 = 12$ reliable significant digits.

5. Geometrically, the matrix maps the unit circle to a very elongated ellipse (aspect ratio $10{,}000$:1), so small changes in $b$ along the short axis produce large changes in $x$.

6. Mitigation strategies: use higher precision arithmetic, regularization (e.g., Tikhonov: solve $(A^TA + \alpha I)x = A^Tb$), or truncated SVD (discard components with $\sigma_i$ below a threshold).

Rate this chapter: