Matrix Computations – Wangkun Xu’s Blog

This post summarizes the knowledge from the book Matrix Computation authored by Gene and Charles amd from some other refernces.

Singular Value Decomposition (SVD)

Theorem 1 (SVD) If $A$ is a real $m \times n$ matrix, then there exist orthogonal matrices \[ U = \begin{bmatrix} u_1 & \cdots & u_m \end{bmatrix}, \quad V = \begin{bmatrix} v_1 & \cdots & v_n \end{bmatrix} \] such that \[ U^T A V = \Sigma = \operatorname{diag}(\sigma_1, \ldots, \sigma_p) \in \mathbb{R}^{m \times n},\quad p = \min(m, n), \] where $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_p \geq 0$ are the singular values of $A$. The columns of $U$ and $V$ are the left and right singular vectors of $A$, respectively.

Corollary 1 If $U^T A V = \Sigma$ is the SVD of $A\in\mathbb{R}^{m\times n}$ and $m \geq n$, then for $i=1:n$, $Av_i = \sigma_i u_i$ and $A^Tu_i = \sigma_i v_i$.

Corollary 2 If $A\in\mathbb{R}^{m\times n}$, then \[ \|A\|_2 = \sigma_1, \quad \|A\|_F = \sqrt{\sigma_1^2 + \cdots + \sigma_p^2}. \]

Proof. The proof directly follows from the fact that the unitary transformation preserve both the 2-norm and Frobenius norm.

Corollary 3 (Subspaces) If $A has $r$ positive singular values, then $\operatorname{rank}(A) = r$ and \[ \begin{aligned} \operatorname{range}(A) &= \operatorname{span}\{u_1, \ldots, u_r\}, \\ \operatorname{null}(A) &= \operatorname{span}\{v_{r+1}, \ldots, v_n\} \\ \operatorname{range}(A^T) &= \operatorname{span}\{v_1, \ldots, v_r\}, \\ \operatorname{null}(A^T) &= \operatorname{span}\{u_{r+1}, \ldots, u_m\} \end{aligned} \]

Corollary 4 (Symmetric Matrices) If $A$ is a symmetric matrix, then $A = U \Sigma U^T$. Trivially, $\operatorname{range}(A) = \operatorname{range}(A^T)$ and $\operatorname{null}(A) = \operatorname{null}(A^T)$.

The SVD on sysmmetric matrices can be found by torch.linalg.eigh in PyTorch.

Corollary 5 (Connection to Eigenvalues) \[ A^TA = V \Sigma^T U^T U \Sigma V^T = V \operatorname{diag}(\sigma_1^2,\cdots,\sigma_n^2) V^T, \quad AA^T = U \Sigma V^T V \Sigma U^T = U \operatorname{diag}(\sigma_1^2,\cdots,\sigma_m^2) U^T \]

Projections

$P\in\mathbb{R}^{n\times n}$ is the orthogonal projection onto $S$ if $\operatorname{range}(P) = S$ and $P^2 = P$ and $P^T = P$. It can be proved that the orthogonal projection for a subspace is unique and if the columns of $V$ are orthonormal basis of $S$, then the unique orthogonal projection operator is $V V^T$.

Suppose $A\in\mathbb{R}^{m\times n}$ has SVD $A = U\Sigma V^T$ and $\operatorname{rank}(A) = r$, then the $U$ and $V$ can be partitioned as \[ U = \begin{bmatrix} U_1 & U_2 \end{bmatrix}, \quad V = \begin{bmatrix} V_1 & V_2 \end{bmatrix} \]

Then

$V_1 V_1^T$ is the orthogonal projection onto $\operatorname{range}(A^T)$.
$V_2 V_2^T$ is the orthogonal projection onto $\operatorname{null}(A)$.
$U_1 U_1^T$ is the orthogonal projection onto $\operatorname{range}(A)$.
$U_2 U_2^T$ is the orthogonal projection onto $\operatorname{null}(A^T)$.

Least Squares

Full Rank Case

Full Column Rank

If $A\in\mathbb{R}^{m\times n}$ has full column rank, then the least squares solution of $Ax = b$ is to find $x$ such that $\|Ax - b\|_2$ is minimized. This is because $b$ may not be in the range of $A$ and the solution is to find the projection of $b$ onto the range of $A$. This is an unconstrained optimization problem \[ \min_x \frac{1}{2} \|Ax - b\|_2^2 \]

The first-order optimality condition is \[ A^T(Ax - b) = 0 \]

Because $A$ has full column rank, $A^TA$ is invertible and the solution is \[ x_{LS} = (A^TA)^{-1}A^Tb \]

Alternatively, the LS solution can be directly derived from the definition of SVD. \[ \begin{aligned} \|Ax - b\|_2 & = \|U^T(Ax - b)\|_2 = \|\Sigma V^Tx - U^Tb\|_2 \\ & = \sum_{i=1}^n (\sigma_i v_i^Tx - u_i^Tb)^2 + \sum_{i=n+1}^m (u_i^Tb)^2 \end{aligned} \]

The minimizer is \[ x_{LS} = \sum_{i=1}^n \frac{u_i^Tb}{\sigma_i}v_i \]

Note that in this case $A = U_1\Sigma_1 V^T$ and $x_{LS} = V\Sigma_1^{-1}U_1^Tb$.

Full Row Rank

In this case we have $y\in\operatorname{range}(A)$. The minimum norm solution is to find \[ \min_x \|x\|_2 \quad \text{s.t.} \quad Ax = b \]

From the first-order optimality condition, we can derive $\lambda = -(AA^T)^{-1}y$ and $x^\star = A^T(AA^T)^{-1}y$.

We can indeed check that this $x^\star$ has the minimum norm for all solutions $x$ satisfying $Ax = b$. To start, we have $A(x - x^\star) = 0$. Then, \[ \begin{aligned} (x-x^\star)^Tx^\star &= (x-x^\star)^TA^T(AA^T)^{-1}y \\ & = (A(x-x^\star))^T(AA^T)^{-1}y = 0 \end{aligned} \]

Therefore, $(x-x^\star) \perp x^\star$. So, \[ \|x\|^2 = \|x^\star\|^2 + \|x - x^\star\|^2 \geq \|x^\star\|^2 \] which means that $x^\star$ has the minimum norm of any solution.

Rank Deficient Case

When $A$ has rank $r < n$, the least squares solution is not unique. If $x$ is a minimizer and $z$ is in the null space of $A$, then $x + z$ is also a minimizer. In this case, \[ x_{LS} = \sum_{i=1}^r \frac{u_i^Tb}{\sigma_i}v_i = V_1^T \Sigma_1^{-1} U_1^T b \] minimizes $\|Ax - b\|_2$ and has the smallest 2-norm among all minimizers. The proof directly follows from the full rank case. E.g., \[ \begin{aligned} \|Ax - b\|_2 & = \|U^T(Ax - b)\|_2 = \|\Sigma V^Tx - U^Tb\|_2 \\ & = \sum_{i=1}^r (\sigma_i v_i^Tx - u_i^Tb)^2 + \sum_{i=r+1}^m (u_i^Tb)^2 \end{aligned} \]

Note that least square minimum norm solution means that the solution should be first a least square solution!

Pseudo-Inverse

In general, let $A\in\mathbb{R}^{m\times n}$ and $A = U\Sigma V^T$ be the SVD of $A$. The pseudo-inverse of $A$ is defined as $A^+ = V\Sigma^+U^T$ with \[ \Sigma = \operatorname{diag}(\sigma_1, \ldots, \sigma_r, 0, \ldots, 0), \quad \Sigma^+ = \operatorname{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}, 0, \ldots, 0) \] and $x_{LS} = A^+b$ is the minimum norm least squares solution to $Ax = b$.

If $\operatorname{rank}(A) = n$ (this infers that $m\geq n$), then $A^+ = (A^TA)^{-1}A^T$ is the left inverse of $A$.
If $\operatorname{rank}(A) = m$ (this infers that $m\leq n$), then $A^+ = A^T(AA^T)^{-1}$ is the right inverse of $A$.
If $\operatorname{rank}(A) = m = n$ (square matrix), then $A^+ = A^{-1}$ is the inverse of $A$.

Moreover, the pseudo-inverse can be used to find the projection operator:

$AA^+ = U_1 U_1^T$ is the orthogonal projection onto $\operatorname{range}(A)$.
$A^+A = V_1 V_2^T$ is the orthogonal projection onto $\operatorname{range}(A^T)$.
$I-AA^+ = U_2U_2^T$ is the orthogonal projection onto $\operatorname{null}(A^T)$.
$I-A^+A = V_2V_2^T$ is the orthogonal projection onto $\operatorname{null}(A)$.