I’ve been reading through Goodfellow’s Deep Learning Book, which starts with a review of the applied mathematics and machine learning basics necessary to understand deep learning. While reading through the linear algebra chapter, I realized that I had become a bit rusty working with eigenvalues and matrix decomposition, so I’ve decided to explain them here! It’s been a great exercise to review these topics myself and I hope that you find them helpful as well.

Eigendecomposition

We are often concerned with breaking mathematical objects down into smaller pieces in order to gain a better understanding of its characteristics. A classic example of this is decomposing an integer into its prime factors. For example, $60 = 2^{2} \times 3 \times 5$ , which tells us that 60 is divisible by 4, but not by 8 - not necessarily obvious just by looking at the number 60. In the same way, we can decompose matrices into different representations that give us some quick insight to its structure. A common form of matrix decomposition is called eigenvalue decomposition. Before we get too far, let’s define exactly what eigenvectors and eigenvalues are.

An eigenvector of a square matrix $A$ is a nonzero vector $v$ such that multipication by $A$ alters only the scale of $v$ :

$A v = λ v$

The scalar $λ$ is known as the eigenvalue corresponding to the eigenvector.

Finding eigenvalues

With a bit of algebraic manipulation, we can see that

$\begin{aligned} A v & = λ v \\ A v - λ v & = 0 \\ (A - λ I) v & = 0 \end{aligned}$

This form of the equation is useful to us because it is known that if a matrix is non-invertible, then the determinant of that matrix must equal zero. Hence, if we solve the equation

$det (A - λ I) = 0$

(this is called the characteristic equation) for $λ$ , then we will find all eigenvalues which satisfy $(A - λ I) v = 0$ . Solving this equation can sometimes be tricky, requiring polynomial long-division in order to do so by hand.

Deriving the eigendecomposition matrix

We can decompose the matrix $A$ into its eigenvalues and eigenvectors with the following equation:

$A = V diag(λ) V^{- 1}$

Now we will show how we can derive this equation. Suppose that we have an $k \times k$ matrix $A$ with eigenvectors $v_{1}, v_{2}, \dots, v_{k}$ , and corresponding eigenvalues $λ_{1}, λ_{2}, \dots, λ_{k}$ . We define the matrix $V$ by concatenating all of our eigenvectors into a matrix like so:

$\begin{aligned} V & = [\begin{array}{c} v_{1} & v_{2} & \dots & v_{k} \end{array}] \\ = [\begin{array}{c} v_{1, 1} & v_{2, 1} & \dots & v_{k_{1}} \\ v_{1, 2} & v_{2, 2} & \dots & v_{k, 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ v_{1, k} & \dots & \dots & v_{k, k} \end{array}] \end{aligned}$

We now define the $k \times k$ matrix $diag(λ)$ as

$\begin{aligned} diag(λ) & = [\begin{array}{c} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & \dots & \dots & λ_{k} \end{array}] \end{aligned}$

With all of these pieces, we can see how to decompose the matrix.

$\begin{aligned} A V & = [\begin{array}{c} A v_{1} & A v_{2} & \dots & A v_{k} \end{array}] \\ A V & = [\begin{array}{c} λ_{1} v_{1} & λ_{2} v_{2} & \dots & λ_{k} v_{k} \end{array}] \\ A V & = [\begin{array}{c} λ_{1} v_{1, 1} & λ_{2} v_{2, 1} & \dots & λ_{k} v_{k_{1}} \\ λ_{1} v_{1, 2} & λ_{2} v_{2, 2} & \dots & λ_{k} v_{k, 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ λ_{1} v_{1, k} & \dots & \dots & λ_{k} v_{k, k} \end{array}] \\ A V & = [\begin{array}{c} v_{1, 1} & v_{2, 1} & \dots & v_{k_{1}} \\ v_{1, 2} & v_{2, 2} & \dots & v_{k, 2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ v_{1, k} & \dots & \dots & v_{k, k} \end{array}] [\begin{array}{c} λ_{1} & 0 & \dots & 0 \\ 0 & λ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & \dots & \dots & λ_{k} \end{array}] \\ A V & = V diag(λ) \\ A & = V diag(λ) V^{- 1} \end{aligned}$

This decomposition allows us to analyze certain properties of the matrix. For example, we can conclude that the matrix is singular if and only if any of the eigenvalues are zero. The benefits of this kind of decomposition, however, are limited. The glaring issue is that the eigendecomposition of a matrix is only defined if the matrix is square. For this reason, in practice, we usually resort to singular value decomposition instead, which is defined on all real matrices and gives us the same kind of information about the matrix.

Resources

In the coming months, I hope to learn more about the applications of matrix decomposition in the context of Deep Learning. My understanding is that we often desire to decompose weights matrices associated with neural networks to analyze how a model is learning. This paper delineates the usefulness of SVD in practice. Additionally, Charles Martin writes on the analysis of weights matrix eigenvalues in this accessible article. Finally, if you are looking for a different perspective matrix decomposition, I recommend hadrienj’s series of articles on linear algebra for Deep Learning which follow along with Goodfellow’s book.