During my friend Charles’s visit to the bay area, we had an interesting conversation on SVD as a learning technique. For those of you who are not familiar with SVD, let me first describe what SVD is and how it can be used.

In linear algebra, the Singular Value Decomposition is a factorization of a rectangular matrix of the form:

*U*is an orthogonal matrix. The columns of*U*are orthonormal eigenvectors of*AA*^{T}and also called the left singular vectors of*A*.*U*^{T}U = I*V*is an orthogonal matrix. The columns of*V*are orthonormal eigenvectors of*A*and also called the right singular vectors of^{T}A*A*.*V*^{T}V = I*S*is a diagonal matrix. The diagonal entries are called the singular values of*A*. The non-zero singular values are the square roots of eigenvalues from*U*or*V*in descending order- The diagonal matrix is uniquely determined by
*A*though the matrices*U*and*V*are not.

Let’s look at an example to better understand the concept of SVD.

Define matrix *A* as:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] -3 -2 -1

Correspondingly, *A ^{T}* is:

[,1] [,2]

[1,] 1 -3

[2,] 2 -2

[3,] 3 -1

In order to find

*U*, compute

*AA*:

^{T}[,1] [,2]

[1,] 14 -10

[2,] -10 14

The next step is to find the eigenvalues for

*AA*. Solving the equation for eigenvectors gives the eigenvalues for a matrix. The equation for eigenvector is defined as:

^{T}Writing this in an equation form yields two linear equations:

14 * x1 – 10 * x2 = lambda * x1

=> (14 – lambda) * x1 – 10 * x2 = 0

-10 * x1 + 14 * x2 = lambda * x2

=> –10 * x1 + (14 – lambda) * x2 = 0

Equating the determinant of coefficient matrix to 0 gives lambda values or eigenvalues as 24 and 4. Plugging these values into the two linear equation above yields:

x1 = –x2 and

x1 = x2

Now form a set of eigenvectors, corresponding to each eigenvalue, such that the resultant matrix is easier to work with. Note that the eigenvector for the largest eigenvalue is placed in the first column, eigenvector for the second largest value is placed in the second column and so on.

[,1] [,2]

[1,] -1 -1

[2,] 1 -1

Now we need to convert the above matrix into an orthonormal matrix. This is done by applying the Gram-Schmidt orthonormalization method shown below:

Normalizing the first column vector gives:

[,1]

[1,] -0.7071068

[2,] 0.7071068

Applying the Gram-Schmidt process to the second column vector gives:

[,2]

[1,] -1

[2,] -1

Normalizing the second column vector gives:

[,2]

[1,] -0.7071068

[2,] -0.7071068

The resultant matrix *U* is given as:

[,1] [,2]

[1,] -0.7071068 -0.7071068

[2,] 0.7071068 -0.7071068

Similarly, it’s possible to calculate *V* based on *A ^{T}A*. The resultant matrix

*V*is:

[,1] [,2] [,3]

[1,] -0.5773503 7.071068e-01 0.4082483

[2,] -0.5773503 6.234162e-19 -0.8164966

[3,] -0.5773503 -7.071068e-01 0.4082483

Correspondingly, *V ^{T}* is:

[,1] [,2] [,3]

[1,] -0.5773503 -5.773503e-01 -0.5773503

[2,] 0.7071068 6.234162e-19 -0.7071068

[3,] 0.4082483 -8.164966e-01 0.4082483

The diagonal matrix

*S*is composed of square root of non-zero eigenvalues on the main diagonal in descending order. The eigenvalues for both

*U*and

*V*is the same:

[,1] [,2] [,3]

[1,] 4.898979 0 0

[2,] 0.000000 2 0

Putting all together:*U * S * V ^{T}* =

*A*

> U %*% S %*% t(V)

[,1] [,2] [,3]

[1,] 1 2 3

[2,] -3 -2 -1

**Visualization of SVD**

The matrix decomposition can also be visualized as three geometric transformations – a rotation followed by a stretch followed by another rotation. *V ^{T}* describes an orthonormal basis in the domain,

*U*describes an orthonormal basis in the co-domain and

*S*describes the stretch for vectors in

*V*that leads to vectors in

^{T}*U*.

An example below illustrates a transformation of a circle by a matrix. This is shown in two different ways. The first part is a transformation by a matrix. The second part illustrates each stage of transformation by the ordered matrix components.

**Applications of SVD****Dimensionality Reduction/Data Compression**SVD is a method for identifying and ordering the dimensions along which the data points exhibit the most variation. Once the dimensions are identified, it’s possible to find the best approximation of the original data points using fewer dimensions.

In the SVD method explained above, *S* is a diagonal matrix with non-zero eigenvalues in descending order. These values indicate the variance of the linearly independent components along each dimension. Restricting the number of singular values in *S* has an effect of dimensionality reduction on the data set. This method is also called the reduced SVD method.

In an example below, the least significant eigenvalue in *S* (last value along the diagonal) is dropped thereby reducing the size of matrix *S* as shown by the shaded portion. This results in further reduction in size of *V ^{T}* again shown by the shaded portion. Overall, this reduces the rank of the product matrix providing a good approximation to matrix

*A*. This type of compression is therefore lossy unless the dropped eigenvalues are all 0s.

**Noise Filtering/SVD Filtering**In any experiment related to data collection, measurement errors manifest in form of noise. The technique of reduced SVD discards insignificant eigenvalues thereby not only leading to dimensionality reduction but noise filtering as well! The details on the SVD reduction technique are given in the previous sections.

**Latent Semantic Indexing/Analysis**

SVD can be seen as a method for transforming correlated variables into a set of uncorrelated ones that better expose the various relationships among the original data items. With SVD, the representation of items that share substructure become more similar to each other and items that were dissimilar become more dissimilar as well. In practical terms, the method of SVD on a document corpus yields various topics for those documents. This aspect is the SVD learning technique along the lines of unsupervised clustering.

**References**

Singular Value Decomposition in Wikipedia

Introduction to the Singular Value Decomposition

Singular Value Decomposition Tutorial