SVD - Single Value Decomposition
Trying a different approach on this one by using Jupyter notebooks to generate a more visual document.
- Quick Glace at the API
- Terminology used across blogs and videos that can be confusing
- SVD - Concepts and Examples
- Other decomposition approaches
Quick Glance at how the API works
# Fastai imports to import pd, np, scipy etc.
from fastai import *
from fastai.core import Path
a1 = np.array([[1,2, 3], [1, 4, 1], [2, 3, 5]])
print(a1)
from scipy.linalg import svd
U, s, Vh = svd(a1, full_matrices=True)
U, Vh
(U * s) @ Vh
[[1 2 3]
[1 4 1]
[2 3 5]]
array([[1., 2., 3.],
[1., 4., 1.],
[2., 3., 5.]])
np.sum(U[:, 2]**2)
1.0000000000000007
Terminology
These are terms that are frequently used when discussing SVD.
Matrix Factorisation
Is a process of decomposing a matrix into two matrices that when run through a dot product result in the original matrix.
Orthogonal Matrix
An orthogonal matrix is one which when multipled with it’s transpose returns an Identity matrix
\[AA^t = A^tA = I\]Diagonal Matrix
A diagonal matrix is one which has non-zero values in it’s diagonal
Column Orthonormal
The columns of the matrix are Eculidean length 1. Which means if you compute the sum of squared values of each column it’ll be 1.
a1 = np.array([[1,2, 3], [1, 4, 1], [2, 3, 5]])
print(a1)
[[1 2 3]
[1 4 1]
[2 3 5]]
from scipy.linalg import svd
U, s, Vh = svd(a1, full_matrices=True)
print(U)
array([[-0.466533, 0.184235, -0.865103],
[-0.445706, -0.893781, 0.050018],
[-0.763998, 0.408917, 0.499093]])
np.sum(U[:, 2]**2)
1.0000000000000007
Orthogonal
Any two columns of two matrices (say U and V), their dot product would be 0
Single Value Decomposition
What is SVD
The goal of SVD is to discover latent features present in a document. It’s the most widely used approach for recommendation systems.
It does not care about what the latent features are and does not expect the creator of the recommendor system to define the categories/concepts that define the product (or movies, music, books, etc). This system identifies these categories and assigns relevant weights depending upon the inputs provided by users. The inputs usually are ratings, likes, downvotes etc.
How does it work
The basis of the approach here is the understanding that each product is made up of N categories. To use the movie analogy here let’s say $Movie1$ is composed of $n$ categories at varying weights.
That is
\[Movie1 = (W_1 * Concept1) + (W_2 * Concept_2 ) + .... + (W_n * Concept_n)\]where our concepts could be comedy
, thriller
, action
, horror
etc, meaning the movie is 70% comedy, 20% drama and 10% action.
Similary a user can be defined as someone who different categories/concepts to varying degress.
That is
\[User1 = (UW_1 * Concept1) + (UW_2 * Concept_2 ) + .... + (UW_n * Concept_n)\]Visualising it as a problem
Let’s assume that the input information is provided to us as the following dataFrame
ratings = np.array([[3, 5, 4, 1, 1],
[2, 2, 1, 4, 5],
[3, 3, 5, 4, 3]])
movie_labels = ['Star Wars', 'Departed', 'Saving Pvt Ryan', 'Home Alone', 'Love Actually']
users = ['User1', 'User2', 'User3']
df = pd.DataFrame(ratings, columns=movie_labels, index=users)
df
Star Wars | Departed | Saving Pvt Ryan | Home Alone | Love Actually | |
---|---|---|---|---|---|
User1 | 3 | 5 | 4 | 1 | 1 |
User2 | 2 | 2 | 1 | 4 | 5 |
User3 | 3 | 3 | 5 | 4 | 3 |
So the SVD based factorisation returns 3 values $U$, $E$ and $V$
Where
- $U$ is the left singular matrix
- $E$ is the diagonal matrix (is a diagnonal matrix where the values of the diagonal are represented in decreasing order)
- $V$ is the right singular vector
Our ratings
matrix can therefore be represented as
Computing the values of $U$, $E$ and $V$
from scipy.linalg import svd
U, E, V = svd(ratings, full_matrices=True)
pd.DataFrame(U, columns=['Concept1', 'Concept2', 'Concept3'], index=['User1', 'User2', 'User3'])
Concept1 | Concept2 | Concept3 | |
---|---|---|---|
User1 | -0.533328 | 0.673748 | -0.511493 |
User2 | -0.508224 | -0.738581 | -0.442952 |
User3 | -0.676217 | 0.023714 | 0.736321 |
Our $E$ matrix can be viewed as representing the strength of each concept.
pd.DataFrame(E, index=['Concept1', 'Concept2', 'Concept3'])
0 | |
---|---|
Concept1 | 12.012443 |
Concept2 | 4.690892 |
Concept3 | 1.922695 |
Our $V$ matrix can be viewed as representing the relationship between a movie and concept.
Since we have more movies the algorithm identifies additional concepts that those movies could be attributed to.
pd.DataFrame(V[:3], columns=movie_labels, index=['Concept1', 'Concept2', 'Concept3'])
Star Wars | Departed | Saving Pvt Ryan | Home Alone | Love Actually | |
---|---|---|---|---|---|
Concept1 | -0.386689 | -0.475485 | -0.501365 | -0.438803 | -0.424818 |
Concept2 | 0.131153 | 0.418411 | 0.442342 | -0.465950 | -0.628455 |
Concept3 | -0.109960 | -0.642018 | 0.620318 | 0.344300 | -0.269044 |
Shape of these matrices
So $rating$ is a $3x5$ matrix
- $U$ is a $3x3$ matrix
- $V$ is a $3x1$ matrix
- $E$ is a $5x5$ matrix
Now we can visualise our $U$ matrix as a set of identified concepts to users.
Checking if U, V are orthonormal
Orthonormal is when the sum of the squares
of the values of any column equal 1
np.sum(V[:][0] ** 2)
0.9999999999999999
np.sum(U[:][0] ** 2)
1.0000000000000002
Checking the decomposition of U and V is of the original input
The $E$ matrix is a diagonal matrix that is presently returned as a 3, 1 matrix. The problem for us now is that to reconstruct the original matrix we need to be able to multiply the matrices $U, E, V$
The shape of these however are in the for $U (3x3)$, $E (3,1)$ which if we convert to a diagnoal matrix would be $(3x3)$ and $V (5x5)$.
So we reshape our $E$ to be a $(3x5)$ to do this and call it $S$
-
Create an 0 value matrix of the shape $(3x5)$
S = np.zeros((ratings.shape[0], ratings.shape[1])) S
array([[0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.], [0., 0., 0., 0., 0.]])
- Feed the newly created diagonal matrix into $S$
S[:ratings.shape[0], :ratings.shape[0]] = np.diag(E) S
array([[12.012443, 0. , 0. , 0. , 0. ], [ 0. , 4.690892, 0. , 0. , 0. ], [ 0. , 0. , 1.922695, 0. , 0. ]])
-
Our new shapes look like this
U.shape, Sigma.shape, V.shape
((3, 3), (3, 5), (5, 5))
- Now multiply to validate the factorization
U @ S @ V
array([[3., 5., 4., 1., 1.], [2., 2., 1., 4., 5.], [3., 3., 5., 4., 3.]])
Visualising the movies as concepts
Based on the first 2 concepts our graph shows the Saving Pvt Ryan and Deoarted are similar, which would be a fair assement given they’re more action oriented while Star Wars has a more Scifi theme to it. Similarly Home Alone and Love Actually have a more family theme to them.
concept1 = V[:][0]
concept2 = V[:][1]
fig, ax = plt.subplots(figsize=(5, 5))
ax.scatter(concept1, concept2)
for idx, movie in enumerate(movie_labels):
ax.annotate(movie, (concept1[idx], concept2[idx]))
Other matrix factorization methods that you may stumble upon.
While it’s not essential to know how this happens, it’s useful in visualising what is actually happening behind the scenes during the matrix factorisation process. A lot of times the terms such as orthonormal
categories can trip one over while trying to follow tutorials that are more mathematical in nature.
Most of the actual computation is handled by scipy
and sklearn
so it’s safe to skip this section and look at the examples.
There are multiple ways in which matrices can be decomposed.
-
LU Decomposition
In this a matrix $A$ in decomposed into two matrices $L$ and $U$ where $L$ is considered to have the “Lower” trigular and U the upper triangular. The problem with LU decomposition is that it works on nxn matrices only. The outcome we end up with is $L @ U = A$ and $U @ L \neq A$
So for a matrix of the form
\[A = \begin{bmatrix} 1 & 5 & 12 \\ 1 & 2 & 3 \\ 2 & 4 & 3 \end{bmatrix}\]Where $L$ and $U$ define our decompositions
\[L = \begin{bmatrix} 1 & 0 & 0 \\ L_{21} & 1 & 0 \\ L_{31} & L_{32} & 1 \end{bmatrix}\] \[U = \begin{bmatrix} U_{11} & U_{12} & U_{13} \\ 0 & U_{22} & U_{23} \\ 0 & 0 & U_{32} \end{bmatrix}\]And finally,
\[\begin{bmatrix} 1 & 0 & 0 \\ L_{21} & 1 & 0 \\ L_{31} & L_{32} & 1 \end{bmatrix} X \begin{bmatrix} U_{11} & U_{12} & U_{13} \\ 0 & U_{22} & U_{23} \\ 0 & 0 & U_{32} \end{bmatrix} = \begin{bmatrix} 1 & 5 & 12 \\ 1 & 2 & 3 \\ 2 & 4 & 3 \end{bmatrix}\]The transformations are achieved by using the row manipulation operations described here
LU example
from scipy.linalg import lu
A = np.array([[1, 5, 12],
[1, 2, 3],
[2, 4, 3]])
A
array([[ 1, 5, 12],
[ 1, 2, 3],
[ 2, 4, 3]])
Here P is the permutation matrix which represents the row swaps such that $PA=LU$
P, L, U = lu(A)
print(P)
print("")
print(L)
print("")
print(U)
P @L @ U == A
[[0. 1. 0.]
[0. 0. 1.]
[1. 0. 0.]]
[[1. 0. 0. ]
[0.5 1. 0. ]
[0.5 0. 1. ]]
[[ 2. 4. 3. ]
[ 0. 3. 10.5]
[ 0. 0. 1.5]]
array([[ True, True, True],
[ True, True, True],
[ True, True, True]])