Exercises

Contents

Exercises#

SVD#

  1. Let’s have a look at a simple movie ratings matrix of six users and four movies:

    \[\begin{align*} D=\begin{pmatrix} 5 & 5 & 1 & 1\\ 5 & 5 & 1 & 1\\ 5 & 5 & 1 & 1\\ 1 & 1 & 5 & 5\\ 1 & 1 & 5 & 5\\ 5 & 5 & 5 & 5 \end{pmatrix} \end{align*}\]
    • a. Which patterns of movie preferences can you detect in the matrix \(D\)?

    • b. Can you denote a rank-2 factorization of \(D\) which reflects the assignment of users to the patterns you found?

    • c. Compute a rank-2 truncated SVD of \(D\). Do the movie patterns denoted by the SVD solution reflect the patterns you identified?

  2. Consider the movie recommendation matrix from the lecture, whose missing values are imputed with the mean value of \(\mu=3\):

    \[\begin{align*} D=\begin{pmatrix} 5 & \mu & 1 & 1 \\ \mu & 1 & 5 & \mu \\\ 2 & 1 & 5 & 3 \\ 4 & \mu & 4 & 2\\ 5 & 5 & \mu & 1 \\ \mu & 1 & 5 & 3 \end{pmatrix} \end{align*}\]

    In the example of the lecture, we have used a rank of 2. Try using a rank of 1, 3 and 4 and evaluate the obtained recommendations. Which rank would you choose?

  3. Show that the minimum approximation error of a rank-\(r\) matrix factorization of the data \(D\in\mathbb{R}^{n\times d}\) is equal to the sum of the \(\min\{n,d\}-r\) smallest singular values of \(D\):

    \[\sigma^2_{r+1}+\ldots+\sigma^2_{\min\{n,d\}}=\min_{X,Y}\lVert D -YX^\top\rVert^2 \quad \text{s.t. } X\in\mathbb{R}^{d\times r},Y\in\mathbb{R}^{n\times r}.\]

PCA#

  1. Show that the constraint \(Z^\top Z=I\), where \(Z\in\mathbb{R}^{n\times r}\) and \(r<n\), which is imposed for the objective of PCA, implies that \(Z\) has orthogonal columns, and all columns have a Euclidean norm of one.

  2. We define a new feature \(\mathtt{F}_3 = \mathtt{F}_1 + 2\mathtt{F}_2\), given the following data:

    \(\mathtt{F}_1\)

    \(\mathtt{F}_2\)

    2

    -2

    0

    3

    1

    -2

    1

    1

    1. Compute the sample variance of the new feature by computing the new feature values and then computing the variance of these values.

    2. Compute the sample variance of the new feature by means of the formula derived in the lecture: \(\sigma_{\mathtt{F}_3}^2 = \frac1n\lVert (D-\mathbf{1}\mu_{\mathtt{F}}^\top )\alpha\rVert^2\)

    3. Plot the data points and the vector \(\alpha\) which defines the new feature \(\mathtt{F}_3\). Does \(\alpha\) indicate a direction of high or low sample variance in the data? How can you compute the variance in the direction of \(\alpha\)?

    4. Compute the variance and direction of maximum variance in the data.

  3. Given a data matrix \(D\in\mathbb{R}^{n\times d}\), Show that every right singular vector \(V_{\cdot k}\) of the centered data matrix \(C=D -\vvec{1}{\bm \mu_\mathtt{F}^\top }\) indicates a new feature \(\mathtt{F}_{V_{\cdot k}}= V_{1k}\mathtt{F}_1 + \ldots + V_{dk}\mathtt{F}_d\) whose sample variance is given by the corresponding squared singular value divided by the number of samples \(\sigma_k^2/n\).

  4. Given an \(n\times d\) data matrix \(D\), gathering \(n\) observations of \(d\) features \(\mathtt{F}_1,\ldots,\mathtt{F}_d\). We define a new feature \(\displaystyle\mathtt{F}_{d+1}= \sum_{k=1}^d \alpha_k\mathtt{F}_k\).

    1. Show that the sample mean of the new feature can be computed by the matrix-vector product \({\bm \mu_{\mathtt{F}_{d+1}}=\bm \mu_\mathtt{F}^\top \alpha}\), where \(\bm \mu_\mathtt{F}^\top = \begin{pmatrix} \mu_{\mathtt{F}_1} &\ldots & \mu_{\mathtt{F}_d}\end{pmatrix}\) gathers the sample means of all other features.

    2. Show that the sample variance of the new feature can be computed as \(\sigma_{\mathtt{F}_{d+1}}^2 = \frac1n\left\lVert \left(D -\vvec{1}{\bm \mu_\mathtt{F}^\top }\right){\bm\alpha}\right\rVert^2\).