02. Matrix Decompositions

02. Matrix Decompositions

Learning Objectives

  • Understand the concept of eigenvalues and eigenvectors and be able to calculate them
  • Explain the spectral theorem for symmetric matrices and properties of positive definite matrices
  • Understand the geometric meaning of singular value decomposition (SVD) and implement it
  • Explain the mathematical principles of low-rank approximation using SVD and PCA
  • Understand the purpose and computation methods of LU, QR, and Cholesky decompositions
  • Give concrete examples of how matrix decompositions are used in machine learning

1. Eigenvalues and Eigenvectors

1.1 Definition and Meaning

An eigenvector $\mathbf{v}$ of matrix $A$ is a non-zero vector that satisfies:

$$A\mathbf{v} = \lambda\mathbf{v}$$

where $\lambda$ is the eigenvalue.

Geometric interpretation: An eigenvector is a special direction that doesn't change direction under linear transformation $A$, only changes magnitude by a factor of $\lambda$.

import numpy as np
import matplotlib.pyplot as plt

# 2x2 ํ–‰๋ ฌ์˜ ๊ณ ์œ ๊ฐ’๊ณผ ๊ณ ์œ ๋ฒกํ„ฐ
A = np.array([[3, 1],
              [0, 2]])

eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"Eigenvalues: {eigenvalues}")
print(f"Eigenvectors:\n{eigenvectors}")

# ๊ฒ€์ฆ: Av = ฮปv
for i in range(len(eigenvalues)):
    v = eigenvectors[:, i]
    lam = eigenvalues[i]
    Av = A @ v
    lam_v = lam * v
    print(f"\nEigenvector {i+1}:")
    print(f"Av = {Av}")
    print(f"ฮปv = {lam_v}")
    print(f"Equal? {np.allclose(Av, lam_v)}")

1.2 Characteristic Equation

Eigenvalues are solutions to the characteristic equation:

$$\det(A - \lambda I) = 0$$

This is an $n$-th degree polynomial with $n$ eigenvalues (including multiplicities).

# ํŠน์„ฑ๋ฐฉ์ •์‹ ์ˆ˜๋™ ๊ณ„์‚ฐ (2x2 ์˜ˆ์ œ)
A = np.array([[4, 2],
              [1, 3]])

# det(A - ฮปI) = 0
# (4-ฮป)(3-ฮป) - 2*1 = 0
# ฮปยฒ - 7ฮป + 10 = 0
# (ฮป-5)(ฮป-2) = 0

eigenvalues = np.linalg.eigvals(A)
print(f"Eigenvalues: {eigenvalues}")  # [5, 2]

# ๊ฒ€์ฆ: ํŠน์„ฑ๋ฐฉ์ •์‹
for lam in eigenvalues:
    det = np.linalg.det(A - lam * np.eye(2))
    print(f"det(A - {lam}I) = {det:.10f}")  # ~0

1.3 Eigendecomposition

If an $n \times n$ matrix $A$ has $n$ linearly independent eigenvectors:

$$A = V\Lambda V^{-1}$$

where: - $V$: Matrix with eigenvectors as columns - $\Lambda$: Diagonal matrix with eigenvalues on diagonal

# ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด
A = np.array([[3, 1],
              [0, 2]])

eigenvalues, V = np.linalg.eig(A)
Lambda = np.diag(eigenvalues)

print(f"V (eigenvectors):\n{V}")
print(f"ฮ› (eigenvalues):\n{Lambda}")

# ์žฌ๊ตฌ์„ฑ: A = Vฮ›V^(-1)
A_reconstructed = V @ Lambda @ np.linalg.inv(V)
print(f"A reconstructed:\n{A_reconstructed}")
print(f"Original A:\n{A}")
print(f"Equal? {np.allclose(A, A_reconstructed)}")

1.4 Matrix Powers and Functions

Matrix powers become easier using eigendecomposition:

$$A^k = V\Lambda^k V^{-1}$$

Matrix functions can also be defined:

$$f(A) = Vf(\Lambda)V^{-1}$$

# ํ–‰๋ ฌ ๊ฑฐ๋“ญ์ œ๊ณฑ
A = np.array([[2, 1],
              [1, 2]])

eigenvalues, V = np.linalg.eig(A)
Lambda = np.diag(eigenvalues)

# A^10 ๊ณ„์‚ฐ (๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•)
# ๋ฐฉ๋ฒ• 1: ์ง์ ‘ ๊ณฑ์…ˆ
A_10_direct = np.linalg.matrix_power(A, 10)

# ๋ฐฉ๋ฒ• 2: ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด ์ด์šฉ
Lambda_10 = np.diag(eigenvalues**10)
A_10_eigen = V @ Lambda_10 @ np.linalg.inv(V)

print(f"A^10 (direct):\n{A_10_direct}")
print(f"A^10 (eigendecomposition):\n{A_10_eigen.real}")
print(f"Equal? {np.allclose(A_10_direct, A_10_eigen.real)}")

# ํ–‰๋ ฌ ์ง€์ˆ˜ ํ•จ์ˆ˜
from scipy.linalg import expm
exp_A_scipy = expm(A)
exp_Lambda = np.diag(np.exp(eigenvalues))
exp_A_eigen = V @ exp_Lambda @ np.linalg.inv(V)
print(f"exp(A) match? {np.allclose(exp_A_scipy, exp_A_eigen.real)}")

2. Spectral Theorem for Symmetric Matrices

2.1 Spectral Theorem

For a real symmetric matrix $A = A^T$: 1. All eigenvalues are real 2. Eigenvectors corresponding to different eigenvalues are orthogonal 3. Orthogonally diagonalizable: $A = Q\Lambda Q^T$ (where $Q$ is orthogonal)

# ๋Œ€์นญ ํ–‰๋ ฌ์˜ ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด
A_sym = np.array([[4, 2, 0],
                  [2, 3, 1],
                  [0, 1, 2]])

# ๋Œ€์นญ ํ–‰๋ ฌ์ธ์ง€ ํ™•์ธ
print(f"Is symmetric? {np.allclose(A_sym, A_sym.T)}")

eigenvalues, Q = np.linalg.eigh(A_sym)  # ๋Œ€์นญ ํ–‰๋ ฌ ์ „์šฉ ํ•จ์ˆ˜
Lambda = np.diag(eigenvalues)

print(f"Eigenvalues: {eigenvalues}")
print(f"Q (orthogonal matrix):\n{Q}")

# ์ง๊ต์„ฑ ํ™•์ธ
print(f"Q^T Q = I?\n{Q.T @ Q}")
print(f"Is orthogonal? {np.allclose(Q.T @ Q, np.eye(3))}")

# ์žฌ๊ตฌ์„ฑ: A = Qฮ›Q^T
A_reconstructed = Q @ Lambda @ Q.T
print(f"Reconstruction error: {np.linalg.norm(A_sym - A_reconstructed)}")

2.2 Positive Definite Matrix

A symmetric matrix $A$ is positive definite if:

$$\mathbf{x}^T A \mathbf{x} > 0 \quad \forall \mathbf{x} \neq \mathbf{0}$$

Equivalent conditions: - All eigenvalues are positive - All leading principal minors are positive - Cholesky decomposition exists

Positive definite matrices are very important in machine learning (covariance matrices, Hessians, etc.).

# ์–‘์ •์น˜ ํ–‰๋ ฌ ์˜ˆ์ œ
A_pd = np.array([[2, -1, 0],
                 [-1, 2, -1],
                 [0, -1, 2]])

eigenvalues = np.linalg.eigvalsh(A_pd)
print(f"Eigenvalues: {eigenvalues}")
print(f"All positive? {np.all(eigenvalues > 0)}")

# x^T A x > 0 ํ™•์ธ
x = np.random.randn(3)
quadratic_form = x.T @ A_pd @ x
print(f"x^T A x = {quadratic_form}")
print(f"Positive? {quadratic_form > 0}")

# ๋ฐ˜์–‘์ •์น˜(positive semidefinite) ์˜ˆ์ œ
A_psd = np.array([[1, 1],
                  [1, 1]])
eigenvalues_psd = np.linalg.eigvalsh(A_psd)
print(f"PSD eigenvalues: {eigenvalues_psd}")  # [0, 2]
print(f"All non-negative? {np.all(eigenvalues_psd >= 0)}")

2.3 Geometric Interpretation of Symmetric Matrices

Symmetric matrices define ellipsoids. Eigenvectors are the directions of the principal axes, and eigenvalues correspond to the length of each axis.

# 2D ํƒ€์› ์‹œ๊ฐํ™”
A = np.array([[3, 1],
              [1, 2]])

eigenvalues, eigenvectors = np.linalg.eigh(A)

# ๋‹จ์œ„์›์„ A๋กœ ๋ณ€ํ™˜
theta = np.linspace(0, 2*np.pi, 100)
unit_circle = np.array([np.cos(theta), np.sin(theta)])
ellipse = A @ unit_circle

# ์‹œ๊ฐํ™”
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 6))

# ๋‹จ์œ„์›
ax1.plot(unit_circle[0], unit_circle[1], 'b-', label='Unit circle')
ax1.set_aspect('equal')
ax1.grid(True)
ax1.legend()
ax1.set_title('Before transformation')

# ํƒ€์›
ax2.plot(ellipse[0], ellipse[1], 'r-', label='Ellipse')
# ๊ณ ์œ ๋ฒกํ„ฐ ๊ทธ๋ฆฌ๊ธฐ
for i in range(2):
    scale = np.sqrt(eigenvalues[i])
    v = eigenvectors[:, i] * scale
    ax2.arrow(0, 0, v[0], v[1], head_width=0.2, head_length=0.2,
              fc=f'C{i}', ec=f'C{i}', label=f'โˆšฮป{i+1} * v{i+1}')
ax2.set_aspect('equal')
ax2.grid(True)
ax2.legend()
ax2.set_title('After transformation by A')

plt.tight_layout()
plt.show()

3. Singular Value Decomposition (SVD)

3.1 SVD Definition

Every $m \times n$ matrix $A$ can be decomposed as:

$$A = U\Sigma V^T$$

where: - $U$: $m \times m$ orthogonal matrix (left singular vectors) - $\Sigma$: $m \times n$ diagonal matrix (singular values, $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$) - $V$: $n \times n$ orthogonal matrix (right singular vectors)

# SVD ๊ณ„์‚ฐ
A = np.array([[3, 1, 1],
              [-1, 3, 1]])

U, sigma, VT = np.linalg.svd(A, full_matrices=True)

print(f"A shape: {A.shape}")
print(f"U shape: {U.shape}")
print(f"sigma: {sigma}")  # 1D ๋ฐฐ์—ด
print(f"V^T shape: {VT.shape}")

# ฮฃ ํ–‰๋ ฌ ์žฌ๊ตฌ์„ฑ (m x n)
Sigma = np.zeros_like(A, dtype=float)
Sigma[:min(A.shape), :min(A.shape)] = np.diag(sigma)

print(f"Sigma shape: {Sigma.shape}")
print(f"Sigma:\n{Sigma}")

# ์žฌ๊ตฌ์„ฑ ํ™•์ธ
A_reconstructed = U @ Sigma @ VT
print(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed)}")

3.2 Geometric Meaning of SVD

SVD breaks down any linear transformation into three steps: 1. $V^T$: Rotation/reflection 2. $\Sigma$: Axis-aligned scaling 3. $U$: Rotation/reflection

  V^T          ฮฃ           U
 โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”€โ”€โ”€โ”€โ”€โ”€โ”€    โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 Rotate  โ†’  Scale   โ†’  Rotate
# SVD์˜ ๊ธฐํ•˜ํ•™์  ํ•ด์„
A = np.array([[3, 1],
              [1, 2]])

U, sigma, VT = np.linalg.svd(A)

# ๋‹จ์œ„์›
theta = np.linspace(0, 2*np.pi, 100)
circle = np.array([np.cos(theta), np.sin(theta)])

# ๊ฐ ๋‹จ๊ณ„ ์ ์šฉ
step1 = VT @ circle  # ์ฒซ ๋ฒˆ์งธ ํšŒ์ „
Sigma_2d = np.diag(sigma)
step2 = Sigma_2d @ step1  # ์Šค์ผ€์ผ๋ง
step3 = U @ step2  # ๋‘ ๋ฒˆ์งธ ํšŒ์ „

# ์ตœ์ข… ๊ฒฐ๊ณผ (ํ•œ๋ฒˆ์—)
final = A @ circle

# ์‹œ๊ฐํ™”
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

axes[0, 0].plot(circle[0], circle[1], 'b-')
axes[0, 0].set_title('1. Original circle')
axes[0, 0].set_aspect('equal')
axes[0, 0].grid(True)

axes[0, 1].plot(step1[0], step1[1], 'g-')
axes[0, 1].set_title('2. After V^T (rotation)')
axes[0, 1].set_aspect('equal')
axes[0, 1].grid(True)

axes[0, 2].plot(step2[0], step2[1], 'm-')
axes[0, 2].set_title('3. After ฮฃ (scaling)')
axes[0, 2].set_aspect('equal')
axes[0, 2].grid(True)

axes[1, 0].plot(step3[0], step3[1], 'r-')
axes[1, 0].set_title('4. After U (rotation)')
axes[1, 0].set_aspect('equal')
axes[1, 0].grid(True)

axes[1, 1].plot(final[0], final[1], 'k-', linewidth=2)
axes[1, 1].set_title('5. Final: A @ circle')
axes[1, 1].set_aspect('equal')
axes[1, 1].grid(True)

# ๊ฒ€์ฆ
axes[1, 2].plot(step3[0], step3[1], 'r-', alpha=0.5, label='SVD steps')
axes[1, 2].plot(final[0], final[1], 'k--', label='Direct')
axes[1, 2].set_title('Verification')
axes[1, 2].set_aspect('equal')
axes[1, 2].grid(True)
axes[1, 2].legend()

plt.tight_layout()
plt.show()

3.3 Relationship Between SVD and Eigendecomposition

SVD can be obtained from eigendecomposition of $A^TA$ and $AA^T$:

$$A^TA = V\Sigma^T\Sigma V^T = V\Sigma^2 V^T$$ $$AA^T = U\Sigma\Sigma^T U^T = U\Sigma^2 U^T$$

That is: - $V$ is the eigenvectors of $A^TA$ - $U$ is the eigenvectors of $AA^T$ - $\sigma_i^2$ are the eigenvalues of $A^TA$ (or $AA^T$)

# SVD์™€ ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด์˜ ๊ด€๊ณ„
A = np.array([[3, 1, 1],
              [-1, 3, 1]])

# ๋ฐฉ๋ฒ• 1: ์ง์ ‘ SVD
U_svd, sigma_svd, VT_svd = np.linalg.svd(A)

# ๋ฐฉ๋ฒ• 2: A^T A์˜ ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด
ATA = A.T @ A
eigenvalues_ATA, V_eigen = np.linalg.eigh(ATA)
# ๊ณ ์œ ๊ฐ’์„ ๋‚ด๋ฆผ์ฐจ์ˆœ์œผ๋กœ ์ •๋ ฌ
idx = eigenvalues_ATA.argsort()[::-1]
eigenvalues_ATA = eigenvalues_ATA[idx]
V_eigen = V_eigen[:, idx]

sigma_from_eigen = np.sqrt(eigenvalues_ATA)

print(f"Singular values (SVD): {sigma_svd}")
print(f"sqrt(eigenvalues of A^T A): {sigma_from_eigen}")
print(f"V from SVD:\n{VT_svd.T}")
print(f"V from eigendecomposition:\n{V_eigen}")

3.4 Low-Rank Approximation

Eckart-Young-Mirsky theorem: The best rank $k$ approximation in the Frobenius norm sense is:

$$A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T = U_k \Sigma_k V_k^T$$

# ์ €๋žญํฌ ๊ทผ์‚ฌ ์˜ˆ์ œ: ์ด๋ฏธ์ง€ ์••์ถ•
from sklearn.datasets import load_sample_image
import matplotlib.pyplot as plt

# ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€ ๋กœ๋“œ
china = load_sample_image("china.jpg")
# ๊ทธ๋ ˆ์ด์Šค์ผ€์ผ๋กœ ๋ณ€ํ™˜
china_gray = china.mean(axis=2)

# SVD
U, sigma, VT = np.linalg.svd(china_gray, full_matrices=False)

# ๋‹ค์–‘ํ•œ ๋žญํฌ๋กœ ๊ทผ์‚ฌ
ranks = [5, 20, 50, 100]
fig, axes = plt.subplots(2, 3, figsize=(15, 10))
axes = axes.ravel()

axes[0].imshow(china_gray, cmap='gray')
axes[0].set_title(f'Original (rank={np.linalg.matrix_rank(china_gray)})')
axes[0].axis('off')

for i, k in enumerate(ranks, 1):
    # ๋žญํฌ k ๊ทผ์‚ฌ
    A_k = U[:, :k] @ np.diag(sigma[:k]) @ VT[:k, :]

    axes[i].imshow(A_k, cmap='gray')

    # ์••์ถ•๋ฅ  ๊ณ„์‚ฐ
    original_size = china_gray.size
    compressed_size = k * (U.shape[0] + VT.shape[1] + 1)
    compression_ratio = original_size / compressed_size

    axes[i].set_title(f'Rank {k} (compression: {compression_ratio:.1f}x)')
    axes[i].axis('off')

# ํŠน์ด๊ฐ’ ๋ถ„ํฌ
axes[-1].plot(sigma, 'b-')
axes[-1].set_xlabel('Index')
axes[-1].set_ylabel('Singular value')
axes[-1].set_title('Singular value spectrum')
axes[-1].set_yscale('log')
axes[-1].grid(True)

plt.tight_layout()
plt.show()

4. PCA: From SVD to Principal Component Analysis

4.1 Covariance Matrix

For a data matrix $X$ ($n$ samples ร— $d$ features), the covariance matrix is:

$$C = \frac{1}{n-1}X^TX \quad \text{(assuming data is centered)}$$

The covariance matrix is symmetric and positive semidefinite.

# ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ
np.random.seed(42)
X = np.random.randn(100, 3)

# ๋ฐ์ดํ„ฐ ์ค‘์‹ฌํ™”
X_centered = X - X.mean(axis=0)

# ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ (๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•)
cov_manual = (X_centered.T @ X_centered) / (len(X) - 1)
cov_numpy = np.cov(X.T)

print(f"Manual covariance:\n{cov_manual}")
print(f"NumPy covariance:\n{cov_numpy}")
print(f"Equal? {np.allclose(cov_manual, cov_numpy)}")

# ๋Œ€์นญ์„ฑ ํ™•์ธ
print(f"Is symmetric? {np.allclose(cov_manual, cov_manual.T)}")

4.2 Mathematical Derivation of PCA

Goal: Find the direction (principal component) that maximizes data variance

The first principal component $\mathbf{w}_1$ solves:

$$\max_{\mathbf{w}} \mathbf{w}^T C \mathbf{w} \quad \text{subject to } \|\mathbf{w}\| = 1$$

Solution: The eigenvector corresponding to the largest eigenvalue of covariance matrix $C$

The $k$-th principal component is the direction orthogonal to the previous $k-1$ components that maximizes variance.

# PCA ์ˆ˜๋™ ๊ตฌํ˜„
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data

# 1. ๋ฐ์ดํ„ฐ ์ค‘์‹ฌํ™”
X_centered = X - X.mean(axis=0)

# 2. ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ
cov = np.cov(X_centered.T)

# 3. ๊ณ ์œ ๊ฐ’ ๋ถ„ํ•ด
eigenvalues, eigenvectors = np.linalg.eigh(cov)

# 4. ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print(f"Eigenvalues (variances): {eigenvalues}")
print(f"Principal components (eigenvectors):\n{eigenvectors}")

# 5. ๋ณ€ํ™˜ (์ฒ˜์Œ 2๊ฐœ ์ฃผ์„ฑ๋ถ„์œผ๋กœ ํˆฌ์˜)
k = 2
X_pca = X_centered @ eigenvectors[:, :k]

print(f"Transformed data shape: {X_pca.shape}")

4.3 PCA Using SVD

SVD can be used without explicitly computing the covariance matrix:

$$X = U\Sigma V^T \quad \Rightarrow \quad X^TX = V\Sigma^2 V^T$$

Therefore, the columns of $V$ are the principal components, and $\sigma_i^2/(n-1)$ is the variance.

# SVD๋ฅผ ์ด์šฉํ•œ PCA
X_centered = X - X.mean(axis=0)

# SVD
U, sigma, VT = np.linalg.svd(X_centered, full_matrices=False)

# ์ฃผ์„ฑ๋ถ„ = V์˜ ์—ด๋ฒกํ„ฐ
principal_components = VT.T
print(f"Principal components (from SVD):\n{principal_components}")

# ์„ค๋ช…๋œ ๋ถ„์‚ฐ
explained_variance = (sigma**2) / (len(X) - 1)
print(f"Explained variance: {explained_variance}")

# ๋ถ„์‚ฐ ๋น„์œจ
explained_variance_ratio = explained_variance / explained_variance.sum()
print(f"Explained variance ratio: {explained_variance_ratio}")

# ๋ณ€ํ™˜
X_pca_svd = X_centered @ principal_components[:, :2]
print(f"Same as eigendecomposition? {np.allclose(np.abs(X_pca), np.abs(X_pca_svd))}")

4.4 PCA Visualization and Interpretation

from sklearn.decomposition import PCA as SklearnPCA

# scikit-learn PCA
pca = SklearnPCA(n_components=2)
X_sklearn = pca.fit_transform(X)

# ์‹œ๊ฐํ™”
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# ๋ณ€ํ™˜๋œ ๋ฐ์ดํ„ฐ
for i, target_name in enumerate(iris.target_names):
    mask = iris.target == i
    axes[0].scatter(X_sklearn[mask, 0], X_sklearn[mask, 1],
                   label=target_name, alpha=0.7)
axes[0].set_xlabel(f'PC1 ({pca.explained_variance_ratio_[0]:.2%})')
axes[0].set_ylabel(f'PC2 ({pca.explained_variance_ratio_[1]:.2%})')
axes[0].set_title('PCA of Iris Dataset')
axes[0].legend()
axes[0].grid(True)

# ์„ค๋ช…๋œ ๋ถ„์‚ฐ
axes[1].bar(range(1, len(pca.explained_variance_ratio_)+1),
           pca.explained_variance_ratio_)
axes[1].set_xlabel('Principal Component')
axes[1].set_ylabel('Explained Variance Ratio')
axes[1].set_title('Scree Plot')
axes[1].grid(True)

plt.tight_layout()
plt.show()

# ์ฃผ์„ฑ๋ถ„ ๋กœ๋”ฉ (์›๋ž˜ ํ”ผ์ฒ˜์™€์˜ ์ƒ๊ด€๊ด€๊ณ„)
loadings = pca.components_.T * np.sqrt(pca.explained_variance_)
print("\nFeature loadings:")
for i, feature in enumerate(iris.feature_names):
    print(f"{feature:20s}: PC1={loadings[i,0]:6.3f}, PC2={loadings[i,1]:6.3f}")

4.5 Dimensionality Reduction and Reconstruction

PCA can compress data to lower dimensions and reconstruct it.

# ์ฐจ์› ์ถ•์†Œ์™€ ์žฌ๊ตฌ์„ฑ
pca_full = SklearnPCA(n_components=4)
X_transformed = pca_full.fit_transform(X)

# ๋‹ค์–‘ํ•œ ์„ฑ๋ถ„ ๊ฐœ์ˆ˜๋กœ ์žฌ๊ตฌ์„ฑ
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.ravel()

for idx, n_components in enumerate([1, 2, 3, 4]):
    # ์ฒ˜์Œ n๊ฐœ ์„ฑ๋ถ„๋งŒ ์‚ฌ์šฉ
    X_reduced = X_transformed[:, :n_components]

    # ์žฌ๊ตฌ์„ฑ
    X_reconstructed = pca_full.inverse_transform(
        np.column_stack([X_reduced,
                        np.zeros((len(X), 4-n_components))])
    )

    # ์žฌ๊ตฌ์„ฑ ์˜ค์ฐจ
    reconstruction_error = np.mean((X - X_reconstructed)**2)
    explained_var = pca_full.explained_variance_ratio_[:n_components].sum()

    # ์ฒซ ๋‘ ํ”ผ์ฒ˜๋งŒ ์‹œ๊ฐํ™”
    axes[idx].scatter(X[:, 0], X[:, 1], alpha=0.3, label='Original')
    axes[idx].scatter(X_reconstructed[:, 0], X_reconstructed[:, 1],
                     alpha=0.3, label='Reconstructed')
    axes[idx].set_xlabel('Feature 1')
    axes[idx].set_ylabel('Feature 2')
    axes[idx].set_title(f'{n_components} PCs (Var: {explained_var:.2%}, MSE: {reconstruction_error:.3f})')
    axes[idx].legend()
    axes[idx].grid(True)

plt.tight_layout()
plt.show()

5. Other Matrix Decompositions

5.1 LU Decomposition

Decompose square matrix $A$ into lower triangular $L$ and upper triangular $U$:

$$A = LU$$

Uses: Solving systems of equations $Ax = b$, computing determinants

from scipy.linalg import lu

# LU ๋ถ„ํ•ด
A = np.array([[2, 1, 1],
              [4, -6, 0],
              [-2, 7, 2]])

P, L, U = lu(A)

print(f"P (permutation matrix):\n{P}")
print(f"L (lower triangular):\n{L}")
print(f"U (upper triangular):\n{U}")

# ์žฌ๊ตฌ์„ฑ
A_reconstructed = P @ L @ U
print(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed)}")

# LU ๋ถ„ํ•ด๋ฅผ ์ด์šฉํ•œ ๋ฐฉ์ •์‹ ํ’€์ด
b = np.array([4, 2, 6])

# Ax = b๋ฅผ LUx = b๋กœ ๋ณ€ํ™˜
# 1. Ly = Pb ํ’€๊ธฐ (์ „๋ฐฉ ๋Œ€์ž…)
# 2. Ux = y ํ’€๊ธฐ (ํ›„๋ฐฉ ๋Œ€์ž…)

from scipy.linalg import solve_triangular

y = solve_triangular(L, P @ b, lower=True)
x = solve_triangular(U, y, lower=False)

print(f"Solution: {x}")
print(f"Verification: Ax = {A @ x}")

5.2 QR Decomposition

Decompose matrix $A$ into orthogonal matrix $Q$ and upper triangular matrix $R$:

$$A = QR$$

Uses: Least squares problems, Gram-Schmidt orthogonalization

# QR ๋ถ„ํ•ด
A = np.array([[1, 1, 0],
              [1, 0, 1],
              [0, 1, 1]])

Q, R = np.linalg.qr(A)

print(f"Q (orthogonal):\n{Q}")
print(f"R (upper triangular):\n{R}")

# ์ง๊ต์„ฑ ํ™•์ธ
print(f"Q^T Q:\n{Q.T @ Q}")

# ์žฌ๊ตฌ์„ฑ
A_reconstructed = Q @ R
print(f"Reconstruction error: {np.linalg.norm(A - A_reconstructed)}")

# ์ตœ์†Œ์ž์Šน ๋ฌธ์ œ ํ’€๊ธฐ: Ax = b (overdetermined)
A_over = np.array([[1, 1],
                   [1, 2],
                   [1, 3],
                   [1, 4]])
b = np.array([2, 3, 5, 6])

Q, R = np.linalg.qr(A_over)
# R์€ ์ฒ˜์Œ 2ํ–‰๋งŒ ์ƒ์‚ผ๊ฐ
R_square = R[:2, :]
Q_reduced = Q[:, :2]

# x = R^(-1) Q^T b
x = np.linalg.solve(R_square, Q_reduced.T @ b)
print(f"Least squares solution: {x}")

5.3 Cholesky Decomposition

Decompose positive definite symmetric matrix $A$ into lower triangular $L$:

$$A = LL^T$$

Uses: Gaussian sampling, numerically stable computations

# Cholesky ๋ถ„ํ•ด
A_pd = np.array([[4, 2, 2],
                 [2, 5, 1],
                 [2, 1, 6]])

# ์–‘์ •์น˜ ํ™•์ธ
eigenvalues = np.linalg.eigvalsh(A_pd)
print(f"Eigenvalues: {eigenvalues}")
print(f"Is positive definite? {np.all(eigenvalues > 0)}")

# Cholesky ๋ถ„ํ•ด
L = np.linalg.cholesky(A_pd)
print(f"L:\n{L}")

# ์žฌ๊ตฌ์„ฑ
A_reconstructed = L @ L.T
print(f"Reconstruction error: {np.linalg.norm(A_pd - A_reconstructed)}")

# ์‘์šฉ: ๋‹ค๋ณ€๋Ÿ‰ ๊ฐ€์šฐ์‹œ์•ˆ ์ƒ˜ํ”Œ๋ง
# N(ฮผ, ฮฃ)์—์„œ ์ƒ˜ํ”Œ๋งํ•˜๋ ค๋ฉด: ฮผ + L @ z (z ~ N(0, I))
mu = np.array([1, 2, 3])
Sigma = A_pd
L = np.linalg.cholesky(Sigma)

# ํ‘œ์ค€ ์ •๊ทœ๋ถ„ํฌ์—์„œ ์ƒ˜ํ”Œ๋ง
n_samples = 1000
Z = np.random.randn(n_samples, 3)

# ๋ณ€ํ™˜
samples = mu + (L @ Z.T).T

# ๊ฒ€์ฆ
print(f"Sample mean: {samples.mean(axis=0)}")
print(f"True mean: {mu}")
print(f"Sample covariance:\n{np.cov(samples.T)}")
print(f"True covariance:\n{Sigma}")

5.4 Comparison of Decomposition Methods

Decomposition Condition Computational Complexity Main Uses
Eigendecomposition Square matrix $O(n^3)$ Matrix powers, dynamical systems
SVD Any matrix $O(mn^2)$ Low-rank approximation, PCA, recommender systems
LU Square matrix $O(n^3)$ Systems of equations, determinants
QR Any matrix $O(mn^2)$ Least squares, orthogonalization
Cholesky Positive definite $O(n^3)$ Sampling, numerical stability

6. Applications of Matrix Decompositions in ML

6.1 Recommender Systems: Matrix Factorization

Approximate user-item rating matrix $R$ with low rank:

$$R \approx UV^T$$

# ๊ฐ„๋‹จํ•œ ์ถ”์ฒœ ์‹œ์Šคํ…œ
from sklearn.decomposition import TruncatedSVD

# ์‚ฌ์šฉ์ž-์˜ํ™” ํ‰์  ํ–‰๋ ฌ (5 users ร— 6 movies)
R = np.array([
    [5, 3, 0, 1, 0, 0],
    [4, 0, 0, 1, 0, 0],
    [1, 1, 0, 5, 0, 0],
    [0, 0, 0, 4, 4, 5],
    [0, 0, 5, 0, 4, 4]
])

# 0์€ ๊ฒฐ์ธก๊ฐ’
mask = R > 0

# SVD (๋žญํฌ 2 ๊ทผ์‚ฌ)
svd = TruncatedSVD(n_components=2)
U = svd.fit_transform(R)
VT = svd.components_

# ์žฌ๊ตฌ์„ฑ
R_approx = U @ VT

print("Original ratings:")
print(R)
print("\nApproximated ratings:")
print(R_approx)

# ๊ฒฐ์ธก๊ฐ’ ์˜ˆ์ธก
print("\nPredicted ratings for missing values:")
for i in range(R.shape[0]):
    for j in range(R.shape[1]):
        if R[i, j] == 0:
            print(f"User {i}, Movie {j}: {R_approx[i, j]:.2f}")

6.2 Image Compression

Image compression using SVD was shown earlier. SVD can be applied independently to each color channel.

6.3 Spectral Clustering

Clustering using eigenvectors of graph Laplacian matrix:

from sklearn.cluster import SpectralClustering
from sklearn.datasets import make_moons

# ๋น„์„ ํ˜• ๋ถ„๋ฆฌ ๊ฐ€๋Šฅํ•œ ๋ฐ์ดํ„ฐ
X, y = make_moons(n_samples=200, noise=0.05, random_state=42)

# ์ŠคํŽ™ํŠธ๋Ÿผ ๊ตฐ์ง‘ํ™”
spectral = SpectralClustering(n_clusters=2, affinity='nearest_neighbors',
                               random_state=42)
labels = spectral.fit_predict(X)

# ์‹œ๊ฐํ™”
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='viridis')
plt.title('True labels')

plt.subplot(1, 2, 2)
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title('Spectral clustering')

plt.tight_layout()
plt.show()

6.4 Feature Extraction: Kernel PCA

Combining kernel trick with PCA for nonlinear dimensionality reduction:

from sklearn.decomposition import KernelPCA

# ์›ํ˜• ๋ฐ์ดํ„ฐ
theta = np.linspace(0, 2*np.pi, 100)
r = 1 + 0.3 * np.random.randn(100)
X_circle = np.column_stack([r * np.cos(theta), r * np.sin(theta)])

# ์„ ํ˜• PCA
pca_linear = SklearnPCA(n_components=1)
X_pca_linear = pca_linear.fit_transform(X_circle)

# Kernel PCA (RBF ์ปค๋„)
kpca = KernelPCA(n_components=1, kernel='rbf', gamma=10)
X_kpca = kpca.fit_transform(X_circle)

# ์‹œ๊ฐํ™”
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

axes[0].scatter(X_circle[:, 0], X_circle[:, 1], c=theta, cmap='viridis')
axes[0].set_title('Original data')
axes[0].set_aspect('equal')

axes[1].scatter(X_pca_linear, np.zeros_like(X_pca_linear),
               c=theta, cmap='viridis')
axes[1].set_title('Linear PCA')
axes[1].set_ylim(-0.5, 0.5)

axes[2].scatter(X_kpca, np.zeros_like(X_kpca), c=theta, cmap='viridis')
axes[2].set_title('Kernel PCA')
axes[2].set_ylim(-0.5, 0.5)

plt.tight_layout()
plt.show()

Practice Problems

Problem 1: Eigendecomposition

Calculate eigenvalues and eigenvectors of the following matrix by hand and verify with NumPy:

$$A = \begin{bmatrix} 5 & 2 \\ 2 & 2 \end{bmatrix}$$

Determine if this matrix is positive definite, and compute $A^{10}$ using eigendecomposition.

Problem 2: SVD and Low-Rank Approximation

Generate a $4 \times 3$ matrix and compute its SVD. Find rank 1 and rank 2 approximations, and calculate the Frobenius norm error for each.

Problem 3: PCA Implementation

For the Iris dataset: 1. Perform PCA using eigendecomposition of covariance matrix 2. Perform PCA using SVD 3. Verify that both methods give identical results 4. What is the minimum number of components for 95% cumulative explained variance?

Problem 4: Cholesky Decomposition and Sampling

Generate 1000 samples from a 2D Gaussian distribution with covariance matrix $\Sigma = \begin{bmatrix} 4 & 2 \\ 2 & 3 \end{bmatrix}$ and mean $\mu = [0, 0]^T$. Use Cholesky decomposition and verify that the empirical covariance of samples is close to $\Sigma$.

Problem 5: Recommender System

Predict missing values in the following user-movie rating matrix using SVD:

$$R = \begin{bmatrix} 5 & ? & 4 & ? \\ ? & 3 & ? & 2 \\ 4 & ? & 5 & ? \\ ? & 2 & ? & 3 \end{bmatrix}$$

Use rank 2 approximation and verify that predicted ratings are in the 1-5 range.

References

Textbooks

  1. Strang, G. (2016). Introduction to Linear Algebra. Chapter 6: Eigenvalues and Eigenvectors.
  2. Trefethen, L. N., & Bau III, D. (1997). Numerical Linear Algebra. SIAM.
  3. In-depth discussion of SVD and numerical stability
  4. Golub, G. H., & Van Loan, C. F. (2013). Matrix Computations (4th ed.). Johns Hopkins.
  5. The bible of matrix decomposition algorithms

Papers

  1. Eckart, C., & Young, G. (1936). "The approximation of one matrix by another of lower rank." Psychometrika.
  2. Jolliffe, I. T. (2002). Principal Component Analysis (2nd ed.). Springer.

Online Resources

  1. 3Blue1Brown - Eigenvectors and eigenvalues: https://www.youtube.com/watch?v=PFDu9oVAE-g
  2. Stanford CS229 - Linear Algebra Review: http://cs229.stanford.edu/section/cs229-linalg.pdf
  3. scikit-learn PCA Guide: https://scikit-learn.org/stable/modules/decomposition.html

Practice Tools

  1. NumPy Linear Algebra: https://numpy.org/doc/stable/reference/routines.linalg.html
  2. SciPy Linear Algebra: https://docs.scipy.org/doc/scipy/reference/linalg.html

Next lesson: 03. Matrix Calculus to learn the mathematical foundation of backpropagation.

to navigate between lessons