Part IV: Advanced Topics | Chapter 4

Applications in Data Science

How linear algebra powers modern machine learning and data analysis

Historical Context

Karl Pearson invented PCA in 1901 to find "lines and planes of closest fit." The Netflix Prize (2006-2009) demonstrated the power of matrix factorization for recommender systems. Google's PageRank (1998) showed that the dominant eigenvector of a web graph encodes page importance. Today, every major ML algorithm—from linear regression to transformers—is built on linear algebra primitives: matrix multiplication, SVD, eigendecomposition, and gradient computation through matrix calculus.

4.1 Principal Component Analysis

PCA finds orthogonal directions of maximum variance. Given data matrix $X \in \mathbb{R}^{n \times p}$(centered), the sample covariance is $C = \frac{1}{n-1}X^TX$. The principal components are eigenvectors of $C$, ordered by eigenvalue magnitude.

$$C = V\Lambda V^T, \quad \text{variance along PC}_k = \lambda_k$$

Dimensionality reduction to $k$ components retains $\sum_{i=1}^k \lambda_i / \sum_i \lambda_i$fraction of total variance—the optimal rank-$k$ approximation by the Eckart-Young theorem.

4.2 Linear Regression and Regularization

Linear regression $y = X\beta + \epsilon$ is solved by the normal equations $\hat\beta = (X^TX)^{-1}X^Ty$.Ridge regression adds $L^2$ penalty: $\hat\beta_\alpha = (X^TX + \alpha I)^{-1}X^Ty$, while LASSO uses $L^1$: $\min \|y - X\beta\|^2 + \alpha\|\beta\|_1$, promoting sparsity.

4.3 Recommender Systems

The user-item rating matrix $R \approx UV^T$ is approximated via low-rank factorization. The SVD gives the optimal rank-$k$ approximation. In practice, alternating least squares (ALS) or stochastic gradient descent handle missing entries and scale to millions of users and items.

4.4 Graph Theory and Spectral Methods

The graph Laplacian $L = D - W$ (degree matrix minus adjacency) encodes graph connectivity. Its eigenvalues reveal structure: the number of zero eigenvalues equals the number of connected components, and the Fiedler vector (second-smallest eigenvector) gives the optimal graph bisection.

PageRank computes the dominant eigenvector of the stochastic transition matrix$G = \alpha H + (1-\alpha)\frac{1}{n}ee^T$, where $\alpha \approx 0.85$ is the damping factor.

4.5 Neural Networks

A neural network layer computes $h = \sigma(Wx + b)$ where $W$ is a weight matrix. Backpropagation is the chain rule applied to compositions of matrix operations. The attention mechanism in transformers computes $\text{Attention}(Q,K,V) = \text{softmax}(QK^T/\sqrt{d_k})V$—a sequence of matrix multiplications. Understanding the linear algebra of weight matrices, their singular values, and their condition numbers is essential for training stability and generalization.

Computational Laboratory

This simulation demonstrates PCA on 2D data, spectral clustering of concentric circles, PageRank computation, and matrix factorization for recommender systems.

Linear Algebra in Data Science

Python

data_science.py143 lines

import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt

# ============================================================
# Linear Algebra Applications in Data Science
# ============================================================
np.random.seed(42)

# --- 1. PCA ---
print("=" * 60)
print("PRINCIPAL COMPONENT ANALYSIS")
print("=" * 60)

n = 200
mu = np.array([2, 3])
cov = np.array([[3, 2], [2, 2]])
X = np.random.multivariate_normal(mu, cov, n)
X_centered = X - X.mean(axis=0)
C = X_centered.T @ X_centered / (n - 1)
eigenvalues, eigenvectors = np.linalg.eigh(C)
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

print(f"Data: {n} points in R²")
print(f"Covariance matrix:\n{np.round(C, 4)}")
print(f"PC eigenvalues: {np.round(eigenvalues, 4)}")
print(f"Variance explained: {np.round(eigenvalues/eigenvalues.sum()*100, 2)}%")

# --- 2. Spectral clustering ---
print("\n" + "=" * 60)
print("SPECTRAL CLUSTERING")
print("=" * 60)

from scipy.spatial.distance import cdist

# Two circles
t1 = np.linspace(0, 2*np.pi, 80)
cluster1 = np.column_stack([np.cos(t1), np.sin(t1)]) + np.random.randn(80, 2) * 0.1
cluster2 = 2.5 * np.column_stack([np.cos(t1), np.sin(t1)]) + np.random.randn(80, 2) * 0.1
data = np.vstack([cluster1, cluster2])

# Gaussian similarity
sigma = 0.5
W = np.exp(-cdist(data, data, 'sqeuclidean') / (2 * sigma**2))
D = np.diag(W.sum(axis=1))
L = D - W  # Graph Laplacian

# Normalized Laplacian eigenvectors
D_inv_sqrt = np.diag(1.0 / np.sqrt(np.diag(D)))
L_norm = D_inv_sqrt @ L @ D_inv_sqrt
evals_L, evecs_L = np.linalg.eigh(L_norm)

# Use 2nd eigenvector for clustering
labels = (evecs_L[:, 1] > 0).astype(int)
print(f"Data points: {len(data)}")
print(f"Smallest Laplacian eigenvalues: {np.round(evals_L[:5], 6)}")
print(f"Cluster sizes: {np.bincount(labels)}")

# --- 3. PageRank ---
print("\n" + "=" * 60)
print("PAGERANK")
print("=" * 60)

# Simple web graph
n_pages = 6
links = {0: [1, 2], 1: [2], 2: [0], 3: [0, 2], 4: [0, 3, 5], 5: [0]}
H = np.zeros((n_pages, n_pages))
for src, dests in links.items():
    for dst in dests:
        H[dst, src] = 1.0 / len(dests)

alpha = 0.85
G = alpha * H + (1 - alpha) / n_pages * np.ones((n_pages, n_pages))

# Power method
r = np.ones(n_pages) / n_pages
for _ in range(100):
    r = G @ r

print(f"PageRank scores: {np.round(r, 4)}")
print(f"Most important page: {np.argmax(r)}")

# --- 4. Matrix factorization ---
print("\n" + "=" * 60)
print("MATRIX FACTORIZATION (Recommender)")
print("=" * 60)

# Simulated user-item ratings
n_users, n_items, rank = 20, 15, 3
U_true = np.random.rand(n_users, rank)
V_true = np.random.rand(n_items, rank)
R = U_true @ V_true.T + 0.1 * np.random.randn(n_users, n_items)

# Low-rank approximation via SVD
U_svd, s, Vt_svd = np.linalg.svd(R, full_matrices=False)
R_approx = U_svd[:, :rank] @ np.diag(s[:rank]) @ Vt_svd[:rank, :]
error = np.linalg.norm(R - R_approx, 'fro') / np.linalg.norm(R, 'fro')
print(f"Rating matrix: {n_users}×{n_items}")
print(f"Rank-{rank} approx relative error: {error:.6f}")
print(f"Singular values: {np.round(s[:6], 3)}")

# --- 5. Visualization ---
fig, axes = plt.subplots(1, 3, figsize=(16, 5))

# Plot 1: PCA
ax = axes[0]
ax.scatter(X[:, 0], X[:, 1], s=10, c="#38bdf8", alpha=0.5)
center = X.mean(axis=0)
for i, (color, scale) in enumerate(zip(["#3b82f6", "#0ea5e9"], [2, 1])):
    v = eigenvectors[:, i] * np.sqrt(eigenvalues[i]) * scale
    ax.arrow(center[0], center[1], v[0], v[1], head_width=0.15, fc=color, ec=color, lw=2)
ax.set_aspect("equal")
ax.set_title("PCA: Principal Components", fontsize=12, fontweight="bold")
ax.grid(True, alpha=0.3)
ax.set_facecolor("#0f172a")

# Plot 2: Spectral clustering
ax = axes[1]
ax.scatter(data[labels==0, 0], data[labels==0, 1], s=10, c="#3b82f6", label="Cluster 1")
ax.scatter(data[labels==1, 0], data[labels==1, 1], s=10, c="#0ea5e9", label="Cluster 2")
ax.set_aspect("equal")
ax.set_title("Spectral Clustering", fontsize=12, fontweight="bold")
ax.legend(fontsize=9)
ax.grid(True, alpha=0.3)
ax.set_facecolor("#0f172a")

# Plot 3: PageRank
ax = axes[2]
ax.bar(range(n_pages), r, color="#3b82f6")
ax.set_xlabel("Page", color="gray")
ax.set_ylabel("PageRank", color="gray")
ax.set_title("PageRank Scores", fontsize=12, fontweight="bold")
ax.grid(True, alpha=0.3)
ax.set_facecolor("#0f172a")

plt.tight_layout()
plt.savefig("output.png", dpi=150, bbox_inches="tight", facecolor="#0f172a")
plt.show()
print("\nVisualization saved.")

Click Run to execute the Python code

Code will be executed with Python 3 on the server

Share:X Reddit LinkedIn

← Numerical Linear Algebra Course Overview →