Applications in Data Science
How linear algebra powers modern machine learning and data analysis
Historical Context
Karl Pearson invented PCA in 1901 to find "lines and planes of closest fit." The Netflix Prize (2006-2009) demonstrated the power of matrix factorization for recommender systems. Google's PageRank (1998) showed that the dominant eigenvector of a web graph encodes page importance. Today, every major ML algorithm—from linear regression to transformers—is built on linear algebra primitives: matrix multiplication, SVD, eigendecomposition, and gradient computation through matrix calculus.
4.1 Principal Component Analysis
PCA finds orthogonal directions of maximum variance. Given data matrix $X \in \mathbb{R}^{n \times p}$(centered), the sample covariance is $C = \frac{1}{n-1}X^TX$. The principal components are eigenvectors of $C$, ordered by eigenvalue magnitude.
Dimensionality reduction to $k$ components retains $\sum_{i=1}^k \lambda_i / \sum_i \lambda_i$fraction of total variance—the optimal rank-$k$ approximation by the Eckart-Young theorem.
4.2 Linear Regression and Regularization
Linear regression $y = X\beta + \epsilon$ is solved by the normal equations $\hat\beta = (X^TX)^{-1}X^Ty$.Ridge regression adds $L^2$ penalty: $\hat\beta_\alpha = (X^TX + \alpha I)^{-1}X^Ty$, while LASSO uses $L^1$: $\min \|y - X\beta\|^2 + \alpha\|\beta\|_1$, promoting sparsity.
4.3 Recommender Systems
The user-item rating matrix $R \approx UV^T$ is approximated via low-rank factorization. The SVD gives the optimal rank-$k$ approximation. In practice, alternating least squares (ALS) or stochastic gradient descent handle missing entries and scale to millions of users and items.
4.4 Graph Theory and Spectral Methods
The graph Laplacian $L = D - W$ (degree matrix minus adjacency) encodes graph connectivity. Its eigenvalues reveal structure: the number of zero eigenvalues equals the number of connected components, and the Fiedler vector (second-smallest eigenvector) gives the optimal graph bisection.
PageRank computes the dominant eigenvector of the stochastic transition matrix$G = \alpha H + (1-\alpha)\frac{1}{n}ee^T$, where $\alpha \approx 0.85$ is the damping factor.
4.5 Neural Networks
A neural network layer computes $h = \sigma(Wx + b)$ where $W$ is a weight matrix. Backpropagation is the chain rule applied to compositions of matrix operations. The attention mechanism in transformers computes $\text{Attention}(Q,K,V) = \text{softmax}(QK^T/\sqrt{d_k})V$—a sequence of matrix multiplications. Understanding the linear algebra of weight matrices, their singular values, and their condition numbers is essential for training stability and generalization.
Computational Laboratory
This simulation demonstrates PCA on 2D data, spectral clustering of concentric circles, PageRank computation, and matrix factorization for recommender systems.
Linear Algebra in Data Science
PythonClick Run to execute the Python code
Code will be executed with Python 3 on the server