-
Consider a matrix \(X_{n\times p}\) with \(n >> p\). It induces a linear map \(X: \mathbb{R}^p \to \mathbb{R}^n\), \(v \mapsto u = Xv\). We can find a maximizer of \(f(v)=|Xv|^2\) subject to \(g(v)=|v|^2 = 1\) using Lagrange Multiplier \(\lambda\): \[X^T X v = \lambda v, \qquad v^T v = 1.\] Say \(v_1\) is a solution…
-
1. Linear regression seeks \(\hat\beta\) minimizing \(\|y – X\beta\|^2\), yielding the OLS estimator: \[\hat\beta = (X^TX)^{-1}X^Ty\] The feature matrix \(X\) encodes \(n\) sample points in \(\mathbb{R}^p\). The matrix \(X^TX\) captures the geometric spread of these points. 2. PCA as a Description of Sample Spread Centering \(\tilde{X} = X – \mathbf{1}\bar{x}^T\), the sample covariance \(S =…