PCA (principal component analysis) is a method of multivariate analysis that synthesizes a small number of uncorrelated variables, called principal components, that best represent the overall variability from a large number of variables that have correlation. PCA is a method of multivariate analysis that synthesizes a small number of uncorrelated variables called principal components that best represent the overall variability from a large number of correlated variables. It is used to reduce the dimensionality of the data.

The transformation that gives the principal components is chosen to maximize the variance of the first principal component, with the constraint that the subsequent principal components are orthogonal to the previously determined principal components. Maximizing the variance of the principal components is done to ensure that the principal components have as much explanatory power as possible for changes in the observed values. The chosen principal components are orthogonal to each other and can represent a given set of observations as a linear combination. In other words, the principal components are an orthogonal basis of the set of observables. The orthogonality of the principal component vectors follows from the fact that the principal component vectors are eigenvectors of the covariance matrix (or correlation matrix) and the covariance matrix is a real symmetric matrix.

The contribution rate (proportion of the variance) for each axis can be obtained as the eigenvalue for the eigenvector corresponding to that axis divided by the sum of all eigenvalues.

Principal Component Analysis was introduced by Karl Pearson in 1901.


This page is auto-translated from /nishio/PCA using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I’m very happy to spread my thought to non-Japanese readers.