principal component analysis

Astrophysics (Index)

About

principal component analysis

(PCA) (statistical strategy of devising independent variables)

Principal component analysis (PCA) is a statistical procedure/strategy to explore multidimensional statistical data, to search for what is causing any random variation. The strategy is to devise a list of random variables that are linear combinations of the data coordinates that appear independent of each other (according to the data) with the initial listed variable showing the maximum possible variance (termed the principal component), and each following variable in the list showing the maximum possible remaining variance. The result is a set of linear transforms to transform the data into these variables. Each transform can be used independently and they assist in ignoring some of the sources of variation, as well as finding the relations between the variables in the source. One aim is to find a source of variation that is non-random.

The technique can be used in various astrophysics areas including various kinds of demographics, galaxy morphology (classification of shapes), and analysis of observation data such as light curve data. In some cases, it can be used to automate classification, such as identifying a particular kind of object or event.

An obvious and undoubtedly common additional strategy is to introduce additional variables to the data under analysis that are non-linear transformations of the original data's coordinate, e.g., its log, exponential, square, etc. There may be some well-established motivation regarding which to try.

(statistics) Further reading:
https://en.wikipedia.org/wiki/Principal_component_analysis
https://towardsdatascience.com/a-one-stop-shop-for-principal-component-analysis-5582fb7e0a9c
https://builtin.com/data-science/step-step-explanation-principal-component-analysis
https://royalsocietypublishing.org/doi/10.1098/rsta.2015.0202
Referenced by pages:
PCA analysis
telluric line

Index