First Principal Component

Forrest Young's Notes

Copyright © 1999 by Forrest W. Young.


    The First Principal Component
    The first principal component is the linear function of the set of variables which fits the variables as well as possible in a least squares sense. The linear combination  has certain properties:
    • The first principal component line is as close as possible, in a specific average least squares sense, to all of the points.
    • The first principal component line identifies the "central tendency" of the set of variables, just as the mean identifies the "central tendency" of a single variable.
    • The first principal component line provides a simplified description --- a model --- of the set of  variables.
    • The first principal component line gives us a way to summarize the set of variables by a single linear combination.

    •  
    Equation for the first Principal Component
    The n  variables, denoted
    Y1, Y2, ... Yn
      are described by the following linear equation, where X1 is the vector of scores on the first principal component, and b1 is the coefficient of the first principal component:
      Y1, Y2, ... Yn = a + b1X1 + r
      where r is the "residual" information in the Y's not fit by the component's linear combination.