当前位置：和泉文库 > 统计 > 《多元统计分析》课程教学资源（阅读材料）Overview - Principal component analysis

《多元统计分析》课程教学资源（阅读材料）Overview - Principal component analysis

文件格式：PDF，文件大小：731.55KB，售价：7.86元

文档详细内容（约27页）

Overview Principal component analysis Herve Abdi1*and Lynne J.Williams2 Principal component analysis(PCA)is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables.Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components,and to display the pattern of similarity of the observations and of the variables as points in maps.The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife.PCA can be generalized as correspondence analysis (CA)in order to handle qualitative variables and as multiple factor analysis(MFA)in order to handle heterogeneous sets of variables. Mathematically,PCA depends upon the eigen-decomposition of positive semi- definite matrices and upon the singular value decomposition(SVD)of rectangular matrices.2010 John Wiley Sons,Inc.WIREs Comp Stat 2010 2 433-459 o mlee from the same matrix all use the same letter (e.g., A,a,a).The transpose operation is denoted by the and it is used by almost all scientific disciplines.It superscript'.The identity matrix is denoted I. is also likely to be the oldest multivariate technique. The data table to be analyzed by PCA comprises In fact,its origin can be traced back to Pearson!or observations described by I variables and it is even Cauchy2 [see Ref 3,p.416],or Jordan4 and also represented by the I x I matrix X,whose generic Cayley,Silverster,and Hamilton,[see Refs 5,6,for element is xij.The matrix X has rank L where more details]but its modern instantiation was formal- L≤min{l,J ized by Hotelling'who also coined the term principal In general,the data table will be preprocessed component.PCA analyzes a data table representing before the analysis.Almost always,the columns of X observations described by several dependent vari- will be centered so that the mean of each column ables,which are,in general,inter-correlated.Its goal is equal to 0 (i.e.,X1=0,where 0 is a I by is to extract the important information from the data 1 vector of zeros and 1 is an I by 1 vector of table and to express this information as a set of new ones).If in addition,each element of X is divided orthogonal variables called principal components. by√I(or√T-i),the analysis is referred to as PCA also represents the pattern of similarity of the a covariance PCA because,in this case,the matrix observations and the variables by displaying them as XTX is a covariance matrix.In addition to centering, points in maps [see Refs 8-10 for more details]. when the variables are measured with different units, it is customary to standardize each variable to unit PREREQUISITE NOTIONS AND norm.This is obtained by dividing each variable by NOTATIONS its norm (i.e.,the square root of the sum of all the squared elements of this variable).In this case,the Matrices are denoted in upper case bold,vectors are analysis is referred to as a correlation PCA because, denoted in lower case bold,and elements are denoted then,the matrix X'X is a correlation matrix(most in lower case italic.Matrices,vectors,and elements statistical packages use correlation preprocessing as a default). The matrix X has the following singular value decomposition [SVD,see Refs 11-13 and Appendix B +Correspondence to:herve@utdallas.edu for an introduction to the SVD]: 1School of Behavioral and Brain Sciences,The University of Texas at Dallas,MS:GR4.1,Richardson,TX 75080-3021,USA X=PAQT (1) 2Department of Psychology,University of Toronto Scarborough, Ontario,Canada where P is the Ix L matrix of left singular vectors, DOL:10.1002wics.101 Q is the Ix L matrix of right singular vectors,and A Volume 2,July/August 2010 2010 John Wiley Sons,Inc. 433

Overview Principal component analysis Herve Abdi ´ 1∗ and Lynne J. Williams2 Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen-decomposition of positive semidefinite matrices and upon the singular value decomposition (SVD) of rectangular matrices.  2010 John Wiley & Sons, Inc. WIREs Comp Stat 2010 2 433–459 Principal component analysis (PCA) is probably the most popular multivariate statistical technique and it is used by almost all scientific disciplines. It is also likely to be the oldest multivariate technique. In fact, its origin can be traced back to Pearson1 or even Cauchy2 [see Ref 3, p. 416], or Jordan4 and also Cayley, Silverster, and Hamilton, [see Refs 5,6, for more details] but its modern instantiation was formalized by Hotelling7 who also coined the term principal component. PCA analyzes a data table representing observations described by several dependent variables, which are, in general, inter-correlated. Its goal is to extract the important information from the data table and to express this information as a set of new orthogonal variables called principal components. PCA also represents the pattern of similarity of the observations and the variables by displaying them as points in maps [see Refs 8–10 for more details]. PREREQUISITE NOTIONS AND NOTATIONS Matrices are denoted in upper case bold, vectors are denoted in lower case bold, and elements are denoted in lower case italic. Matrices, vectors, and elements ∗Correspondence to: herve@utdallas.edu 1School of Behavioral and Brain Sciences, The University of Texas at Dallas, MS: GR4.1, Richardson, TX 75080-3021, USA 2Department of Psychology, University of Toronto Scarborough, Ontario, Canada DOI: 10.1002/wics.101 from the same matrix all use the same letter (e.g., A, a, a). The transpose operation is denoted by the superscriptT. The identity matrix is denoted I. The data table to be analyzed by PCA comprises I observations described by J variables and it is represented by the I × J matrix X, whose generic element is xi,j. The matrix X has rank L where L ≤ min ! I,J " . In general, the data table will be preprocessed before the analysis. Almost always, the columns of X will be centered so that the mean of each column is equal to 0 (i.e., XT1 = 0, where 0 is a J by 1 vector of zeros and 1 is an I by 1 vector of ones). If in addition, each element of X is divided by √ I (or √I − 1), the analysis is referred to as a covariance PCA because, in this case, the matrix XTX is a covariance matrix. In addition to centering, when the variables are measured with different units, it is customary to standardize each variable to unit norm. This is obtained by dividing each variable by its norm (i.e., the square root of the sum of all the squared elements of this variable). In this case, the analysis is referred to as a correlation PCA because, then, the matrix XTX is a correlation matrix (most statistical packages use correlation preprocessing as a default). The matrix X has the following singular value decomposition [SVD, see Refs 11–13 and Appendix B for an introduction to the SVD]: X = P!QT (1) where P is the I × L matrix of left singular vectors, Q is the J × L matrix of right singular vectors, and ! Volume 2, July/August 2010  2010 John Wiley & Son s, In c. 433

Overview www.wiley.com/wires/compstats is the diagonal matrix of singular values.Note that to have the largest possible variance(i.e.,inertia and A2 is equal to A which is the diagonal matrix of the therefore this component will 'explain'or 'extract' (nonzero)eigenvalues of XTX and XXT the largest part of the inertia of the data table). The inertia of a column is defined as the sum of The second component is computed under the the squared elements of this column and is computed constraint of being orthogonal to the first component as and to have the largest possible inertia.The other components are computed likewise (see Appendix A (2) for proof).The values of these new variables for the observations are called factor scores,and these factors scores can be interpreted geometrically as the The sum of all the y?is denoted I and it is called projections of the observations onto the principal components. the inertia of the data table or the total inertia.Note that the total inertia is also equal to the sum of the squared singular values of the data table (see Finding the Components Appendix B). The center of gravity of the rows [also called In PCA,the components are obtained from the SVD of the data table X.Specifically,with X=PAQT centroid or barycenter,see Ref 14],denoted g,is the (cf.Eq.1),the I x L matrix of factor scores,denoted vector of the means of each column of X.When X is F,is obtained as: centered,its center of gravity is equal to the 1 x row vector 0 F=PA (5) The(Euclidean)distance of the i-th observation to g is equal to The matrix Q gives the coefficients of the linear combinations used to compute the factors scores. (3) This matrix can also be interpreted as a projection matrix because multiplying X by Q gives the values of the projections of the observations on the principal When the data are centered Eq.3 reduces to components.This can be shown by combining Eqs.1 and 5 as: i.j (4) F=PA=P△QQ=XQ (6) The components can also be represented Note that the sum of all is equal to I which is the geometrically by the rotation of the original axes. inertia of the data table For example,if X represents two variables,the length of a word(Y)and the number of lines of its dictionary definition(W),such as the data shown in Table 1,then GOALS OF PCA PCA represents these data by two orthogonal factors. The geometric representation of PCA is shown in The goals of PCA are to Figure 1.In this figure,we see that the factor scores give the length (i.e.,distance to the origin)of the (1)extract the most important information from the projections of the observations on the components. data table; This procedure is further illustrated in Figure 2.In (2)compress the size of the data set by keeping only this context,the matrix Q is interpreted as a matrix this important information; of direction cosines(because Q is orthonormal).The matrix Q is also called a loading matrix.In this (3)simplify the description of the data set;and context,the matrix X can be interpreted as the (4)analyze the structure of the observations and the product of the factors score matrix by the loading variables. matrix as: In order to achieve these goals,PCA computes X=FOT with FF=A2 and Q'Q=I.(7) new variables called principal components which are obtained as linear combinations of the original This decomposition is often called the bilinear variables.The first principal component is required decomposition of X [see,e.g.,Ref 15]. 434 2010 John Wiley Sons,Inc. Volume 2,July/August 2010

Overview www.wiley.com/wires/compstats is the diagonal matrix of singular values. Note that !2 is equal to " which is the diagonal matrix of the (nonzero) eigenvalues of XTX and XXT. The inertia of a column is defined as the sum of the squared elements of this column and is computed as γ 2 j = # I i x2 i,j . (2) The sum of all the γ 2 j is denoted I and it is called the inertia of the data table or the total inertia. Note that the total inertia is also equal to the sum of the squared singular values of the data table (see Appendix B). The center of gravity of the rows [also called centroid or barycenter, see Ref 14], denoted g, is the vector of the means of each column of X. When X is centered, its center of gravity is equal to the 1 × J row vector 0T. The (Euclidean) distance of the i-th observation to g is equal to d2 i,g = # J j $ xi,j − gj %2 . (3) When the data are centered Eq. 3 reduces to d2 i,g = # J j x2 i,j . (4) Note that the sum of all d2 i,g is equal to I which is the inertia of the data table . GOALS OF PCA The goals of PCA are to (1) extract the most important information from the data table; (2) compress the size of the data set by keeping only this important information; (3) simplify the description of the data set; and (4) analyze the structure of the observations and the variables. In order to achieve these goals, PCA computes new variables called principal components which are obtained as linear combinations of the original variables. The first principal component is required to have the largest possible variance (i.e., inertia and therefore this component will ‘explain’ or ‘extract’ the largest part of the inertia of the data table). The second component is computed under the constraint of being orthogonal to the first component and to have the largest possible inertia. The other components are computed likewise (see Appendix A for proof). The values of these new variables for the observations are called factor scores, and these factors scores can be interpreted geometrically as the projections of the observations onto the principal components. Finding the Components In PCA, the components are obtained from the SVD of the data table X. Specifically, with X = P!QT (cf. Eq. 1), the I × L matrix of factor scores, denoted F, is obtained as: F = P!. (5) The matrix Q gives the coefficients of the linear combinations used to compute the factors scores. This matrix can also be interpreted as a projection matrix because multiplying X by Q gives the values of the projections of the observations on the principal components. This can be shown by combining Eqs. 1 and 5 as: F = P! = P!QTQ = XQ. (6) The components can also be represented geometrically by the rotation of the original axes. For example, if X represents two variables, the length of a word (Y) and the number of lines of its dictionary definition (W), such as the data shown in Table 1, then PCA represents these data by two orthogonal factors. The geometric representation of PCA is shown in Figure 1. In this figure, we see that the factor scores give the length (i.e., distance to the origin) of the projections of the observations on the components. This procedure is further illustrated in Figure 2. In this context, the matrix Q is interpreted as a matrix of direction cosines (because Q is orthonormal). The matrix Q is also called a loading matrix. In this context, the matrix X can be interpreted as the product of the factors score matrix by the loading matrix as: X = FQT with FTF = !2 and QTQ = I. (7) This decomposition is often called the bilinear decomposition of X [see, e.g., Ref 15]. 434  2010 John Wiley & Son s, In c. Volume 2, July/Augu st 2010

Overview www.wiley.com/wires/compstats Projecting New Observations onto the Components (a) Equation 6 shows that matrix Q is a projection 11 .Pretentious Generality matrix which transforms the original data matrix 10 ●Infectious into factor scores.This matrix can also be used to 9 Monastery ● Therefore Scoundrel compute factor scores for observations that were not included in the PCA.These observations are ● Across Insane called supplementary or illustrative observations.By 6Ne种ea信"5i68Aise Relief contrast,the observations actually used to compute 51 ● 6 With Solid Blot the PCA are called active observations.The factor This scores for supplementary observations are obtained On Bag by first positioning these observations into the PCA ● space and then projecting them onto the principal components.Specifically a 1xJrow vector can be projected into the PCA space using Eq.6.This gives the 1xL vector of factor scores,denoted 123456789101112131415 Number of lines of the definition which is computed as: (b) Pretentious ● 5 fup=xupQ (8) Generality ● 4 Monastery Infectious ● ● 3士 If the data table has been preprocessed (e.g.,centered Therefore 、Scoundrel 2 Neither or normalized),the same preprocessing should be ● 1 applied to the supplementary observations prior to Insane 3 456 7 +十+ the computation of their factor scores. Across As an illustration,suppose that-in addition to With Blot the data presented in Table 1-we have the French 2 Tis ● word 'sur'(it means 'on').It has Ysur =3 letters,and -30 For On our French dictionary reports that its definition has ● Bags Wsur =12 lines.Because sur is not an English word, we do not want to include it in the analysis,but we would like to know how it relates to the English (c) vocabulary.So,we decided to treat this word as a Infectious Pretentious 34 2 ● supplementary observation. Arise Blot Generality Scoundrel Solid● The first step is to preprocess this supplementary 1 Monastery Insane Slope● Bag observation in a identical manner to the active ye%5-434 observations.Because the data matrix was centered, Across With● the values of this observation are transformed into Therefore On -2 ●This ● deviations from the English center of gravity.We find Neither the following values: 3 ●● For By ysur Ysur -My=3-6=-3 and FIGURE 1 The geometric steps for finding the components of a principal component analysis.To find the components (1)center the Wsur Wsur -Mw =12-8=4. variables then plot them against each other.(2)Find the main direction (called the first component)of the cloud of points such that we have the minimum of the sum of the squared distances from the points to the Then we plot the supplementary word in the graph component.Add a second component orthogonal to the first such that that we have already used for the active analysis. the sum of the squared distances is minimum.(3)When the Because the principal components and the original components have been found,rotate the figure in order to position the variables are in the same space,the projections of the first component horizontally (and the second component vertically), supplementary observation give its coordinates (i.e., then erase the original axes.Note that the final graph could have been factor scores)on the components.This is shown in obtained directly by plotting the observations from the coordinates Figure 3.Equivalently,the coordinates of the projec- given in Table 1. tions on the components can be directly computed 436 2010 John Wiley Sons,Inc. Volume 2,July/August 2010

Overview www.wiley.com/wires/compstats Projecting New Observations onto the Components Equation 6 shows that matrix Q is a projection matrix which transforms the original data matrix into factor scores. This matrix can also be used to compute factor scores for observations that were not included in the PCA. These observations are called supplementary or illustrative observations. By contrast, the observations actually used to compute the PCA are called active observations. The factor scores for supplementary observations are obtained by first positioning these observations into the PCA space and then projecting them onto the principal components. Specifically a 1 × J row vector xT sup, can be projected into the PCA space using Eq. 6. This gives the 1 × L vector of factor scores, denoted fT sup, which is computed as: f T sup = xT supQ. (8) If the data table has been preprocessed (e.g., centered or normalized), the same preprocessing should be applied to the supplementary observations prior to the computation of their factor scores. As an illustration, suppose that—in addition to the data presented in Table 1—we have the French word ‘sur’ (it means ‘on’). It has Ysur = 3 letters, and our French dictionary reports that its definition has Wsur = 12 lines. Because sur is not an English word, we do not want to include it in the analysis, but we would like to know how it relates to the English vocabulary. So, we decided to treat this word as a supplementary observation. The first step is to preprocess this supplementary observation in a identical manner to the active observations. Because the data matrix was centered, the values of this observation are transformed into deviations from the English center of gravity. We find the following values: ysur = Ysur − MY = 3 − 6 = −3 and wsur = Wsur − MW = 12 − 8 = 4. Then we plot the supplementary word in the graph that we have already used for the active analysis. Because the principal components and the original variables are in the same space, the projections of the supplementary observation give its coordinates (i.e., factor scores) on the components. This is shown in Figure 3. Equivalently, the coordinates of the projections on the components can be directly computed 9 8 7 6 5 4 3 2 1 2 4 3 5 6 7 9 10 11 12 13 14 15 10 Monastery Number of lines of the definition This For On Bag Solid Blot Across Insane Relief By Arise With Generality Scoundrel Infectious Pretentious Therefore Slope Neither Number of letters of the word 11 1 8 Across Insane Infectious −7 −6 −5 −4 −3 23 456 7 Monastery Pretentious Relief This By For With On Bag Blot Solid Arise Generality Scoundrel 1 2 1 Neither −4 −1 −3 −2 −2 −1 Slope Therefore 1 2 3 4 5 −1 Across Infectious Bag Relief 3 −1 −3 −7 −6 −4 −3 −2 −2 1 2 Monastery Therefore Neither By This Slope Arise Solid With On For Scoundrel Generality Pretentious Blot Insane 2 4 5 6 7 1 1 2 3 (a) (b) (c) −5 FIGURE 1 | The geometric steps for finding the components of a principal component analysis. To find the components (1) center the variables then plot them against each other. (2) Find the main direction (called the first component) of the cloud of points such that we have the minimum of the sum of the squared distances from the points to the component. Add a second component orthogonal to the first such that the sum of the squared distances is minimum. (3) When the components have been found, rotate the figure in order to position the first component horizontally (and the second component vertically), then erase the original axes. Note that the final graph could have been obtained directly by plotting the observations from the coordinates given in Table 1. 436  2010 John Wiley & Son s, In c. Volume 2, July/Augu st 2010

WIREs Computational Statistics Principal component analysis Projection of (a) "neither" On first component On second component 5.60 Neither (b) -0.38 FIGURE 2 Plot of the centered data,with the first and second 499 components.The projections (or coordinates)of the word 'neither'on the first and the second components are equal to-5.60 and-2.38. from Eg.8 (see also Table 3 for the values of Q)as: fp=xpQ=[-34]× -0.53690.8437 0.84370.5369 =[4.9853-0.38351 (9) FIGURE 3|How to find the coordinates (i.e.,factor scores)on the principal components of a supplementary observation:(a)the French word sur is plotted in the space of the active observations from its deviations to the W and Y variables;and (b)The projections of the sur INTERPRETING PCA on the principal components give its coordinates. Inertia explained by a component The importance of a component is reflected by its where A is the eigenvalue of the e-th component. The value of a contribution is between 0 and 1 and, inertia or by the proportion of the total inertia for a given component,the sum of the contributions "explained"by this factor.In our example (see Table 2)the inertia of the first component is equal of all observations is equal to 1.The larger the value of the contribution,the more the observation to 392 and this corresponds to 83%of the total inertia. contributes to the component.A useful heuristic is to base the interpretation of a component on the observations whose contribution is larger than Contribution of an Observation to a the average contribution (i.e.,observations whose Component contribution is larger than 1/I).The observations Recall that the eigenvalue associated to a component with high contributions and different signs can then is equal to the sum of the squared factor scores be opposed to help interpret the component because for this component.Therefore,the importance of an these observations represent the two endpoints of this observation for a component can be obtained by the component. ratio of the squared factor score of this observation by The factor scores of the supplementary obser- the eigenvalue associated with that component.This vations are not used to compute the eigenvalues and ratio is called the contribution of the observation to the therefore their contributions are generally not com- component.Formally,the contribution of observation puted. i to component e is,denoted ctri.e,obtained as: 后 Squared Cosine of a Component with an (10) Observation The squared cosine shows the importance of a component for a given observation.The squared Volume 2,July/August 2010 2010 John Wiley Sons,Inc. 437

WIREs Computational Statistics Principal component analysis Monastery −1 Across Insane Infectious Bag −2 Relief 5 4 3 2 1 Slope Therefore Scoundrel Generality Arise Solid Blot On With For By This Pretentious −7 −6 −5 −4 −3 −2 1 2 3 4 5 7 6 −3 −1 −4 1 Neither 2 Projection of “neither” On first component On second component −2.38 −5.60 FIGURE 2 | Plot of the centered data, with the first and second components. The projections (or coordinates) of the word ‘neither’ on the first and the second components are equal to −5.60 and −2.38. from Eq. 8 (see also Table 3 for the values of Q) as: f T sup = xT supQ = ' −3 4( × ) −0.5369 0.8437 0.8437 0.5369* = ' 4.9853 − 0.3835( . (9) INTERPRETING PCA Inertia explained by a component The importance of a component is reflected by its inertia or by the proportion of the total inertia ‘‘explained’’ by this factor. In our example (see Table 2) the inertia of the first component is equal to 392 and this corresponds to 83% of the total inertia. Contribution of an Observation to a Component Recall that the eigenvalue associated to a component is equal to the sum of the squared factor scores for this component. Therefore, the importance of an observation for a component can be obtained by the ratio of the squared factor score of this observation by the eigenvalue associated with that component. This ratio is called the contribution of the observation to the component. Formally, the contribution of observation i to component # is, denoted ctri,#, obtained as: ctri,# = f 2 # i,# i f 2 i,# = f 2 i,# λ# (10) Infectious Across Insane 5 4 3 2 1 Slope Therefore Scoundrel Generality Arise Solid Blot On Bag With For By This Relief Monastery Pretentious −7 −6 −5 −4 −3 −2 −1 2 3 5 6 7 −7 −6 −5 −4 −3 −2 −1 1 −2 −1 −4 Neither 1 2 −3 4 Sur 4.99 −0.38 Infectious Across Insane 5 4 3 2 1 Slope Therefore Scoundrel Generality Arise Solid Blot On Bag With For By This Relief Monastery Pretentious 1 2 3 5 6 7 −2 −1 −4 Neither −3 Sur 4 1 2 (a) (b) FIGURE 3 | How to find the coordinates (i.e., factor scores) on the principal components of a supplementary observation: (a) the French word sur is plotted in the space of the active observations from its deviations to the W and Y variables; and (b) The projections of the sur on the principal components give its coordinates. where λ# is the eigenvalue of the #-th component. The value of a contribution is between 0 and 1 and, for a given component, the sum of the contributions of all observations is equal to 1. The larger the value of the contribution, the more the observation contributes to the component. A useful heuristic is to base the interpretation of a component on the observations whose contribution is larger than the average contribution (i.e., observations whose contribution is larger than 1/I). The observations with high contributions and different signs can then be opposed to help interpret the component because these observations represent the two endpoints of this component. The factor scores of the supplementary observations are not used to compute the eigenvalues and therefore their contributions are generally not computed. Squared Cosine of a Component with an Observation The squared cosine shows the importance of a component for a given observation. The squared Volume 2, July/August 2010  2010 John Wiley & Son s, In c. 437

点击进入文档下载页（PDF格式）

共27页，可试读9页，点击继续阅读 ↓↓

您可能感兴趣的文档

点击购买下载（PDF）

下载及服务说明

购买前请先查看本文档预览页，确认内容后再进行支付；
如遇文件无法下载、无法访问或其它任何问题，可发送电子邮件反馈，核实后将进行文件补发或退款等其它相关操作；
邮箱：

文档浏览记录