人类学学报 ›› 2005, Vol. 24 ›› Issue (03): 221-231.

• 人类学学报 • 上一篇    下一篇

人类群体遗传结构的协方差阵主成分分析方法

薛付忠,王洁贞,郭亦寿,胡平   

  • 出版日期:2005-09-15 发布日期:2005-09-15

The methodology of principle component analysis based on the averaged covariance matrix for the analysis of human populational genetic structures

XUE Fuzhong, WANG Jiezhen, GUO Yishou, HU Ping   

  • Online:2005-09-15 Published:2005-09-15

摘要: 目的:探讨基因频率矩阵的中心化(或均值化)协方差阵主成分分析方法在人类群体遗传结构研究中的适用性和合理性。方法:从基因频率矩阵的结构特征入手,分析中心化、均值化协方差阵主成分分析与标准化相关阵主成分分析在特征根、特征向量以及降维效果等方面的差异,并通过实例比较不同方法在解释群体遗传结构特征上合理性。结果:中心化(或均值化)协方差阵的主成分不仅反映了基因变异程度的“方差信息量权”,而且反映了基因间相互影响程度的“相关信息量权”;标准化相关阵的主成分反映的仅是“相关信息量权”,不包括“方差信息量权”。通过比较中国26个汉族人群HLA-A基因座中心化协方差阵和标准化相关阵2种主成分分析结果,证实中心化协方差阵主成分分析方法在特征根与特征向量、保留主成分的个数和对主成分的群体遗传学解释的合理性等方面均优于标准化相关阵主成分分析方法。结论:在对群体遗传结构进行主成分分析时,应使用中心化(或均值化)变换消除基因频率矩阵中量级的影响,然后在用其协方差阵提取主成分。

关键词: 人类群体遗传结构;主成分分析;中心化(或均值化)协方差阵;HLA-A

Abstract: Objective: To explore the applicability and rationale of principle component analysis based on the averaged covariance matrix for analyzing human populational genetic structure. Methods: Based on the structure of gene frequency matrix, we showed differences of eigenvalues, eigenvectors, and their effect in reducing the dimensionality between the standardized correlation matrix principle component analysis and the averaged covariance matrix principle component analysis. To validate and compare their use and rationale in human population genetics, we analyzed the genetic structure of HLA-A locus in 26 Chinese Han populations using both standardized correlation matrix principle component analysis and averaged covariance matrix principle component analysis methods. Results: The principle component of standardized correlation matrix does not represent the variance weight of gene frequency matrix. Instead it represents the correlation weight between the genes. The principle component of averaged covariance matrix not only reflectsthe variance weight of gene frequency matrix, but also identifies correlation weight between the genes in gene the matrix. From analyzing the genetic structure of HLA - A locus in 26 Chinese Han populations using the different two methods, we discovered that the averaged covariance matrix principle component analysis is better than the standardized correlation matrix principle component analysis in reducing the dimensionality of gene frequency matrix. And using the principle method in reducing covariance matrix, the genetic structure of HLA-A locus in Chinese Han populations can be explained correctly. Conclusion: carry out the principle component analysis of human population genetic structure, one should calculate the PC using averaged covariance matrix rather than the standardized correlation matrix.

Key words: Human population; Genetic structure; Principle component analysis; Averaged covariance matrix; HLA-A