全基因组关联分析标准化流程的构建与扩展应用(Development and Extended Applications of Standardized Processes for Genome-Wide Association Studies)
目的 构建GWAS标准化流程及多组学分析体系框架,为基于多组学队列的脑血管病药物逆向研发提供高效分析方法。 方法 基于国际GWAS质量控制标准与多组学整合与分析策略,构建模块化的分析体系。GWAS前数据质量控制模块:对样本和变异检出率、群体遗传结构与分层、亲缘关系等进行严格质量控制。在合格样本组成的群体中,保留次要等位基因频率>0.5%的遗传变异用于GWAS。关联分析模块:利用PLINK、SAIGE和Regenie等软件,使用广义线性模型与广义线性混合模型进行GWAS操作。通过基因组膨胀系数和分位数-分位数图评估GWAS质量。使用中国国家卒中登记Ⅲ的全基因组测序和临床数据,对该模块进行测试...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | Chinese |
Published: |
Editorial Department of Chinese Journal of Stroke
2025-06-01
|
Series: | Zhongguo cuzhong zazhi |
Subjects: | |
Online Access: | https://www.chinastroke.org.cn/CN/10.3969/j.issn.1673-5765.2025.06.002 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | 目的 构建GWAS标准化流程及多组学分析体系框架,为基于多组学队列的脑血管病药物逆向研发提供高效分析方法。
方法 基于国际GWAS质量控制标准与多组学整合与分析策略,构建模块化的分析体系。GWAS前数据质量控制模块:对样本和变异检出率、群体遗传结构与分层、亲缘关系等进行严格质量控制。在合格样本组成的群体中,保留次要等位基因频率>0.5%的遗传变异用于GWAS。关联分析模块:利用PLINK、SAIGE和Regenie等软件,使用广义线性模型与广义线性混合模型进行GWAS操作。通过基因组膨胀系数和分位数-分位数图评估GWAS质量。使用中国国家卒中登记Ⅲ的全基因组测序和临床数据,对该模块进行测试。多组学分析模块:整合多基因风险评分、跨队列meta分析、孟德尔随机化及共定位分析等流程,为利用GWAS结果进行分子机制解析和靶点筛选提供支持。
结果 本研究搭建的GWAS前数据质量控制模块主要从遗传数据质量和群体遗传两方面对数据进行GWAS前质量控制和评估。经过质量控制,有9632例和7265例样本分别被纳入基线TG水平、卒中后3个月死亡两个表型的GWAS。GWAS结果显示,不同软件得到的曼哈顿图趋势较为接近,但在病例-对照样本存在较大偏倚时,SAIGE软件相比于PLINK和Regenie软件校正适度、统计检验方法相对稳健。在多组学分析模块中,构建了包含多基因风险评分、meta分析、孟德尔随机化和共定位分析等多个标准化分析流程,用以开展对GWAS结果的深入挖掘。
结论 本研究建立的GWAS标准化流程具有模块化、扩展性强等特点,能够满足复杂表型分析和多组学数据整合与分析的需求,为基于遗传关联的药物逆向研发提供了方法学基础。
Abstract: Objective To develop standardized workflow for GWAS and multi-omics analysis frameworks, providing an efficient analytical pipeline for pharmaceutical reverse engineering of cerebrovascular diseases using multi-omics cohorts.
Methods A modular analysis system was constructed based on international GWAS quality control standards and multi-omics integration strategies. Pre-GWAS data quality control module: this module performed stringent quality control on sample and variant call rates, population genetic structure and stratification, and kinship. In the population composed of qualified samples, genetic variants with a minor allele frequency>0.5% were retained for GWAS. Association analysis module: using software such as PLINK, SAIGE, and Regenie, GWAS was performed utilizing generalized linear models and generalized linear mixed models. The quality of GWAS was evaluated by the genome inflation coefficient and quantile-quantile plots. The module was tested using whole-genome sequencing and clinical data from the China national stroke registry Ⅲ. Multi-omics analysis module: this module integrated polygenic risk score, cross-cohort meta-analysis, Mendelian randomization, and colocalization analysis procedures, providing support for molecular mechanism interpretation and target screening using GWAS results.
Results The pre-GWAS data quality control module established in this study conducts pre-GWAS quality control and assessment from the aspects of genetic data quality and population genetics. After quality control, 9632 and 7265 samples were included in the GWAS of baseline TG levels and 3-month post-stroke mortality phenotypes, respectively. The GWAS results showed that the trends of Manhattan plots obtained from different software were similar. However, compared to PLINK and Regenie, SAIGE software offered more appropriate correction and relatively robust statistical testing, especially when case-control samples were biased. In the multi-omics analysis module, standardized analysis processes including polygenic risk score, meta-analysis, Mendelian randomization, and colocalization analysis were developed to enable in-depth exploration of GWAS results.
Conclusions The GWAS standardization processes established in this study are characterized by modularity and high scalability, enabling comprehensive analysis of complex phenotypes and multi-omics data. These processes provide a methodological foundation for exploration of pharmaceutical reverse engineering based on genetic association. |
---|---|
ISSN: | 1673-5765 |