Loading…

ADSP Whole Genome Sequencing (WGS) Release 5 data update from Genome Center for Alzheimer’s Disease

Background The Genome Center for Alzheimer’s Disease (GCAD) coordinates the integration and meta‐analysis of all available Alzheimer’s disease (AD) relevant whole genome sequencing (WGS) data to facilitate the goal of identifying AD risk or protective genetic variants and eventual therapeutic target...

Full description

Saved in:
Bibliographic Details
Published in:Alzheimer's & dementia 2024-12, Vol.20 (S1), p.n/a
Main Authors: Carter, Luke, Leung, Yuk Yee, Lee, Wan‐Ping, Kuzma, Amanda B, Gangadharan, Prabhakaran, Nicaretta, Heather Issen, Qu, Liming, Ren, Youli, Valladares, Otto, Zhao, Yi, Iqbal, Taha, Schmidt, Michael A., Mena, Pedro R., Dalgard, Clifton L., Kunkle, Brian W., Bush, William S., Martin, Eden R., Naj, Adam C., Haines, Johnathan L, Pericak‐Vance, Margaret A. A, Wang, Li‐San, Schellenberg, Gerald D.
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Background The Genome Center for Alzheimer’s Disease (GCAD) coordinates the integration and meta‐analysis of all available Alzheimer’s disease (AD) relevant whole genome sequencing (WGS) data to facilitate the goal of identifying AD risk or protective genetic variants and eventual therapeutic targets. The WGS datasets are generated via the collaboration of scientists from the Alzheimer’s Disease Sequencing Project (ADSP) and GCAD. To minimize data heterogeneity introduced by different sequencing protocols and machines, GCAD processes all samples using identical pipelines. Methods The raw sequencing data are first mapped to GRCh38/hg38 and variants (SNVs and indels) are called using GATK. Additionally, compact VCF and GDS formatted files are generated to facilitate researchers who want to use smaller pVCFs. SNVs and indels are annotated using the ADSP annotation pipeline. Lastly, structural variants (SV) are called using Smoove and Manta and joint genotyped using GraphTyper2. Results The dataset (ADSP Release 5, R5, 2024) includes ∼60,000 genomes from >50 diverse cohorts with 4 major ancestries: 47% Non‐Hispanic White, 29% Hispanic or Latino, 16% Black or African American and 8% Asian. Data are deeply sequenced (average genome coverage: >30x). CRAMs, gVCFs from GATK, and SV VCFs of a subset of the R5 samples (n = 36,361) were deposited into NIAGADS Data Sharing Service (DSS) (https://dss.niagads.org/) for public distribution in 2022, and similarly, the new samples in R5 will be released after the joint call is complete. In addition, joint‐genotype VCFs on SNVs, indels, and SVs will be available. These will undergo full quality control and annotation process. Conclusion The ADSP and GCAD generate high quality genotype and SV calls. Currently the project is processing ∼60,000 WGS samples sequenced primarily through the ADSP Follow‐Up Study, which will contain a more ancestrally diverse set of populations. We anticipate this 2024 release will continue to benefit the research community studying AD genetics.
ISSN:1552-5260
1552-5279
DOI:10.1002/alz.087495