The goal of our project is to accelerate the discovery of DNA variation relevant to health and disease by analyzing data from over 225,000 ethnically and racially diverse patients that will undergo genome sequencing. Of particular importance is ensuring we have powerful statistical methods for analyzing data from underserved groups including U.S. minority populations. Achieving this goal requires expertise across many domains of knowledge including: medical and population genomics, algorithm development for disease mapping, and expertise in large-scale databases.
The NHGRI Genome Sequencing Program (GSP) will identify genomic variants relevant to health and disease by genome sequencing over 225,000 participants across a multitude of diseases. The GSP will also serve as a pilot for the Precision Medicine Initiative that aims to enroll and sequence more than a million people representative of U.S. ethnic diversity. Here, we propose a GSP analysis center focused on Multi- and Trans- ethnic Mapping of Mendelian and Complex Diseases. There is a growing recognition of the substantial scientific advantages, as well as public health importance, of conducting biomedical research across ethnically diverse cohorts. We propose to develop scalable methods that incorporate ancestry to optimize medical genomic study design and improve power for uncovering the role of common and rare variants in disease. Achieving this goal requires expertise across diverse domains of knowledge including: medical and population genomics, algorithm development for complex disease mapping, and expertise in management of large-scale databases. Here, we have assembled a world-class team of medical and population geneticists, computer scientists, statisticians and clinicians, with leading expertise in the development of novel and scalable strategies for characterizing sequence variants and their role in disease. Importantly, our group has been at the forefront of development of resources, study designs and methods to enable genomic research in U.S. minority populations. Our project has three main objectives. First, we will develop an Automated Scalable Ancestry Pipeline (ASAP) for common disease mapping in diverse populations. ASAP will improve the computational efficiency of existing state-of-the-art methods for ancestry inference and develop important extensions to linear mixed models (LMMs) and other mapping strategies leveraging local and global ancestry. We will also develop methods to refine phenotypes and identify common controls for disease studies and define endpoints. Secondly, we will develop tools and resources for trans- and multi-population rare variant discovery that incorporate patterns of local and sub-continental ancestry. We will also develop machine-learning tools for variant annotation that leverage ancestral information, patterns of sequence evolution, and protein structure in a unified framework. Furthermore, we will incorporate population-specific patterns of cellular phenotypes to improve functional prediction algorithms for non-coding and coding variants. Lastly, we will disseminate our results through web-based resource that empower the biomedical research community. We will augment existing resources including ClinGen by annotating and characterizing pathogenic variants across diverse populations. We will develop a secure web-server that allows sharing of summary statistics and analysis pipelines to enable discovery, fine-mapping and functional prediction of genetic variants. Our team has ample experience with NIH-funded consortia and is dedicated to meeting the overall GSP project goals through collaborative work with NHGRI leadership and other funded investigators.