Computational and Functional Annotation of the Zebrafish Genome Regulatory Toolbox
Location(s): United States
Zebrafish with its growing arsenal of tools that allow the generation of transgenics, gene knockdowns and knockouts, and mutant resources coupled with its high-throughput and cost efficiency is quickly becoming the major animal model for drug screens and gene related studies. However, as with other vertebrate genomes, the majority of the zebrafish genome (97%) is made up of non-genic sequences whose functional necessity remains largely unknown. One vital function that is clearly embedded in these regions is gene regulation, instructing genes when and where to turn on or off. However, unlike genes where we know their genomic location, their code, and the consequences of nucleotide changes within them, in gene regulatory sequences we don't have that knowledge. This knowledge is extremely vital, with a wide variety of clinical and molecular data supporting these sequences to be an important driver for development, evolution, diversity, and disease. In this proposal, we will combine advanced computational tools with high-throughput zebrafish functional studies to annotate this noncoding terrain. Using and refining multiple vertebrate genome alignments we have generated an unprecedented set of 166,693 zebrafish conserved noncoding elements (CNEs), with at least 8,805 regions having a direct ortholog in the human genome. Preliminary studies for a portion of these sequences using a zebrafish transgenic enhancer assay, find 41% of these sequences to function as enhancers at 24 to 48 hours post fertilization. Taking advantage of this transgenic assay we aim to screen 200 sequences a year for enhancer activity. These sequences will be selected from our large CNE set, sequences whose enhancer activity and tissue-timepoint specificity will be predicted using sophisticated computational tools, and community requested sequences. This characterization will not only allow the functional annotation of these sequences, but will also generate a novel and extremely important toolkit of gene regulatory elements that can drive expression of any gene of interest at precise locations and precise developmental time points. In addition, we will also use the annotated regulatory landscape to discover novel genes with potential important developmental function. This will be carried out by analyzing the expression patterns and functional consequences due to knockdown of less characterized genes that lie in rich regulatory regions, a common sign for the existence of important developmental gene regulators. Additional computational techniques will be used to discover genes under tight regulation in novel tissue contexts, as well as pathways which are currently not studied in the context we find them enriched in. All the data generated in this proposal, both computational and functional, will be made available to the community through a dedicated web browser (http://zebrafish.stanford.edu/) as well as integration into ZFIN, Ensemble, and the UCSC genome browser. Combined, our work will advance zebrafish as the major animal model for annotating and characterizing the noncoding portion of the vertebrate genome.