Pan-genomics and the structural diversity of plant genomes
Johns Hopkins University
A central task of genetics research is to uncover genotypes linked to important phenotypes. However, many genomic loci are incompletely or inaccurately represented in genetics studies, thus obscuring their function and evolution. New technology can accurately and continuously sequence large segments of genomic DNA at affordable cost and unprecedented scale, raising the possibility of complete and accurate representations of genomes across the tree of life. However, new computational methods are required to automatically finish, validate, and curate the forthcoming wave of genome assemblies enabled by these technologies. Researchers must also devise analytical approaches to comparing previously unresolved and usually repetitive genomic loci within and between species. Here, we introduce RaGOO and RagTag, new methods that leverage genome maps to automatically scaffold and improve draft genome assemblies into chromosome-scale representations. By applying these new methods to a bread wheat genome, we show how the established reference falsely collapsed functional paralogs genome-wide. In Arabidopsis thaliana, we present a new reference assembly that completely resolves all five centromeres for the first time, revealing centromere architecture, genetics, epigenetics, and evolution. Finally, we present a catalog of natural structural variants (SVs) across 100 diverse tomato accessions revealing exceptional genetic diversity via artificial introgression as well as broad and specific examples of how SVs influence molecular, domestication, and improvement phenotypes. This work underscores the potential to accelerate genetics research with complete and diverse genotype data and apply these findings to plant breeding and engineering.
genome, genomics, plants, genetics, genetic variants,genome assembly, comparative genomics, genome structure, genome function, genome evolution, genotype