COMPUTATIONAL METHODS FOR GENOMIC, TRANSCRIPTOMIC, AND EPI-OMIC ANALYSIS WITH LONG-READ SEQUENCING

Embargo until
Date
2022-03-29
Journal Title
Journal ISSN
Volume Title
Publisher
Johns Hopkins University
Abstract
Long-read single-molecule sequencing has proven to be a powerful tool for decoding genomic and transcriptomic complexity. Long reads are able to span sequences such as structural variants and alternatively spliced transcripts, which have historically been challenging to resolve with conventional short-reads sequencing. Nanopore long-read sequencers are also inexpensive and highly portable, making sequencing more broadly accessible. These methods can also detect epigenetic and epitranscriptomic modifications, collectively part of the “epi-ome”, which regulates many facets of biology. Long-read sequencing has made all of these features easier to uncover, however these technologies require new algorithms to process novel data types. Here we present several methods to advance long-read sequencing to explore complex genomic and transcriptomic features. First we discuss StringTie2, which assembles long RNA reads into full transcripts based on genome alignments. StringTie2 can also locally assemble short-reads into synthetic long reads, known as super-reads. We show that StringTie2 has greater sensitivity and precision than existing methods when using either short or long reads on human and plant data, and that super-reads further improve performance. Next we describe UNCALLED, an algorithm which enables software-based targeted nanopore sequencing without any additional library preparation. This is accomplished by rapidly mapping nanopore signals and selectively ejecting reads before they finish sequencing. We use UNCALLED to deplete several bacteria in a mock community, and enrich a panel of 148 genes associated with hereditary cancer, enabling robust structural variant and methylation calling. I then discuss a project in which we used short-fragment nanopore sequencing to detect copy number alterations in cancer genomes. Finally, I introduce recent work on visualization and analysis of nanopore signal alignments with Uncalled4. This work focuses on detecting RNA modifications using Nanopore signal alignments, and could be applicable to many applications.
Description
Keywords
Genomics, Transcriptomics, Epigenetics, Epitranscriptomics, Long-read sequencing, Single-molecule sequencing
Citation