High-resolution gene expression analysis
Frazee, Alyssa Christine
MetadataShow full item record
RNA sequencing (RNA-seq) measures gene expression in cell populations at an unprecedented resolution. The advent of this new technology around 2008 spurred the need for new techniques for finding scientific meaning in the resulting data. Early statistical techniques for analyzing RNA-seq data were inspired by methods for microarray data analysis, and they involved quantifying gene expression by counting the RNA-seq reads falling within boundaries of pre-specified genes. However, RNA-seq data is very high-resolution, and much of that resolution is lost during the gene counting process. To that end, this thesis introduces novel statistical methods and software for analyzing RNA-seq data at a resolution beyond that of gene counting. First, we propose a technique for segmenting the genome into regions of differential expression between two population using single-base-level measures of signal. Next we focus on transcript-level differential expression analysis; in particular, we introduce tools for finding statistical differential expression signal in transcriptomes that were assembled de novo from RNA-seq reads. Finally, we create a tool for evaluating the statistical properties of RNA-seq differential expression methods: our new tool generates RNA-seq reads to simulate an experiment with known transcript-level differential expression. These statistical and computational contributions to the RNA-seq analysis literature further our ability to draw meaningful biological conclusions from high-throughput RNA sequencing data.