Embargo until
Journal Title
Journal ISSN
Volume Title
Johns Hopkins University
Since the DNA structure was discovered in 1953, a great deal of effort has been put into studying this molecule in detail. We now know DNA comprises an organism’s genetic makeup and constitutes a blueprint for life. The study of DNA has dramatically increased our knowledge about cell function and evolution and has led to remarkable discoveries in biology and medicine. Just as DNA is replicated during cell division, several chemical marks are also passed onto progeny during this process. Epigenetics studies these marks and represents a fascinating research area given their crucial role. Among all known epigenetic marks, 5mc DNA methylation is probably one of the most important ones given its well-established association with various biological processes, such as development and aging, and disease, such as cancer. The work in this dissertation focuses primarily on this epigenetic mark, although it has the potential to be applied to other heritable marks. In the 1940s, Waddington introduced the term epigenetic landscape to conceptually describe cell pluripotency and differentiation. This concept lived in the abstract plane until Jenkinson et al. 2017 & 2018 estimated actual epigenetic landscapes from WGBS data, and the work led to startling results with biological implications in development and disease. Here, we introduce an array of novel computational methods that draw from that work. First, we present CPEL, a method that uses a variant of the original landscape proposed by Jenkinson et al., which, together with a new hypothesis testing framework, allows for the detection of DNA methylation imbalances between homologous chromosomes. Then, we present CpelTdm, a method that builds upon CPEL to perform differential methylation analysis between groups of samples using targeted bisulfite sequencing data. Finally, we extend the original probabilistic model proposed by Jenkinson et al. to estimate methylation landscapes and perform differential analysis from nanopore data. Overall, this work addresses immediate needs in the study of DNA methylation. The methods presented here can lead to a better characterization of this critical epigenetic mark and enable biological discoveries with implications for diagnosing and treating complex human diseases.
Computational Genomics, Epigenetics, DNA methylation, Statistical Mechanics, Statistical Inference, EM algorithm, Allele-specific methylation, Whole-genome bisulfite sequencing, Reduced-representation bisulfite sequencing, Nanopore sequencing