MATHEMATICAL ANALYSES OF GENOME COMPLEXITY AND POPULATION DIVERSITY USING NEXT- AND THIRD-GENERATION SEQUENCING TECHNOLOGIES

dc.contributor.advisorTimp, Winston G
dc.contributor.committeeMemberSchatz, Michael C
dc.contributor.committeeMemberMcCoy, Rajiv C
dc.creatorRanallo-Benavidez, Timothy Rhyker
dc.creator.orcid0000-0003-3137-8439
dc.date.accessioned2021-06-25T12:54:29Z
dc.date.created2021-05
dc.date.issued2020-12-08
dc.date.submittedMay 2021
dc.date.updated2021-06-25T12:54:29Z
dc.description.abstractThe advent and continued improvement of DNA sequencing methods promises deeper insights into the genomes of living organisms. The deluge of data from second-generation and third-generation sequencing technologies requires large-scale bioinformatics tools. These tools must account not only for genome complexity within a single individual or species but also for the population diversity across individuals and species. This dissertation presents three tools which leverage mathematical insights and modeling to profile and analyze genome complexity and population diversity. First, GenomeScope applies combinatorial theory to establish a detailed mathematical model of how k-mer frequencies are distributed in heterozygous and polyploid genomes. GenomeScope is able to accurately estimate genomic characteristics such as length, heterozygosity, and repetitiveness even for complex organisms. Second, SVCollector analyzes a population-level VCF file from a low resolution genotyping study and uses a greedy algorithm to compute a ranked list of samples that maximizes the total number of variants present from a subset of a given size. SVCollector is able to run quickly on thousands of samples and allows for a more cost-efficient way to identify and validate variants within large populations. Finally, HetSmoother uses a k-mer based approach to identify sequencing errors and heterozygous regions within sequencing reads. Once the errors are removed, the heterozygous regions are then consistently edited to a single haplotype, which improves the contiguity and reduces the duplication of assemblies for heterozygous genomes.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/63927
dc.language.isoen_US
dc.publisherJohns Hopkins University
dc.publisher.countryUSA
dc.subjectgenomics
dc.subjectcomputational biology
dc.subjectbioinformatics
dc.subjectDNA sequencing
dc.subjectpopulation diversity
dc.subjectmathematical modeling
dc.titleMATHEMATICAL ANALYSES OF GENOME COMPLEXITY AND POPULATION DIVERSITY USING NEXT- AND THIRD-GENERATION SEQUENCING TECHNOLOGIES
dc.typeThesis
dc.type.materialtext
local.embargo.lift2022-05-01
local.embargo.terms2022-05-01
thesis.degree.departmentBiomedical Engineering
thesis.degree.disciplineBiomedical Engineering
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorSchool of Medicine
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RANALLO-BENAVIDEZ-DISSERTATION-2021.pdf
Size:
60.55 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
PROQUEST_LICENSE.txt
Size:
5.85 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.68 KB
Format:
Plain Text
Description: