• Login
    View Item 
    •   JScholarship Home
    • Theses and Dissertations, Electronic (ETDs)
    • ETD -- Doctoral Dissertations
    • View Item
    •   JScholarship Home
    • Theses and Dissertations, Electronic (ETDs)
    • ETD -- Doctoral Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    MINIMAL PANELS OF RNA MARKERS FOR CELL TYPES USING SINGLE-CELL DATA

    Thumbnail
    View/Open
    JI-DISSERTATION-2022.pdf (8.648Mb) (embargoed until: 2026-05-01)
    Date
    2022-03-18
    Author
    Ji, Lanlan
    Metadata
    Show full item record
    Abstract
    Single-cell RNA sequencing technologies provide measurements of the number of RNA molecules in many thousands of individual cells, a rich source of information for determining attributes of cell populations, such as cell types and the variation in gene expression from cell to cell, which are not available from bulk RNA sequencing data [1–5]. A core challenge in the analysis of sc-RNA seq data is to find “marker genes” for some class of cells, e.g., cell type. Another challenge is to describe, let alone quantify, how the individual marker genes cooperate to determine cell labels. Generally, most existing methods of scRNA-seq analysis are at the univariate (single gene) level even though the relevant biology is often decidedly multivariate. In this thesis we introduces a method that formulates marker gene selection as a variation of the well-known “minimal set-covering problem” in combinatorial optimization. Here, the “covering” elements are genes and the objects to be covered are a sub-population of cells with a particular label k. In order to draw this link between marker panels and set coverings, we binarize the raw mRNA counts into “expressed” (positive count) or “not expressed” (zero count). The resulting paradigm, based on covering a target class, differs fundamentally from most standard approaches, in which optimal panels are determined by optimizing their weights with a fixed panel size. In addition to enabling the link to set covering, binarization facilitates the biological interpretation of marker genes and the manner in which they characterize and discriminate among types of cells. Using the covering paradigm, we can predict cell types or transfer marker panels to identify shared cellular processes across data sets in related biological contexts using extremely transparent discriminants, such as the number of expressed panel genes. We illustrate this new methodology in the context of neocortical neurogenesis during mid-gestation when the vast majority of neurons in the brain are produced. To further investigate some basic properties of covering marker panels, we also discuss the stability of covering marker sets, as well as the gene interactions within a marker set. Some generalizations and extensions of the covering algorithm are also introduced. We also present a semi-supervised learning version of marker panel construction when cell labeling is incomplete or some marker genes are known. Finally, we introduce a marker panel based on pairs of genes which characterizes the transitions between cell states.
    URI
    http://jhir.library.jhu.edu/handle/1774.2/67091
    Collections
    • ETD -- Doctoral Dissertations

    DSpace software copyright © 2002-2016  DuraSpace
    Policies | Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     

    Browse

    All of JScholarshipCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    DSpace software copyright © 2002-2016  DuraSpace
    Policies | Contact Us | Send Feedback
    Theme by 
    Atmire NV