Topic Modeling in Theory and Practice

dc.contributor.advisorVan Durme, Benjamin
dc.contributor.committeeMemberDredze, Mark
dc.contributor.committeeMemberYarowsky, David
dc.creatorMay, Chandler Camille
dc.creator.orcid0000-0002-1655-6527
dc.date.accessioned2022-07-25T17:55:22Z
dc.date.available2022-07-25T17:55:22Z
dc.date.created2022-05
dc.date.issued2022-03-28
dc.date.submittedMay 2022
dc.date.updated2022-07-25T17:55:23Z
dc.description.abstractTopic models can decompose a large corpus of text into a relatively small set of interpretable themes or topics, potentially enabling a domain expert to explore and analyze a corpus more efficiently. However, in my work, I have found that theories put forth by topic modeling research are not always borne out in practice. In this dissertation, I use case studies to explore four theories of topic modeling. While these theories are not explicitly stated, I show that they are communicated implicitly, some within an individual study and others more diffusely. I show that this implicit knowledge fails to hold in practice in the settings I consider. While my work is confined to topic modeling research and moreover concentrated on the latent Dirichlet allocation topic model, I argue that these kinds of gaps may pervade scientific research and present an obstacle to improving the diversity of the research community.
dc.format.mimetypeapplication/pdf
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/67095
dc.language.isoen_US
dc.publisherJohns Hopkins University
dc.publisher.countryUSA
dc.subjectnatural language processing
dc.subjectmachine learning
dc.subjectartificial intelligence
dc.subjecttopic modeling
dc.subjectreproducibility
dc.titleTopic Modeling in Theory and Practice
dc.typeThesis
dc.type.materialtext
thesis.degree.departmentComputer Science
thesis.degree.disciplineComputer Science
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MAY-DISSERTATION-2022.pdf
Size:
2.08 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LICENSE.txt
Size:
2.67 KB
Format:
Plain Text
Description: