Topic Modeling in Theory and Practice
May, Chandler Camille
MetadataShow full item record
Topic models can decompose a large corpus of text into a relatively small set of interpretable themes or topics, potentially enabling a domain expert to explore and analyze a corpus more efficiently. However, in my work, I have found that theories put forth by topic modeling research are not always borne out in practice. In this dissertation, I use case studies to explore four theories of topic modeling. While these theories are not explicitly stated, I show that they are communicated implicitly, some within an individual study and others more diffusely. I show that this implicit knowledge fails to hold in practice in the settings I consider. While my work is confined to topic modeling research and moreover concentrated on the latent Dirichlet allocation topic model, I argue that these kinds of gaps may pervade scientific research and present an obstacle to improving the diversity of the research community.