Show simple item record

dc.contributor.advisorDredze, Mark
dc.creatorBenton, Adrian
dc.date.accessioned2019-03-07T03:12:56Z
dc.date.available2019-03-07T03:12:56Z
dc.date.created2018-12
dc.date.issued2018-10-25
dc.date.submittedDecember 2018
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/60120
dc.description.abstractSocial media users routinely interact by posting text updates, sharing images and videos, and establishing connections with other users through friending. User representations are routinely used in recommendation systems by platform developers, targeted advertisements by marketers, and by public policy researchers to gauge public opinion across demographic groups. Computer scientists consider the problem of inferring user representations more abstractly; how does one extract a stable user representation – effective for many downstream tasks – from a medium as noisy and complicated as social media? The quality of a user representation is ultimately task-dependent (e.g. does it improve classifier performance, make more accurate recommendations in a recommendation system) but there are also proxies that are less sensitive to the specific task. Is the representation predictive of latent properties such as a person’s demographic features, socio-economic class, or mental health state? Is it predictive of the user’s future behavior? In this thesis, we begin by showing how user representations can be learned from multiple types of user behavior on social media. We apply several extensions of generalized canonical correlation analysis to learn these representations and evaluate them at three tasks: predicting future hashtag mentions, friending behavior, and demographic features. We then show how user features can be employed as distant supervision to improve topic model fit. We extend a standard supervised topic model, Dirichlet Multinomial Regression (DMR), to make better use of high-dimensional supervision. Finally, we show how user features can be integrated into and improve existing classifiers in the multitask learning framework. We treat user representations – ground truth gender and mental health features – as auxiliary tasks to improve mental health state prediction. We also use distributed user representations learned in the first chapter to improve tweet-level stance classifiers, showing that distant user information can inform classification tasks at the granularity of a single message.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherJohns Hopkins University
dc.subjectsocial media
dc.subjectmachine learning
dc.subjectrepresentation learning
dc.subjectmultiview
dc.subjectmultitask learning
dc.subjecttopic model
dc.titleLearning Representations of Social Media Users
dc.typeThesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
dc.date.updated2019-03-07T03:12:56Z
dc.type.materialtext
thesis.degree.departmentComputer Science
dc.contributor.committeeMemberArora, Raman
dc.contributor.committeeMemberYarowsky, David
dc.contributor.committeeMemberHovy, Dirk
dc.publisher.countryUSA
dc.creator.orcid0000-0003-3915-4085


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record