Show simple item record

dc.contributor.advisorPark, Youngser
dc.creatorYoder, Jordan
dc.date.accessioned2017-07-26T17:52:03Z
dc.date.available2017-07-26T17:52:03Z
dc.date.created2016-05
dc.date.issued2016-03-16
dc.date.submittedMay 2016
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/40715
dc.description.abstractWe consider an extension of model-based clustering to the semi-supervised case, where some of the data are pre-labeled. We provide a derivation of the Bayesian Information Criterion (BIC) approximation to the Bayes factor in this setting. We then use the BIC to the select number of clusters and the variables useful for clustering. We discuss some considerations for $O(1)$ terms in information criteria when performing model-based clustering. Next, we explore a novel method for the initialization of the EM algorithm for the semi-supervised case using modifications to the k-means++ algorithm to account for the labels. Then, we derive an improved theoretical bound on expected cost and observe improved performance in simulated and real data examples. This analysis provides theoretical justification for a typically linear time semi-supervised clustering algorithm. We show how this algorithms outperforms related semi-supervised k-means-style algorithms on several datasets. Finally, we demonstrate semi-supervised model based clustering with our improved k-means++ initialization on two applications. First, we identify behaviotypes in a fly larva dataset. Next, we nominate interesting vertices in graphs using two types of supervision.
dc.format.mimetypeapplication/pdf
dc.languageen
dc.publisherJohns Hopkins University
dc.subjectSemi-superivsed clusteringen_US
dc.subjectk-meansen_US
dc.subjectclusteringen_US
dc.subjectk-means++en_US
dc.subjectapproximation algorithmen_US
dc.subjectGMMen_US
dc.titleOn Model-Based Semi-Supervised Clustering
dc.typeThesis
thesis.degree.disciplineApplied Mathematics & Statistics
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
dc.date.updated2017-07-26T17:52:04Z
dc.type.materialtext
thesis.degree.departmentApplied Mathematics and Statistics
dc.contributor.committeeMemberPriebe, Carey E.
dc.contributor.committeeMemberLyzinski, Vince
dc.contributor.committeeMemberTang, Minh
dc.publisher.countryUSA


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record