Show simple item record

dc.contributor.advisorKhudanpur, Sanjeev
dc.creatorSnyder, David
dc.date.accessioned2020-06-21T20:07:17Z
dc.date.available2020-06-21T20:07:17Z
dc.date.created2020-05
dc.date.issued2020-03-24
dc.date.submittedMay 2020
dc.identifier.urihttp://jhir.library.jhu.edu/handle/1774.2/62539
dc.description.abstractSpeaker recognition is the task of identifying speakers based on their speech signal. Typically, this involves comparing speech from a known speaker, with recordings from unknown speakers, and making same-or-different speaker decisions. If the lexical contents of the recordings are fixed to some phrase, the task is considered text-dependent, otherwise it is text-independent. This dissertation is primarily concerned with this second, less constrained problem. Since speech data lives in a complex, high-dimensional space, it is difficult to directly compare speakers. Comparisons are facilitated by embeddings: mappings from complex input patterns to low-dimensional Euclidean spaces where notions of distance or similarity are defined in natural ways. For almost ten years, systems based on i-vectors--a type of embedding extracted from a traditional generative model--have been the dominant paradigm in this field. However, in other areas of applied machine learning, such as text or vision, embeddings extracted from discriminatively trained neural networks are the state-of-the-art. Recently, this line of research has become very active in speaker recognition as well. Neural networks are a natural choice for this purpose, as they are capable of learning extremely complex mappings, and when training data resources are abundant, tend to outperform traditional methods. In this dissertation, we develop a next-generation neural embedding--denoted by x-vector--for speaker recognition. These neural embeddings are demonstrated to substantially improve upon the state-of-the-art on a number of benchmark datasets.
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherJohns Hopkins University
dc.subjectspeech processing
dc.subjectspeaker recognition
dc.subjectspeaker diarization
dc.subjectspoken language recognition
dc.titleX-VECTORS: ROBUST NEURAL EMBEDDINGS FOR SPEAKER RECOGNITION
dc.typeThesis
thesis.degree.disciplineComputer Science
thesis.degree.grantorJohns Hopkins University
thesis.degree.grantorWhiting School of Engineering
thesis.degree.levelDoctoral
thesis.degree.namePh.D.
dc.date.updated2020-06-21T20:07:17Z
dc.type.materialtext
thesis.degree.departmentComputer Science
dc.contributor.committeeMemberPovey, Daniel
dc.contributor.committeeMemberKoehn, Philipp
dc.contributor.committeeMemberDehak, Najim
dc.publisher.countryUSA


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record