Child Speech Recognition as Low Resource Automatic Speech Recognition

Wu, Fei

Child Speech Recognition as Low Resource Automatic Speech Recognition

Files

WU-THESIS-2020.pdf (894.24 KB)

Date

2020-05-12

Authors

Wu, Fei

Publisher

Johns Hopkins University

Abstract

This thesis investigates child speech recognition as a low-resource scenario of automatic speech recognition (ASR), and explores multiple methods to improve the performance of both hybrid and end-to-end ASR models in recognizing children's speech. Similar to ASR for adults, child speech recognition aims to transcribe the content of audio recordings into text automatically. Due to the difference in vocal characteristics, ASR models trained on only adult speech data are not adequate for recognizing child speech. With limited public available child speech corpora, recognizing child speech calls for more data-efficient methods to develop ASR systems. In this thesis, three strategies widely used in low-resource ASR are investigated for child speech recognition: Using compact model parameterization: factorized time delay neural networks (TDNN-F) are used as more data-efficient acoustic models (AM) for Deep Neural Network (DNN)-HMM hybrid ASR models; Adapting models trained on out-of-domain data: transfer learning is used to adapt end-to-end ASR model trained on adult speech for child speech recognition Making creative use of available in-domain data: different data augmentation methods are applied to enhance existing child speech data to train hybrid ASR models. Empirical results are presented on several publicly available data sets, and are compared with previously published results on the same data sets.

Keywords

Automatic Speech Recognition (ASR), Child speech recognition

URI

http://jhir.library.jhu.edu/handle/1774.2/62766

Collections

ETD -- Graduate theses

Full item page