A Bayesian Approach to Speech Production
Kirov, Christo Nikolaev
MetadataShow full item record
The production of speech is one of the most widely studied topics by scientists interested in how the human mind and brain function, since it is uniquely characteristic of human cognition. Despite the everyday ubiquity of speech, the mechanisms that make it possible have proven to be extremely complex and difficult to formally model. Like other cognitive systems, the speech production system is assumed to be composed of several component stages, or levels of representation/processing. The production of a spontaneous utterance is typically described as involving first selecting a message to convey with its corresponding semantic semantic representation, encoding that message through syntax, morphology, and phonology, and ultimately creating an articulatory plan that can be used to drive the tongue and other speech organs. These different levels of processing must communicate with each other in some way, and multiple similar representations are simultaneously active during processing at each level. Beyond this basic understanding, many of the details associated with the actual mechanisms behind these levels of processing remain up for debate and experimental results collected by linguists, psycholinguists, and neuroscientists present a number of apparent paradoxes. For example, many experiments have measured the relative effect of distractors or primes on the time required to plan and initiate a target utterance. While the tasks used in these experiments appear to differ only slightly, researchers have found that high similarity between the target utterance and the prime results in faster speech planning, and in other cases that similarity is associated with slower planning. In the absence of a strong underlying theory of speech production, such apparently contradictory results have led to complicated, somewhat ad-hoc models that focus on explaining either facilitatory or inhibitory effects, but not both. In addition, while there has been some preliminary empirical investigation of the effects of distractors and primes on phonetic variation, a systematic account explaining the range of these effects has yet to emerge. This dissertation contributes to the empirical understanding of speech production by presenting novel experimental results describing how contextual competition in the speech environment leads to hyperarticulation. The results of these experiments bear on which levels of processing competition takes place, and how similarity between competitors affects the level of competition. They suggest that competition occurs at specific mismatching positions between competing utterances (e.g., their onsets) rather than only at a more holistic level. Furthermore, the effects of similarity are non-linear --- a competitor must differ from the target minimally in order for it to exert a measurable effect on speech production. The results also support the notion that hyperarticulation and planning latency are mechanically related. Both effects follow qualitatively similar patterns; speech seems to be hyperarticulated in just the cases when it would also take longer to plan. In an attempt to clarify the mechanisms underlying these new results and the body of previous empirical findings, as well as to unify the formal study of speech production with that of speech perception and other cognitive functions, this dissertation applies Bayesian methods to speech production modeling. The basic hypothesis is that the levels of processing involved in speech production communicate with one another in the technical sense of information theory. A particular level of processing receives noisy signals from other levels indicating which representational state it should adopt. For example, the phonological encoding level receives noisy signals from a higher level that represents lexical items (lemmas). Each signal causes the receiving level to update its probabilistic belief distribution over possible representations, through the operation of Bayesian inference. If model assumptions are correct, this Bayesian decoding method is guaranteed to find the optimal interpretation of a signal given sufficient noisy samples. Formalizing the communication between levels of processing using Bayesian belief updating allows us to move towards an account of the apparent contradictions in the empirical reaction-time literature, as well as an account of the range of possible hyperarticulatory effects, with a formally simple, unified mechanism. It provides a framework that is general enough to predict the outcome of various production tasks, given knowledge of the relationships among the evidence passed between levels of processing involved, the target utterance, and the structure of competing representations. In particular, I show how the model provides a qualitative account for pervasive patterns in priming and Stroop-like distractor studies. In many priming paradigms, target utterances are facilitated by identical primes, but slowed by non-identical, very similar primes. Depending on the relationship between targets and distractors (i.e., whether they are similar along semantic or phonological dimensions) in Stroop-like tasks, the presence of the distractors may lead to either facilitation or inhibition. I also use the model to provide an explanation for the apparent correlation between hyperarticulation and latency observed in the empirical portion of this dissertation. Furthermore, I show that the probabilistic Bayesian approach to speech production can be extended to account for phonotactic effects that have been largely ignored in previous modeling, namely that phonotactically difficult utterances take longer to plan and are more error-prone. I also show how representing probability distributions as graphical models known as factor graphs allows active phonological processes such as syllabification and allophonic variation (and potentially higher level morphological and syntactic processes) to be included within the framework of the speech production system.