Monday, July 30, 2007

The child and the puzzle: How our general views of acoustic cues in speech perception influence our models of speech acquisition

By Susan Nittrouer

In the history of any science, the specific hypotheses we choose to test arise from our larger, collective view of the problem, or the puzzle, under study. As a consequence, we may fail to test hypotheses that would fundamentally affect our conceptualization of that problem, all the while building increasingly intricate models around issues that deal with only small pieces of the larger puzzle.

In the study of speech perception, we have extensively examined the role of the acoustic cue in phonetic decision making. For much of this history we have neither had a unifying view of the relations among cues, nor an understanding of the roles served by signal attributes that do not neatly conform to the notion of a cue. When it comes to understanding the acquisition of native speech perception, these constraints on design have curtailed our understanding of that process. This presentation will: (1) examine the history of the acoustic cue, (2) provide a unifying basis for cues, and (3) present evidence for contributions to speech perception by properties that do not strictly adhere to the notion of a cue.

First, the evolution of the acoustic cue will be reviewed, and it will be shown that it was originally based on very practical research objectives, such as building a reading machine for the blind. Out of that early work arose the general approach of manipulating separate bits of the speech signal (or acoustic cues, as they came to be termed) to examine how each bit influenced phonetic decisions. That general approach shaped the way we have studied speech acquisition.

Next an argument will be made that the various properties of the acoustic speech signal actually arise from different components of production, and so shape different levels of structure in the signal. For example, relatively slow changes in vocal tract geometry create slow modulation in formant frequencies (short stretches of which are known as formant transitions), while abrupt changes in local constrictions lead to brief segments of silence and/or turbulence noise. The first of these shapes the global structure of the signal, while the second shapes the fine structure, creating what we call acoustic cues.

Finally, results will be presented from studies demonstrating that levels of structure other than fine structure influence speech perception by adults and children. Based on those results, a comprehensive model of how the various levels of acoustic structure interact in speech acquisition will be presented.

In summary, this presentation will discuss the notion that we have historically been examining one piece of a larger puzzle, and that approach has influenced the models of acquisition that we have constructed. Ideas for how to fit other pieces of empirical data into that puzzle will be offered.

2 comments:

Alex said...

Hi Susan,

You say "The first of these shapes the global structure of the signal, while the second shapes the fine structure, creating what we call acoustic cues."
Does this mean that slow modulations are *not* acoustic cues?

Also, how do you relate these concepts to Ying and Jeff's findings that acoustic cues are useful for inducing manner, but not place, features, while motor cues contribute to determining place features?

Anonymous said...

Interesting to know.