Monday, July 30, 2007

Integrality of acoustic cues depends on language- and contrast-specific experience

By Grant McGuire

Phonetic categories are generally assumed to have multiple cues available for listeners to reliably use in identification. Specifically, it has been shown that fricatives are identified through static spectral information resulting from the narrow fricative constriction and dynamic formant transition information provided by movement into or out of a preceding or following vowel (Harris 1958, Heinz and Stevens 1961, Whalen 1991) and that these cues are perceived linguistically, but not psycho-acoustically, as an integral unit (Tomiak et al. 1986). The weighting of these cues varies such that formant transitions are weighted more heavily for contrasts between spectrally similar fricatives while fricative noise is the primary cue for spectrally distinct fricatives (Harris 1958, Wagner et al. 2006). This difference in weighting changes over time as young children listen preferentially to dynamic cues (Nittrouer et al. 1993, Nittrouer and Miller 1997) leading to the general hypothesis that young children preferentially listen to large-scale dynamic changes in the speech signal (Developmental Weighting Shift, Nittrouer and Miller 1997).

This paper presents results from two experiments exploring the relative weighting and integration of sibilant fricative cues by non-native speakers. Specifically, Mandarin and American English listeners discriminated and labeled examples of Polish alveopalatal [ɕ] and retroflex [ʂ] sibilants. These two language groups were chosen as this contrast is extremely similar to a native Mandarin one (Stevens et al. 2004) but is difficult for English listeners to discriminate as they generally assimilate the two sounds to their native palato-alveolar category (Lisker 2001). The first experiment used a same~different paradigm to assess both language groups' discrimination of (A) modified naturally produced pairs of syllables having fully correlated fricative noise and formant transition cues, (B) pairs differing in a single dimension (fricative noise, formant transition), or (C) fully conflicting cue pairs. In the second experiment the same listeners labeled a two – dimensional continuum of the same modified naturally produced Polish sounds varying in fricative noise and formant transition cues where syllables had differing degrees of correlated and conflicting cues.

An analysis of subjects' reaction times demonstrates that Mandarin listeners were significantly slowed by conflicting cue stimuli in both experiments, unlike English listeners who showed no such effect. Moreover, in the labeling experiment, English listeners relied solely on formant transition information and ignored fricative noise variation. This contrasts with Mandarin listeners who use both dimensions for categorization. These results demonstrate that English listeners do not integrate the two cues and, although they are able to use fricative noise information in the discrimination experiment and they rely heavily on such cues for their native sibilant contrast, they do not use static spectral information for this contrast. This suggests that listeners do not generalize knowledge of cues beyond the contexts in which they experience them natively and that cue integration is a learned phenomenon heavily dependent on local context. The English listeners' reliance on formant transition for a novel contrast also supports Nittrouer's Developmental Weighting Shift hypothesis as a general perceptual strategy not limited to language acquisition.

No comments: