Speech perception by the chinchilla: Identification functions for synthetic VOT stimuli - JASA (1977) Patrica K. Kuhl & James D. Miller
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Speech perception by the chinchilla: Identification functions for synthetic VOT stimuli Patrica K. Kuhl & James D. Miller JASA (1977)
Introduction • To account for the data in speech experiment, it is useful to distinguish between auditory levels of processing and phonetic levels of processing. • The evidence supporting a dichotomy between the two levels: -a lack of acoustic invariance between the acoustic cues and our percepts (Liberman et al., 1967) -the discovery of perceptual behaviors like categorical perception -evidence from studies of selective adaptation
Questions about the dichotomy • Invariant cues for stop consonants may be found in the dynamic configurations of spectral energy over time (Fant(1973), Stevens (1975)). • Perceptual behaviors have been demonstrated for complex nonspeech signals (Pisoni, 1977) • The effects of selective adaptation, thought to provide evidence for phonetic feature detectors now appear to be attributed to auditory.
Solution • A direct test of the distinction between auditory and phonetic levels of processing- using animal listener who has no phonetic resources. • The rationale for this comparative approach is to tease out the perceptual effects from those that are unique to speech-sound processing.
Previous Analyses • Liberman et al.(1967): categorical perception is considered unique to the processing of speech sounds • For nonspeech stimuli : Cutting and Rosner (1974) - “plucked” or “bowed”, Pisoni(1977) • Eimas (1974b) : Infants perceives the stimuli in a linguistic mode and may categorically discriminate voicing contrasts that are not phonemic in the infant’s linguistic environment. • Kuhl (1978) : While the infant’s perceptual proclivities are linguistically relevant, their origins may reflect constraints that are psychoacoustic rather than specifically linguistic.
Goals • In an attempt to differentiate perceptual effects that are attributable to “auditory” and “phonetic” levels of processing in speech perception, Kuhl & Moller (1977) undertook a series of experiments with animal listener • The results obtained with alveolar stimuli are reported in experiment I; results obtained with labial and velar stimuli are reported in experiments II and III. • Experiment IV is a report of results obtained when the stimuli from all three continua were used.
Experiment I A. Stimuli • The speech sounds were synthesized at the Haskins Laboratories on the parallel-resonance synthesizer. • For a specified VOT, the upper two formants were excited with thermal noise for the duration of the interval; at the end of this interval, the two formants were excited with periodic pulses. • The first formant was off throughout the VOT interval. 5ms for b/p and d/t, 20ms for g/k stimuli • VOT’s from 0 to 80ms in 10-ms steps, was recorded on a full-track tape recorder and then re-recorded onto a disk pack of RAP.
Subjects • Four chinchillas, each about two years of age • Two of the four animals had been previously trained to categorize naturally produced alveolar tokens as either /d/ or /t/ syllables. • The other two animals had never been trained • Four English-speaking-adults
Apparatus • A double-grille cage with a loudspeaker in a sound-treated booth. • The cage divided by a midline barrier and having a door buzzer at one end. • Presentation of a speech sound was initiated by the experimenter and controlled by punched paper tape and a high-speed paper-tape reader. • The punched tape was prepared according to the randomization speifications.
Discriminating training • On positive trials, the animal had to cross the midline barrier to avoid a mild shock and the surrounding of the buzzer. • On negative trials, the animal could remain at the drinking tube. If the animal successfully inhibited the crossing response, it was rewarded with free water. • At the end of that experiment, two animals had learned to classify correctly the voiced and voiceless CV syllables produced by eight different talkers in six different vowel context. • Randomization of positive and negative trials by computer- punched paper tapes
Generalization testing • On half of the trials: the endpoint stimuli, 0 and +80 ms VOT • On the other half of trials: the stimuli between these endpoints, +10 to +70 ms VOT • During generalization testing, shock was never presented and all feedback was arranged to tell the animal he was always correct. • Testing human subjects: -Four human subjects with the same sound-treated booth -the same trial structure - instructed to label the stimuli as /da/ or /ta/.
Results-Experiment I • Location of the phonetic boundaries: -The phonetic boundaries of the fitted curves: 35.2 ms VOT for English-speaking adults, 33.3 ms VOT for chinchillas -The boundary value range: 29.9 ms -42.0 ms for humans, 26.7 ms -36 ms for chinchilla -Exposure to natural speech had no effect on the location of the boundary a. the two animals trained- 31.4 ms VOT b. the two animals having no training- 32.8 ms VOT
Experiment II • Subjects: -Two of the four chinchillas used in experiment I served as subject. -One had originally been trained on natural speech while the other had been trained only on the synthetic tokens. -The same four English-speaking adults
Results –Experiment II 1. Transfer from the alveolar stimuli to the stimuli with a labial place of articulation 2. Location of the phonetic boundaries: phonetic boundaries of the fitted curves are 26.8 ms VOT for English-speaking adults and 23.3 ms for chinchillas. 3. Boundary width: Each subject’s fitted curve was matched at the 50%.
Experiment III • Subject: same subjects in experiment II. • Procedure a. Discrimination training b. Generalization testing • Results a. Transfer to stimuli with a velar place of articulation b. The 50% points of the fitted curves: 42.3 ms VOT for English-speaking adults, 42.5 ms VOT for chinchillas c. Each subject’s fitted curve was matched at the 50% point.
Experiment IV • Stimuli: The labial, alveolar and velar stimuli previously described were used • Subject: A single animal, the one for whom voiced stimuli were positive • Procedure: a. Discrimination training: with the endpoint stimuli (0 ms VOT and +80 ms VOT) of all three continua b. Generalization testing: six endpoint stimuli and 21 stimuli between the endpoints (+10 ms VOT to +70 ms VOT from labial, alveolar, and velar continua)
Results-Experiment IV • Location of the phonetic boundaries: -The relative locations of the three boundaries did not change when place of articulation was varied randomly • Boundary width: The boundary widths of the fitted curves from experiment I, II and III are very similar to those from experiment IV.
Statistical Analyses • Phonetic boundary -two-factor Analysis of Variance (species x place of articulation) -While the main effect of species was not significant (F= 0.376), the main effect of place of articulation was highly significant (p < 0.001). -no significant interactions • Boundary width -two-factor Analysis of Variance -Both the main effect of species (p < 0.05) and of places of articulation (p< 0.05) were significant. -no significant interactions -the steepest slopes for the identification functions-velar stimuli -the shallowest slopes –labial stimuli
Discussion A. Comparison of the “labeling” functions for human and nonhuman listeners • Agreement between the identification functions for humans and animals for all three stimulus sets -The slopes are slightly less steep for chinchillas. • The boundary values shift with the place of articulation are similar for the two groups of subjects. -A complementary relation between VOT and the F1- onset frequency. -The lower the onset frequency of F1, the greater the VOT
Discussion B. Implications for theories of speech perception and the evolution of a speech-sound repertoire -A mammal with the appropriate auditory capabilities and no linguistic experience is predisposed to hear an abrupt qualitative change in the short voicing-lag region of the VOT. -The psychoacoustic considerations in the selection of candidates for a speech-sound repertoire - Speech sounds were selected to exploit the perceptual discontinuities that are a natural result of the functions of the mammalian auditory system.
Discussion C. Interpretations of the human infant’s perceptual behavior -The infant’s accomplishments might reflect psychoacoustic predispositions that are favorable to speech-sound perception (Kuhl, 1978). D. Exploring the nature of complex auditory perception using an “animal model”
You can also read