Speech perception by the chinchilla: Identification functions for synthetic VOT stimuli - JASA (1977) Patrica K. Kuhl & James D. Miller

Page created by Nicole Lewis
 
CONTINUE READING
Speech perception by the chinchilla:
Identification functions for synthetic
             VOT stimuli

    Patrica K. Kuhl & James D. Miller
               JASA (1977)
Introduction
• To account for the data in speech experiment, it is
   useful to distinguish between auditory levels of
   processing and phonetic levels of processing.
• The evidence supporting a dichotomy between the
   two levels:
  -a lack of acoustic invariance between the acoustic
   cues and our percepts (Liberman et al., 1967)
 -the discovery of perceptual behaviors like categorical
   perception
 -evidence from studies of selective adaptation
Questions about the dichotomy
• Invariant cues for stop consonants may be
  found in the dynamic configurations of
  spectral energy over time (Fant(1973), Stevens
  (1975)).
• Perceptual behaviors have been demonstrated
  for complex nonspeech signals (Pisoni, 1977)
• The effects of selective adaptation, thought to
  provide evidence for phonetic feature detectors
  now appear to be attributed to auditory.
Solution
• A direct test of the distinction between
  auditory and phonetic levels of processing-
  using animal listener who has no phonetic
  resources.
• The rationale for this comparative approach is
  to tease out the perceptual effects from those
  that are unique to speech-sound processing.
Previous Analyses
•  Liberman et al.(1967): categorical perception is considered
  unique to the processing of speech sounds
• For nonspeech stimuli
  : Cutting and Rosner (1974) - “plucked” or “bowed”,
    Pisoni(1977)
• Eimas (1974b) : Infants perceives the stimuli in a linguistic
  mode and may categorically discriminate voicing contrasts that
  are not phonemic in the infant’s linguistic environment.
• Kuhl (1978) : While the infant’s perceptual proclivities are
  linguistically relevant, their origins may reflect constraints that
  are psychoacoustic rather than specifically linguistic.
Goals
• In an attempt to differentiate perceptual effects that
  are attributable to “auditory” and “phonetic” levels of
  processing in speech perception, Kuhl & Moller
  (1977) undertook a series of experiments with animal
  listener
• The results obtained with alveolar stimuli are reported
  in experiment I; results obtained with labial and velar
  stimuli are reported in experiments II and III.
• Experiment IV is a report of results obtained when
  the stimuli from all three continua were used.
Experiment I
A.   Stimuli
•     The speech sounds were synthesized at the Haskins
     Laboratories on the parallel-resonance synthesizer.
•    For a specified VOT, the upper two formants were excited
     with thermal noise for the duration of the interval; at the end
     of this interval, the two formants were excited with periodic
     pulses.
•    The first formant was off throughout the VOT interval.
      5ms for b/p and d/t, 20ms for g/k stimuli
•    VOT’s from 0 to 80ms in 10-ms steps, was recorded on a
     full-track tape recorder and then re-recorded onto a disk
     pack of RAP.
Subjects
• Four chinchillas, each about two years of age
• Two of the four animals had been previously
  trained to categorize naturally produced
  alveolar tokens as either /d/ or /t/ syllables.
• The other two animals had never been trained
• Four English-speaking-adults
Apparatus
• A double-grille cage with a loudspeaker in a
  sound-treated booth.
• The cage divided by a midline barrier and
  having a door buzzer at one end.
• Presentation of a speech sound was initiated by
  the experimenter and controlled by punched
  paper tape and a high-speed paper-tape reader.
• The punched tape was prepared according to
  the randomization speifications.
Discriminating training
• On positive trials, the animal had to cross the midline barrier
  to avoid a mild shock and the surrounding of the buzzer.
• On negative trials, the animal could remain at the drinking
  tube. If the animal successfully inhibited the crossing response,
  it was rewarded with free water.
• At the end of that experiment, two animals had learned to
  classify correctly the voiced and voiceless CV syllables
  produced by eight different talkers in six different vowel
  context.
• Randomization of positive and negative trials by computer-
  punched paper tapes
Generalization testing
• On half of the trials: the endpoint stimuli, 0 and +80
  ms VOT
• On the other half of trials: the stimuli between these
  endpoints, +10 to +70 ms VOT
• During generalization testing, shock was never
  presented and all feedback was arranged to tell the
  animal he was always correct.
• Testing human subjects:
  -Four human subjects with the same sound-treated booth
  -the same trial structure
 - instructed to label the stimuli as /da/ or /ta/.
Results-Experiment I
• Location of the phonetic boundaries:
  -The phonetic boundaries of the fitted curves: 35.2
   ms VOT for English-speaking adults, 33.3 ms VOT
   for chinchillas
 -The boundary value range: 29.9 ms -42.0 ms for
   humans, 26.7 ms -36 ms for chinchilla
 -Exposure to natural speech had no effect on the
   location of the boundary
   a. the two animals trained- 31.4 ms VOT
   b. the two animals having no training- 32.8 ms VOT
Experiment II
• Subjects:
 -Two of the four chinchillas used in
  experiment I served as subject.
 -One had originally been trained on natural
  speech while the other had been trained only
  on the synthetic tokens.
 -The same four English-speaking adults
Results –Experiment II
1. Transfer from the alveolar stimuli to the
    stimuli with a labial place of articulation
2. Location of the phonetic boundaries:
   phonetic boundaries of the fitted curves are
    26.8 ms VOT for English-speaking adults
    and 23.3 ms for chinchillas.
3. Boundary width: Each subject’s fitted curve
    was matched at the 50%.
Experiment III
• Subject: same subjects in experiment II.
• Procedure
 a. Discrimination training
 b. Generalization testing
• Results
 a. Transfer to stimuli with a velar place of articulation
 b. The 50% points of the fitted curves: 42.3 ms VOT
  for English-speaking adults, 42.5 ms VOT for
  chinchillas
 c. Each subject’s fitted curve was matched at the 50%
  point.
Experiment IV
• Stimuli: The labial, alveolar and velar stimuli
   previously described were used
• Subject: A single animal, the one for whom voiced
   stimuli were positive
• Procedure:
 a. Discrimination training: with the endpoint stimuli
  (0 ms VOT and +80 ms VOT) of all three continua
 b. Generalization testing: six endpoint stimuli and 21
   stimuli between the endpoints (+10 ms VOT to +70
   ms VOT from labial, alveolar, and velar continua)
Results-Experiment IV
• Location of the phonetic boundaries:
 -The relative locations of the three boundaries
  did not change when place of articulation was
  varied randomly
• Boundary width: The boundary widths of the
  fitted curves from experiment I, II and III are
  very similar to those from experiment IV.
Statistical Analyses
• Phonetic boundary
 -two-factor Analysis of Variance (species x place of
   articulation)
 -While the main effect of species was not significant (F= 0.376),
   the main effect of place of articulation was highly significant
   (p < 0.001).
 -no significant interactions
• Boundary width
 -two-factor Analysis of Variance
 -Both the main effect of species (p < 0.05) and of places of
   articulation (p< 0.05) were significant.
 -no significant interactions
 -the steepest slopes for the identification functions-velar stimuli
 -the shallowest slopes –labial stimuli
Discussion
A. Comparison of the “labeling” functions for human
  and nonhuman listeners
• Agreement between the identification functions for
  humans and animals for all three stimulus sets
  -The slopes are slightly less steep for chinchillas.
• The boundary values shift with the place of
  articulation are similar for the two groups of subjects.
 -A complementary relation between VOT and the F1-
  onset frequency.
 -The lower the onset frequency of F1, the greater the
  VOT
Discussion
B. Implications for theories of speech perception and
   the evolution of a speech-sound repertoire
  -A mammal with the appropriate auditory capabilities
   and no linguistic experience is predisposed to hear an
   abrupt qualitative change in the short voicing-lag
   region of the VOT.
 -The psychoacoustic considerations in the selection of
   candidates for a speech-sound repertoire
 - Speech sounds were selected to exploit the perceptual
   discontinuities that are a natural result of the
   functions of the mammalian auditory system.
Discussion
C. Interpretations of the human infant’s perceptual
  behavior
 -The infant’s accomplishments might reflect
  psychoacoustic predispositions that are favorable to
  speech-sound perception (Kuhl, 1978).
D. Exploring the nature of complex auditory perception
  using an “animal model”
You can also read