Changes in speech production in response to formant perturbations: An overview of two decades of research - Peter Lang Publishing
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Tiphaine Caudrelier, Amélie Rochet-Capellan Changes in speech production in response to formant perturbations: An overview of two decades of research Abstract: One way to investigate speech motor learning is to create artificial adaptation situations by perturbing speakers’ auditory feedback in real time. Formant perturbations were introduced by Houde and Jordan (1998), providing the first evidence that speakers adapt their pronunciation to compensate for these perturbations. Twenty years later, this chapter provides an overview of the general impact of Houde and Jordan’s work in speech research and beyond, as well as a more detailed review of studies that involve formant perturbations. The impact of Houde and Jordan’s work appears to be cross-disciplinary. Although mainly related to speech production and perception, it has also been cited in the limb movement and even animal research, mainly as evidence of adaptive sensorimotor control. Formant perturbations research has expanded rapidly since 2006, spreading across the world and many research teams. We identified 77 experimental studies focused on formant perturbations which we then analyzed with regard to technical and the- oretical issues. This analysis showed that various apparatuses and procedures were used to address important topics of speech research. A primary interest has been in feedback and feedforward control mechanisms in speech. These mechanisms were addressed in different populations, including adults and children with typical vs. atypical development, with behavioral or neurophysiological approaches, or both. Some formant perturbations studies more specifically focused on the integration of auditory and somatosensory feedback in speech production, while others explored the interaction between speech production and perception of phonemic contrasts. Some research questioned the processes and the nature of speech representations by investigating generalization of adaptation to formant perturbations. Finally, a few studies were interested in the effect of extraneous variables such as surface effects or speakers’ general cognitive abilities. Altogether, these studies provide insights into speech motor control in general and into the understanding of sensorimotor interactions in particular. The field has developed recently and may still expand in the future, as it allows us to address fundamental topics in speech research such as perception-production links or abstract vs. exemplar representations. Future Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
16 Caudrelier and Rochet-Capellan research with formant perturbations may also further connect sensorimotor adap- tation to linguistic and cognitive factors and in particular to working and long-term memory. Keywords: perturbation, real-time auditory feedback, formants, speech units, learning 1. Introduction As an “extraordinary feat of motor control” (Kelso, Tuller, Vatikiotis- Bateson, & Fowler, 1984, p. 812), speech production is a challenging research topic, highly influenced by movement sciences (Grimme, Fuchs, Perrier, & Schöner, 2011; Maas et al., 2008). Speech motor control indeed shares numerous features with other sensorimotor systems and in partic- ular with limb motor control. Among these features, sensorimotor adapt- ability of speech is of particular interest to speech science as the basis of speech rehabilitation (Maas et al., 2008), and since it is ubiquitous in daily life. Common examples include, among others, changes in the way we speak according to our interlocutor or to the surroundings, such as speaking louder when talking with someone with a hearing impairment or in a noisy environment (Garnier, Henrich, & Dubois, 2010); or spontane- ously imitating our interlocutor’s speech sounds (Pardo, 2006). Speech motor control also adapts throughout the lifespan to natural or accidental alterations of our sensory systems or vocal tract geometry, temporarily or more permanently (Jones & Munhall, 2003; Lane et al., 2007). These adaptations allow maintenance of some level of intelligibility despite vocal tract growth, hearing loss, orofacial surgery, or when wearing a dental apparatus, losing teeth, speaking while eating etc. Being essential to speech production, sensorimotor adaptation of speech is the topic of numerous studies. For the purpose of this chapter, we will focus on studies that involved specific perturbation of formants. Formants are frequencies cor- responding to peaks of acoustical energy, the relative values of which char- acterize vowels. Research in this field, and especially Houde and Jordan’s work, was inspired by the study of visuomotor adaptation in the limb movement literature (Houde & Jordan, 1998). Pioneering work on adaptation of different visuomotor activities appeared at the end of the 19th century (Held, 1965; Stratton, 1897). This Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 17 work introduced a now common approach to assessing visuomotor adap- tation that consists of investigating changes in movement in response to a systematic distortion of visual feedback, such as prism adaptation. As an illustration, Stratton (1897) reported his own and extreme everyday life experience while wearing an apparatus for eight days that reversed the ret- inal image upside down and left to right. On the first day, “the entire scene appeared upside down”. He felt nauseous. His movements were “labo- rious”, “embarrassed”, “inappropriate” (p. 344), required a lot of atten- tion and were “extremely fatiguing” (p. 344). By the start of the third day things were much better, with no sign of “nervous distress” (p. 349). At the end of the fourth day, he “preferred to keep the glasses on rather than sit blindfolded” (p. 351/352). When the apparatus was removed on day eight, it took him some time to go back to normal feelings and motions. Later work on visuomotor adaptation focused on more specific activi- ties, less dramatic and more local and short-term changes, with a focus on reaching movements performed with rotations of the visual field. In this context, it has been repetitively demonstrated that when movements are achieved while the visual field is shifted by a specific angle (α), participants first miss the target by the same angle α. However, with repetition, they progressively learn to adapt their movements to the new feedback and reach the target accurately again. When they return to normal vision, after-effects and transfer effects are observed: participants miss the training target (after-effects) and/or a new target (transfer) by an angle more or less close to –α. These effects vary as a function of the angular distance between the training and the testing targets (Krakauer, Pine, Ghilardi, & Ghez, 2000; Shadmehr & Mussa-Ivaldi, 1994). Sensorimotor adaptation has been attributed early on to feedforward control (i.e. predictive control based on learnt sensorimotor mappings) in contrast to forward closed- loop control (i.e. online processing of sensory inputs), visible in correction to unexpected perturbations (Golfinopoulos, Tourville, & Guenther, 2010; Houde & Chang, 2015). These notions are defined later in this chapter. Twenty years ago, Houde and Jordan (1998) introduced an analogous procedure of visuomotor rotation adaptation to question feedforward control in speech, which used real-time alterations of formant frequencies in vowels. By altering the frequencies of the first and/or second formants (F1 and F2 respectively) it is possible to make a vowel sound like another Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
18 Caudrelier and Rochet-Capellan vowel. For example, by decreasing F1 and increasing F2, the vowel /ε/ would sound closer to the vowel /ɪ/, as illustrated in Figure 1. This alter- ation displaces the auditory feedback, in the same way as prism vision displaces the visual position of the target. For example, the speaker says “head”, speaking into a microphone and wearing headphones (Figure 1.A). The signal is processed in real time so that F1 and F2 formants are moved towards “hid” (Figure 1.B), and played-back into the headphones. The consequence for the speaker is a discrepancy between the auditory target expected from the planned movements (“head”) and the auditory target they actually got (~“hid”). In other words, similar to visuomotor adap- tation, the speaker first misses the auditory target (Figure 1.C, “Training start”). With practice – repetition of shifted utterance(s) with the same per- turbation – the speaker adapts to the perturbation (Figure 1.C, “Training end”): To reach the auditory target “head” again in the presence of the perturbation, they produce formants in the opposite direction to the per- turbation. In our example, this corresponds to the production of an utter- ance closer to “had”. When the feedback is returned to normal or masked with a noise, for the same vs. different utterance(s) than the training one(s), after-effects vs. transfer effects are observed (Figure 1.C, column “After- effect” and “Transfer”). This suggests that the compensation is not only an online feedback control change but also affects auditory-motor mappings supporting feedforward control, in a more or less utterance or segment- specific way. The procedure was later adapted to address feedback control by investigating online compensation to unexpected perturbations (Purcell & Munhall, 2006b). Adaptation to formant perturbations has been investigated per se, or used as a paradigm to address more general issues in speech science. The current chapter reviews research in formant perturbations by analyzing Houde and Jordan’s seminal study (Houde & Jordan, 1998, 2002) and the scientific literature that has referred to it. Using this approach (detailed in the first section of the chapter) we can see the cross-disciplinary impact of Houde and Jordan’s work and in particular, identify the main topics of the scientific literature that have cited this work (reported in the second part of the chapter). Among the collected papers, only a subsection corresponded to empirical studies involving formant perturbations. Based on the analysis of these studies, including review of their reference lists, the latter parts Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 19 Figure 1: The auditory prism adaptation. (A) The speaker speaks into a microphone; his feedback is altered such as when he produces “head” he is hearing a signal closer to “hid”; (B) To do so, F1 and F2 are changed in real time; (C) Before the introduction of the perturbation (Baseline) the auditory feedback is consistent with the target. The first exposure to the perturbation (Training start) induces a discrepancy (or an error) between the auditory feedback and the planed target. With repetitive exposure to the perturbation, the talker changes his production to compensate for the perturbation (Training end). When the perturbation is removed after-effects and/or transfer effects are observed. of the chapter provide: (1) a description of the main apparatuses and paradigms used in formant perturbations studies; (2) an overview of the research topics addressed using these perturbations and the main reported results; and (3) some perspectives for future research. 2. Paper collection and analysis As we were interested in the impact of Houde and Jordan’s work and also wanted to provide an analytical review of formant perturbations studies, we first analyzed the published work that referred to Houde and Jordan (1998 and/or 2002) from 1999 to 2018 (last update on July 6th 2018). This was performed using the “Cited by” function in Google Scholar. We choose this approach rather than keyword research, as we wanted to collect various sorts of publications, and because it appeared to be the Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
20 Caudrelier and Rochet-Capellan Table 1: Number of references in each category of the first level of selection (see text for details) Formant shift No formant shift Not in Error Total Rejected Kept Rejected Kept English ref. 26 72 (+ 2, Houde & 140 287 35 22 584 Jordan 1998 and 2002) most systematic way to collect publications in the field. To compensate for potential errors and omissions by Google Scholar, the results were then analyzed very closely. An analysis by year of Google Scholar output resulted in a total of 584 references (including the two papers by Houde and Jordan, see Table 1). As a first step, we excluded documents that were not written in English or that corresponded to reference errors (57 in total, see Table 1). Among the 527 remaining references, we distinguished between those without vs. with an empirical study that included formant perturbations. In the former category (n=427, without formant perturbation), we kept only journal pa- pers for a thematic analysis of Houde and Jordan’s broad impact (n=287). In the latter category (n=100, with formant perturbations), we first kept all the documents except PhD or Master theses, posters or abstracts to conferences (74 references kept, 26 rejected). Note that there were 11 PhD theses; most of them were associated with journal publications. For consistency in criteria, we did not include Frank (2011)’s PhD thesis, even though it is often cited by studies investigating linguistic effects on formants adaptation. Its results were never published in peer-reviewed papers. Three more papers were added that included formant perturbations. One paper that did not cite Houde and Jordan was found in the refer- ence list of the selected papers (Niziolek & Guenther, 2013); and two pa- pers in course of publication at the time of writing that we were aware of (Caudrelier, Perrier, Schwartz, & Rochet-Capellan, 2018; Klein, Brunner, & Hoole, in this book). The general characteristics of the documents including formants perturbations are described in Table 2. Technical Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 21 Table 2: Number of papers considered for the analysis of formant perturbations according to source and type. Houde & Jordan (1998, 2002) are included. Journal papers Proceedings papers Reports/ chapters Total Google Scholar 55 17 2 74 Other sources 1 1 1 3 papers as well as papers investigating compensation to unexpected for- mant perturbations were included. The full list of analyzed papers related to formant perturbation is avail- able in Table 4, with their main related research topic indicated. As the paper collection is based mainly on the “cited by” function of Google Scholar some papers may be missing despite our careful attention. However, we believe our analysis provides an accurate picture of the field at the time it was run. 3. Overall impact of Houde and Jordan’s seminal work The overall impact of Houde and Jordan (1998, 2002) is illustrated in Figure 2. We distinguished seven broad categories of research: (1) for- mant perturbations studies (n=77); (2) studies that investigated speech compensation and/or adaptation to other auditory perturbations or equiv- alent situations (n=91) or (3) to an alteration of the vocal tract (n=16); (4) empirical or theoretical papers on speech production (n=61) or (5) on speech perception (n=46); (6) studies involving non-speech actions (n=25); and (7) experimental or theoretical papers involving animals (n=43). Five papers were not considered, as they were difficult to classify in these cate- gories. We first analyzed the journal papers that did not empirically test for- mant perturbations. As described above, this involved 286 articles. Broad research topics were identified mainly from abstract reading. A subset of papers was selected and read in more detail to illustrate the different topics. The articles on formant perturbations will be reviewed in detail in the next sections. We will now briefly overview the research topics in the six other categories. References in the following section are illustrative. Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
22 Caudrelier and Rochet-Capellan 3.1. Compensation/adaptation of speech production to various auditory perturbations Speech compensation and adaptation were investigated prior to the devel- opment of formant perturbation studies and used various methods. These methods continued to be used in some of the later work that cited Houde and Jordan. About half of the papers in this first category investigated speech modifications in reaction to either an unexpected or a predictable modification of F0 in different populations and conditions. A number of papers in this topic were published by Jones et al. (Jones & Munhall, 2000); Larson et al. (Burnett & Larson, 2002); or Hanjun et al. (Li et al., 2016). The other half of the studies investigated speech modifications in reaction to other types of auditory perturbations such as delayed auditory feedback (Chon, Kraft, Zhang, Loucks, & Ambrose, 2013); changes in intensity or noise level (Maas, Mailend, & Guenther, 2015); hearing loss (Palethorpe, Watson, & Barker, 2003); real or simulated use of cochlear implants (Casserly, 2015; Lane et al., 2007); or replacement of the audi- tory feedback by a stranger’s voice (Hubl et al., 2014). Other work modified consonant features such as frication (Shiller, Sato, Gracco, & Baum, 2009) or voicing (Mitsuya, MacDonald, & Munhall, 2014). Self- regulation in adaptation to formant perturbations was also linked with interpersonal auditory-motor regularizations in speech such as phonetic convergence (Pardo, 2006). 3.2. Compensation/adaptation of speech production to perturbations of the vocal tract dynamics or geometry Research on compensation and adaptation to perturbations affecting the somatosensory feedback is another field closely connected to adaptation to formant perturbations. Houde and Jordan’s work was thus cited by studies involving an alteration of the vocal tract geometry or dynamics. This includes dental prostheses (Jones & Munhall, 2003); lip tubes in chil- dren and adults (Ménard, Perrier, & Aubin, 2016); false palates (Thibeault, Ménard, Baum, Richard, & McFarland, 2011); mechanical forces applied to the jaw with a robot (Tremblay, Shiller, & Ostry, 2003); or more per- manent changes such as those induced by oropharyngeal cancer treatments (de Bruijn et al. 2012). Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 23 Figure 2: Overall impact: number of analyzed papers by year and categories. 3.3. Empirical or theoretical papers on speech production Houde and Jordan’s work is cited by empirical and theoretical research on speech production. For example, adaptation to formant perturbations is mentioned by studies providing further evidence of the role of auditory feedback in speech motor control, such as work linking auditory acuity to the production of speech contrasts (Perkell et al., 2004); auditory per- ceptual learning with improvement in production (Shiller, Rvachew, & Brosseau-Lapré, 2010); comparing overt and covert speech (Brumberg et al., 2016) or analyzing the neurophysiological activities of the auditory cortex during speech production (Curio, Neuloh, Numminen, Jousmäki, & Hari, 2000). Adaptation to formant perturbations provides support for neurocomputational models of speech production such as the Directions Into Velocity of Articulators model (DIVA, Golfinopoulos et al., 2010) or the State Feedback Control model (SFC, Houde & Chang, 2015), both models assuming a feedback and a feedforward control mechanism. Further information about these control mechanisms will be provided in the section describing formant perturbation studies related to this topic. Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
24 Caudrelier and Rochet-Capellan 3.4. Empirical or theoretical papers on speech perception Adaptation to formant perturbations is also taken as evidence of sensori- motor integration in speech. As such, it is relevant for papers probing or discussing the role of the motor system in speech perception (Sato, Troille, Ménard, Cathiard, & Gracco, 2013) or in theoretical papers related to the dual-stream model of language processing. Basically, this model proposes a cortical ventral stream that maps speech sounds to concepts, and a dorsal stream for auditory-motor mapping. Adaptation to formant perturbations is then cited as an evidence that a dorsal auditory-motor integration path is still functional in adulthood (Hickok & Poeppel, 2004). 3.5. Non-speech movement studies Various non-speech studies cited Houde and Jordan’s work to illustrate sensorimotor adaptation in humans. These studies focused on activities involving auditory feedback such as piano playing (Pfordresher & Palmer, 2006); or the learning of artificial auditory-arm movement maps (van Vugt & Ostry, 2018). Some papers were also interested in other kinds of sensorimotor adaptations such as swallowing (Wong, Domangue, Fels, & Ludlow, 2017), or visuomotor adaptation of limb movements (Wei et al., 2014). Note that as formant perturbations studies were inspired by visuomotor adaptation, they often referred to limb movement litera- ture. The converse seems not necessarily true as our research suggests that few works on limb adaptation have cited Houde and Jordan’s work. This result should be taken cautiously as limb movement research could cite other studies using formant perturbations to illustrate the adaptability of speech motor control, and we only collected papers that reference Houde and Jordan using “cited by” functionality of Google Scholar. 3.6. Animal studies Finally, animal studies have early, and regularly, cited Houde and Jordan’s work (Figure 2), with a main focus on the role of auditory feedback in action control. Over half of these papers were dedicated to birdsong and published by Brainard et al. and/or Doupe et al. and/or Sober et al. Many of these papers include studies of birdsong production or learning using audi- tory perturbations with behavioral and/or neurophysiologic recordings, as Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 25 well as interspecies comparative reviews about the processing of auditory feedback of self-produced sounds (Brainard & Doupe, 2000; Doupe & Kuhl, 1999; Sober & Brainard, 2009). Analogous works were done in bats (Smotherman, Zhang, & Metzner, 2003) and primates (Eliades & Miller, 2017). To summarize, this non-exhaustive analysis of the overall impact of Houde and Jordan’s seminal work suggests that it is (as expected) cited by papers investigating speech compensation and adaptation to other types of sensory perturbations. Most of the scientific questions in this first set of papers overlap with the research topics we will review based on the more detailed analysis of formant perturbations studies in the related section of this chapter. In a broad context, adaptation to formant perturbations is often interpreted as evidence for sensorimotor integration and sensori- motor plasticity in speech production and perception. It is cited to illustrate auditory feedback and feedforward control mechanisms in speech produc- tion, as explained below, and taken as an example of such mechanisms (and their plasticity) in studies investigating animal vocalizations, singing, music playing, but also inter-personal convergence or coordination of movements. Note that more research topics related to formant perturbation studies may be found by including “2nd order” connections to Houde and Jordan’s work (i.e. references that cite any of the studies on formant perturbations). 4. Methods in formant perturbation studies In this section, we provide an overview of the apparatuses used to apply real-time formant perturbation and a description of the main procedures identified in the collected papers. 4.1. Real-time formant perturbation The systems used to shift formants in the collected papers are summarized in Table 3. Paper details can be found in Table 4. With regards to formant perturbation, it is important to emphasize that in order to preserve the best quality of self-perception, the real-time modification of formants in speakers’ auditory feedback should meet some requirements, specifically: Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
26 Caudrelier and Rochet-Capellan (1) The signal should be processed and played back fast enough for the speaker not to perceive any delay (less than 30ms, see Yates, 1963). Specific digital signal processing boards (DSP), including systems from the music industry were used, especially in earlier work. Nowadays, this can be achieved at a software level, on a PC with appropriate sound card and software to analyze and change formants. For the same code, the achieved delay can vary depending on the operating system and hardware. (2) The parameters of the signal processor should be adapted to the speaker and/or to the vowel. This parameterization improves the for- mant detection and the reliability of the perturbation. (3) Perception of unperturbed feedback (bone conduction and air conduc- tion outside the headphones) should be reduced as much as possible. Different approaches were used to achieve this aim, such as: • Using whispered speech (Houde & Jordan, 1998, 2002) although subsequent studies were run with normal speech; • Using closed headphones or insert earphones to reduce the percep- tion of the air-conducted signal. The occlusion effect of the head- phones on adaptation was recently investigated with no significant difference in the magnitude of F1 adaptation between the use of the closed Sennheiser “HD 265” and the insert Etymotic Research ER2 (Mitsuya & Purcell, 2016); • Increasing the level of the feedback in the headphones, up to 87dB SPL (Villacorta et al., 2007); • And/or using a masking noise mixed with the played back signal to mask bone-conducted speech. (4) The shifted vowel should have clearly distinguishable F1 and/or F2 values, and the shift should be consistent with these values. For this reason, the vowel /ε/ is chosen in most of the studies as shifting more extreme front or back vowels could be limited by overlap in F1–F2 or F0–F1 frequencies (Mitsuya, MacDonald, Munhall, & Purcell, 2015), and this vowel allows upward and downward perturbations. Different research groups have developed their own formant perturba- tion systems (Table 3) with four main categories: (1) The two systems developed by Houde described with more details in Houde’s PhD (Houde, 1997) for whispered speech (1.a), and then in Katseff, Houde, & Johnson Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 27 (2012) for voiced speech (1.b); (2) The system developed and used by Munhall, Purcell and collaborators that used a specific hardware; (3) The system used by Perkell and Guenther’s teams that first included specific hardware (Villacorta et al., 2007) and was then adapted as a free software for Matlab. It supports various auditory perturbations, including changes in F1 and/or F2, but also more complex ones such as formant trajectory perturbations (Cai, Boucek, Ghosh, Guenther, & Perkell, 2008; Tourville, Cai, & Guenther, 2013). The last version is called “Audapter” and can be download on github.com (https://github.com/shanqing-cai/audapter_ matlab, this link was retrieved July, 6, 2018); (4) The last system was developed in parallel by three teams: Max et al., Ostry et al., and Shiller et al. It uses a device from the music industry (VoiceOne, TC Helicon) that by default allows shifting of all the formants while preserving F0. This system was used as a way to alter all formants in the same direction (Max & Maffett, 2015) or, with supplementary signal processing steps, including filtering and mixing, as a way to perturb F1 only (Rochet-Capellan & Ostry, 2011). A few papers were dedicated to the presentation and first Table 3: Main signal processing systems used in the literature to perturb formants in real time (references indicate the publication describing the system) and number of papers using the system. System 1 System 2 System 3 System 4 References Houde (1997); Purcell & Villacorta et al. Feng et al. Others Katseff et al. (2012) Munhall (2007); Cai et al. (2011); (2006ab) (2008); Tourville Rochet- & al. (2013) Capellan & Ostry (2011); Shum et al. (2011) Signal 1.a. Whispered National Texas Instruments Electronic Other processingspeech: Analysis- Instruments C6701 Evaluation speech software synthesis PXI-8176 Module DSP processor from or process, DSP- 96 embedded board then music industry hardware board, Ariel, controller C-extension VoiceOne; solutions – Inc. 1.b.Voiced Mex for TC Helicon + speech: “Feedback Matlab, opened filters Alteration Device” – access – Audapter Sinewave synthesis Number of 10 23 2 then 20 19 3 papers Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
28 Caudrelier and Rochet-Capellan evaluation of these different perturbation systems. This was the case with Cai et al. (2008) and Tourville et al. (2013) and with the preliminary work by Shih, Suemitsu, & Akagi (2011). Two papers also presented a method to perturb formants in populations in which speech acoustics have deteri- orated, by coupling articulatory synthesis with Audapter (Berry, North, & Johnson, 2014; Berry, North, Meyers, & Johnson, 2013). As displayed in Table 4, most of the studies involved native speakers of English, mainly from North America. Other languages were investigated in a few comparative studies or in relation to other research questions as described in the next section. Potential generalization of these findings to other languages and populations should therefore be taken with caution. 4.2. Main procedures in formant perturbation studies and related concepts The main procedures identified in the collected papers about formant perturbations are summarized in Figure 3. These procedures will be referred to in relation to the research topics detailed in the next section. Two main approaches can be distinguished: (1) Unexpected formant perturbation during the production of prolonged utterances: This first approach was used in only a few of the collected papers (n=11, ~14 % of the papers with formant perturbations, see Table 4). The perturbation is only applied to a small proportion of utterances so that talkers cannot anticipate the perturbation. Moreover, the utterances are produced with long vowel duration (steady-state vowels) so that corrective answers result from online processing of the auditory feedback (cf. Figure 3, procedure P4). This correction is called compensation. (2) Systematic and constant perturbation over a number of utterances: This second approach was used in the majority of the papers (n=66, ~86 %, Table 4). The basic procedure is represented in Figure 3, procedure P1. It involves the production of utterances with “natural” duration, in general. After a baseline with unaltered auditory feedback, the perturbation is introduced either gradually or abruptly, and then systematically applied at a constant level. Depending on the research group, changes in formant production Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 29 Figure 3: Overview of procedures used in formant perturbations studies. Duration of experimental phases and perturbations were variable across studies. P1 is the basic procedure to study auditory-motor adaptation, used in Munhall et al.’s studies. It was adapted to investigate the transfer of adaptation (P1t) (MacDonald, Pile, Dajani, & Munhall, 2008; Rochet-Capellan, Richer, & Ostry, 2012) and the effect of auditory motor adaptation on perception (P1p) (Lametti, Rochet-Capellan, Neufeld, Shiller, & Ostry, 2014) or the effect of perceptual training on sensorimotor adaptation (Lametti, Krol, Shiller, & Ostry, 2014). P2 is the procedure used in Houde & Jordan (1998) and then by Perkell et al. (Villacorta, Perkell, & Guenther, 2007). It is structured in epochs with training words produced with feedback followed by training words and generalization words produced with a masking noise. P3 is the multiple perturbation procedure developed in Rochet-Capellan & Ostry (2011), during which words are produced in random order with specific perturbation associated with each word. P4 is the compensation procedure to unpredictable perturbations. In this last case, long steady-state vowels are produced and the perturbation is introduced randomly for a small proportion of utterances to assess online correction (Purcell & Munhall, 2006b). Grey scale gradient in the ramp phase represents the progressive introduction of the shift. at the end of the training phase are referred to as compensation (cf. Houde & Jordan, 1998; Purcell & Munhall, 2006b) or adaptation (cf. Rochet-Capellan, Richer & Ostry, 2012, Martin et al., 2018), and residual changes when the feedback is returned to normal after training are referred to as adaptation or after-effect, respectively. Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
30 Caudrelier and Rochet-Capellan This procedure was also used to assess generalization (or transfer) of adaptation to untrained utterances, either in the course of the training phase (Figure 3, procedure P2) or after the training (Figure 3, proce- dure P1t), as presented in the next section. Hereafter, adaptation will refer to changes observed at the end of the training phase in response to a systematic perturbation. Compensation will mainly refer to changes in response to unpredictable perturbations but will also be used to qualify the direction of adaptive responses (by contrast with following responses that go in the same direction as the perturbation). 5. Research topics tackled with formant perturbations In this section, we provide a thematic review of the collected papers that included an empirical study of formant perturbation. As much as possible, we chose to associate each paper with a main topic but obviously a paper could be related to more than one topic. Table 4 provides a list of all the cited references and their main associated research topics. 5.1. Properties of feedback and feedforward control Many studies involving formant perturbations are related to the role of auditory feedback in speech motor control and distinguish between feedback and feedforward control mechanisms. Feedback control is a closed-loop system that involves the sensory consequences of the current motion. It is regarded as too slow to account for rapid control and rapid adjustments observed in fast coordinated actions. Rapidity and adapt- ability of motion were identified early on as evidence of a feedforward control mechanism by researchers in visuomotor adaptation. The core idea is that the brain makes predictions of the sensory consequences of its actions based on an efference copy of the motor command (Houde & Jordan, 2002). These predictions involve mappings between motor and sensory representations also called internal models (Purcell & Munhall, 2006a) or sensorimotor memories (see Perrier, 2012, for a discussion of the nature of internal models in speech). The DIVA (Golfinopoulos et al., 2010) or the SFC (Houde & Chang, 2015) neurocomputational models of speech production assume the existence of both feedback and feedforward control networks that involve auditory and somatosensory systems. When Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 31 the prediction based on internal models does not match the actual sensory input, the internal representations are changed to reduce this prediction “error” so that future movements performed in similar conditions will be accurate. This mechanism is claimed to underlie sensorimotor adaptation. In this context, a first subset of studies with formant perturbations was designed to “Investigate the nature, level of details, and use of internal models in speech production” (Max, Wallace, & Vincent, 2003, p. 1053) and to “begin to parameterize the formant feedback system” (MacDonald, Goldberg, & Munhall, 2010 p. 1060). The main contribution of these studies is to describe the role of auditory feedback in the control of for- mant production, and the adaptability of this control. In these papers, adaptability is mainly explained or taken as an evidence for feedforward internal models. To address the properties of adaptation to formant perturbations, Houde and Jordan (2002) analyzed in more detail the adaptation phenom- enon introduced in Houde and Jordan (1998). The results highlight some properties of feedback and feedforward control that were subsequently discussed and investigated in later work, involving various types of for- mant perturbations and procedures. The first observation of Houde and Jordan was that the changes in F1 and F2 production in talkers’ speech were compensatory responses, in the opposite direction to the perturbation. This result has been reproduced consistently in later work when between-speaker data are aggregated. Individual data suggests that some speakers follow the shift, however. For example, in a meta-analysis of their own studies of adaptation to for- mant perturbations, MacDonald et al. (2011) found that 26 out of 116 female speakers followed F1 or F2 shifts when their production of “head” was perturbed toward “had”. A possible explanation is that non-adapted speakers may not be able to dissociate their own production from the auditory feedback (Vaughn & Nasir, 2015). Following the formant shift rather than compensating for it was actually the most frequent behaviour observed in a preliminary study investigating compensation in Japanese speakers to unexpected perturbations of F1, F2 and F3 (Shih et al., 2011). Aside from this study, all other published work on formant perturbations observed significant compensatory adaptation in acoustic analyses, whereas preliminary analyses of articulatory correlates of adaptation are Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
32 Caudrelier and Rochet-Capellan Table 4: List of all the studies related to formant perturbation included in the pre- sent review. The first column provides the reference of the article. The 2nd column gives the language of participants (Du: Dutch, En: English, Fr: French, Ge: German, Ja: Japanese, Ko: Korean, Ma: Mandarin, Ru: Russian, Sp: Spanish). Column 3 is related to the perturbation systems, which are described in Table 3 (briefly, 1.a: Houde & Jordan (1998), 1.b. Katseff et al. (2012); 2: Purcell & Munhall, (2006a); 3: Audapter and its previous versions; 4: VoiceOne, TC Helicon, 5: Others) and column 4 indicates whether an article is mainly dedicated to the description of a perturbation system. Each study has been classified into either compensation (to unpredictable perturbations, column 5) or adaptation (to sustained perturbations). Columns 7 to 14 show whether the article is related to each of the main research topics presented in the present review. A cross indicates that the article is cited in the corresponding subsection, while a (X) indicates it is not although it is related to the topic. Surface effects & speakers’ characteristics Perception acuity and sensory integration Properties of feedback and feedforward Pathology affecting speech production Neural basis of speech motor learning Perceptual & phonological categories Transfer/Specificity and speech units Perturbation System System description Compensation Development Adaptation References Language control Alsius, Mitsuya, Latif, & En 2 X (X) X Munhall, 2017 Berry, Jaeger, Wiedenhoeft, En 3 X X X Bernal, & Johnson, 2014 Berry, North, & Johnson, 2014 En 3 X Berry, North, Meyers, & En 3 X Johnson, 2013 Bourguignon, Baum, & Shiller, En 4 X X 2014 Bourguignon, Baum, & Shiller, En 4 X X 2015 Bourguignon, Baum, & Shiller, En 4 X X 2016 Cai, Beal, Ghosh, Tiede, En 3 X (X) X Guenther, & Perkell, 2012 Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 33 Table 4: Continued Surface effects & speakers’ characteristics Perception acuity and sensory integration Properties of feedback and feedforward Pathology affecting speech production Neural basis of speech motor learning Perceptual & phonological categories Transfer/Specificity and speech units Perturbation System System description Compensation Development Adaptation References Language control Cai, Boucek, Ghosh, Guenther, Ma 3 X X & Perkell, 2008 Cai, Ghosh, Guenther, & Ma 3 X X X Perkell, 2010 Cai, Ghosh, Guenther, & En 3 X X Perkell, 2011 Caudrelier, Perrier, Schwartz, & Fr 3 X (X) X Rochet-Capellan, 2016 Caudrelier, Perrier, Schwartz, & Fr 3 X (X) X Rochet-Capellan, 2018 Caudrelier, Schwartz, Perrier, Fr 3 X (X) X Gerber, & Rochet-Capellan, 2018 Daliri, Wieland, Cai, Guenther, En 3 X (X) X & Chang, 2018 Lametti, Krol, Shiller, & Ostry, En 4 X X 2014 Lametti, Nasir, & Ostry, 2012 En 4 X X Lametti, Smith, Freidin, & En 4 X X Watkins, 2018 Demopoulos et al., 2018 En 1b X X X (X) Deroche, Nguyen, & Gracco, En 4 X (X) X 2017 Dimov, Katseff, & Johnson, En 1b X X 2012 Eckey & MacDonald, 2015 Ge 5 X X Feng, Gracco, & Max, 2011 En 4 X X Houde & Jordan, 1998 En 1a X X X (continued on next page) Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
34 Caudrelier and Rochet-Capellan Table 4: Continued Surface effects & speakers’ characteristics Perception acuity and sensory integration Properties of feedback and feedforward Pathology affecting speech production Neural basis of speech motor learning Perceptual & phonological categories Transfer/Specificity and speech units Perturbation System System description Compensation Development Adaptation References Language control Houde & Jordan, 2002 En 1a X X Ito, Coppola, & Ostry, 2016 En 4 X (X) X Katseff & Houde, 2008 En 1b X (X) Katseff, Houde, & Johnson, En 1b X X 2012 Klein, Eugen; Brunner, Jana; Ru 3 X X (X) Hoole, Phil (sous press) Lametti, Rochet-Capellan, En 4 X X Neufeld, Shiller, & Ostry, 2014 MacDonald & Munhall, 2012 En 2 X X MacDonald, Goldberg, & En 2 X X X Munhall, 2010 MacDonald, Johnson, Forsythe, En 2 X X Plante, & Munhall, 2012 MacDonald, Pile, Dajani, & En 2 X X Munhall, 2008 MacDonald, Purcell, & En 2 X X Munhall, 2011 Martin et al., 2018 Sp 1b X X Max & Maffett, 2015 En 4 X X Max, Wallace, & Vincent, 2003 En 5 X X Mitsuya & Purcell, 2016 En 2 X X Mitsuya, MacDonald, Munhall, En 2 X X & Purcell, 2015 Mitsuya, MacDonald, Purcell, En 2 X X & Munhall, 2011 Mitsuya, Munhall, & Purcell, En 2 X 2017 Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 35 Table 4: Continued Surface effects & speakers’ characteristics Perception acuity and sensory integration Properties of feedback and feedforward Pathology affecting speech production Neural basis of speech motor learning Perceptual & phonological categories Transfer/Specificity and speech units Perturbation System System description Compensation Development Adaptation References Language control Mitsuya, Samson, Ménard, & Fr 2 X X Munhall, 2013 Mollaei, Shiller, & Gracco, 2013 En 4 X X Mollaei, Shiller, Baum, & En 4 X (X) X Gracco, 2016 Munhall, MacDonald, Byrne, En 2 X X (X) & Johnsrude, 2009 Neufeld, Purcell, & Van Ko 2 X X Lieshout, 2013 Niziolek & Guenther, 2013 En 3 X X Parrell, Agnew, Nagarajan, En 1b X X X Houde, & Ivry, 2017 Pile, Dajani, Purcell, & En 2 X X Munhall, 2007 Purcell & Munhall, 2006a En 2 X X Purcell & Munhall, 2006b En 2 X X Purcell & Munhall, 2008 En 2 X X X Reilly & Dougherty, 2013 En 3 X X (X) Reilly & Pettibone, 2017 En 3 X X Rochet-Capellan & Ostry, 2011 En 4 X X Rochet-Capellan, Richer, & En 4 X X Ostry, 2012 Sato & Shiller, 2018 Fr 3 X (X) X X Schuerman, Nagarajan, & En 1b X X Houde, 2015 Schuerman, Nagarajan, En 1b X X McQueen, & Houde, 2017 (continued on next page) Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
36 Caudrelier and Rochet-Capellan Table 4: Continued Surface effects & speakers’ characteristics Perception acuity and sensory integration Properties of feedback and feedforward Pathology affecting speech production Neural basis of speech motor learning Perceptual & phonological categories Transfer/Specificity and speech units Perturbation System System description Compensation Development Adaptation References Language control Schuerman, Meyer, & Du 3 X X (X) McQueen, 2017 Sengupta & Nasir, 2015 En 2 X X Sengupta & Nasir, 2016 En 2 X X Sengupta, Shah, Gore, Loucks, En 2 X X (X) & Nasir, 2016 Shih, Suemitsu, & Akagi, 2011 Ja 5 X X Shiller & Rochon, 2014 En 4 X X (X) Shiller, Lametti, & Ostry, 2013 En 4 X X Shum, Shiller, Baum, & Gracco, En 4 X X 2011 Terband & Van Brenk, 2015 Du 3 X X Terband, Van Brenk, & van Du 3 X (X) X (X) Doornik-van der Zee, 2014 Tourville, Cai, & Guenther, 2013 3 X Tourville, Reilly, & Guenther, En 3 X X 2008 Trudeau-Fisette, Tiede, & Fr 2 X X (X) Ménard, 2017 van den Bunt, Groen, Ito, Du 4 X (X) X Francisco, Gracco, Pugh, & Verhoeven, 2017 Vaughn & Nasir, 2015 En 2 X X Villacorta, Perkell, & Guenther, En 3 X X X X 2007 Zheng, Vicente-Grabovetsky, En 2 X X MacDonald, Munhall, Cusack, & Johnsrude, 2013 Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 37 less clear. Max et al. (2003) analyzed acoustic changes to perturbation of all formants in the same direction in relation to jaw and tongue movement during adaptation. No consistent behaviour were observed in articulatory kinematics. Similar results were obtained in a pilot study in one Korean speaker with an F2 shift (Neufeld, Purcell, & Van Lieshout, 2013), while clearer tongue compensation movements were reported in speakers with blindness (Trudeau-Fisette, Tiede, & Ménard, 2017). On the other hand, while the majority of studies on adaptation to formant perturbations found significant compensatory responses, it was also shown that adap- tation vanishes when perturbed feedback is delayed by more than 100ms (Max & Maffett, 2015), or is at least largely reduced (Mitsuya, Munhall, & Purcell, 2017). Houde and Jordan also reported that maximal changes at the end of training did not fully compensate for the perturbation. This result was systematically reproduced in later studies. As an illustration, in Purcell & Munhall (2006a), the maximal adaptation to a 200Hz upward vs. downward shift of F1 compensated for about 30 % of the perturbation, regardless of the number of repetitions during the hold phase. This also suggests that adaptation is a fast process, in agreement with Max et al. (2003)’s observation that compensatory responses occurred after only a few repetitions. However, a F1 perturbation of at least 60Hz (80Hz on average across conditions) was required in Purcell & Munhall (2006a) to initiate the compensatory response. Similar thresholds were reported in later work, regardless of the delay in the auditory feedback (Mitsuya et al., 2017) and the occlusion of the headphones (Mitsuya & Purcell, 2016). Furthermore, MacDonald et al. (2010) highlighted a linear relationship between the magnitude of the perturbation and the magnitude of changes in speakers’ utterances for perturbation magnitudes up to +200Hz in F1 and -250Hz in F2, compensating for 25 % of the perturbation in F1 and 30 % in F2. With larger perturbations, there was no improvement, and a decrease even appeared in response to perturbations larger than 300Hz in F1 and larger than 400Hz in F2. Similar limits were observed by Katseff and colleagues (Katseff & Houde, 2008; Katseff et al., 2012), as discussed in the next section. Comparable adaptations were reported in the meta- analysis provided by MacDonald et al. (2011), with an average of 26.5 % for F1 and 23.2 % for F2. Moreover, in this last analysis, changes in F1 in Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
38 Caudrelier and Rochet-Capellan speakers’ production weakly correlated with changes in F2, suggesting a specific control of the two parameters and the existence of speaker-specific strategies. The magnitude of the response was also found to vary according to the vowel in pet, bus and law utterances in Max et al. (2003). Further work addressing this last point with regard to more specific research topics is presented in the next section. Houde and Jordan also noticed that inter-speaker variability was not related to a speaker’s awareness of the auditory shift. When interviewed after the study, talkers reported they were unaware of the perturbation or of any change in their production. By contrast, Purcell & Munhall (2006a) reported that 40 % of their participants indicated awareness of “some kind of change in the auditory feedback over the course of the experiment”, with only 8 % noticing that the perturbation transformed the vowel into a different one. However, the magnitude of adaptation did not seem to be related to the responses in this interview. This difference to Houde and Jordan might be related to the abrupt suppression of the perturbation after training in Purcell & Munhall (2006a) (Procedure P1, Figure 3) that was probably perceived by the speakers, while Houde and Jordan assessed how adaptation was sustained using catch trials with masking noise (Procedure P2, Figure 3). Munhall, MacDonald, Byrne, & Johnsrude (2009) then con- firmed that the awareness of the perturbation does not influence adaptive behavior, as discussed later in the “Surface effects & speakers’ character- istics” subsection. Another important result in Houde and Jordan was that changes for perturbed utterances were larger than changes for utterances produced with a masking noise. The authors discussed this result as evidence that “vowel production could be partly under immediate auditory feedback control” (Houde & Jordan, 2002, p. 307). By contrast, in their prelimi- nary study of adaptation to a shift of all formants in the same direction, Max et al., (2003) argued that the modifications in talkers’ production should be considered as adaptive responses rather than reactive changes, as they already occur at vowel onset, and have been observed for sustained vowels as well as vowels with shorter duration. The variability of changes in formants according to the vowel’s parts were not systematically investi- gated in adaptation studies as most of the studies used a single steady-state value, often around the middle of the vowel. However, in their preliminary Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
Changes in speech production in response to formant perturbations 39 work, Berry, Jaeger, Wiedenhoeft, Bernal, & Johnson (2014) suggested that this single value might not be the most appropriate, depending on consonant context and coarticulatory effects. Vaughn and Nasir (2015) also provided evidence that full trajectory analysis might better capture adaptation phenomena. The relationship between formant values in con- secutive trials (as measured with one-lag cross correlation analyses), in the absence of any perturbation, may also be predictive of adaptation mag- nitude (Purcell & Munhall, 2006a). Altogether, these results suggest that changes observed over the course of adaptation to a perturbation result probably from a mix of feedback and feedforward control. Houde and Jordan (2002) suggested investigating compensation to formant perturbations in steady-state vowels to determine the role of online feedback in formant control. Studies focusing on compensation to an unexpected formant perturbation in sustained vowels usually ana- lyzed changes at different points of the vowel. For instance, in Purcell and Munhall (2006b) upward vs. downward perturbations of F1 were applied randomly in five utterances of “head” over 100 utterances of different CVC words. Results show partial compensation, with on average, 16.3 % vs. 10.6 % of the upward vs. downward shifts, but with high variability for the same talker between utterances and between talkers. However, this study was not designed to measure the delay in compensatory response. This delay was found in later studies to be around 160ms, at least when F1 is shifted upward (e.g. Tourville, Reilly, & Guenther, 2008), and when more complex spatial or temporal perturbations of formants trajecto- ries are applied during the production of short sentences (Cai, Ghosh, Guenther, & Perkell, 2011). The smaller compensation of perturbation observed in studies involving unexpected perturbation compared to studies involving systematic perturbation, as well as the delay required to observe a compensatory response, confirm the idea that responses produced in the presence of the perturbation in adaptation studies are at least partially adaptive. One of the most intriguing outcomes of Houde and Jordan (2002) was that the modification in formants was still present when talkers came back a month later to run a control study evaluating changes in produc- tion without perturbation. This long-term effect was attributed by the authors to implicit memory of the task or specific control mechanisms for Susanne Fuchs, Joanne Cleland and Amélie Rochet-Capellan - 9783631797884 Downloaded from PubFactory at 04/03/2020 12:26:51AM via free access
You can also read