Written pronunciation instruction to combat final position fortis/lenis neutralisation in the Dutch-English accent
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Written pronunciation instruction to combat final position fortis/lenis neutralisation in the Dutch-English accent Heddwen Newton (3519104) h.m.k.newton@students.uu.nl Master Applied Cognitive Psychology Utrecht University Thesis 30 ECTS Content supervisor: Koen Sebregts (UU English Language and Culture) Process supervisor & first assessor: Krista Overvliet (UU Experimental Psychology) Second assessor: Jeroen Benjamins (UU Experimental Psychology) Date final version: 21 June 2021
Abstract Explicit pronunciation instruction is needed for better intelligibility when speaking English. Pronunciation pedagogy research has seen a resurgence in the past decade. Pronunciation often receives little attention in English lessons due to teachers’ lack of knowledge but also because it is a difficult skill to teach in a classroom setting. Many commercial books exist offering pronunciation instruction in writing, but there seemingly has been no research in the efficacy of this method. This study aims to amend that, and uses one specific feature of Dutch pronunciation as an example case: final position fortis/lenis neutralisation. Dutch English speakers were asked to record the words eyes and bed. A control group was compared to two experimental groups, one of which received specific written instructions to improve the fortis/lenis distinction and one of which received the same written instructions including audio files with spoken examples. The recordings were then presented in a second survey to English-speaking judges who were asked to rate if they heard ice or eyes, and bet or bed. Results show a significant improvement between the first recording and the second recording for all groups, including the control group. No significant differences in improvement were found between groups. This suggests that specific written pronunciation instructions are not beneficial. It is proposed that the reason for this lies in the fact that students on their own cannot judge which pronunciation instructions they should apply and how best to apply them, and that guidance is needed. 1
1 Introduction 1.1 The importance of English and good English pronunciation It can hardly be disputed that being able to speak English is important. Over the past decades it has grown into the main lingua franca and the most widely spoken language in the world, spoken by 375 million people as a first language and an estimated 1.1 billion people as a second language (Dorren, 2018). For speakers of a language like Dutch, which is not widely spoken or learned in the world, it is especially important to speak English and to speak it well if the Dutch person in question wants to achieve success outside their own country (Mai & Hoffmann, 2014) and even within their own country, with many businesses choosing English as an in-house language and Dutch universities switching to English (Kotake, 2017). Pronunciation is given less attention by teachers than other aspects of language such as grammar and vocabulary (Burri & Baker, 2020; Darcy, 2018; Gilakjani & Sabouri, 2016). In the past decade, we are seeing a resurgence in interest in pronunciation training (Martin, 2018; Zarate-Sandez, 2020; R. Zhang & Yuan, 2020), but this has not yet reached the classroom of students or even that of teacher trainees (A. A. Baker, 2014; Burri & Baker, 2020; Jarosz, 2019; Levis & Sonsaat, 2017; Zarate-Sandez, 2020). Teacher training programmes often include introductory courses which give future teachers a deeper understanding of the phonology of the language they will be teaching, but do not help their understanding of how to teach pronunciation (Martin, 2018; Murphy, 2017). Most Dutch secondary school teachers of English give little priority to pronunciation training (Smakman, 2014; Van den Doel, 2006), and it is often absent from the curriculum (Hermans, 2018). Dutch teaching materials hardly ever include sections on pronunciation (Van Hattum & Rupp, 2014). The disregard for pronunciation training is unfortunate, because good pronunciation is important for spoken communication; someone who makes grammatical mistakes but has good pronunciation is easier to understand than the inverse (Jarosz, 2019; Jenkins, 2005). Lecturers with a moderate Dutch accent are evaluated less positively than those with a slight Dutch accent or a native accent (Van Run, 2018). For British listeners, Dutch-accented English negatively impacts intelligibility compared to British accented English, more so for those listeners who were not familiar with Dutch accented English (Nejjari, 2020). Explicit pronunciation instruction can help students to be more intelligible when speaking (Thomson & Derwing, 2015; Zarate-Sandez, 2020; R. Zhang & Yuan, 2020). Students who have not been helped by their teachers to improve their pronunciation might try to improve it on their own, at home. Even before the Covid-19 pandemic, online learning saw a significant increase (Martin, 2018) and it is likely that after all learning moved online in 2020, home and online learning will remain important in the future (Lockee, 2021). Home-based pronunciation training might also be preferable to classroom teaching as teachers in a classroom usually have to take a one-size-fits-all- approach and cannot cater to individual issues. At home, students can focus on those features that they personally have problems with (Martin, 2018). Another reason to favour pronunciation training outside the classroom is that foreign language teachers are often so used to hearing foreign accented speech that they are no longer good judges as to what is intelligible and comprehensible English and what is not, and therefore are not able to set the most effective pronunciation goals (Munro & Derwing, 2006). Teachers also claim other reasons for not teaching pronunciation, namely a lack of materials and a lack of confidence in their own ability to pronounce English correctly (De Goei, 2017). Another advantage of teaching pronunciation at home is that 2
teachers are unwilling to single out and potentially embarrass students in class by correcting their pronunciation (Martin, 2018). Research into home-based learning has focused mostly on other subjects than L2 acquisition, and within the subject of L2 acquisition, speech acquisition and pronunciation has hardly been studied at all (Martin, 2018). Of these studies, many focus on technology (e.g. Deutschmann, Panichi, & Molka- Danielsen, 2009; Martin, 2018). But one of the very few recent meta-analyses into pronunciation instruction research found a smaller effect size for interventions that used technology than those that did not (Lee, Jang, & Plonsky, 2015). Written pronunciation training is easily available and affordable; there is an abundance of pronunciation instruction and accent work available on the market consisting of a book or set of books accompanied by an audio CD or online audio files. Some of these books are primarily meant for use with a teacher, but have been written in such a way that they may also be used without one (e.g. A. Baker, 2005; Collins & Mees, 2003, 2013; Cook, 2000; Smakman, 2014) , but many commercial books can also be found that are meant for use without the aid of a teacher (e.g. Farlex International, 2017; Hoge, 2014; Mojsin, 2016; Rupp, 2013; Sampaio, 2021). However, I have not been able to find any studies that look at the effectiveness of written pronunciation training. 1.2 Aim of current paper The current study aims to take a quantitative look at a popular but under-researched method of teaching pronunciation, namely teaching pronunciation via written text. Can pronunciation be taught on paper, and are audio files beneficial? 2 Theoretical framework Pronunciation is very difficult to teach and to master (Murphy, 2017; Setter & Jenkins, 2005; Smakman, 2014), because conceptual patterns of first language pronunciation are internalised during childhood and new, different patterns become more difficult to learn as we age because the cognitive functions needed to acquire these patterns disappear after childhood (Gilakjani, Ahmadi, & Ahmadi, 2011; Olea & Antonio, 2019). For example, in Dutch, the phonological property of devoicing obstruents in word-final position is categorical (Simon, 2010; Warner, Jongman, Sereno, & Kemps, 2004); voiced obstruents are never voiced in final position, meaning a word like hond is pronounced /hont/ when the “d” is in final position, but the plural honden is pronounced /honden/, because in this case the “d” is not in final position. This influences the way the Dutch speak their second language, English (Simon, 2010), meaning a word like dog will sound like dock. Speakers of other languages have trouble with this feature, too, for example Russian, German, Polish, Czech, and Catalan (Fullana & Mora, 2009; Jansen, 2004; Van den Doel, 2006; Warner et al., 2004). The feature is known as final position fortis/lenis neutralisation, final devoicing, or Auslautverhärtung. When second-language English speakers have problems with their pronunciation such as the one mentioned above, this can lead to intelligibility issues. Intelligibility is the ability to recognize “a word or another sentence-level element of an utterance” (Kachru & Smith, 2008: p. 61). Intelligibility is a main prerequisite in second language acquisition; although context can also often make clear what is being said, if the context is ambiguous or absent it is important to be able to understand an utterance at word level (Dauer, 2005). In the past few decades many researchers and teachers have stopped trying to get second-language speakers to attain native pronunciation, instead focusing on 3
comfortable intelligibility (Jarosz, 2019). Final position fortis/lenis neutralisation may lead to issues of intelligibility (Quené & Van Delft, 2010; Rupp, 2013). One reason for this is the high functional load of these errors; an error with a high functional load is one where a listener will find it hard to guess what a speaker is trying to say, for example because it is just one distinctive feature making the difference (minimal pairs) and/or because there are many other similar options that the word could be (Munro & Derwing, 2006). The greatest likelihood of misunderstandings occurs when the two words are in the same lexical category (e.g. both nouns), are both relatively frequent and are both semantically plausible in the context (Levis & Cortes, 2008). Minimal pairs where the contrast is in word-final position such as piece vs. peas are considered to have a high functional load (Zarate- Sandez, 2020). Indeed, German speakers found native English speakers easier to understand than German-English speakers when it came to this particular feature, a finding which goes against the interlanguage speech intelligibility benefit; the phenomenon that people with a certain language background will find English speakers with that same language background easier to understand than native speakers (Smith, Hayes-Harb, Bruss, & Harker, 2009). When it comes to classroom pronunciation teaching, current practices in pronunciation instruction in second language teaching consist of a hodgepodge of methodologies. Many teachers opt to not teach pronunciation explicitly, instead focusing on having students speak English as much as possible thereby mimicking immersion in the language; pronunciation is thought to follow automatically as students become aware that their interlocutors cannot understand them (Levis & Sonsaat, 2017). For children, this indeed works. Taking final position fortis/lenis neutralisation as an example again, language learners who are immersed in an English setting starting between the ages of 3 – 13 do not have trouble realising this feature correctly (Fullana & Mora, 2009). However, most adult learners, although they do become better as they spend more time with native speakers, are not able to realise it fully when placed in a naturalistic setting (Van Leeuwen, 2011). Moving away from immersion, an intuitive and often-used way to teach pronunciation is by saying a word or sentence and asking a student to copy it verbally (A. A. Baker, 2017), but if the student does not have a concept for certain patterns from childhood, this will not work without explicit and understandable instruction on how to pronounce utterances (Gilakjani et al., 2011). To find out how a certain pronunciation feature was being dealt with in the practice of the Dutch classroom, a short questionnaire about final position fortis/lenis neutralisation (final devoicing) was circulated among (former) English teachers and (former) English pupils in the Netherlands. The results show that 18% of Dutch English teachers do not address final devoicing at all, either because they are unaware of the issue or for other reasons. 54% of teachers pay attention to the correct pronunciation of their students, including final devoicing, but do not tell their students about it explicitly. Instead, they make sure their own pronunciation is correct in this regard, and they correct or recast their students’ pronunciation if an opportune moment to do so presents itself. Of the 28% of teachers who talk about final devoicing explicitly, 81% has students listen and copy, 53% talks about the distinction between voiced and unvoiced obstruents, 51% gives the advice to make the vowel longer, 18% gives the tip to add a schwa to the end of the word. (Approval for this teacher questionnaire was sought and given by the Ethics Review Board of the Faculty of Social & Behavioural Sciences at Utrecht University. Protocol number 21-0945. For the questionnaire, please see appendix 1.) 4
I have not been able to find any previous research that looks at the efficacy of teaching pronunciation on paper. Studies looking at efficacy of pronunciation training mostly focus on classroom settings (see Thomson & Derwing, 2015 for an overview), and the studies that focus on home study invariably look at online options which include video, social media or special software (e.g. Kartal & Korucu-Kis, 2020; Martin, 2020; Nielson, 2011). Written pronunciation instructions have often been based on previous written pronunciation instructions, for example, Smakman (2014) is based on Collins & Mees (2003) and Rupp (2013) is based on Jenkins (2005). But none of these works refer to any formal testing done into these instructions. Perhaps studies done into written pronunciation training were done in the past and these studies are no longer available, or perhaps these studies were never done because it makes such intuitive sense that teaching pronunciation in writing is inferior to teaching pronunciation with audio samples, which in turn is inferior to interactive teaching with software or a teacher, because it is difficult to describe what an utterance should sound like without sound, and difficult to describe how a mouth should move without moving images. 3 Research questions and hypotheses The research questions that guide this study are: Question 1: Can a specific written instruction help Dutch people make the distinction between final- position fortis and lenis obstruents in English? Question 2: Is it sufficient for this instruction to be in writing only, or are audio examples of added value? In order to find an answer to these questions, a short, written pronunciation lesson will be created to address the final position fortis / lenis issue described above and an experiment will be designed to see if these instructions have a measurable effect on the pronunciation of this feature for a group of Dutch-English speakers. It is predicted that written pronunciation instructions will have an effect on pronunciation improvement compared to a control group but that the effects will not be large. No prediction can be made concerning the presence of audio files. Audio recordings of the Dutch-English speakers will be presented to English-speaking judges. It is of interest to see if these judges rate these recordings differently depending on their background because this will lead to a clearer picture concerning the importance of pronunciation instruction depending on which interlocutors a student is likely to meet in their future life. One of the goals of pronunciation teaching should be to allow students to be understood by a wide range of interlocutors; not only native speakers, but also non-native English speakers (Espinosa, 2017). Interlocutors with different language backgrounds than that of the speaker have different needs concerning the pronunciation of English when it comes to understanding their conversation partner properly (Seidlhofer, 2009), and native English speakers have been shown to rate pronunciation errors differently depending on their accent (Van den Doel, 2006). It will also shed more light on the finding that teachers who are very familiar with their students’ accents can be biased judges when it comes to how well their student can be understood by people who are not familiar with the accent (Winke, Gass, & Myford, 2012). 5
This leads to the third research question. Question 3: Do listeners with different backgrounds (education level, native language/accent, age, familiarity with similar accents) interpret words spoken with Dutch fortis/lenis neutralisation differently? 3.1 Hypotheses 3.1.1 Question 1: Can a specific written instruction help Dutch people make the distinction between final-position fortis and lenis obstruents in English? In order to answer this question, Dutch English speakers will be divided into a control group without specific instructions on making the final position fortis/lenis distinction, and a treatment group who will read written instructions on this pronunciation matter. Due to its very form, written text as a means of instruction has many deficits compared to a teacher explaining something orally. Writing is static and fixed whereas speech is dynamic (a speaker can vary vocal property such as rhythm, tone and loudness) and flexible (for example, a speaker can respond to their interlocutor by adding more information or making a correction) (Ha, 2016) meaning teachers are better able to respond to their particular students’ needs when a lesson is spoken rather than written. There is also the matter of cognitive load; if a written text is lexically or syntactically dense or presents a lot of new information it can be difficult for students to distil the information they need, and this varies per student (Jacob, Lachner, & Scheiter, 2020). Studies looking at second language acquisition outside of the classroom find that students find it difficult to keep up the work without an external reason to do so such as a teacher (Nielson, 2011) and that it is important that students know how to self-regulate their studies (Tullis & Benjamin, 2011). This would suggest that a teacher will be much better able to improve pronunciation than a written text. But compared to a control group, a written text is still expected to lead to at least some improvement in pronunciation for the intuitive reason that some instruction is expected to be better than no instruction at all. 3.1.2 Question 2: Is it sufficient for this instruction to be in writing only, or are audio examples of added value? With only one self-published exception (Sampaio, 2021), every pronunciation or accent training book that I reviewed has been designed to include audio files online or on CD (A. Baker, 2005; Collins & Mees, 2003, 2013; Cook, 2000; Farlex International, 2017; Hoge, 2014; Mojsin, 2016; Raifsnider, 2011; Rupp, 2013; Smakman, 2014), though these have often become unavailable over time. As noted previously, in the communicative learner model that has been popular in the past decades for second language learning, the need for interaction is set in very high regard and such audio files are dismissed as tools by many researchers as they do not provide that interaction (e.g. Deutschmann et al., 2009). However, no studies have been found that give evidence that audio files are helpful or not in this context, though some authors note that they are without providing evidence (e.g. Raifsnider, 2011). Research has shown that listeners with another native language have trouble hearing certain differences in utterances pronounced by native speakers when these differences are not present in their own language (Gilakjani et al., 2011). Intuitively, one would think that audio files with written instructions would be better than no audio files with written instructions, but no evidence could be found to support this idea. 6
Due to the mixed evidence, no hypothesis can be drawn up for this research question. 3.1.3 Question 3: Do listeners with different backgrounds (education level, native language/accent, age, familiarity with similar accents) interpret words spoken with Dutch final position fortis/lenis neutralisation differently? There is no reason to suspect age or education level will affect the way listeners interpret English words spoken with Dutch fortis/lenis neutralisation. However, familiarity with accents that have this feature, including having such an accent oneself, might lead to people being more prone to think a Dutch person is saying the voiced word even when the unvoiced word is heard; in practice, this means when a Dutch person intends to say eyes, an experienced listener may indeed understand eyes but a non-experienced listener might hear ice. The interlanguage speech intelligibility benefit states that people with the same language backgrounds find it easier to understand others with the same background when speaking a second language (Bent & Bradlow, 2003). Listeners who are familiar with a certain foreign accent have been found to find it more comprehensible (Carey, Mannell, & Dunn, 2011; Gallardo del Puerto, García Lecumberri, & Gómez Lacabex, 2015; Winke et al., 2012; Y. Zhang & Elder, 2011). Based on these findings, I hypothesise that listeners who are familiar with accents that have the final devoicing feature will more often rate that they hear the voiced version of the minimal pair that they are presented with. I am cautious however, as in a previous study among German speakers looking at final position fortis/lenis neutralisation the opposite was shown, with German speakers finding native English speakers easier to understand than German-English speakers when it came to this feature (Smith et al., 2009). Van den Doel asked native English speakers with an American accent and speakers with an RP (British) English accent to judge Dutch-English pronunciation errors on seriousness. American speakers rated fortis/lenis neutralisation (not necessarily in final position) as a slightly more serious error than RP speakers (2006). However, the difference was only very slight and the theoretical reasoning, namely that speakers are more positive about accents that are similar to their own does not hold in this case because final devoicing is not a feature of RP or of American English. Also, we are not looking at “seriousness of an error” but at intelligibility. We therefore have no reason to assume that native-English accent will have an effect on judge ratings. 4 Method 4.1 Participants (phase 1: Speakers) The participants of the phase 1 questionnaire will be referred to as speakers. Dutch speakers who do not speak another language with native or near native proficiency were recruited via Facebook groups targeted at Dutch people, and to a lesser extent via Reddit, the website Surveyswap.io and the researcher’s own website. The researcher’s own social circle was not used to distribute the questionnaire. Speakers were randomly assigned to one of three groups: the control group, the audio intervention group and the written intervention group. Approval was sought and given by the Ethics Review Board of the Faculty of Social & Behavioural Sciences at Utrecht University. Protocol number 21-0945. Of the 257 speakers who started the survey, 21 speakers stopped because they were not willing to leave an audio recording, 37 speakers stopped because they were not able to leave an audio 7
recording (for example because they were not in a quiet environment) and 57 speakers were led out of the survey because they did not grow up in the Netherlands or because they spoke a second native language. 19 speakers left the survey before being asked to record anything for unknown reasons. Of the 123 speakers who continued the survey, 96 made an audio recording. It is likely that the 27 speakers who did not do so encountered technical problems. Of the 96 audio recordings, 24 were not useable. Most of these were corrupted due to the same technical issue as described above. Two people did not record all five words and two people made the first recording but not the second. One person recorded silence. Six recordings were very soft, these were amplified and kept in the sample. About ten recordings included static but were deemed by the researcher to be audible enough to be kept in the sample. In total, the researcher was left with 73 useable audio samples. Of the 73 speakers whose audio recordings were included in the second study, 41% classed themselves as speaking English at an advanced (C1 or C2) level, 47% classed themselves at upper intermediate (B2), 11% at lower intermediate (B1) and 1% at beginner (A1 or A2). These self- categorisations need to be viewed with some caution, as Dutch people tend to overestimate how good their English is when they self-report (Van Onna & Jansen, 2006). Four people (5%) opted not to fill in the optional demographic questions of age, gender and education. The average age of the remaining 69 speakers was 37.38 years (SD=1.67). 75% of these speakers were female, 25% were male. No speakers listed their gender as “other”. Most speakers were highly educated, with 57% of the sample educated at university level. Speakers were allocated to one of the three groups automatically and randomly by the phonic.ai survey software. The spread of speaker characteristics can be seen in table 1. 8
Table 1 Speaker characteristics per group Control Audio Written Whole sample Number of speakers 24 22 27 73 Age* 35.59 (SD 2.38) 39.75 (SD 3.63) 37.07 (SD 2.56) 37.38 (SD 1.67) Gender* 86.4% female 66.7% female 73.1% female 75.4% female 13.6% male 33.3% male 26.9% male 24.6% male Education* 0.0% Elementary 0.0% Elementary 0.0% Elementary 0.0% Elementary 0.0% VMBO 0.0% VMBO 0.0% VMBO 0.0% VMBO 0.0% HAVO 0.0% HAVO 7.7% HAVO 2.9% HAVO 0.0% VWO 9.5% VWO 0.0% VWO 2.9% VWO 0.0% MBO 4.8% MBO 11.5% MBO 5.8% MBO 50.0% HBO 28.6% HBO 19.2% HBO 31.9% HBO 50.0% WO 57.1% WO 61.5% WO 56.5% WO English level 0.0% A1/A2 4.5% A1/A2 0.0% A1/A2 1.4% A1/A2 20.8% B1 4.5% B1 7.4% B1 11.0% B1 54.2% B2 45.5% B2 40.7% B2 46.6% B2 25.0% C1/C2 45.5% C1/C2 51.9% C1/C2 41.1% C1/C2 *Age, gender and education were not compulsory and were not filled in by two people in the control group, one person in the audio group and one person in the audio group. VMBO=pre-vocational secondary education, HAVO=senior general secondary education, VWO=pre- university education, MBO=secondary vocational education, HBO=university of applied science, WO=university The minimum number of participants required per group was determined by an a priori power analysis in Gpower (Erdfelder, Faul, Buchner, & Lang, 2009). Within the realm of psychology, a medium effect size is f = .25 (Sawilowsky, 2009) and, considering an estimate power of .80, we estimate a minimum sample size of 14 speakers per group to detect main effects at an alpha-level of 0.05. This target has been achieved. 4.2 Design 3 x 2 repeated measures design with group as between-subject factor (control group, intervention A, intervention B) and time of recording (pre-intervention, post-intervention) as within-subject factor. 4.3 Experimental procedure: Speaker questionnaire Recordings were collected online, via the survey platform Phonic (www.phonic.ai). This platform allows participants to easily record themselves. Speakers were asked if they were willing and able to make audio recordings of their speech. They were informed about the nature and procedure of the study and were asked to give informed consent. They were then asked if they had grown up in the Netherlands or spoke another native language than Dutch. Respondents who were not eligible for 9
the study were screened out at this point and thanked for their interest. The questions that followed were about age, gender, and participants were asked to class their level of spoken English. This question was based on the European Framework of Reference (CITE). The levels were: advanced, upper intermediate, lower intermediate and beginner. As there were not enough speakers to ensure statistical power if advanced speakers were omitted, these were left in the sample. Speakers were then presented with an audio example consisting of the words “shape, hair, moon, coffee, statue” spoken with a southern UK English (RP) accent at a slow pace. These words had been selected to be similar in length, meaning and grammatical category as the experimental words, but to not contain the feature of interest so the speakers did not hear an example of that feature just before recording. They were then asked to make their own audio recording of the words "book, eyes, mango, bed, label". The second and fourth words in this list include the final fortis/lenis feature which is difficult for Dutch people to pronounce in a native speaker fashion. The third and fifth words were chosen because they start with a sonorant which was viewed as the least likely to have an effect on the last sound of the word before (Jansen, 2004). The first word was added so that the stimulus words would not be adversely affected by any technical or performative issues connected to starting a new sentence. All five words are nouns, all five words are expected to be known by speakers of every level of English and none of the words have confusing spelling. A pilot study included a question to guess why these particular words had been chosen, so that participants who were aware of the fortis/lenis issue could be removed from the pool. However, as none of the pilot participants were able to guess the purpose of the words, and a number of these were linguists, the question was deemed unnecessary and left out of the final questionnaire. After making their pre-test recording, all participants were given two “neutral” tips on speaking English. Tip 1 consisted of the message that the Dutch English accent is generally perceived more negatively by Dutch people than by other English speakers (Korsten, 2020; Koster & Koet, 1993; Nejjari, 2020; Nejjari, Gerritsen, Van der Haagen, & Korzilius, 2012) and Tip 2 consisted of a recommendation to articulate well. Speakers who had been sorted into the control group, condition 1, were then asked to make a second recording of the same words. Tip 1 and tip 2 can be read in full in appendix 4. The written group (condition 3) was presented with a short written pronunciation lesson that focused on making the vowel sound of words where final devoicing occurs longer, and the final obstruent softer. This was based on the literature (Gonet, 2012; Van Leeuwen, 2011) and on a personal communication with Koen Sebregts, linguist at Utrecht University (2021). Special care was taken to make the text low in complexity in order to have a low cognitive load (Jacob et al., 2020), and understanding was tested informally on a number of pilot testers. The lesson and a translation can be found in appendix 5. Speakers in the written group only got this tip in writing, speakers in condition 2 (the audio group) were also provided with audio files with an English native speaker (RP accent) saying the word pairs. The audio files can be accessed on https://hoezegjeinhetengels.nl/uitspraaktip-final-devoicing/. After making their second recording, speakers were led to a page explaining the goal of the study and giving them the opportunity to give feedback if they wished. After exiting the study they were sent to a “thank-you page” external to the survey, and were then given the chance to leave their e- 10
mail address in a separate survey if they wished to be informed of the results or if they had further questions. This was done to safeguard speaker anonymity. Speakers’ recordings were coded as follows: Condition 1, 2 or 3, Time 1 (pre-test) or Time 2 (post- test), Speaker number. Therefore, the third speaker’s pre-test recording who was randomly sorted into condition 2 would be given code C2T1S3. A flow chart describing the speaker survey is presented in figure 1. Figure 1. Flow chart depicting steps in speaker survey 11
4.4 Rating procedure: Judge questionnaire 4.4.1 Participants (phase 2: Judges) 498 people started the judge questionnaire and 411 filled in all necessary questions. These participants will be referred to as judges. The questionnaire was preceded by an informed consent page. Approval for the judge questionnaire was sought and given by the Ethics Review Board of the Faculty of Social & Behavioural Sciences at Utrecht University. Protocol number 21-0945. The average age was 30.2 (SD = 11.95) (14 people opted not to fill in their age), most respondents had a bachelor’s degree as their highest attained education level. 74% of judges were native English speakers, the most common accent was American/Canadian English (50%). Of the non-native English speakers the most common mother tongue was German, with 5.4% of all judges speaking this language. The educational background of the judges is listed in table 1. The distribution of language backgrounds and accents can be seen in tables 2 through 5. The minimum number of respondents required to run the multiple linear regression test to answer research question 3 was not defined in advance. However, a post-hoc sensitivity analysis confirmed the sample size was large enough to achieve a power > 0.99. Table 2 Highest attained level of education of participants who were participating as judge. Elementary school 3.4% High school 12.4% Some college 18.0% Bachelor's 33.2% Masters 27.1% PHD 4.9% Didn't say 1.0% Table 3 Accent distribution for native English speaking judges American/Canadian English 68.2% UK English 21.3% Australian, New Zealand, South African or Irish English 6.9% Other English accent 3.6% Table 4 Bilingualism in judges’ pool Monolingual English 65.1% Monolingual not English 22.7% Bilingual with English 9.3% Bilingual without English 2.9% 12
Table 5 Number of speakers per language in judges’ pool English 305 German 29 Spanish 17 Russian 13 Dutch 12 French 11 Mandarin 7 Italian 6 Portuguese 6 Cantonese 5 Swedish 5 Arabic 4 Hindi 3 Polish 3 Romanian 3 Other 30 4.4.2 Procedure phase 2: judge questionnaire All useable recordings from the speaker questionnaire were embedded in the judge questionnaire. This questionnaire was built in Qualtrics and distributed via Social Media (Facebook, Twitter, Discord and Reddit). The researcher’s social circle was not used to distribute the questionnaire. The Judge questionnaire presented six random recordings out of the total pool of 146 to each judge who was then asked to rate the recordings on a Likert scale as follows: Which words do you hear in the spaces? You can listen as often as you like. Book, …., mango, ….., label First word: O It’s definitely “ice” O I’m pretty sure it’s “ice” O I think it’s “ice” O I think it’s “eyes” O I’m pretty sure it’s “eyes” O It’s definitely “eyes” “It’s definitely ‘ice’” corresponds to a rating of 1, “it’s definitely ‘eyes’” corresponds to a rating of 6. This is a measure of intelligibility. In pronunciation studies, some scholars use measures of subjective opinion (e.g. “how serious is this error, in your opinion?”), some scholars use measures of 13
comprehensibility (e.g. “how easy was it to understand what the speaker was saying, in your opinion?”) and some scholars, as in our case, choose the more objective measure of intelligibility (e.g. “what word did you hear?”) (Munro, Derwing, & Morton, 2006). The judges were then asked to note if the recordings had been clear enough to make a judgement; 6 people (1.5%) answered “no”, 16 (3.9%) answered “mostly no”. The ratings from the judges who answered “no” were removed from the sample. The researcher checked if these judges had all heard a similar batch of recordings but this was not the case, so it is assumed that the problem was on the judges’ end. They were then asked for their native language. If their native language was English, they were also asked to fill in their accent and note how familiar they were with a list of final- devoicing languages (German, Dutch, Afrikaans, Polish, Russian, Czech, Slovak, Bulgarian, Armenian, Lithuanian, Catalan and/or Turkish). They were then asked for their age and highest achieved level of education. The last page of the survey included an explanation of the study and the possibility to provide feedback, ask a question or leave an e-mail address to be kept informed of the results. 28 people left their e-mail address. The researcher was not able to change the url of the survey (survey.uu.nl), so after the sound quality question the respondents were asked which country they thought the speakers were from, to check for bias. 37% of respondents answered “no idea”, 34% answered UK, 6% answered The Netherlands, 5% answered USA and 5% thought it was a German-speaking country. A range of 25 countries was mentioned by the remaining 13% of respondents. 5 Results 5.1 Results of speaker performance To answer the questions “Can a short instruction help Dutch people make the distinction between final position fortis and lenis obstruents in English?” and “Is it sufficient for this instruction to be in writing only, or must it include audio examples?”, a repeated measures ANOVA was carried out with performance (the mean scores of the judges per speaker) as dependent variable and the time of recording (pre-post; within subjects) and group (between subjects) as independent variables. Normality assumptions were assessed by plotting frequency distributions and computing the Shapiro-Wilk test for all the groups. Results indicated that normality was achieved for all (ps>.05) except for one group, namely the control group (p=.039). The sphericity assumption was not significantly violated (p>.05). Results revealed a significant main effect of time on performance for eyes F(1,71)=9.65, p=.003, η2=.120. There was, however, no significant main effect of group (F(1,71)=0.07, p>.05, η2=.002) or interaction between time and group, F(2,71)=0.71, p=.491, η2=.020. See Figure 2. 14
6 5 4 3 2 1 0 "eyes" time 1 "eyes" time 2 Control Audio Written Figure 2. Bar chart illustrating the mean score at time 1 and time 2 for the word eyes in the control group (white), the group receiving audio instructions (light grey) and the group receiving written instructions (dark grey). Error bars illustrate 95% Confidence Intervals. A score of one represents “definitely ice” and a score of six represents “definitely eyes”. Results revealed a significant main effect of time on performance for bed F(1,71)=8.03, p=.000, η2=.159. There was, however, no significant main effect of group (F(2,71)=0.74, p=.48, η2=.020) or interaction between time and group, F(2,71)=0.82, p=.259, η2=.037. See Figure 3. 6 5 4 3 2 1 0 "bed" time 1 "bed" time 2 Control Audio Written Figure 3. Bar chart illustrating the mean score at time 1 and time 2 for the word “bed” in the control group (white), the group receiving audio instructions (light grey) and the group receiving written instructions (dark grey). Error bars illustrate 95% Confidence Intervals. A score of one represents “definitely ice” and a score of six represents “definitely eyes”. 15
To explore whether an effect would be apparent, only the speakers with a rating below 4 at T1 were selected from the sample. The resulting sample was, however, not big enough to ensure power in statistical estimates (C1 eyes N=13, C1 bed N=9, C2 eyes N=9, C2 bed N=12, C3 eyes N=13, C3 bed N=10). Thus, the below trends should be considered with caution. Results revealed a significant main effect of time on performance for eyes F(1,25)=3.80, p=.025, η2=.185. There was, however, no significant main effect of group (F(1,25)=0.27, p>.124, η2=.154) or interaction between time and group, F(2,25)=0.98, p=.391, η2=.072. See Figure 4. 6 5 4 3 2 1 0 "eyes
6 5 4 3 2 1 0 "bed
For model 2 (bet/bed), no significant results were observed, F(4,302)=1.833, p=.122 Specifically, investigation of the standardized beta coefficients indicated that none of the predictors carried significant explanatory power in relation to the outcome variable (all ps>.05). See table 7. Table 7 Standardized beta coefficients for “bed” B SD β t p Accent 0.005 0.052 0.006 0.105 .916 Familiarity -0.059 0.045 -0.077 -1.324 .186 Age 0.002 0.004 0.027 0.41 .682 Education 0.077 0.047 0.108 1.646 .101 6 Discussion 6.1 Summary of results A significant improvement was seen between the pre-recording of the two final-devoicing words and the post-recording, but this result was also apparent for the control group. No significant difference in level of improvement was seen between groups, suggesting that a written instruction can lead to improvements in intelligible pronunciation, but that it is enough that this instruction be “please articulate better” and “don’t be embarrassed about your accent” and that an instruction that was carefully designed to improve a specific feature of pronunciation makes no difference. There was no significant difference in improvement between the written group and the audio group, suggesting audio pronunciation examples give no added value. No significant results were observed when judges’ age, educational background, accent and familiarity with final-devoicing accents were looked at as predictors for their ratings, suggesting that listeners with different backgrounds do not interpret words spoken with Dutch final position fortis/lenis neutralisation differently when these words are presented without semantic context. 6.2 Written instruction without audio The first research question posited was “Can a specific written instruction help Dutch people make the distinction between final-position fortis and lenis obstruents in English?” Based on the results of the experiment, the answer seems to be “no”. Martin (2018) notes that pronunciation training done at home needs to be guided by a teacher, otherwise “students zoom in on the wrong features and, despite training, will not improve their pronunciation” (P.33). The current study seems to corroborate this. Most speakers did unexpectedly well in pronouncing eyes and bed intelligibly in the pre-intervention recording, meaning the experiment was providing pronunciation instructions to people who did not need them. When these speakers were taken out of the sample, a trend was observed that the experimental groups did better than the control group for the pronunciation of bed. However, the results were not significant which might be due to the small sample size. For eyes the removal of previously successful speakers made little difference to the results, as the control group still did comparatively well. This difference between eyes and bed may be because eyes ends on a fricative, meaning a speaker can lengthen this 18
final consonant, whereas the word bed ends in a stop, which cannot be lengthened. Listening to the audio recordings the researcher heard many examples of people who, intending to lengthen the vowel, also lengthened the /s/ sound. In native-speaker English, word-final /s/ is longer than word- final /z/, which means eyes will sound more like ice if the last fricative is lenthened (Fullana & Mora, 2009). Five speakers in the audio group and one speaker in the written group who were rated on average as saying eyes in the pre-intervention recording were rated as saying ice in the post- intervention recording. This effect was not seen for a single speaker for bed. Taking this into account and the previously mentioned trend, it is possible that if this experiment were to be repeated among a larger sample of people with low-level English, a small but significant effect might be seen for final- devoicing words that end in a stop. 6.3 Written instruction with audio The second research question was “Is it sufficient for this instruction to be in writing only, or are audio examples of added value?” There was no difference between the written group and the audio group, and there was no trend apparent in the data that would suggest a difference might be found if the sampling limitations of this study had been absent. In the design of the study, the difference between the written experimental group and the audio experimental group was kept small; the text was almost exactly the same, except for a reference to the audio file in the audio group. The audio file was of a native speaker reading the minimal pairs discussed in the pronunciation lesson. This was done to be comparable to commercial pronunciation training books. It is possible that not all speakers listened to the audio files. There was no check in the experiment to see if they did, but books of pronunciation or accent also leave it up to the consumer if they will listen to the audio files or not. 6.4 Listener background The third research question was “Do listeners with different backgrounds (education level, native language/accent, age, familiarity with similar accents) interpret words spoken with Dutch fortis/lenis neutralisation differently?” No significant effects were found. Even though each judge only heard six recordings, the large sample size of 411 judges means the negative finding here is robust. A reason for this result might be that the judges heard a list of five words without any context. Contextual clues can help to understand words that are difficult to understand on their own (Bent & Bradlow, 2003). Previous studies that found experienced listeners found non-native accents to be more comprehensible than naïve listeners carried out their study with full sentences, not with single words (e.g. Gallardo del Puerto et al., 2015). It is possible that people who have more experience with final-devoicing accents would be better able to draw from contextual clues what a foreign accented person is trying to say, but without these contextual clues, there is no difference between listeners. 6.5 Limitations 6.5.1 Attrition in speaker survey There was a high rate of attrition in the speaker survey. Partly, this was to be expected because people clicked on the link, discovered that they would be asked to make an audio recording, and unwilling to do so, left again. However, I also encountered many technical challenges. The Phonic.ai platform is not able to function properly when accessed via Facebook on a smartphone. In the pilot 19
test, almost all recordings failed for this reason. In the actual survey I presented respondents with a workaround before the start of the survey, (“If you have accessed this survey via Facebook and are on your smartphone, please click the three dots at the top right of your screen and choose ‘Open in Chrome’ (or ‘Open in…’ and then your usual web browser). Then you can start the survey. If you do not do this, the survey will not work properly. Thank you!”) It is likely that some people did not see or understand this. There were 28 speakers who filled in the questionnaire up until the point of making the first audio recording, suggesting they wanted to do so but ran into technical difficulties. Of these people, 36% had an education level of MBO or lower (compare to the final speaker sample of 12%), 18% classed themselves as a beginner level English speaker (final sample: 1%) and 29% as lower intermediate (final sample: 11%). It is clear, therefore, that due to the technical difficulties inherent in the Phonic.ai system, my speaker sample was less diverse than it could have been. As discussed above, a larger speaker sample with lower ratings in the first recording might have led to significant results for improvement in pronunciation of bed. It should be noted at this point that only two people (one from the control group and one from the written group) made the first recording but not the second, meaning the attrition during the experimental stage itself was low. Fears that the pronunciation instructions were too long and that people would stop at that point were, it seems, unfounded. 6.5.2 Choice of stimulus words Bed might not have been the best word to choose as it includes the dress/trap vowel that many Dutch people struggle with. A number of judges remarked that they didn’t hear bed or bet but instead heard bad or bat. This confusion is likely to have been more present for American judges as the American pronunciation of bad comes very close to the Dutch-English pronunciation of bed, especially when the vowel is lengthened. A minimal pair of nouns ending in a stop with a less problematic vowel might have been log and lock. It is possible that this minimal pair would have led to judges making more divergent ratings which in turn might have led to different results. Eyes might not have been the best word to choose because many North American accents have a different vowel for eyes than for ice; commonly known as Canadian raising this affects words with /aɪ/ which become /ʌɪ/ when positioned in front of a voiceless obstruent. This is becoming more common in North America (Moreton, 2016). This means that for people with this pronunciation feature, eyes is less likely to sound like ice. I have a British accent and I was not aware of this until a small number of judges mentioned it in the feedback field at the end of the survey. However, if this had had a large affect, this would have been apparent in the statistical analysis for the third research question which looks among other things at any difference in judgement between judges with an American accent and others. No effect was found. A different set of minimal pairs such as phase and face might have been a better choice, but it is not expected that these words would have led to different results. 6.5.3 Quality of intervention The written lesson on how to better pronounce words ending on a voiced consonant, which is presented in appendix 5, was specially developed for this study and was not tested beforehand in another modality, for example by a teacher in an English lesson. Though it was developed based on sound sources and studies (Collins & Mees, 2013; Gonet, 2012; Van Leeuwen, 2011), and with the aid of a professor of phonology specialised in Dutch-English pronunciation, it is possible that there 20
was no significant result not because of the written modality, but because the lesson in itself was not helpful. 6.5.4 Ecological validity Elicitation was not spontaneous, but speakers were reading a list of words, a method in pronunciation research and pronunciation teaching that is often used but much criticised (e.g. in Thomson & Derwing, 2015). Anecdotal evidence suggests that when speakers speak spontaneously, the final-devoicing issue is more apparent. This means a more naturalistic study, though difficult to carry out, may have led to different results. However, the current research had more speakers than a naturalistic study is likely to have been able to collect. This gives the statistical results more weight. Also, as a first study into a matter that has not yet seen any research done (as far as I have been able to find) it makes sense to use a controlled experimental setup rather than a naturalistic study. 6.6 Future directions One of the speakers in this study reached out to say that she had been living in Canada for many years and that the final devoicing issue was one she had trouble with. Her Canadian conversation partners would often misunderstand her, and she felt the tip was very useful to her. For these kinds of people, with specific, known needs, this kind of written pronunciation instruction might be helpful. A future study might look at the best way to match students with the pronunciation instruction that they need, and measure if in that situation a specific written instruction is sufficient for improvement. It might be beneficial to contrast written instruction with video instruction and live instruction by a teacher, to see how much of a benefit each modality presents. A number of studies have looked at technological solutions for pronunciation training without a teacher (e.g. Kartal, 2020; Martin, 2018). A meta-analysis found that the effects were small compared to interventions with teachers (Lee et al., 2015), but the abundance of pronunciation books on the market suggest there is a need for pronunciation training that can be followed without a teacher. Perhaps future studies can uncover a technological tool that is able to give the teacher- like interaction required for effective pronunciation training. 7 Conclusion Explicit pronunciation training is important for intelligibility (Thomson & Derwing, 2015). English teachers in the Netherlands and other countries have many reasons not to address pronunciation in their classrooms. They may feel it is unnecessary because within the communicative method, pronunciation is seen as something that does not need explicit teaching (Levis & Sonsaat, 2017), they might feel they do not have the time (Thomson & Derwing, 2015), they might not have been taught how to address pronunciation during their training (Burri & Baker, 2020), they might not want to embarrass their individual students or they might feel that a one-size-fits-all approach does not work for an entire class of students, each with their own particular pronunciation needs (Martin, 2018). However, teachers and schools need to find a way to address pronunciation because the current study suggests that leaving students to pick up a book on pronunciation is not helpful. This might be because students cannot properly judge which issues they need to work on and how best to apply the interventions they read about. Other solutions must be found. Because home-based interventions have many benefits, future studies should look at ways to provide home-based pronunciation interventions that include forms of guidance. Technology might be beneficial. 21
8 Literature Baker, A. (2005). Ship or Sheep; An intermediate pronunciation course (3rd ed.). Cambridge University Press. Baker, A. A. (2014). Exploring teachers’ knowledge of second language pronunciation techniques: Teacher cognitions, observed classroom practices, and student perceptions. TESOL Quarterly, 48(1), 136–163. https://doi.org/10.1002/tesq.99 Baker, A. A. (2017). Pronunciation teaching in the preCLT era. In Okim Kang, R. Thomson, & J. M. Murphy (Eds.), The Routledge Handbook of Contemporary English Pronunciation (1st ed., pp. 249–266). Routledge. https://doi.org/10.4324/9781315145006-16 Bent, T., & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. The Journal of the Acoustical Society of America, 114, 1600–1610. https://doi.org/10.1121/1.1603234 Burri, M., & Baker, A. A. (2020). “A big influence on my teaching career and my life”: A longitudinal study of learning to teach English pronunciation. TESL-EJ: The Electronic Journal for English as a Second Language, 23(4), 1–24. Retrieved from https://ro.uow.edu.au/sspapers/4677 Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater’s familiarity with a candidate’s pronunciation affect the rating in oral proficiency interviews? Language Testing, 28(2), 201– 219. https://doi.org/10.1177/0265532210393704 Collins, B., & Mees, I. M. (2003). The phonetics of English and Dutch (5th ed.). Brill. Collins, B., & Mees, I. M. (2013). Practical phonetics and phonology: A resource book for students (3rd ed.). Routledge. https://doi.org/10.4324/9780203080023 Cook, A. (2000). American accent training; A guide to speaking and pronouncing American English for everyone who speaks English as a second language (2nd ed.). Barron. Darcy, I. (2018). Powerful and effective pronunciation instruction: How can we achieve it? CATESOL Journal, 30(1), 13–45. Retrieved from https://files.eric.ed.gov/fulltext/EJ1174218.pdf Dauer, R. M. (2005). The Lingua Franca Core: A new model for pronunciation instruction? TESOL Quarterly, 39(3), 543–550. https://doi.org/10.2307/3588494 De Goei, S. (2017). Do you speak English? De implicaties van Engels als Lingua Franca voor het uitspraakonderwijs. [Master’s thesis, Fontys Hogeschool Tilburg]. https://doi.org/10.13140/RG.2.2.15560.83206 Deutschmann, M., Panichi, L., & Molka-Danielsen, J. (2009). Designing oral participation in second life - A comparative study of two language proficiency courses. ReCALL, 21(2), 206–226. https://doi.org/10.1017/S0958344009000196 Dorren, G. (2018). Babel: Around the World in 20 Languages. Grove Press. Erdfelder, E., Faul, F., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149– 1160. https://doi.org/10.3758/BRM.41.4.1149 Espinosa, J. A. C. (2017). “A relaxing cup of Lingua Franca Core”: Local attitudes towards locally- accented english. Atlantis, 39(1), 11–32. 22
You can also read