Written pronunciation instruction to combat final position fortis/lenis neutralisation in the Dutch-English accent

Page created by Marion Page
 
CONTINUE READING
Written pronunciation instruction to
combat final position fortis/lenis
neutralisation in the Dutch-English
accent
Heddwen Newton (3519104)
h.m.k.newton@students.uu.nl
Master Applied Cognitive Psychology
Utrecht University
Thesis
30 ECTS
Content supervisor: Koen Sebregts (UU English Language and Culture)
Process supervisor & first assessor: Krista Overvliet (UU Experimental Psychology)
Second assessor: Jeroen Benjamins (UU Experimental Psychology)
Date final version: 21 June 2021
Abstract

Explicit pronunciation instruction is needed for better intelligibility when speaking English.
Pronunciation pedagogy research has seen a resurgence in the past decade. Pronunciation often
receives little attention in English lessons due to teachers’ lack of knowledge but also because it is a
difficult skill to teach in a classroom setting. Many commercial books exist offering pronunciation
instruction in writing, but there seemingly has been no research in the efficacy of this method. This
study aims to amend that, and uses one specific feature of Dutch pronunciation as an example case:
final position fortis/lenis neutralisation. Dutch English speakers were asked to record the words eyes
and bed. A control group was compared to two experimental groups, one of which received specific
written instructions to improve the fortis/lenis distinction and one of which received the same
written instructions including audio files with spoken examples. The recordings were then presented
in a second survey to English-speaking judges who were asked to rate if they heard ice or eyes, and
bet or bed. Results show a significant improvement between the first recording and the second
recording for all groups, including the control group. No significant differences in improvement were
found between groups. This suggests that specific written pronunciation instructions are not
beneficial. It is proposed that the reason for this lies in the fact that students on their own cannot
judge which pronunciation instructions they should apply and how best to apply them, and that
guidance is needed.

                                                                                                       1
1 Introduction

1.1 The importance of English and good English pronunciation
It can hardly be disputed that being able to speak English is important. Over the past decades it has
grown into the main lingua franca and the most widely spoken language in the world, spoken by 375
million people as a first language and an estimated 1.1 billion people as a second language (Dorren,
2018). For speakers of a language like Dutch, which is not widely spoken or learned in the world, it is
especially important to speak English and to speak it well if the Dutch person in question wants to
achieve success outside their own country (Mai & Hoffmann, 2014) and even within their own
country, with many businesses choosing English as an in-house language and Dutch universities
switching to English (Kotake, 2017).

Pronunciation is given less attention by teachers than other aspects of language such as grammar
and vocabulary (Burri & Baker, 2020; Darcy, 2018; Gilakjani & Sabouri, 2016). In the past decade, we
are seeing a resurgence in interest in pronunciation training (Martin, 2018; Zarate-Sandez, 2020; R.
Zhang & Yuan, 2020), but this has not yet reached the classroom of students or even that of teacher
trainees (A. A. Baker, 2014; Burri & Baker, 2020; Jarosz, 2019; Levis & Sonsaat, 2017; Zarate-Sandez,
2020). Teacher training programmes often include introductory courses which give future teachers a
deeper understanding of the phonology of the language they will be teaching, but do not help their
understanding of how to teach pronunciation (Martin, 2018; Murphy, 2017). Most Dutch secondary
school teachers of English give little priority to pronunciation training (Smakman, 2014; Van den
Doel, 2006), and it is often absent from the curriculum (Hermans, 2018). Dutch teaching materials
hardly ever include sections on pronunciation (Van Hattum & Rupp, 2014).

The disregard for pronunciation training is unfortunate, because good pronunciation is important for
spoken communication; someone who makes grammatical mistakes but has good pronunciation is
easier to understand than the inverse (Jarosz, 2019; Jenkins, 2005). Lecturers with a moderate Dutch
accent are evaluated less positively than those with a slight Dutch accent or a native accent (Van
Run, 2018). For British listeners, Dutch-accented English negatively impacts intelligibility compared
to British accented English, more so for those listeners who were not familiar with Dutch accented
English (Nejjari, 2020). Explicit pronunciation instruction can help students to be more intelligible
when speaking (Thomson & Derwing, 2015; Zarate-Sandez, 2020; R. Zhang & Yuan, 2020).

Students who have not been helped by their teachers to improve their pronunciation might try to
improve it on their own, at home. Even before the Covid-19 pandemic, online learning saw a
significant increase (Martin, 2018) and it is likely that after all learning moved online in 2020, home
and online learning will remain important in the future (Lockee, 2021). Home-based pronunciation
training might also be preferable to classroom teaching as teachers in a classroom usually have to
take a one-size-fits-all- approach and cannot cater to individual issues. At home, students can focus
on those features that they personally have problems with (Martin, 2018). Another reason to favour
pronunciation training outside the classroom is that foreign language teachers are often so used to
hearing foreign accented speech that they are no longer good judges as to what is intelligible and
comprehensible English and what is not, and therefore are not able to set the most effective
pronunciation goals (Munro & Derwing, 2006). Teachers also claim other reasons for not teaching
pronunciation, namely a lack of materials and a lack of confidence in their own ability to pronounce
English correctly (De Goei, 2017). Another advantage of teaching pronunciation at home is that

                                                                                                      2
teachers are unwilling to single out and potentially embarrass students in class by correcting their
pronunciation (Martin, 2018).

Research into home-based learning has focused mostly on other subjects than L2 acquisition, and
within the subject of L2 acquisition, speech acquisition and pronunciation has hardly been studied at
all (Martin, 2018). Of these studies, many focus on technology (e.g. Deutschmann, Panichi, & Molka-
Danielsen, 2009; Martin, 2018). But one of the very few recent meta-analyses into pronunciation
instruction research found a smaller effect size for interventions that used technology than those
that did not (Lee, Jang, & Plonsky, 2015). Written pronunciation training is easily available and
affordable; there is an abundance of pronunciation instruction and accent work available on the
market consisting of a book or set of books accompanied by an audio CD or online audio files. Some
of these books are primarily meant for use with a teacher, but have been written in such a way that
they may also be used without one (e.g. A. Baker, 2005; Collins & Mees, 2003, 2013; Cook, 2000;
Smakman, 2014) , but many commercial books can also be found that are meant for use without the
aid of a teacher (e.g. Farlex International, 2017; Hoge, 2014; Mojsin, 2016; Rupp, 2013; Sampaio,
2021). However, I have not been able to find any studies that look at the effectiveness of written
pronunciation training.

1.2 Aim of current paper
The current study aims to take a quantitative look at a popular but under-researched method of
teaching pronunciation, namely teaching pronunciation via written text. Can pronunciation be
taught on paper, and are audio files beneficial?

2 Theoretical framework
Pronunciation is very difficult to teach and to master (Murphy, 2017; Setter & Jenkins, 2005;
Smakman, 2014), because conceptual patterns of first language pronunciation are internalised
during childhood and new, different patterns become more difficult to learn as we age because the
cognitive functions needed to acquire these patterns disappear after childhood (Gilakjani, Ahmadi, &
Ahmadi, 2011; Olea & Antonio, 2019). For example, in Dutch, the phonological property of devoicing
obstruents in word-final position is categorical (Simon, 2010; Warner, Jongman, Sereno, & Kemps,
2004); voiced obstruents are never voiced in final position, meaning a word like hond is pronounced
/hont/ when the “d” is in final position, but the plural honden is pronounced /honden/, because in
this case the “d” is not in final position. This influences the way the Dutch speak their second
language, English (Simon, 2010), meaning a word like dog will sound like dock. Speakers of other
languages have trouble with this feature, too, for example Russian, German, Polish, Czech, and
Catalan (Fullana & Mora, 2009; Jansen, 2004; Van den Doel, 2006; Warner et al., 2004). The feature
is known as final position fortis/lenis neutralisation, final devoicing, or Auslautverhärtung.

When second-language English speakers have problems with their pronunciation such as the one
mentioned above, this can lead to intelligibility issues. Intelligibility is the ability to recognize “a word
or another sentence-level element of an utterance” (Kachru & Smith, 2008: p. 61). Intelligibility is a
main prerequisite in second language acquisition; although context can also often make clear what is
being said, if the context is ambiguous or absent it is important to be able to understand an
utterance at word level (Dauer, 2005). In the past few decades many researchers and teachers have
stopped trying to get second-language speakers to attain native pronunciation, instead focusing on

                                                                                                           3
comfortable intelligibility (Jarosz, 2019). Final position fortis/lenis neutralisation may lead to issues
of intelligibility (Quené & Van Delft, 2010; Rupp, 2013). One reason for this is the high functional
load of these errors; an error with a high functional load is one where a listener will find it hard to
guess what a speaker is trying to say, for example because it is just one distinctive feature making
the difference (minimal pairs) and/or because there are many other similar options that the word
could be (Munro & Derwing, 2006). The greatest likelihood of misunderstandings occurs when the
two words are in the same lexical category (e.g. both nouns), are both relatively frequent and are
both semantically plausible in the context (Levis & Cortes, 2008). Minimal pairs where the contrast is
in word-final position such as piece vs. peas are considered to have a high functional load (Zarate-
Sandez, 2020). Indeed, German speakers found native English speakers easier to understand than
German-English speakers when it came to this particular feature, a finding which goes against the
interlanguage speech intelligibility benefit; the phenomenon that people with a certain language
background will find English speakers with that same language background easier to understand
than native speakers (Smith, Hayes-Harb, Bruss, & Harker, 2009).

When it comes to classroom pronunciation teaching, current practices in pronunciation instruction
in second language teaching consist of a hodgepodge of methodologies. Many teachers opt to not
teach pronunciation explicitly, instead focusing on having students speak English as much as possible
thereby mimicking immersion in the language; pronunciation is thought to follow automatically as
students become aware that their interlocutors cannot understand them (Levis & Sonsaat, 2017).
For children, this indeed works. Taking final position fortis/lenis neutralisation as an example again,
language learners who are immersed in an English setting starting between the ages of 3 – 13 do not
have trouble realising this feature correctly (Fullana & Mora, 2009). However, most adult learners,
although they do become better as they spend more time with native speakers, are not able to
realise it fully when placed in a naturalistic setting (Van Leeuwen, 2011).

Moving away from immersion, an intuitive and often-used way to teach pronunciation is by saying a
word or sentence and asking a student to copy it verbally (A. A. Baker, 2017), but if the student does
not have a concept for certain patterns from childhood, this will not work without explicit and
understandable instruction on how to pronounce utterances (Gilakjani et al., 2011).

To find out how a certain pronunciation feature was being dealt with in the practice of the Dutch
classroom, a short questionnaire about final position fortis/lenis neutralisation (final devoicing) was
circulated among (former) English teachers and (former) English pupils in the Netherlands. The
results show that 18% of Dutch English teachers do not address final devoicing at all, either because
they are unaware of the issue or for other reasons. 54% of teachers pay attention to the correct
pronunciation of their students, including final devoicing, but do not tell their students about it
explicitly. Instead, they make sure their own pronunciation is correct in this regard, and they correct
or recast their students’ pronunciation if an opportune moment to do so presents itself. Of the 28%
of teachers who talk about final devoicing explicitly, 81% has students listen and copy, 53% talks
about the distinction between voiced and unvoiced obstruents, 51% gives the advice to make the
vowel longer, 18% gives the tip to add a schwa to the end of the word. (Approval for this teacher
questionnaire was sought and given by the Ethics Review Board of the Faculty of Social &
Behavioural Sciences at Utrecht University. Protocol number 21-0945. For the questionnaire, please
see appendix 1.)

                                                                                                        4
I have not been able to find any previous research that looks at the efficacy of teaching
pronunciation on paper. Studies looking at efficacy of pronunciation training mostly focus on
classroom settings (see Thomson & Derwing, 2015 for an overview), and the studies that focus on
home study invariably look at online options which include video, social media or special software
(e.g. Kartal & Korucu-Kis, 2020; Martin, 2020; Nielson, 2011). Written pronunciation instructions
have often been based on previous written pronunciation instructions, for example, Smakman
(2014) is based on Collins & Mees (2003) and Rupp (2013) is based on Jenkins (2005). But none of
these works refer to any formal testing done into these instructions.

Perhaps studies done into written pronunciation training were done in the past and these studies
are no longer available, or perhaps these studies were never done because it makes such intuitive
sense that teaching pronunciation in writing is inferior to teaching pronunciation with audio
samples, which in turn is inferior to interactive teaching with software or a teacher, because it is
difficult to describe what an utterance should sound like without sound, and difficult to describe
how a mouth should move without moving images.

3 Research questions and hypotheses
The research questions that guide this study are:

Question 1: Can a specific written instruction help Dutch people make the distinction between final-
position fortis and lenis obstruents in English?

Question 2: Is it sufficient for this instruction to be in writing only, or are audio examples of added
value?

In order to find an answer to these questions, a short, written pronunciation lesson will be created
to address the final position fortis / lenis issue described above and an experiment will be designed
to see if these instructions have a measurable effect on the pronunciation of this feature for a group
of Dutch-English speakers. It is predicted that written pronunciation instructions will have an effect
on pronunciation improvement compared to a control group but that the effects will not be large.
No prediction can be made concerning the presence of audio files.

Audio recordings of the Dutch-English speakers will be presented to English-speaking judges. It is of
interest to see if these judges rate these recordings differently depending on their background
because this will lead to a clearer picture concerning the importance of pronunciation instruction
depending on which interlocutors a student is likely to meet in their future life. One of the goals of
pronunciation teaching should be to allow students to be understood by a wide range of
interlocutors; not only native speakers, but also non-native English speakers (Espinosa, 2017).
Interlocutors with different language backgrounds than that of the speaker have different needs
concerning the pronunciation of English when it comes to understanding their conversation partner
properly (Seidlhofer, 2009), and native English speakers have been shown to rate pronunciation
errors differently depending on their accent (Van den Doel, 2006). It will also shed more light on the
finding that teachers who are very familiar with their students’ accents can be biased judges when it
comes to how well their student can be understood by people who are not familiar with the accent
(Winke, Gass, & Myford, 2012).

                                                                                                          5
This leads to the third research question.

Question 3: Do listeners with different backgrounds (education level, native language/accent, age,
familiarity with similar accents) interpret words spoken with Dutch fortis/lenis neutralisation
differently?

3.1 Hypotheses
3.1.1    Question 1: Can a specific written instruction help Dutch people make the
         distinction between final-position fortis and lenis obstruents in English?
In order to answer this question, Dutch English speakers will be divided into a control group without
specific instructions on making the final position fortis/lenis distinction, and a treatment group who
will read written instructions on this pronunciation matter.

Due to its very form, written text as a means of instruction has many deficits compared to a teacher
explaining something orally. Writing is static and fixed whereas speech is dynamic (a speaker can
vary vocal property such as rhythm, tone and loudness) and flexible (for example, a speaker can
respond to their interlocutor by adding more information or making a correction) (Ha, 2016)
meaning teachers are better able to respond to their particular students’ needs when a lesson is
spoken rather than written. There is also the matter of cognitive load; if a written text is lexically or
syntactically dense or presents a lot of new information it can be difficult for students to distil the
information they need, and this varies per student (Jacob, Lachner, & Scheiter, 2020). Studies
looking at second language acquisition outside of the classroom find that students find it difficult to
keep up the work without an external reason to do so such as a teacher (Nielson, 2011) and that it is
important that students know how to self-regulate their studies (Tullis & Benjamin, 2011).

This would suggest that a teacher will be much better able to improve pronunciation than a written
text. But compared to a control group, a written text is still expected to lead to at least some
improvement in pronunciation for the intuitive reason that some instruction is expected to be better
than no instruction at all.

3.1.2    Question 2: Is it sufficient for this instruction to be in writing only, or are audio
         examples of added value?
With only one self-published exception (Sampaio, 2021), every pronunciation or accent training book
that I reviewed has been designed to include audio files online or on CD (A. Baker, 2005; Collins &
Mees, 2003, 2013; Cook, 2000; Farlex International, 2017; Hoge, 2014; Mojsin, 2016; Raifsnider,
2011; Rupp, 2013; Smakman, 2014), though these have often become unavailable over time. As
noted previously, in the communicative learner model that has been popular in the past decades for
second language learning, the need for interaction is set in very high regard and such audio files are
dismissed as tools by many researchers as they do not provide that interaction (e.g. Deutschmann et
al., 2009). However, no studies have been found that give evidence that audio files are helpful or not
in this context, though some authors note that they are without providing evidence (e.g. Raifsnider,
2011). Research has shown that listeners with another native language have trouble hearing certain
differences in utterances pronounced by native speakers when these differences are not present in
their own language (Gilakjani et al., 2011). Intuitively, one would think that audio files with written
instructions would be better than no audio files with written instructions, but no evidence could be
found to support this idea.

                                                                                                        6
Due to the mixed evidence, no hypothesis can be drawn up for this research question.

3.1.3    Question 3: Do listeners with different backgrounds (education level, native
         language/accent, age, familiarity with similar accents) interpret words spoken
         with Dutch final position fortis/lenis neutralisation differently?
There is no reason to suspect age or education level will affect the way listeners interpret English
words spoken with Dutch fortis/lenis neutralisation. However, familiarity with accents that have this
feature, including having such an accent oneself, might lead to people being more prone to think a
Dutch person is saying the voiced word even when the unvoiced word is heard; in practice, this
means when a Dutch person intends to say eyes, an experienced listener may indeed understand
eyes but a non-experienced listener might hear ice. The interlanguage speech intelligibility benefit
states that people with the same language backgrounds find it easier to understand others with the
same background when speaking a second language (Bent & Bradlow, 2003). Listeners who are
familiar with a certain foreign accent have been found to find it more comprehensible (Carey,
Mannell, & Dunn, 2011; Gallardo del Puerto, García Lecumberri, & Gómez Lacabex, 2015; Winke et
al., 2012; Y. Zhang & Elder, 2011). Based on these findings, I hypothesise that listeners who are
familiar with accents that have the final devoicing feature will more often rate that they hear the
voiced version of the minimal pair that they are presented with. I am cautious however, as in a
previous study among German speakers looking at final position fortis/lenis neutralisation the
opposite was shown, with German speakers finding native English speakers easier to understand
than German-English speakers when it came to this feature (Smith et al., 2009).

Van den Doel asked native English speakers with an American accent and speakers with an RP
(British) English accent to judge Dutch-English pronunciation errors on seriousness. American
speakers rated fortis/lenis neutralisation (not necessarily in final position) as a slightly more serious
error than RP speakers (2006). However, the difference was only very slight and the theoretical
reasoning, namely that speakers are more positive about accents that are similar to their own does
not hold in this case because final devoicing is not a feature of RP or of American English. Also, we
are not looking at “seriousness of an error” but at intelligibility. We therefore have no reason to
assume that native-English accent will have an effect on judge ratings.

4 Method

4.1 Participants (phase 1: Speakers)
The participants of the phase 1 questionnaire will be referred to as speakers. Dutch speakers who do
not speak another language with native or near native proficiency were recruited via Facebook
groups targeted at Dutch people, and to a lesser extent via Reddit, the website Surveyswap.io and
the researcher’s own website. The researcher’s own social circle was not used to distribute the
questionnaire. Speakers were randomly assigned to one of three groups: the control group, the
audio intervention group and the written intervention group. Approval was sought and given by the
Ethics Review Board of the Faculty of Social & Behavioural Sciences at Utrecht University. Protocol
number 21-0945.

Of the 257 speakers who started the survey, 21 speakers stopped because they were not willing to
leave an audio recording, 37 speakers stopped because they were not able to leave an audio

                                                                                                            7
recording (for example because they were not in a quiet environment) and 57 speakers were led out
of the survey because they did not grow up in the Netherlands or because they spoke a second
native language. 19 speakers left the survey before being asked to record anything for unknown
reasons. Of the 123 speakers who continued the survey, 96 made an audio recording. It is likely that
the 27 speakers who did not do so encountered technical problems. Of the 96 audio recordings, 24
were not useable. Most of these were corrupted due to the same technical issue as described above.
Two people did not record all five words and two people made the first recording but not the
second. One person recorded silence. Six recordings were very soft, these were amplified and kept in
the sample. About ten recordings included static but were deemed by the researcher to be audible
enough to be kept in the sample. In total, the researcher was left with 73 useable audio samples.

Of the 73 speakers whose audio recordings were included in the second study, 41% classed
themselves as speaking English at an advanced (C1 or C2) level, 47% classed themselves at upper
intermediate (B2), 11% at lower intermediate (B1) and 1% at beginner (A1 or A2). These self-
categorisations need to be viewed with some caution, as Dutch people tend to overestimate how
good their English is when they self-report (Van Onna & Jansen, 2006).

Four people (5%) opted not to fill in the optional demographic questions of age, gender and
education. The average age of the remaining 69 speakers was 37.38 years (SD=1.67). 75% of these
speakers were female, 25% were male. No speakers listed their gender as “other”. Most speakers
were highly educated, with 57% of the sample educated at university level.

Speakers were allocated to one of the three groups automatically and randomly by the phonic.ai
survey software. The spread of speaker characteristics can be seen in table 1.

                                                                                                  8
Table 1

Speaker characteristics per group

                   Control                 Audio               Written            Whole sample
Number of speakers 24                      22                  27                 73

Age*                   35.59 (SD 2.38)     39.75 (SD 3.63)     37.07 (SD 2.56)    37.38 (SD 1.67)

Gender*                86.4% female        66.7% female        73.1% female       75.4% female
                       13.6% male          33.3% male          26.9% male         24.6% male

Education*             0.0% Elementary     0.0% Elementary     0.0% Elementary    0.0% Elementary
                       0.0% VMBO           0.0% VMBO           0.0% VMBO          0.0% VMBO
                       0.0% HAVO           0.0% HAVO           7.7% HAVO          2.9% HAVO
                       0.0% VWO            9.5% VWO            0.0% VWO           2.9% VWO
                       0.0% MBO            4.8% MBO            11.5% MBO          5.8% MBO
                       50.0% HBO           28.6% HBO           19.2% HBO          31.9% HBO
                       50.0% WO            57.1% WO            61.5% WO           56.5% WO

English level          0.0% A1/A2          4.5% A1/A2          0.0% A1/A2         1.4% A1/A2
                       20.8% B1            4.5% B1             7.4% B1            11.0% B1
                       54.2% B2            45.5% B2            40.7% B2           46.6% B2
                       25.0% C1/C2         45.5% C1/C2         51.9% C1/C2        41.1% C1/C2

*Age, gender and education were not compulsory and were not filled in by two people in the control
group, one person in the audio group and one person in the audio group.

VMBO=pre-vocational secondary education, HAVO=senior general secondary education, VWO=pre-
university education, MBO=secondary vocational education, HBO=university of applied science,
WO=university

The minimum number of participants required per group was determined by an a priori power
analysis in Gpower (Erdfelder, Faul, Buchner, & Lang, 2009). Within the realm of psychology, a
medium effect size is f = .25 (Sawilowsky, 2009) and, considering an estimate power of .80, we
estimate a minimum sample size of 14 speakers per group to detect main effects at an alpha-level of
0.05. This target has been achieved.

4.2 Design
3 x 2 repeated measures design with group as between-subject factor (control group, intervention A,
intervention B) and time of recording (pre-intervention, post-intervention) as within-subject factor.

4.3 Experimental procedure: Speaker questionnaire
Recordings were collected online, via the survey platform Phonic (www.phonic.ai). This platform
allows participants to easily record themselves. Speakers were asked if they were willing and able to
make audio recordings of their speech. They were informed about the nature and procedure of the
study and were asked to give informed consent. They were then asked if they had grown up in the
Netherlands or spoke another native language than Dutch. Respondents who were not eligible for

                                                                                                     9
the study were screened out at this point and thanked for their interest. The questions that followed
were about age, gender, and participants were asked to class their level of spoken English. This
question was based on the European Framework of Reference (CITE). The levels were: advanced,
upper intermediate, lower intermediate and beginner. As there were not enough speakers to ensure
statistical power if advanced speakers were omitted, these were left in the sample.

Speakers were then presented with an audio example consisting of the words “shape, hair, moon,
coffee, statue” spoken with a southern UK English (RP) accent at a slow pace. These words had been
selected to be similar in length, meaning and grammatical category as the experimental words, but
to not contain the feature of interest so the speakers did not hear an example of that feature just
before recording. They were then asked to make their own audio recording of the words "book,
eyes, mango, bed, label". The second and fourth words in this list include the final fortis/lenis
feature which is difficult for Dutch people to pronounce in a native speaker fashion. The third and
fifth words were chosen because they start with a sonorant which was viewed as the least likely to
have an effect on the last sound of the word before (Jansen, 2004). The first word was added so that
the stimulus words would not be adversely affected by any technical or performative issues
connected to starting a new sentence. All five words are nouns, all five words are expected to be
known by speakers of every level of English and none of the words have confusing spelling.

A pilot study included a question to guess why these particular words had been chosen, so that
participants who were aware of the fortis/lenis issue could be removed from the pool. However, as
none of the pilot participants were able to guess the purpose of the words, and a number of these
were linguists, the question was deemed unnecessary and left out of the final questionnaire.

After making their pre-test recording, all participants were given two “neutral” tips on speaking
English. Tip 1 consisted of the message that the Dutch English accent is generally perceived more
negatively by Dutch people than by other English speakers (Korsten, 2020; Koster & Koet, 1993;
Nejjari, 2020; Nejjari, Gerritsen, Van der Haagen, & Korzilius, 2012) and Tip 2 consisted of a
recommendation to articulate well. Speakers who had been sorted into the control group, condition
1, were then asked to make a second recording of the same words. Tip 1 and tip 2 can be read in full
in appendix 4.

The written group (condition 3) was presented with a short written pronunciation lesson that
focused on making the vowel sound of words where final devoicing occurs longer, and the final
obstruent softer. This was based on the literature (Gonet, 2012; Van Leeuwen, 2011) and on a
personal communication with Koen Sebregts, linguist at Utrecht University (2021). Special care was
taken to make the text low in complexity in order to have a low cognitive load (Jacob et al., 2020),
and understanding was tested informally on a number of pilot testers. The lesson and a translation
can be found in appendix 5.

Speakers in the written group only got this tip in writing, speakers in condition 2 (the audio group)
were also provided with audio files with an English native speaker (RP accent) saying the word pairs.
The audio files can be accessed on https://hoezegjeinhetengels.nl/uitspraaktip-final-devoicing/.

After making their second recording, speakers were led to a page explaining the goal of the study
and giving them the opportunity to give feedback if they wished. After exiting the study they were
sent to a “thank-you page” external to the survey, and were then given the chance to leave their e-

                                                                                                   10
mail address in a separate survey if they wished to be informed of the results or if they had further
questions. This was done to safeguard speaker anonymity.

Speakers’ recordings were coded as follows: Condition 1, 2 or 3, Time 1 (pre-test) or Time 2 (post-
test), Speaker number. Therefore, the third speaker’s pre-test recording who was randomly sorted
into condition 2 would be given code C2T1S3.

A flow chart describing the speaker survey is presented in figure 1.

Figure 1. Flow chart depicting steps in speaker survey

                                                                                                      11
4.4 Rating procedure: Judge questionnaire
4.4.1 Participants (phase 2: Judges)
498 people started the judge questionnaire and 411 filled in all necessary questions. These
participants will be referred to as judges. The questionnaire was preceded by an informed consent
page. Approval for the judge questionnaire was sought and given by the Ethics Review Board of the
Faculty of Social & Behavioural Sciences at Utrecht University. Protocol number 21-0945. The
average age was 30.2 (SD = 11.95) (14 people opted not to fill in their age), most respondents had a
bachelor’s degree as their highest attained education level. 74% of judges were native English
speakers, the most common accent was American/Canadian English (50%). Of the non-native English
speakers the most common mother tongue was German, with 5.4% of all judges speaking this
language. The educational background of the judges is listed in table 1. The distribution of language
backgrounds and accents can be seen in tables 2 through 5.

The minimum number of respondents required to run the multiple linear regression test to answer
research question 3 was not defined in advance. However, a post-hoc sensitivity analysis confirmed
the sample size was large enough to achieve a power > 0.99.

Table 2

Highest attained level of education of participants who were participating as judge.

Elementary school               3.4%
High school                    12.4%
Some college                   18.0%
Bachelor's                     33.2%
Masters                        27.1%
PHD                             4.9%
Didn't say                      1.0%

Table 3

Accent distribution for native English speaking judges

American/Canadian English                                     68.2%
UK English                                                    21.3%
Australian, New Zealand, South African or Irish English        6.9%
Other English accent                                           3.6%

Table 4

Bilingualism in judges’ pool

Monolingual English            65.1%
Monolingual not English        22.7%
Bilingual with English          9.3%
Bilingual without English       2.9%

                                                                                                  12
Table 5

Number of speakers per language in judges’ pool

English                   305
German                     29
Spanish                    17
Russian                    13
Dutch                      12
French                     11
Mandarin                    7
Italian                     6
Portuguese                  6
Cantonese                   5
Swedish                     5
Arabic                      4
Hindi                       3
Polish                      3
Romanian                    3
Other                      30

4.4.2 Procedure phase 2: judge questionnaire
All useable recordings from the speaker questionnaire were embedded in the judge questionnaire.
This questionnaire was built in Qualtrics and distributed via Social Media (Facebook, Twitter, Discord
and Reddit). The researcher’s social circle was not used to distribute the questionnaire.

The Judge questionnaire presented six random recordings out of the total pool of 146 to each judge
who was then asked to rate the recordings on a Likert scale as follows:

Which words do you hear in the spaces? You can listen as often as you like.

Book, …., mango, ….., label

First word:

O It’s definitely “ice”

O I’m pretty sure it’s “ice”

O I think it’s “ice”

O I think it’s “eyes”

O I’m pretty sure it’s “eyes”

O It’s definitely “eyes”

“It’s definitely ‘ice’” corresponds to a rating of 1, “it’s definitely ‘eyes’” corresponds to a rating of 6.
This is a measure of intelligibility. In pronunciation studies, some scholars use measures of subjective
opinion (e.g. “how serious is this error, in your opinion?”), some scholars use measures of

                                                                                                         13
comprehensibility (e.g. “how easy was it to understand what the speaker was saying, in your
opinion?”) and some scholars, as in our case, choose the more objective measure of intelligibility
(e.g. “what word did you hear?”) (Munro, Derwing, & Morton, 2006).

The judges were then asked to note if the recordings had been clear enough to make a judgement; 6
people (1.5%) answered “no”, 16 (3.9%) answered “mostly no”. The ratings from the judges who
answered “no” were removed from the sample. The researcher checked if these judges had all heard
a similar batch of recordings but this was not the case, so it is assumed that the problem was on the
judges’ end. They were then asked for their native language. If their native language was English,
they were also asked to fill in their accent and note how familiar they were with a list of final-
devoicing languages (German, Dutch, Afrikaans, Polish, Russian, Czech, Slovak, Bulgarian, Armenian,
Lithuanian, Catalan and/or Turkish). They were then asked for their age and highest achieved level of
education. The last page of the survey included an explanation of the study and the possibility to
provide feedback, ask a question or leave an e-mail address to be kept informed of the results. 28
people left their e-mail address.

The researcher was not able to change the url of the survey (survey.uu.nl), so after the sound quality
question the respondents were asked which country they thought the speakers were from, to check
for bias. 37% of respondents answered “no idea”, 34% answered UK, 6% answered The Netherlands,
5% answered USA and 5% thought it was a German-speaking country. A range of 25 countries was
mentioned by the remaining 13% of respondents.

5 Results

5.1 Results of speaker performance
To answer the questions “Can a short instruction help Dutch people make the distinction between
final position fortis and lenis obstruents in English?” and “Is it sufficient for this instruction to be in
writing only, or must it include audio examples?”, a repeated measures ANOVA was carried out with
performance (the mean scores of the judges per speaker) as dependent variable and the time of
recording (pre-post; within subjects) and group (between subjects) as independent variables.

Normality assumptions were assessed by plotting frequency distributions and computing the
Shapiro-Wilk test for all the groups. Results indicated that normality was achieved for all (ps>.05)
except for one group, namely the control group (p=.039). The sphericity assumption was not
significantly violated (p>.05).

Results revealed a significant main effect of time on performance for eyes F(1,71)=9.65, p=.003,
η2=.120. There was, however, no significant main effect of group (F(1,71)=0.07, p>.05, η2=.002) or
interaction between time and group, F(2,71)=0.71, p=.491, η2=.020. See Figure 2.

                                                                                                         14
6

 5

 4

 3

 2

 1

 0
                    "eyes" time 1                                  "eyes" time 2

                                      Control   Audio   Written

Figure 2. Bar chart illustrating the mean score at time 1 and time 2 for the word eyes in the control
group (white), the group receiving audio instructions (light grey) and the group receiving written
instructions (dark grey). Error bars illustrate 95% Confidence Intervals. A score of one represents
“definitely ice” and a score of six represents “definitely eyes”.

Results revealed a significant main effect of time on performance for bed F(1,71)=8.03, p=.000,
η2=.159. There was, however, no significant main effect of group (F(2,71)=0.74, p=.48, η2=.020) or
interaction between time and group, F(2,71)=0.82, p=.259, η2=.037. See Figure 3.

 6

 5

 4

 3

 2

 1

 0
                   "bed" time 1                               "bed" time 2

                                  Control   Audio   Written

Figure 3. Bar chart illustrating the mean score at time 1 and time 2 for the word “bed” in the control
group (white), the group receiving audio instructions (light grey) and the group receiving written
instructions (dark grey). Error bars illustrate 95% Confidence Intervals. A score of one represents
“definitely ice” and a score of six represents “definitely eyes”.

                                                                                                     15
To explore whether an effect would be apparent, only the speakers with a rating below 4 at T1 were
selected from the sample. The resulting sample was, however, not big enough to ensure power in
statistical estimates (C1 eyes N=13, C1 bed N=9, C2 eyes N=9, C2 bed N=12, C3 eyes N=13, C3 bed
N=10). Thus, the below trends should be considered with caution.

Results revealed a significant main effect of time on performance for eyes F(1,25)=3.80, p=.025,
η2=.185. There was, however, no significant main effect of group (F(1,25)=0.27, p>.124, η2=.154) or
interaction between time and group, F(2,25)=0.98, p=.391, η2=.072. See Figure 4.

 6

 5

 4

 3

 2

 1

 0
                   "eyes
6

 5

 4

 3

 2

 1

 0
                    "bed
For model 2 (bet/bed), no significant results were observed, F(4,302)=1.833, p=.122 Specifically,
investigation of the standardized beta coefficients indicated that none of the predictors carried
significant explanatory power in relation to the outcome variable (all ps>.05). See table 7.

Table 7

Standardized beta coefficients for “bed”

                   B         SD           β          t         p
Accent           0.005      0.052       0.006      0.105     .916
Familiarity     -0.059      0.045      -0.077     -1.324     .186
Age              0.002      0.004       0.027      0.41      .682
Education        0.077      0.047       0.108      1.646     .101

6 Discussion

6.1 Summary of results
A significant improvement was seen between the pre-recording of the two final-devoicing words and
the post-recording, but this result was also apparent for the control group. No significant difference
in level of improvement was seen between groups, suggesting that a written instruction can lead to
improvements in intelligible pronunciation, but that it is enough that this instruction be “please
articulate better” and “don’t be embarrassed about your accent” and that an instruction that was
carefully designed to improve a specific feature of pronunciation makes no difference. There was no
significant difference in improvement between the written group and the audio group, suggesting
audio pronunciation examples give no added value.

No significant results were observed when judges’ age, educational background, accent and
familiarity with final-devoicing accents were looked at as predictors for their ratings, suggesting that
listeners with different backgrounds do not interpret words spoken with Dutch final position
fortis/lenis neutralisation differently when these words are presented without semantic context.

6.2 Written instruction without audio
The first research question posited was “Can a specific written instruction help Dutch people make
the distinction between final-position fortis and lenis obstruents in English?” Based on the results of
the experiment, the answer seems to be “no”.

Martin (2018) notes that pronunciation training done at home needs to be guided by a teacher,
otherwise “students zoom in on the wrong features and, despite training, will not improve their
pronunciation” (P.33). The current study seems to corroborate this. Most speakers did unexpectedly
well in pronouncing eyes and bed intelligibly in the pre-intervention recording, meaning the
experiment was providing pronunciation instructions to people who did not need them. When these
speakers were taken out of the sample, a trend was observed that the experimental groups did
better than the control group for the pronunciation of bed. However, the results were not significant
which might be due to the small sample size. For eyes the removal of previously successful speakers
made little difference to the results, as the control group still did comparatively well. This difference
between eyes and bed may be because eyes ends on a fricative, meaning a speaker can lengthen this

                                                                                                      18
final consonant, whereas the word bed ends in a stop, which cannot be lengthened. Listening to the
audio recordings the researcher heard many examples of people who, intending to lengthen the
vowel, also lengthened the /s/ sound. In native-speaker English, word-final /s/ is longer than word-
final /z/, which means eyes will sound more like ice if the last fricative is lenthened (Fullana & Mora,
2009). Five speakers in the audio group and one speaker in the written group who were rated on
average as saying eyes in the pre-intervention recording were rated as saying ice in the post-
intervention recording. This effect was not seen for a single speaker for bed. Taking this into account
and the previously mentioned trend, it is possible that if this experiment were to be repeated among
a larger sample of people with low-level English, a small but significant effect might be seen for final-
devoicing words that end in a stop.

6.3 Written instruction with audio
The second research question was “Is it sufficient for this instruction to be in writing only, or are
audio examples of added value?” There was no difference between the written group and the audio
group, and there was no trend apparent in the data that would suggest a difference might be found
if the sampling limitations of this study had been absent.

In the design of the study, the difference between the written experimental group and the audio
experimental group was kept small; the text was almost exactly the same, except for a reference to
the audio file in the audio group. The audio file was of a native speaker reading the minimal pairs
discussed in the pronunciation lesson. This was done to be comparable to commercial pronunciation
training books. It is possible that not all speakers listened to the audio files. There was no check in
the experiment to see if they did, but books of pronunciation or accent also leave it up to the
consumer if they will listen to the audio files or not.

6.4 Listener background
The third research question was “Do listeners with different backgrounds (education level, native
language/accent, age, familiarity with similar accents) interpret words spoken with Dutch fortis/lenis
neutralisation differently?” No significant effects were found. Even though each judge only heard six
recordings, the large sample size of 411 judges means the negative finding here is robust.

A reason for this result might be that the judges heard a list of five words without any context.
Contextual clues can help to understand words that are difficult to understand on their own (Bent &
Bradlow, 2003). Previous studies that found experienced listeners found non-native accents to be
more comprehensible than naïve listeners carried out their study with full sentences, not with single
words (e.g. Gallardo del Puerto et al., 2015). It is possible that people who have more experience
with final-devoicing accents would be better able to draw from contextual clues what a foreign
accented person is trying to say, but without these contextual clues, there is no difference between
listeners.

6.5 Limitations
6.5.1 Attrition in speaker survey
There was a high rate of attrition in the speaker survey. Partly, this was to be expected because
people clicked on the link, discovered that they would be asked to make an audio recording, and
unwilling to do so, left again. However, I also encountered many technical challenges. The Phonic.ai
platform is not able to function properly when accessed via Facebook on a smartphone. In the pilot

                                                                                                      19
test, almost all recordings failed for this reason. In the actual survey I presented respondents with a
workaround before the start of the survey, (“If you have accessed this survey via Facebook and are
on your smartphone, please click the three dots at the top right of your screen and choose ‘Open in
Chrome’ (or ‘Open in…’ and then your usual web browser). Then you can start the survey. If you do
not do this, the survey will not work properly. Thank you!”) It is likely that some people did not see
or understand this. There were 28 speakers who filled in the questionnaire up until the point of
making the first audio recording, suggesting they wanted to do so but ran into technical difficulties.
Of these people, 36% had an education level of MBO or lower (compare to the final speaker sample
of 12%), 18% classed themselves as a beginner level English speaker (final sample: 1%) and 29% as
lower intermediate (final sample: 11%). It is clear, therefore, that due to the technical difficulties
inherent in the Phonic.ai system, my speaker sample was less diverse than it could have been. As
discussed above, a larger speaker sample with lower ratings in the first recording might have led to
significant results for improvement in pronunciation of bed.

It should be noted at this point that only two people (one from the control group and one from the
written group) made the first recording but not the second, meaning the attrition during the
experimental stage itself was low. Fears that the pronunciation instructions were too long and that
people would stop at that point were, it seems, unfounded.

6.5.2 Choice of stimulus words
 Bed might not have been the best word to choose as it includes the dress/trap vowel that many
Dutch people struggle with. A number of judges remarked that they didn’t hear bed or bet but
instead heard bad or bat. This confusion is likely to have been more present for American judges as
the American pronunciation of bad comes very close to the Dutch-English pronunciation of bed,
especially when the vowel is lengthened. A minimal pair of nouns ending in a stop with a less
problematic vowel might have been log and lock. It is possible that this minimal pair would have led
to judges making more divergent ratings which in turn might have led to different results.

Eyes might not have been the best word to choose because many North American accents have a
different vowel for eyes than for ice; commonly known as Canadian raising this affects words with
/aɪ/ which become /ʌɪ/ when positioned in front of a voiceless obstruent. This is becoming more
common in North America (Moreton, 2016). This means that for people with this pronunciation
feature, eyes is less likely to sound like ice. I have a British accent and I was not aware of this until a
small number of judges mentioned it in the feedback field at the end of the survey. However, if this
had had a large affect, this would have been apparent in the statistical analysis for the third research
question which looks among other things at any difference in judgement between judges with an
American accent and others. No effect was found. A different set of minimal pairs such as phase and
face might have been a better choice, but it is not expected that these words would have led to
different results.

6.5.3 Quality of intervention
The written lesson on how to better pronounce words ending on a voiced consonant, which is
presented in appendix 5, was specially developed for this study and was not tested beforehand in
another modality, for example by a teacher in an English lesson. Though it was developed based on
sound sources and studies (Collins & Mees, 2013; Gonet, 2012; Van Leeuwen, 2011), and with the
aid of a professor of phonology specialised in Dutch-English pronunciation, it is possible that there

                                                                                                        20
was no significant result not because of the written modality, but because the lesson in itself was
not helpful.

6.5.4 Ecological validity
Elicitation was not spontaneous, but speakers were reading a list of words, a method in
pronunciation research and pronunciation teaching that is often used but much criticised (e.g. in
Thomson & Derwing, 2015). Anecdotal evidence suggests that when speakers speak spontaneously,
the final-devoicing issue is more apparent. This means a more naturalistic study, though difficult to
carry out, may have led to different results. However, the current research had more speakers than
a naturalistic study is likely to have been able to collect. This gives the statistical results more weight.
Also, as a first study into a matter that has not yet seen any research done (as far as I have been able
to find) it makes sense to use a controlled experimental setup rather than a naturalistic study.

6.6 Future directions
One of the speakers in this study reached out to say that she had been living in Canada for many
years and that the final devoicing issue was one she had trouble with. Her Canadian conversation
partners would often misunderstand her, and she felt the tip was very useful to her. For these kinds
of people, with specific, known needs, this kind of written pronunciation instruction might be
helpful. A future study might look at the best way to match students with the pronunciation
instruction that they need, and measure if in that situation a specific written instruction is sufficient
for improvement. It might be beneficial to contrast written instruction with video instruction and live
instruction by a teacher, to see how much of a benefit each modality presents.

A number of studies have looked at technological solutions for pronunciation training without a
teacher (e.g. Kartal, 2020; Martin, 2018). A meta-analysis found that the effects were small
compared to interventions with teachers (Lee et al., 2015), but the abundance of pronunciation
books on the market suggest there is a need for pronunciation training that can be followed without
a teacher. Perhaps future studies can uncover a technological tool that is able to give the teacher-
like interaction required for effective pronunciation training.

7 Conclusion
Explicit pronunciation training is important for intelligibility (Thomson & Derwing, 2015). English
teachers in the Netherlands and other countries have many reasons not to address pronunciation in
their classrooms. They may feel it is unnecessary because within the communicative method,
pronunciation is seen as something that does not need explicit teaching (Levis & Sonsaat, 2017),
they might feel they do not have the time (Thomson & Derwing, 2015), they might not have been
taught how to address pronunciation during their training (Burri & Baker, 2020), they might not want
to embarrass their individual students or they might feel that a one-size-fits-all approach does not
work for an entire class of students, each with their own particular pronunciation needs (Martin,
2018). However, teachers and schools need to find a way to address pronunciation because the
current study suggests that leaving students to pick up a book on pronunciation is not helpful. This
might be because students cannot properly judge which issues they need to work on and how best
to apply the interventions they read about. Other solutions must be found. Because home-based
interventions have many benefits, future studies should look at ways to provide home-based
pronunciation interventions that include forms of guidance. Technology might be beneficial.

                                                                                                         21
8 Literature

Baker, A. (2005). Ship or Sheep; An intermediate pronunciation course (3rd ed.). Cambridge
     University Press.

Baker, A. A. (2014). Exploring teachers’ knowledge of second language pronunciation techniques:
     Teacher cognitions, observed classroom practices, and student perceptions. TESOL Quarterly,
     48(1), 136–163. https://doi.org/10.1002/tesq.99

Baker, A. A. (2017). Pronunciation teaching in the preCLT era. In Okim Kang, R. Thomson, & J. M.
     Murphy (Eds.), The Routledge Handbook of Contemporary English Pronunciation (1st ed., pp.
     249–266). Routledge. https://doi.org/10.4324/9781315145006-16

Bent, T., & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. The Journal of the
     Acoustical Society of America, 114, 1600–1610. https://doi.org/10.1121/1.1603234

Burri, M., & Baker, A. A. (2020). “A big influence on my teaching career and my life”: A longitudinal
      study of learning to teach English pronunciation. TESL-EJ: The Electronic Journal for English as a
      Second Language, 23(4), 1–24. Retrieved from https://ro.uow.edu.au/sspapers/4677

Carey, M. D., Mannell, R. H., & Dunn, P. K. (2011). Does a rater’s familiarity with a candidate’s
     pronunciation affect the rating in oral proficiency interviews? Language Testing, 28(2), 201–
     219. https://doi.org/10.1177/0265532210393704

Collins, B., & Mees, I. M. (2003). The phonetics of English and Dutch (5th ed.). Brill.

Collins, B., & Mees, I. M. (2013). Practical phonetics and phonology: A resource book for students
      (3rd ed.). Routledge. https://doi.org/10.4324/9780203080023

Cook, A. (2000). American accent training; A guide to speaking and pronouncing American English for
     everyone who speaks English as a second language (2nd ed.). Barron.

Darcy, I. (2018). Powerful and effective pronunciation instruction: How can we achieve it? CATESOL
     Journal, 30(1), 13–45. Retrieved from https://files.eric.ed.gov/fulltext/EJ1174218.pdf

Dauer, R. M. (2005). The Lingua Franca Core: A new model for pronunciation instruction? TESOL
    Quarterly, 39(3), 543–550. https://doi.org/10.2307/3588494

De Goei, S. (2017). Do you speak English? De implicaties van Engels als Lingua Franca voor het
    uitspraakonderwijs. [Master’s thesis, Fontys Hogeschool Tilburg].
    https://doi.org/10.13140/RG.2.2.15560.83206

Deutschmann, M., Panichi, L., & Molka-Danielsen, J. (2009). Designing oral participation in second
     life - A comparative study of two language proficiency courses. ReCALL, 21(2), 206–226.
     https://doi.org/10.1017/S0958344009000196

Dorren, G. (2018). Babel: Around the World in 20 Languages. Grove Press.

Erdfelder, E., Faul, F., Buchner, A., & Lang, A. G. (2009). Statistical power analyses using G*Power
     3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41(4), 1149–
     1160. https://doi.org/10.3758/BRM.41.4.1149

Espinosa, J. A. C. (2017). “A relaxing cup of Lingua Franca Core”: Local attitudes towards locally-
     accented english. Atlantis, 39(1), 11–32.

                                                                                                       22
You can also read