Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners' recognition of unacceptable sentences
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Learning unacceptability: Repeated exposure to acceptable sentences improves adult learners’ recognition of unacceptable sentences Abstract People who learn a new language as adults tend to judge unconventional utterances more leniently than native speakers do, while both groups’ ratings on acceptable utterances tend to align more closely. Experiment 1 confirms this asymmetry with 61 English-speaking undergraduate students enrolled in Spanish classes. The finding that unconventional utterances are particularly hard for learners to fully appreciate raises the possibility that conventional utterances may not statistically preempt unconventional paraphrases for adult learners. To investigate this, we report a preregistered study that provides the undergraduates learning Spanish with three days of exposure to conventional Spanish sentences involving one of two sets of constructions. They performed self-paced reading initially and after exposure. While native Spanish speakers displayed the expected slow-down when reading the unconventional sentences (Exp 2), but the learners did not, regardless of exposure or proficiency. At the same time, judgment data reveal that even beginning learners at the initial assessment explicitly rate unconventional sentences somewhat lower than conventional sentences, and the recognition of unconventionality increases with proficiency. Moreover, the judgment data reveal an effect of statistical preemption, particularly on intermediate learners, as predicted: repeatedly witnessing conventional sentences significantly impacted subsequent ratings of unconventional paraphrases. Collectively, the current findings indicate that adult learners do take advantage of statistical preemption to identify unacceptable sentences, but their ability to recognize unacceptability in real-time lags far behind. Introduction Learning multiple languages is highly beneficial in our multicultural world, as bilingualism is valued by immigrants, schools, travelers, and all heterogeneous communities. Yet learning a new language as an adult is slow and difficult for most people. The task is challenging because each language involves a unique and complex system of generalizations, subregularities, and idiosyncrasies. For instance, native English speakers judge the sentences in (8a)-(11a) to be strongly dispreferred in comparison to the conventional alternatives in (8b)-(11b): 8a ?Lisa filled water into the cup. (Ambridge & Brandt, 2013) 9a ?The magician disappeared the rabbit. (Robenalt & Goldberg, 2016) 10a ?Amber explained Zach the answer. (Tachihara & Goldberg, 2020) 11a ?Dan forced that Helen plays tennis. (Tachihara & Goldberg, 2020) 8b. Lisa filled the cup with water. 9b. The magician made the rabbit disappear. 10b. Amber explained the answer to Zach. 11b. Dan forced Helen to play tennis. The types of utterances in (8b)-(11b) can be viewed as statistically preempting the corresponding utterance types in (8a)-(11a) for native English speakers, who tend to strongly prefer to use the formulations in (8b)-(11b) instead of any of the formulations in (8a)-(11a) to express the intended messages. 1
For adults learning a new language, avoiding the types of unconventional sentences in (8a)- (11a) is especially challenging. Several studies have demonstrated that adult learners are markedly more tolerant of unconventional sentences in comparison to native speakers (Ambridge & Brandt, 2013; Brooks et al., 1999; Robenalt & Goldberg, 2016; Tachihara & Goldberg, 2020). In fact, there is a significantly larger difference between native speakers’ and learners’ acceptability ratings on unconventional sentences in comparison to conventional sentences (like the [b] sentences): judgments on conventional sentences tend to align more closely with native speakers’. Tachihara & Goldberg (2020) found that adult learners’ more lenient judgments on unconventional formulations were remarkably robust, regardless of their language background, and even though most learners were able to correctly recognize that a conventional alternative was somewhat preferred over an unconventional formulation. The unacceptability of the sentences in (8a)-(11a) may be particularly difficult to fully appreciate because each of the utterances is easy to interpret. Moreover, the same constructions are fully acceptable when used with different verbs, as illustrated in (8c)-(11c): 8c. Lisa poured water into the cup. 9c. The magician hid the rabbit. 10c. Amber told Zack the answer. 11c. Dan thought that Helen plays tennis. The type of semi-arbitrary unacceptability evident in (8a)-(11a) appears to exist in every language, as languages are conventional systems of communication, which develop over time in complex ways. We will see, for example, that native Spanish speakers judge (12a)-(13a) to be markedly less acceptable in comparison to (12b)-(13b), while the same constructions, unacceptable in the (a) sentences, are fully acceptable when used with different main verbs (12c)-(13c). That is, languages can be quite picky about which combinations of verbs and “argument structure” constructions are conventional and acceptable, for reasons that are not always transparent to researchers, let alone language learners. 12a. ? Rafael obligó que ellos irían al cine. “Rafael forced that they go to the movies.” 13a. ?Maya creyó a Javier a esperar por ella. “Maya believed Javier to wait for her.” 12b. Rafael los obligó a ir al cine. “Rafael forced them to go to the movies.” 13b. Maya creyó que Javier esperaría por ella. “Maya believed that Javier would wait for her.” 12c. Rafael pensó que ellos irían al cine. “Rafael thought that they would go to the movies.” 13c. Maya obligó a Javier a esperar por ella. Maya forced Javier to wait for her. Adult learners are known to produce utterances like those in (8a)-(13a). This is not in itself problematic, as subcommunities of speakers routinely develop their own vernaculars and there is 2
no evidence that any vernacular is “better” than any other, so we might consider adult language learners to simply develop their own conventional ways of speaking (Ortega, 2013; Cheng et al, 2021). In fact, as long as what is uttered is interpreted as intended, any utterance can legitimately be said to be good enough (Goldberg & Ferreira, 2022). At the same time, many adult learners aim to reproduce a vernacular they perceive to be standard, to avoid implicit (or explicit) bias (Gluszek & Dovidio, 2010). To the extent that adult learners share this goal, the question arises as to why adult learners show a propensity to rate unconventional language leniently. We do not need to endorse a prescriptivist attitude toward differences between native speakers and adult learners’ speech to find the differences compelling1. Our own interest lies in the mechanisms involved in language learning by adults and children. It is striking that child learners inevitably learn the nuanced conventions of their language(s), and as adults, generally recapitulate the conventional, colloquial language they witness being used by their peers; as native speakers, they rarely produce the unconventional types of examples in any of the (a) examples above as adults and if asked, reliably judge such examples to be quite unacceptable.1 How do child learners avoid producing and come to rate unconventional sentences so strongly unfavorably in their native language(s)? Studies have repeatedly found that caregivers do not reliably provide explicit corrections, as they are more concerned with the content of the child’s speech than its form. Even when corrections are offered, they appear to often be misinterpreted or ignored (Baker, 1979; Bowerman, 1988; Golinkoff & Hirsh-Pasek, 1996; Pinker, 1989). Presumably, child learners come to avoid unconventional formulations because they repeatedly witness conventional alternatives which come to suppress any unconventional ways of expressing the identical message over years of exposure (unless there is a particular reason to produce a novel formulation). This occurs through a process of competition or statistical preemption (e.g., Ambridge et al., 2018; Boyd & Goldberg, 2011; Perek & Goldberg, 2017; Goldberg, 1995; 2011; 2006; see also Chouinard & Clark, 2003 for a related discussion of recasts). There exist two possible interpretations of how statistical preemption works. One possibility assumes that an unconventional formulation needs to be predicted or activated before the conventional formulation is witnessed. Alternatively, statistical preemption may only require that a conventional formulation be activated at the same time or even after an unconventional formulation is witnessed. In either case, statistical preemption relies on the idea that a more conventional formulation competes with any unconventional formulation as a means of expressing the same message. The more often a conventional formulation is witnessed, the stronger it becomes, and the more strongly it comes to suppress any unconventional potential alternative. Evidence that native speakers learn what not to say via statistical preemption comes from production studies (Boyd & Goldberg, 2011), corpus data (Goldberg, 2011), and mini-artificial language paradigms (Perek & Goldberg, 2017). Other evidence comes from the fact that native speakers tend to judge novel sentences with readily available conventional paraphrases to be less 1 We use the term, “adult language learners” to refer to adults who are not yet proficient in a new language. Other terms used in the literature are “second language (L2) speakers” or “nonnative speakers.” We prefer “language learners” to highlight the dynamic nature of language learning. We use “native speakers” to refer to fluent speakers who grew up speaking the language. While we acknowledge that using the term, “native speakers” may exclude potential participants who are fluent expert speakers (Cheng et al., 2021), we use this term because we recruited participants for the current study by asking for “native speakers of .” 3
acceptable than novel sentences for which no readily available paraphrase exists, where the strength of a readily available paraphrase is estimated by the degree of convergence of speakers upon the same paraphrase (Robenalt & Goldberg, 2015). For instance, if asked to paraphrase an unconventional utterance like (8a): ?Lisa filled water into the cup, more than half of the native English speakers spontaneously suggest the same paraphrase, (8b), Lisa filled the cup with water. Relatedly, combinations of verb and argument structure that appear more frequently in naturalistic data provide more readily accessible paraphrases, and several studies have found that judgments on novel utterances vary inversely with the frequency of the convention paraphrase (Ambridge et al., 2008; Brooks & Tomasello, 1999; Robenalt & Goldberg, 2015; 2016; Theakston 2004). For example, 14a is judged less acceptable than 15a, presumably because the combination of verb and construction in 14b is more frequent than that in 15b: i.e., 14b more strongly preempts 14a, than 15b preempts 15a. 14a. ??The magician disappeared the rabbit. 15a. ?The magician vanished the rabbit. 14b. MAKE DISAPPEAR 15b. MAKE VANISH The effect of paraphrases’ frequency on (un)acceptability indicates that repeated experiences are key to statistical preemption. With sufficient exposure to a conventional formulation, it should be possible to learn that a novel alternative is unacceptable without the need for explicit correction. The current work tests whether statistical preemption plays a role in adult language learning. Few previous studies have addressed this question directly, and the few that have, have drawn different conclusions. Adult language learners’ greater acceptance of unconventional formulations suggests that statistical preemption may not work the same way in child and adult language learners except for adults at the highest levels of expertise in the target language (Ambridge & Brandt, 2013; Navarro-Torres et al., to appear; Robenalt & Goldberg, 2016; Treffers-Daller & Calude 2015). For instance, Zhang and Mai, (2018) compared two highly proficient English groups of speakers whose first language was Chinese on judgments of English denominal verbs (e.g., shirt the model; brick the path). One group of participants was English majors in their fourth year of study; the other group of participants was professional English teachers. Only the teachers showed an effect of statistical preemption: they were more likely to accept a denominal verb when a preempting verb was of lower frequency, but the English majors did not. Tachihara & Goldberg (2020) probed this phenomenon in a series of 5 studies, testing 980 adult learners of English who had lived in the US and were reasonably proficient in English. The study confirmed, as already discussed, that learners generally knew which of the two formulations was preferable, but nonetheless tended to be more lenient than native English speakers when asked to rate unconventional sentences. In an attempt to better align learners’ judgments of unacceptable sentences with native speakers’, one study in Tachihara & Goldberg (2020) exposed adult learners to a conventional formulation immediately before the being asked to rate the unconventional paraphrase. The single exposure had no effect on learners’ acceptability ratings. In accounting for adult learners’ challenge in fully appreciating unconventional language as unacceptable, the question of possible interference or “transfer” from learners’ more dominant naturally arises. Yet experienced language teachers have long suspected that the production of 4
unconventional expressions (like those in (8a)-(13a)) do not result from transfer effects (Borg, 2003). An analysis in Tachihara & Goldberg (2020) confirms that transfer effects cannot be wholly responsible for the higher tolerance of unconventional sentences by adult learners. In particular, two constructions of similar complexity were examined in a group of adult Spanish speakers learning English: the double object construction and the clausal complement construction. Spanish does not share the same double-object construction as English does, but the clausal complement constructions in the two languages are quite parallel. Verbs that are translational equivalents are equally (un)acceptable with a tensed clausal complements: For instance, Dan obligó que Helen juegue, is the word-for-word translation of Dan forced that Helen play tennis and both sentences are unacceptable. If judgments on English sentences were influenced by transfer from participants’ native Spanish, we would expect more nativelike judgments on the construction that was parallel in the two languages in comparison to the construction that differed. Yet that was not what was found: Spanish-speaking learners of English judged unconventional instances of both constructions equally more leniently in English than native English speakers did. In the current work, the target language being learned is Spanish. We explore the possibility that adult language learners can benefit from statistical preemption if exposure to conventional formulations is repeated over a series of days. The motivation for including multiple days of exposure stems, not only from a lack of evidence for an immediate effect of statistical preemption in prior work, but also from positive findings in word learning. Gaskell & Dunmay (2003), for instance, taught adults novel words that partially overlapped formally with existing words (e.g., cathedruke, which overlaps with cathedral). After participants learned the novel words, the effect of phonological competition was measured, using a lexical decision task. In this paradigm, longer reaction times for trials containing familiar words (e.g., cathedral) are interpreted as resulting from lexical competition from the newly learned words (e.g., cathedruke). Participants recognized the novel words immediately after exposure, but the anticipated competition effect did not emerge until after the 4th day of repeated exposure. This, along with other studies comparing immediate and delayed testing, suggest that lexical competition requires a period of consolidation during sleep (Dunmay & Gaskell 2007; Gais, Lucas, & Born, 2006; Lindsey & Gaskell 2010; Mattys & Clark 2002). While phonological competition in a lexical decision task differs from the sentence-level competition in a judgment task, we hypothesize that the competition involves the same or related processes. Therefore, we use repeated exposure over three days in order to test for an effect of competition between conventional and unconventional formulations. Because the strength of memory depends on experience, one needs sufficient experience to successfully access a memory in context. Students who are just learning a new language may have had too little experience to activate competing formulations upon witnessing a particular utterance. On the other hand, once learners are highly proficient, we can expect them to behave similarly to native speakers (Navarro-Torres et al., to appear; Robenalt & Goldberg, 2016; Tachihara & Goldberg, 2020). Thus, we predict that intermediate speakers, who have had enough experience to activate sentence formulations, but not enough experience to necessarily recognize competitive relationships, should benefit the most from exposure. In addition to the acceptability rating task, we included a self-paced reading task as an implicit, online measure of sentence processing. Self-paced reading can reveal whether or not participants are detecting anomalies in real-time as they read each word (Jegerski, 2014). The task has been used with native speakers and language learners in order to assess their implicit understanding of sentence structure (for reviews see Clahsen & Felser, 2006; VanPatten & Jegerski, 5
2010). The reason to include both explicit ratings and implicit reading time measure is that it is possible that the exposure may only have an impact on implicit knowledge. Or the opposite may be true: learners may be able to make explicit judgments before implicit measures are influenced. The language learners in the current study were all recruited from Spanish classrooms, so they may be especially interested in gaining explicit knowledge (Larsen-Freeman, 2000). Thus, by including both explicit ratings and the more implicit measure of reading times, we can explore how explicit judgments and implicit online processing may differ. To summarize, the current work investigates how adults come to appreciate what is unconventional in the language they are currently learning: Spanish. In what follows, two studies set the stage for our primary manipulation. To foreshadow results, Experiment 1 confirms that learners differ more from native speakers on judgments of unconventional sentences than on conventional sentences for 5 different construction types in the target language of Spanish. We then report a reading-time study that confirms that the stimuli we assume to be unacceptable in Spanish evoke the expected slow-down in reading times for native Spanish speakers. Experiment 3 is our key manipulation. Groups of undergraduates learning classroom Spanish are provided with exposure to one of two sets of fully conventional Spanish sentences for each of 3 days. On the fourth day, we compare acceptability ratings and reading times on unconventional paraphrases in order to see whether exposure to conventional sentences impacts the Spanish learners’ ratings or reading times. We find an effect of exposure in the judgment data, albeit not in reading times. As predicted, the effect on judgments is focused on Spanish learners at the intermediate level. We end with a discussion of the difference between judgments and reading times, the effect of proficiency, the effect of sleep, the importance of generalizability, and suggestions for future studies. Preregistration We preregistered the current studies on AsPredicted.org before data collection and included our hypotheses, the dependent measures, the data collection process with restrictions on participants and intended sample sizes, and all statistical analyses unless specified as exploratory. Deviations from the preregistration are described and explained in the text with additional references to supporting information (Experiment 1, https://aspredicted.org/J9V_TCN; Experiment 2, https://aspredicted.org/CJB_BSN; Experiment 3, https://aspredicted.org/KJF_KGC). Experiment 1 Experiment 1 tests whether native speakers and language learners of Spanish differ systematically in their acceptability ratings of conventional and unconventional sentences, as has been previously reported for native and language learners of English (Robenalt & Goldberg, 2016; Tachihara & Goldberg, 2020). Method Participants 70 native Spanish speakers living in Spanish-speaking countries and 70 Spanish learners in the US were recruited online through Cloud Research (Litman, Robinson, & Abberbock, 2017). For the second group, participants responded that their “native primary language is English” and “English is my first language” and rated their proficiency in Spanish to be less than 85 on a scale of 0-100. Procedure 6
The consent form was written in English for English-speaking learners of Spanish, and in Spanish for native Spanish speakers. Then a message explained that all following instructions and questions would be in Spanish. Participants were instructed to exit the survey without penalty if they did not know Spanish. Participants rated the acceptability of each sentence on a gradient scale between 0- 100 (100 being fully acceptable). The order of sentence presentation was randomized for each participant. Participants were provided with two examples to clarify the task: one unacceptable sentence (A mí me gusto la película, assigned low rating) and one acceptable (Yo vivo aquí, assigned a high rating). Stimuli We created 42 sentences that included 5 types of variation in the constructions used: copula choice, adjective position, grammatical gender, the double object construction, and the clausal complement construction (see Table 1). For each construction, half of the sentence stimuli were conventional sentences and half were unconventional sentences. The first four distinctions are unique to Spanish while the last is parallel in English and Spanish. Eight sentences were created for each of the first four constructions, and 10 for the last. The complete list of stimuli is available in Supporting information 1. Table 1. Construction types and sample stimuli used in Experiments 1-3 Construction types Unconventional formulation/ Translation into English Conventional formulation 1. ser vs. estar ?La estación de tren es en esta calle. La estación de tren está en esta calle. “The train station is on this street.” 2. pre vs post nominal ?El viejo hermano de Lola es guapo. adjectives El hermano viejo de Lola es guapo. “Lola’s older brother is handsome.” 3. el vs. la ?Usamos la mapa para encontrar la casa. Usamos el mapa para encontrar la casa. “We use the map to find the house.” 4. double object ?Estella envió su madre una carta. Estella le envió a su madre una carta. “Estella sent her mom a letter.” 5. que vs. a ?Rafael obligó que ellos irían al cine. Rafael obligó a ellos a ir al cine. “Rafael forced them to go to the movie.” Results Figure 1 displays the descriptive results. As is evident, both native speakers and adult language learners recognize that conventional sentences are more acceptable than unconventional sentences. This is confirmed with a linear mixed effects model using judgment scores as the outcome variable, conventionality as the fixed effect, and maximal converging random effect structure (in this case random intercepts and slopes for subjects and random intercepts for items (β = 8.38, t = 4.36, p < 0.001). At the same time, the difference between native speakers and learners is larger for unacceptable sentences than it is for acceptable sentences. Specifically, a linear mixed effects 7
model was fit to the data with judgment scores as the outcome and Speaker_group and Conventionality as fixed interacting effects. Random slopes and intercepts were included for subjects and items. Results show a main effect of Speaker_group (β = -25.94, t = -6.08, p < 0.0001), a main effect of Conventionality (β = 8.38, t = 7.40, p = 0.003), and most importantly, the predicted interaction (β = 36.45, t=7.40, p
random intercepts and slopes for subject and item. We find a main effect of Speaker group (β = - 26.77, t = -6.29, p < 0.001) but no main effect of construction (β= 7.48, t = 1.38, p = 0.22) and no interacting effect of speaker and construction (β = -0.37, t = -0.09, p = 0.93). This means that the discrepancy in judgments between language learners and native speakers for the double object construction did not differ from that of the clausal complement. Language learners found it just as challenging to detect unacceptability in a construction that behaved similarly to their dominant language as they did for a construction that behaved differently. This suggests that transfer is not wholly responsible for the higher tolerance of unconventional sentences by language learners. Finally, unsurprisingly, we find a negative correlation between judgment scores and self-rated proficiency of language learners, with more proficient learners aligning more with native speakers (r = -0.11, p < 0.001). Experiment 2 A self-paced reading time study was conducted with native Spanish speakers to confirm that the experimental procedure and the stimuli would work as expected. Based on prior work with self- paced reading, we expected that participants would show a slow down for unconventional sentences compared to conventional sentences, demonstrating that they are able to detect unacceptability during online comprehension (Jegerski, 2014). Methods Participants 100 native Spanish speakers living in Spanish-speaking countries were recruited through CloudResearch, and all are included in the analysis. Stimuli The stimuli consisted of a total of 20 unconventional sentences, 20 conventional sentences, and 20 conventional filler sentences. The target sentences were based on the same 5 constructions from Experiment 1. To shorten the length of the experiment, each participant read half of the target sentences, with the other half counterbalanced across participants. Thus, each participant saw 10 unconventional sentences, 10 conventional sentences, and 20 conventional filler sentences. Note that unconventional sentences made up only 25% of the stimuli in an effort to mitigate participants’ expectation of reading unconventional sentences during the task. Some conventional filler sentences were followed by yes-or-no comprehension questions about the content of the sentence, used to encourage and assess participants’ attention. The target sentences, filler sentences, and 13 comprehension questions appeared in randomized lists. For each sentence, a target region was identified prior to data collection. This included the first word at which one can detect that an unconventional sentence is unacceptable and the next two words to allow for possible spillover effects. The target regions for conventional sentences and unconventional sentences were as close as possible to facilitate comparison. For example, given the unconventional sentence, La estación de tren es en esta calle, the target region was es + en + esta. For the conventional sentence, La estación de tren está en esta calle, the target region was está + en + esta. The target region never included the first or the last word in the sentence. All stimuli and target regions are available in Supporting information 5. Procedure 9
We created the cumulative self-paced reading task on Inquisit 6 (Just, Carpenter & Woolley, 1982). Words appeared one word at a time as participants pressed the space button. The words remained on the screen until the end of the trial, such that the whole sentence was visible at the last word. When each sentence ended, participants clicked a button that appeared on the bottom right corner of the screen. To familiarize participants with self-paced reading, the first 25% of sentences were filler sentences. Results As expected, native speakers display a small but significant slow-down in the target region of unconventional sentences. A linear mixed-effects model with log sum of reading time over target region as the outcome and conventionality as the fixed effect was fit to the data. Random intercepts for subjects and items were included2 . Native speakers are slower to read the target region in unconventional sentences compared to conventional sentences (β = -0.13, t = -8.15, p < 0.0001) (Figure 2). Thus, native speakers can detect (un)acceptability during online sentence comprehension for this set of sentences using a self-paced reading paradigm. *** Figure 2: Mean sum of reaction time over target region for unconventional and conventional sentences. Error bars represent standard error. Circles represent the mean score for each participant. Experiment 3 2 Our preregistration specified analysis of only the first word of the target region instead of the sum of the target region and use a maximal fitting model. The first-word analysis also revealed a significant effect of conventionality for native speakers (β = -0.04, t = -2.39 p = 0.02). We report the longer time window to maximize the possibility of finding a slow-down among the language learners. See Supporting information 2 for the original analysis and an explanation of the random effects structure. 10
To find out if Spanish learners show heightened sensitivity to unacceptable target sentences via statistical preemption, we provided them with multiple days of exposure to conventional (acceptable) paraphrases of the target sentences. Specifically, conventional sentences are provided over 3 days to determine whether this leads to better recognition that unconventional sentences are unacceptable. The same judgment task from Experiment 1 and the self-paced reading task from Experiment 2 were repeated with participants in Experiment 3. Methods Participants We preregistered a plan to analyze the data of 100 participants enrolled in Spanish classes at Princeton University. The recruitment of classroom learners served 2 objectives: participants were active learners of Spanish and their course level was known, providing an objective measure of proficiency. We recruited participants from all levels of Spanish instruction classes. 128 participants took part in the study during 2 semesters of recruiting; however, only 73 completed the critical final assessment. All participants were native English speakers or highly proficient English speakers (all native languages are provided in Supporting information 3). Two participants were excluded because they indicated that their native language included Spanish and they rated their Spanish proficiency to be at ceiling. Ten additional participants were excluded for not passing the preregistered 75% threshold on comprehension questions. Participants were classified as Beginner, Intermediate, or Advanced, according to the placement test created and scored by the Spanish Department (specific tracks for the Spanish classes are provided in Supporting information 4). The breakdown of the final 61 participants by level is listed in Table 2. Their data is analyzed here. Table 2: Participants’ proficiency grouped by class level. Proficiency N Beginner (SPA 101, 102, 103) 19 Intermediate (SPA 105, 107) 19 Advanced (SPA 108, 200, 300) 23 Total 61 Procedure The experiment was administered on six days within an 8-day window (Table 3). During a pretest, participants registered for the experiment and responded to a questionnaire about their language backgrounds. During the initial and final assessments (days 2 and 6), participants read a combination of conventional and unconventional sentences in a self-paced reading task and completed a judgment task on the same set of sentences. During the intervening three days of exposure (days 3-4-5), participants read only conventional sentences, also using a self-paced reading task. All sessions except the pretest questionnaire included comprehension questions, which were used to encourage and assess participants’ attention. As mentioned, comprehension questions served as the preregistered exclusion criterion (75% accuracy required). In an additional effort to engage participants, we included 16 non-linguistic encouragement gifs (e.g., Jennifer Lopez clapping) which appeared at random intervals throughout the tasks. Table 3. Summary of experiment tasks with example stimuli provided on each day. 11
Day 1 Pretest Questionnaire Example stimuli Day 2 First assessment Self-paced reading ?La estación de tren es en esta calle. & Judgment El baño está en ese piso. Day 3 Exposure Self-paced reading La estación de tren está en esta calle. Day 4 Exposure Self-paced reading La tienda que le gusta a Daria está en ese bloque. Day 5 Exposure Self-paced reading El taxí está en la calle incorrecta. Day 6 Final assessment Self-paced reading ?La estación de tren es en esta calle. & Judgment El baño está en ese piso. ?La iglesia es en camino a la escuela. We divided participants into two subgroups as follows. On days 3-4-5, one group of participants was exposed to: ser vs. estar; and, prenominal vs. postnominal adjectives. The other group was exposed to: el vs. la; and que complements vs. a complements. This design allows us to compare the effect of exposure on particular constructions between subgroups while controlling for the delay between initial and final assessments.3 All stimuli can be found in SI 5. Assessments (Days 2 and 6) The format of the judgment task was the same as used in Experiment 1, and the format of the self- paced reading task was the same as used in Experiment 2. The initial assessment consisted of 20 unconventional sentences, 20 conventional sentences, and 40 conventional filler sentences. 4 unconventional sentences and 4 conventional sentences collected were double object constructions and only appeared in the initial assessment since it was not part of the manipulation. The final assessment included 16 additional novel unconventional sentences based on the same construction types but including different words. This was done to determine whether any effect of exposure would generalize beyond the particulars of the sentences witnessed. Thus, the final assessment consisted of 32 unconventional sentences, 16 conventional sentences, and 64 conventional filler sentences. Unconventional sentences made up 25% of the stimuli, much like in Experiment 2, in order to mitigate participants’ expectations of reading an unconventional sentence. Due to experimenter error, the judgment task at the final assessment consisted of a random subset of 40 unconventional and conventional sentences instead of 48. As in Experiment 2, filler sentences made up the first 25% of the task, so participants can be familiarized with self-paced reading. For the rest of the task, the order of sentences was randomized for each participant. Exposure phase (Days 3-4-5) During the 3 days of exposure, participants only read conventional sentences. Each day they witnessed 8 conventional sentences and 8 conventional filler sentences. On the first day of the exposure, the conventional sentences directly competed with unconventional sentences in the assessment: the conventional sentences included the same verbs and noun phrases as the unconventional paraphrases learners had to read and rate. On the other following two days of exposure, participants read different conventional sentences of the same construction types (conventional sentences included distinct verbs and noun phrases). (Recall Table 3 for an overview of the types of sentences participants read each day). 3 We excluded the double object construction from exposure and final assessment in order to have an even number of constructions and reduce the amount of time of the experiment. 12
Results Judgment results at initial assessment We first tested whether learners’ ratings distinguished conventional from unconventional sentences at the initial assessment. As expected, they did, replicating the finding in Experiment 1 as well as prior work on English (Robenalt & Goldberg, 2016; Tachihara & Goldberg, 2020). That is, participants knew Spanish well enough to assign higher acceptability ratings to conventional than to unconventional sentences. Specifically, a linear mixed model confirms that conventionality predicted acceptability judgments for the learners of Spanish, with random intercepts for subjects and items included, even at the initial assessment (β = 24.15, t = 7.41, p < 0.0001) and at final assessment (β = 23.65, t = 6.74, p < 0.0001). To examine the role of proficiency, we first analyze acceptability scores at the initial assessment as a function of class level. We ran a mixed-effects model with conventionality and class as interacting fixed effects and random intercepts for subjects and items. We found a significant interaction, meaning that as the proficiency increases, the difference in judgment scores between conventional and unconventional sentences also increases (β = 4.59, t = 6.69, p < 0.0001). Figure 4. displays each class from lower to higher proficiency. As is visible, the effect was driven by the unconventional sentences (orange bars); the same model confirms a significant effect of class on the unconventional sentences, but not for the conventional sentences (unconventional: β = -3.61, t = -3.93, p = 0.0002; conventional: β = 1.07, t = 1.65, p = 0.11). In other words, as proficiency increases, judgments for unconventional sentences decrease while judgments on conventional sentences remain largely unchanged. Score Figure 3. Mean acceptability scores for unconventional and conventional sentences by class levels in increasing proficiency from left to right. Error bars represent standard error. Circles represent the mean score for each participant. 13
Judgment results at final assessment Our main aim is to investigate whether exposure to conventional paraphrases impacts judgments on unconventional sentences. This would constitute evidence of statistical preemption among L2 learners of Spanish. Results confirm just this: exposure to conventional sentences lowered learners’ subsequent ratings of unconventional paraphrases at the final assessment. Recall that which set of constructions were witnessed was counterbalanced across participants, so that we could directly compare for the effect of exposure while controlling for the delay and general familiarity with the task. We ran a linear mixed-effects model with judgment scores in the final assessment as the outcome and exposure as the fixed effect, including random intercepts for subjects and items. Exposure significantly impacted judgments: participants gave lower ratings to unconventional paraphrases after repeatedly reading the conventional sentences (β = -3.88, t = - 2.79, p = 0.0053). In other words, we find evidence of learning through statistical preemption in language learners: reading conventional sentences led participants to rate unconventional paraphrases to be appropriately less acceptable. To make sure that the effect was not driven by a single construction, we ran the model with an added random effect of construction type in an exploratory analysis and again found an effect of exposure (β = -3.90, t = -2.81, p = 0.005). There was no significant influence of exposure on conventional sentences (β = 0.29, t = 0.16, p =0.87). Recall that we had preregistered an expectation that the effect of statistical preemption would be strongest for the intermediate level leaners. Indeed, the effect of exposure is significant only for intermediate level learners of Spanish (β = -6.25, t = -3.05, p = 0.002); beginner level (β = 0.68, t = 0.34, p = 0.74); advanced level (β = -2.11, t = -1.16, p = 0.25). This is displayed in Figure 4 by course level, which displays judgments for unconventional sentences only, by whether Figure 4. Mean acceptability scores for unconventional sentences with and without exposure to the conventional alternative by class levels in increasing proficiency from left to right. Error bars represent standard error. Circles represent the mean score for each participant. 14
participants were exposed to the corresponding conventional formulations or not. Of interest is whether unconventional sentences with exposure to the corresponding conventional formulations are rated lower than unconventional sentences without such exposure. As illustrated in Figure 4, they were, particularly for intermediate-level speakers, as hypothesized. Recall that in the final assessments, half of the unconventional sentences that participants rated had appeared in the initial assessment (e.g., ?La estación de tren es en esta calle), and were very close paraphrases of the specific conventional sentences the same participants had witnessed on the first day of exposure (La estación de tren está en esta calle). The other half of the unconventional sentences in the final assessment were entirely new but involved the same constructions (?La iglesia es en camino a la escuela). This allows us to investigate how generally statistical preemption applied: was the effect restricted narrowly to the specific content witnessed during the exposure, or did it apply to the new unconventional sentences that shared the same type of conventional paraphrase? To test this, we compared judgment scores for repeated and new unconventional sentences to see whether the effect of statistical preemption generalized beyond specific utterances. An exploratory mixed-effects model with judgment scores on the unconventional sentences in the final assessment as our outcome and repeated vs. new as the fixed effect was fit to the data, with random intercepts for subjects and items. We find that scores on the repeated and new unconventional sentences were not distinct (β = -2.12, t = -0.61, p = 0.55), suggesting that participants generalized the preemptive exposure beyond the specific sentences they had witnessed. We had preregistered a plan to analyze the effect of exposure on the difference in judgment scores between initial and final assessments for unconventional sentences. The change in judgment score was calculated by subtracting the initial assessment score from the final assessment score for each item. Because half of the items in the final assessment were new items that were not tested in the initial assessment, only the items that were repeated in the initial and the final assessment were included for the analysis using the change in judgment score. We ran a mixed-effects model with change in judgment score as the outcome and exposure as the fixed effect on the repeated unconventional sentences. Random intercepts for subjects and items were included. We did not find a significant effect of exposure in this case (β = -2.65, t = -1.19, p = 0.23). This suggests not only that new sentences are comparable to repeated unconventional sentences, but that new sentences are necessary for the analysis to have sufficient power for a significant effect of exposure. Self-paced reading task We found no evidence that language learners slowed down during the key window when reading unconventional sentences. Specifically, we ran the same mixed-effects model that was used to demonstrate a slow-down among native Spanish speakers on the same unconventional sentences in Experiment 2: log sum of reading time over target region as our outcome, conventionality as the fixed effect, and random intercepts for subjects and items. But adult learners showed no slow- down either during the initial assessment (β = 0.003, t = 0.031, p = 0.98), nor at the final assessment (β = 0.019, t = 0.35, p = 0.73) (see SI 6). The same null effect was found when exposure was taken into account: We ran a mixed-effects model with log sum of reading time over target region as our outcome and exposure as the fixed effect on the unconventional sentences from the final assessment, with random intercepts for subjects and items included. Participants did not slow down when reading unconventional sentences after being exposed to conventional paraphrases (β = -0.21, t = -1.52 p =0.13) (SI 7). The accuracy on the comprehension questions of these sentences was high (M = 94.83%), indicating that participants were paying attention and 15
understood the sentences. Additional analyses for self-paced reading can be found in Supporting information 8 & 9. Discussion Language learners have specific difficulty identifying unconventional sentences as unacceptable, providing a likely reason that they commonly continue to produce sentences that native speakers strongly judge to be unacceptable, even at relatively high proficiency levels (Bley-Vroman & Joo, 2001; Bley-Vroman & Yoshinaga, 1992; Hubbard & Hix, 1988; Inagaki, 1997; Martinez- Garcia & Wulff, 2012; Oh, 2010). In Experiment 1, Spanish learners displayed a larger discrepancy from native speakers’ on ratings of unconventional sentences than conventional sentences. We saw that transfer from participants’ dominant language (English) is unlikely to be responsible, since learners showed the same pattern, regardless of whether the acceptability pattern was the same in English and Spanish. Experiment 2 confirmed, as expected, that native Spanish speakers recognize unconventional sentences immediately upon encountering them, as they slowed down in the target region during the self-paced reading task. In Experiment 3, our goal was to determine whether statistical preemption is operative in adult learners. Indirect support that adult learners can make use of statistical preemption is the simple fact that awareness of unacceptable sentences in Spanish increases with proficiency (recall Figure 3). Whereas previous results likewise indicate that L2 learners at the highest proficiency judge unacceptable sentences much like native speakers do presumably by learning through statistical preemption, the current work is the first that we know of that manipulates exposure to statistical preemption across proficiency levels and finds evidence of statistical preemption. Specifically, results on the judgment task demonstrate that repeated exposure to conventional sentences helps language learners more fully appreciate that unconventional paraphrases are markedly unacceptable, particularly at the intermediate level, as predicted. Since judgments were compared after exposure and dependent on which set of constructions participants had been exposed to, we can be confident that the exposure to conventional sentences led learners to ultimately rate unconventional sentences as less acceptable. By contrast, the results from the self-paced reading time measure are incongruent with those of the judgment task. Recall even at the initial assessment, the judgment data revealed undergraduate learners already recognized some distinction between conventional and unconventional sentences. And we have seen that exposure led to an even greater distinction between conventional and unconventional sentences. Yet in the self-paced reading task, learners showed no slow-down during the initial or final assessment nor any effect of exposure. That is, comparisons of reading times for conventional and unconventional sentences reveal no evidence that the undergraduate learners detected any difference between them, despite judgment evidence that they did. The fact that judgment data reveal evidence of a growing awareness of unacceptability while reading times do not indicate that adult learners, at the levels of proficiency included here, do not detect unacceptability in real-time but require the extra time afforded in the off-line judgment task. If detecting unacceptability requires speakers to access an acceptable paraphrase, we speculate that the lack of a slow-down during online reading may reflect a relative inaccessibility of conventional paraphrases during real time processing. That is, if it takes learners more time to access a conventional paraphrase than it does native speakers, they may only notice that a sentence is unacceptable after it has been completed. Studies using electroencephalogram (EEG) to measure brain responses of L2 speakers show that language learners sometimes show a 16
reduced event-related potential (ERP) when reading anomalous sentences compared to native speakers (Rossi et. at., 2006; Foucart & Frenck-Mestre, 2012). These effects depend on the proficiency and/or age of acquisition of the participants. These patterns of results are consistent with the idea that the accessibility of a conventional paraphrase determines whether unacceptability is detected during online comprehension. Alternatively, it may be that recognition of unacceptability requires an explicit task, such as a judgment task. Because the participants in this study were recruited from language instruction classes and were actively trying to learn Spanish, there was likely a conscious effort to learn the (un)acceptability of sentences. Indeed, many of the constructions used in this task, such as ser vs. estar are explicitly taught in Spanish classes. It is important to note, however, that while the judgment task itself was explicit, no feedback was provided at any point in the experiment. The learning evident in the judgment task occurred without explicit feedback, recasts, or corrections. This means that language learners are able to learn to judge unacceptability through statistical preemption, a type of indirect negative evidence, but it remains unclear the extent to which it is recognized in real-time. We also find that learning through statistical preemption depended on the proficiency level of the participant. Specifically, only intermediate-level speakers showed an effect of exposure when rating the acceptability of unconventional sentences. Successful learning through statistical preemption requires competition between conventional and unconventional sentences. A learner must activate sentence structures that differ from what they are currently reading for competition to take place. The amount of experience with the language can affect how successfully one can activate sentence structures. Past literature has shown that language learners appear to be less likely than native speakers to predict upcoming forms (Grüter, Hurtado, Marchman, & Fernald, 2014; Ito, Martin, & Nieuwland, 2017; Kaan, 2014; Kaan, Dallas, & Wijnen, 2010; Kaan, Kirkham, & Wijnen, 2016; Lew-Williams & Fernald, 2010; Martin et al., 2013). This suggests that they would also be less likely to activate sentence formulations that are not in front of them. Because beginning-level students have had little experience with the language, they likely have trouble activating sentence formulations and learning unacceptability from competition. Intermediate- level learners, who are likely to have had enough experience to activate sentence formulations, were the ones to benefit from exposure to conventional sentences, as we had predicted. Advanced students were already more closely aligned with native speakers, so there was less possibility for the exposure to show an effect. Thus, learning unacceptability through a competition mechanism like statistical preemption may require an optimal level of sentence activations. Limitations The current study was designed to be relatively easy to complete and take a short amount of time to participate. This was done so in an effort to recruit as many students as possible across different proficiency levels. One limitation was that we were not able to recruit our target sample size of 100, although we were able to recruit equally across proficiency levels. Because the recruitment periods occurred over two semesters during the COVID-19 pandemic, where classes were only offered online, it is possible that students were more reluctant to participate in an online study. The short format of the study also meant that we were limited in the number of sentences in our stimuli. Future studies with a larger sample and a larger set of sentences would be beneficial because they would allow for additional analyses comparing proficiency and sentence types. We designed our current study such that repeated exposure occurred over days to allow for memory consolidation to take place during sleep (Dunmay & Gaskell 2007; Gais, Lucas, & Born, 17
2006; Lindsey & Gaskell 2010; Mattys & Clark 2002). The lack of an effect of statistical preemption reported by Tachihara & Goldberg (2020) may be due to the fact that only a single conventional sentence was provided for each unconventional sentence or by the fact that the study was conducted in a single session, so participants had no opportunity to sleep. The current design cannot disentangle the effects of repeated exposure and sleep, but future work should investigate each independently. Generalizability Another difference between the single exposure in Tachihara & Goldberg (2020) and the current study is the variability of conventional sentences used during exposure. Because there were three separate exposure sessions, learners received exposure to three different conventional sentences of the same construction. It is possible that this variability helped augment the effect of exposure. We found that learners are able to generalize within the same construction since they rated unconventional sentences as unacceptable regardless of whether they were repeated at the initial and final assessments or were new sentences. Variability of sentences could increase generalizability and make statistical preemption more effective in learning unacceptability. In other words, it’s not just that learners can generalize within the same construction, but that they need the variability in the input for statistical preemption to be effective. Generalizability is important in the practical use of statistical preemption as a pedagogical tool. If learners were only able to reduce acceptability for sentences in which they had read the exact paraphrase, statistical preemption would not be a viable tool for learning. However, since learners are able to generalize unacceptability, repeated exposure to the conventional construction is sufficient to lead to knowledge about the unacceptability of the unconventional construction. Transfer effect Transfer effect is a commonly assumed cause of mistakes in language learners. However, findings from this paper and recent evidence suggest that transfer effects are not wholly responsible for higher tolerance for unconventional sentences in language learners. Specifically, we tested two constructions of similar complexity, the double object construction, which behaves differently between English and Spanish, and the clausal complement construction, which behaves in the same way between English and Spanish. This means that the same verbs that are novel and unacceptable in Spanish are also unacceptable in English. Dan obligó que Helen juegue, the literal translation of Dan forced that Helen play tennis, is unacceptable in both languages. Thus, if one was using knowledge of their native language, English, and transferring it onto the new language, Spanish, they should show a higher tolerance for the double object, but not for the clausal complement. Yet, there was no difference between constructions, supporting the idea that transfer effects are not the cause of higher tolerance. In Tachihara & Goldberg (2020), we asked native Spanish speakers who spoke English as a nonnative or less dominant language to judge English sentences in the double object construction and the clausal complement construction. We found the same results, with higher tolerance of unacceptable formulations in both constructions and, importantly, no difference between the two constructions. Taken together, we conclude that transfer effects are not the cause of the higher acceptability of unacceptable sentences in language learners. Consistent with this idea, experienced language teachers have long noted that most mistakes are not a result of the transfer (Borg, 2003). Reduced sensitivity to competition 18
You can also read