Error annotation in learner corpora: tools and applications in English and Italian - University of ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
1 Error annotation in learner corpora: tools and applications in English and Italian OLGA VINOGRADOVA, NIKITA LOGIN, IVAN TORUBAROV (RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS, MOSCOW) LUCIANA FORTI, STEFANIA SPINA (UNIVERSITY FOR FOREIGNERS PERUGIA, ITALY)
Part 4 2 The annotation of phraseological errors in LOCCLI (Longitudinal Corpus of Chinese Learners of Italian) Stefania Spina & Luciana Forti University for Foreigners of Perugia 13th TaLC Conference 18-21 July 2018 Faculty of Education, University of Cambridge
3 Outline u Phraseological errors. Why they are relevant in SLA and why they are challenging for researchers. u Description of a scheme for the annotation of Italian phraseological errors in learner texts. u Annotation lab u data on learner errors u data on annotators agreement u Conclusions
4 Relevance of phraseological errors in SLA research 1. Evidence form corpus linguistics & psycholinguistics: ➢ centrality of formulaic units in language acquisition, processing and use (Hoey 2005; Siyanova 2013; Taylor 2012; Wray 2013) 2. Evidence from learner corpus research: ➢ formulaic units as particularly challenging for learners even at higher proficiency levels (Ellis et al. 2015; Howarth 1998; Laufer & Waldman 2011); ➢ VN collocations particularly challenging (Bestgen & Granger 2014; Nesselhauf 2005; Wang 2016).
5 Relevance of phraseological errors in SLA research Implications for SLA research and SL/FL pedagogy: a. unit of observation in empirical research on the development of learner language through time; b. definition and grading of learning aims: language learning principles devised by Rod Ellis and adopted by the New Zealand Ministry of Education (Ellis, 2005; Maley, 2016).
6 The analysis of phraseological errors in learner language ISSUES 1. Phraseological units mostly analysed in terms of a. frequency; b. strength of association; c. non-nativelike uses compared to native language uses. ➢ Limited evidence related to their accuracy in context (Spina, forthcoming; Thewissen, 2015)
7 The analysis of phraseological errors in learner language ISSUES 2. Phraseological errors mostly analysed in cross-sectional studies. Limitations: a. Most SLA theories are based on how second language learning evolves over time (Gass & Selinker, 2008); b. Most cross-sectional learner corpora contain data from a single proficiency level (Granger et al. 2009). ➢ Limited evidence related to the anaysis of phraseological errors in longitudinal learner corpora (Bestgen & Granger 2014; Qui & Ding 2011; Siyanova-Chanturia 2015; Siyanova-Chanturia & Spina in preparation; Yoon 2016)
8 The analysis of phraseological errors in learner language ISSUES 3. Difficulty in classifying errors and automatically annotating large learner corpora. ➢ Limited evidence related to the agreement between annotators with different degrees of expertise, and between different error categories.
9 Filling in gaps Creation and annotation of the Longitudinal Corpus of Chinese Learners of Italian (LOCCLI): ➢ represents a language with limited LCR evidence (Italian); ➢ covers a 6 month time-span; ➢ includes 3 different proficiency levels; ➢ contains error annotation for different categories of collocations.
10 Error annotation scheme A) 1. Word replacement Lexical errors 2. Non-existing combination 3. Existing combination with different meaning B) 4. Determiner Grammatical errors 5. Modifier Addition 6. Agreement Omission 7. Number Choice Position
11 Error types A) Lexical errors 1. Word replacement e.g. sento molto paura (136116 A) (« I have a lot of fear »)
12 Error types A) Lexical errors 1. Word replacement e.g. sento molto paura (136116 A) (« I have a lot of fear »)
13 Error types A) Lexical errors 1. Word replacement e.g. sento molto paura (136116 A) ho (« I have a lot of fear »)
14 Error types A) Lexical errors 2. Non-existing word combination e.g. dopo aver mangiato il pranzo (136139 B) (« after having eaten the lunch »)
15 Error types A) Lexical errors 2. Non-existing word combination e.g. dopo aver mangiato il pranzo (136139 B) (« after having eaten the lunch »)
16 Error types A) Lexical errors 2. Non-existing word combination e.g. dopo aver mangiato il pranzo (136139 B) aver pranzato (« after having eaten the lunch »)
17 Error types A) Lexical errors 3. Existing combination with different meaning e.g. mi piace godo questi hobby (136815 B) (« I like to enjoy these hobbies »)
18 Error types A) Lexical errors 3. Existing combination with different meaning e.g. mi piace godo questi hobby (136815 B) (« I like to enjoy these hobbies »)
19 Error types A) Lexical errors 3. Existing combination with different meaning e.g. mi piace godo questi hobby (136815 B) dedicarmi a questi hobby (« I like to enjoy these hobbies »)
20 Error types B) Grammatical errors 4. Determiner (omission) e.g. abbiamo visitato Musei Vaticani (136736 B) (« We visited Vatican Museums »)
21 Error types B) Grammatical errors 4. Determiner (omission) e.g. abbiamo visitato Musei Vaticani (omission) (136736 B) (« We visited Vatican Museums »)
22 Error types B) Grammatical errors 4. Determiner (omission) e.g. abbiamo visitato Musei Vaticani (omission) (136736 B) i (« We visited Vatican Museums »)
23 Error types B) Grammatical errors 5. Modifier (position) e.g. Ci sono molte famose opere d’arte (136736 B) (« There are many famous works of art »)
24 Error types B) Grammatical errors 5. Modifier (position) e.g. Ci sono molte famose opere d’arte (136736 B) (« There are many famous works of art »)
25 Error types B) Grammatical errors 5. Modifier (position) e.g. Ci sono molte famose opere d’arte (136736 B) opere d’arte famose (« There are many famous works of art »)
26 Error types B) Grammatical errors 6. Agreement e.g. studio lingua italiano (136736 A) (« I study the Italian language »)
27 Error types B) Grammatical errors 6. Agreement e.g. studio lingua italiano (136736 A) (« I study the Italian language »)
28 Error types B) Grammatical errors 6. Agreement e.g. studio lingua italiano (136736 A) lingua italiana (« I study the Italian language »)
29 Error types B) Grammatical errors 7. Number e.g. fare una nuova amicizia (136380 B) (« to make a new friendship »)
30 Error types B) Grammatical errors 7. Number e.g. fare una nuova amicizia (136380 B) (« to make a new friendship »)
31 Error types B) Grammatical errors 7. Number e.g. fare una nuova amicizia (136380 B) nuove amicizie (« to make a new friendship »)
32 Annotation lab u Participants: first year University students of a Master’s degree in “Teaching Italian as a second language” (University for Foreigners of Perugia) u The task was carried out in a computer lab, where a pc with internet connection was available to each student u Learner texts annotated using Brat u http://brat.nlplab.org
33 Accessing the corpus
34 Longitudinal Corpus of Chinese Learners of Italian (LOCCLI) u 350 essays; u 175 Chinese learners of Italian – each learner, two essays (beginning and end of a 6-month course); u 3 proficiency levels (A1, A2, B1); u Age: 17-33 years old (mean=20.5, SD=2.7; 105 females)
35 Annotation lab: word combination types u three different types of combinations, particularly challenging for learners: u Verb+noun (VN) combinations, where the noun is the direct object of the verb u Noun+adjective (NADJ) u Adjective+noun (ADJN) u the two combinations used in the adjectival modifier grammatical dependency.
36 VN combinations u The sequence of verb and noun can be interrupted, and its internal order can be inverted, in the case of passive constructions: u Fare la doccia (“take a shower”) u Fare spesso la doccia (“often take a shower”) u Fare spesso una lunga e piacevole doccia (“often take a long and nice shower”) u La doccia deve essere fatta preferibilmente all’inizio della giornata (“the shower must be taken preferably at the beginning of the day”)
37 NADJ and ADJN combinations u noun in Italian: either preceded or followed by one or more adjectives. u syntactic and semantic constraints: it follows the noun u if it is modified by an adverb (un libro molto interessante “a very interesting book”) u if it is modified by a complement (un libro utile per gli studenti “a useful book for the students”) u or if it has the function of narrowing the noun it refers to, defining a subclass in its meaning (ho comprato dei fiori gialli “I bought some yellow flowers”). u two possible phraseological sequences: u noun + adjective (NADJ): scuola elementare “primary school” u adjective + noun (ADJN): bel tempo “nice weather”.
38 Annotation lab: description u 46 students previously instructed on the annotation scheme u 23 groups: u each group formed by two students (one native and one L2). u Each student was asked to annotate 20 texts written by 10 different learners and collected in the two collection points A (beginning of the course) and B (six months later).
39 Annotation lab: the task u Assign a label with the word combination type, and decide whether correct or incorrect. u choice among the error types required by the annotation scheme; u Formulate a target hypothesis u Write a final report
40 Example http://clizia.unistrapg.it/brat/#/
41 Annotated texts
42 Annotation lab: data u data on learner errors in the use of the selected word combinations u what is mostly difficult for Chinese learners of Italian u data on annotators agreement u what was most difficult for annotators.
43 Annotation lab: preliminary data on errors u Sample of 20 texts, two annotators, 393 word combinations A1 A2 B1 Texts 4 6 10 n. of word combinations 81 114 198 Word combinations per 20.2 19 19.8 text
44 Word combination types
45 Errors per word combination type
46 Errors per word combination type u Grammatical errors are the most frequent errors u Lexical errors are constant through word combination types u ADJN are the least frequent combinations, but those where errors occur most (52%) u NADJ are the combinations where errors occur less (25%)
47 Grammatical errors per word combination type (x100)
48 Example: modifier position errors in ADJN combinations u This error type is due to the wrong position of the adjective, and is likely a transfer error, since in Chinese the adjective precedes the noun. u Anche la moda fa un importante parte di Italia. u “Fashion too is an important part of Italy”. u Infine, ho trovato gli spagnoli ragazzi sono non più belli di italiani ragazzi. u “Finally, I found that Spanish boys are not more beautiful than Italian boys”.
49 Outcomes u Allow students u to have a direct contact with data produced by learners u Through error annotation, to discover patterns of recurrent errors u To make hypotheses on their frequency and their motivations
50 Visualization of errors in word combination types
51 Annotation lab: preliminary data on inter-annotator agreement u Students’ IAA (2 students on 20 texts): 0.54 (moderate agreement); z = 19.7; p-value = 0. u Experts’ IAA (2 experts): 0.81 (near perfect agreement); z = 25.6; p-value = 0.
52 No agreement u There’s no agreement between annotators in 106 word combinations (27%) u Correct combinations: 15% u Grammatical errors: 50% u Lexical errors: 34% u Grammatical errors are those where there is the lowest degree of agreement between annotators.
53 No agreement in grammatical errors
54 No agreement in grammatical errors u Errors with the lowest degree of agreement between annotators: u modifiers (50%-100%) u determiner addition (85%) u B1 – data collection point A u Inoltre, nel tempo libero, se c'è possibilità, mi piace fare lo sport, per esempio nuotare e sciare. u “In addition, in my free time, if there is the possibility, I like doing sport, for example swimming and skiing”.
55 Motivation: choice between alternative errors u A1 – data collection point B u Abbiamo mangiato qualche la pizza u “We ate some (the) pizza” u Two possible errors: u Abbiamo mangiato la pizza u Abbiamo mangiato qualche pizza
56 Motivation: choice between overlapping errors u A1 – data collection point B u Vogliamo ascoltare musica, ci piace cantante, la cinese nome è Chen Yi Xun u “We want to listen to the music, we like a singer, her Chinese name is Chen Yi Xun” u Two possible errors: u la cinese nome agreement choice u la cinese nome modifier position
57 Motivation: complexity was too high u A2 – data collection point B u Ma quando io ho fatto una passeggiata e ho veduto i italiani guardavano concorrenza in strada, anch'io stop a guardare perché è attraente u “But when I took a walk and I saw Italian looking at concurrence in the street, I stopped looking as well because it’s attractive”
58 What the annotators said… u Uno degli aspetti più difficili a mio avviso è stato proprio quello lessicale. Per capire un errore di questo tipo infatti bisogna innanzitutto interpretare il messaggio che lo studente vuole inviare. u “One of the most difficult aspects was the lexical one. You need to interpret the message that learner wants to convey in order to understand a lexical error”. u Avere a che fare con produzioni scritte di apprendenti stranieri è una bella sfida per un italiano madrelingua. Capire i loro errori ci dà l’opportunità di riflettere sulla nostra lingua, e ci permette di vedere l’idioma, di cui siamo abili ‘padroni’, sotto un altro punto di vista. u “Dealing with written productions of L2 learners is a challenge for a native Italian. Understanding their errors allows us to reflect upon our language –that we fully master - and to consider it under a different point of view”.
59 Conclusions Pedagogical advantages of annotation tasks for students aiming to become SL/FL teachers: u Increased awareness of the properties of word combinations in their native language; u Acquitision of skills in analysing annotated data and gaining insight in relation to interlanguage and contrastive analysis (comparison with learners’ L1); u Drawing connections with SLA theories studied in previous, introductory, applied linguistics modules; u Use of analysed data in lesson planning (selecting and grading learning aims) and pedagogical materials’ design (building classroom activities).
60 Conclusions Pedagogical advantages of annotation tasks for researchers: u Precious insight into differences in inter-annotator agreement rates • between annotators with different levels of expertise (e.g. researchers vs. students); • across different error types (e.g. errors involving determiner: lowest agreement) u Opportunity to improve CALL systems from a computational perspective. u Opportunity to trace the development of phraseological errors through time and across proficiency levels.
61 Conclusions Potential pedagogical advantages of annotation tasks for in-service teachers: u Awareness of new tools and resources developed by applied linguistics researchers (corpora, data extraction and annotation tools, etc.) u Insight into the properties of word combinations and their challenges in SL/FL pedagogical practice; u Feedback and collaboration on possible uses of annotated data in the SL/FL classroom.
62 References Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28–41. Ellis, N., Simpson-Vlach, R., Römer, U., O’Donnell, M., & Wulff, S. (2015). Learner corpora and formulaic language in SLA. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner corpus research (pp. 357–378). Cambridge: Cambridge University Press. Ellis, R. (2005). Principles of instructed language learning. Asian TEFL Journal, 7(3), 9–29. Gass, S. M., & Selinker, L. (2008). Second Language Acquisition. An Introductory Course. New York: Routledge. Granger, S., Dagneaux, E., & Meunier, F. (Eds.). (2009). International corpus of learner English (Version 2). Louvain la Neuve: Presses universitaires de Louvain. Hoey, M. (2005). Lexical priming. A new theory of words and language. London; New York: Routledge/AHRB. Howarth, P. A. (1998). Phraseology and Second Language Proficiency. Applied Linguistics, 19(1), 24–44. Laufer, B., & Waldman, T. (2011). Verb-Noun Collocations in Second Language Writing: A Corpus Analysis of Learners’ English: Verb-Noun Collocations in L2 Writing. Language Learning, 61(2), 647–672. Maley, A. (2016). Principles and Procedures in Materials Development. In M. Azarnoosh, M. Zeraatpishe, A. Faravani, & H. R. Kargozari (Eds.), Issues in Materials Development (pp. 11–30). Rotterdam: Sense Publishers. Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam-Philadelphia: Benjamins. Qi Y. & Ding Y. (2011). Use of formulaic sequences in monologues of Chinese EFL learners. System 39, 164-174. Siyanova-Chanturia, A. (2015). On the ‘holistic’ nature of formulaic language. Corpus Linguistics and Linguistic Theory, 11(2). Siyanova-Chanturia, A. (2013). Eye-tracking and ERPs in multi-word expression research. A state-of-the-art review of the method and findings. The Mental Lexicon, 8(2), 245–268. Spina, forthcoming. The development of phraseological errors in Chinese learners of Italian: a longitudinal study Taylor, J. R. (2012). The mental corpus: how language is represented in the mind. Oxford ; New York: Oxford University Press. Thewissen, J. (2015). Accuracy across Proficiency Levels. A Learner Corpus Approach. Louvain: Presses universitaires de Louvain., Wang, Y. (2016). The Idiom Principle and L1 Influence. A contrastive learner-corpus study of delexical verb+noun collocations. Amsterdam; Philadelphia: John Benjamins Publishing Company. Wray, A. (2013). Formulaic language. Language Teaching, 46(03), 316–334. Yoon, H. (2016). Association strength of verb-noun combinations in experienced NS and less experienced NNS writing: Longitudinal and cross-sectional findings. Journal of Second Language Writing 34, 42-57.
63 THANK YOU! Stefania Spina stefania.spina@unistrapg.it @sspina Luciana Forti luciana.forti@unistrapg.it @l_for_ti
You can also read