Error annotation in learner corpora: tools and applications in English and Italian - University of ...

Page created by Keith Snyder
 
CONTINUE READING
1

Error annotation in
learner corpora: tools
and applications in
English and Italian
OLGA VINOGRADOVA, NIKITA LOGIN, IVAN TORUBAROV
(RESEARCH UNIVERSITY HIGHER SCHOOL OF ECONOMICS,
MOSCOW)
LUCIANA FORTI, STEFANIA SPINA (UNIVERSITY FOR
FOREIGNERS PERUGIA, ITALY)
Part 4     2

    The annotation of phraseological errors in LOCCLI
   (Longitudinal Corpus of Chinese Learners of Italian)

                        Stefania Spina & Luciana Forti

                    University for Foreigners of Perugia

13th TaLC Conference
18-21 July 2018
Faculty of Education, University of Cambridge
3
Outline

u   Phraseological errors. Why they are relevant in SLA
    and why they are challenging for researchers.
u   Description of a scheme for the annotation of Italian
    phraseological errors in learner texts.
u   Annotation lab
    u   data on learner errors
    u   data on annotators agreement
u   Conclusions
4
Relevance of phraseological errors in SLA
research

1. Evidence form corpus linguistics & psycholinguistics:
     ➢   centrality of formulaic units in language acquisition,
         processing and use
     (Hoey 2005; Siyanova 2013; Taylor 2012; Wray 2013)

2. Evidence from learner corpus research:
     ➢   formulaic units as particularly challenging for learners
         even at higher proficiency levels (Ellis et al. 2015;
         Howarth 1998; Laufer & Waldman 2011);
     ➢   VN collocations particularly challenging (Bestgen &
         Granger 2014; Nesselhauf 2005; Wang 2016).
5
Relevance of phraseological errors in SLA
research

Implications for SLA research and SL/FL pedagogy:
a.   unit of observation in empirical research on the
     development of learner language through time;
b.   definition and grading of learning aims: language
     learning principles devised by Rod Ellis and
     adopted by the New Zealand Ministry of Education
     (Ellis, 2005; Maley, 2016).
6
The analysis of phraseological errors in learner
language

ISSUES
1. Phraseological units mostly analysed in terms of
    a.     frequency;
    b.     strength of association;
    c.     non-nativelike uses compared to native
           language uses.
    ➢    Limited evidence related to their accuracy in
         context (Spina, forthcoming; Thewissen, 2015)
7
The analysis of phraseological errors in learner
language

ISSUES
2. Phraseological errors mostly analysed in cross-sectional studies.
Limitations:
a.   Most SLA theories are based on how second language learning
     evolves over time (Gass & Selinker, 2008);
b.   Most cross-sectional learner corpora contain data from a single
     proficiency level (Granger et al. 2009).
     ➢   Limited evidence related to the anaysis of phraseological
         errors in longitudinal learner corpora (Bestgen & Granger 2014;
         Qui & Ding 2011; Siyanova-Chanturia 2015; Siyanova-Chanturia
         & Spina in preparation; Yoon 2016)
8
The analysis of phraseological errors in learner
language

ISSUES
3. Difficulty in classifying errors and automatically
annotating large learner corpora.

    ➢    Limited evidence related to the agreement
         between annotators with different degrees of
         expertise, and between different error
         categories.
9
Filling in gaps

Creation and annotation of the Longitudinal Corpus of Chinese
Learners of Italian (LOCCLI):
➢   represents a language with limited LCR evidence (Italian);
➢   covers a 6 month time-span;
➢   includes 3 different proficiency levels;
➢   contains error annotation for different categories of collocations.
10
      Error annotation scheme

       A)           1. Word replacement
  Lexical errors    2. Non-existing combination
                    3. Existing combination with different meaning

      B)           4. Determiner
Grammatical errors 5. Modifier
     Addition       6. Agreement
     Omission       7. Number
      Choice
     Position
11
              Error types

             A) Lexical errors

1.            Word replacement

     e.g. sento molto paura (136116 A)

          (« I have a lot of fear »)
12
              Error types

             A) Lexical errors

1.            Word replacement

     e.g. sento molto paura (136116 A)

          (« I have a lot of fear »)
13
              Error types

             A) Lexical errors

1.            Word replacement

     e.g. sento molto paura (136116 A)
               ho

          (« I have a lot of fear »)
14
              Error types

              A) Lexical errors

     2. Non-existing word combination

e.g. dopo aver mangiato il pranzo (136139 B)

     (« after having eaten the lunch »)
15
              Error types

              A) Lexical errors

     2. Non-existing word combination

e.g. dopo aver mangiato il pranzo (136139 B)

     (« after having eaten the lunch »)
16
              Error types

              A) Lexical errors

     2. Non-existing word combination

e.g. dopo aver mangiato il pranzo (136139 B)
                                  aver pranzato
     (« after having eaten the lunch »)
17
                Error types

               A) Lexical errors

3. Existing combination with different meaning

 e.g. mi piace godo questi hobby (136815 B)

       (« I like to enjoy these hobbies »)
18
                Error types

               A) Lexical errors

3. Existing combination with different meaning

 e.g. mi piace godo questi hobby (136815 B)

       (« I like to enjoy these hobbies »)
19
                Error types

               A) Lexical errors

3. Existing combination with different meaning

 e.g. mi piace godo questi hobby (136815 B)

                                   dedicarmi a questi hobby

       (« I like to enjoy these hobbies »)
20
       Error types

            B) Grammatical errors

           4. Determiner (omission)

e.g. abbiamo visitato Musei Vaticani (136736 B)

       (« We visited Vatican Museums »)
21
             Error types

                 B) Grammatical errors

                4. Determiner (omission)

e.g. abbiamo visitato Musei Vaticani (omission) (136736 B)

            (« We visited Vatican Museums »)
22
             Error types

                 B) Grammatical errors

                4. Determiner (omission)

e.g. abbiamo visitato Musei Vaticani (omission) (136736 B)
                            i

            (« We visited Vatican Museums »)
23
         Error types

              B) Grammatical errors

              5. Modifier (position)

e.g. Ci sono molte famose opere d’arte (136736 B)

     (« There are many famous works of art »)
24
         Error types

              B) Grammatical errors

              5. Modifier (position)

e.g. Ci sono molte famose opere d’arte (136736 B)

     (« There are many famous works of art »)
25
         Error types

              B) Grammatical errors

              5. Modifier (position)

e.g. Ci sono molte famose opere d’arte (136736 B)
                                       opere d’arte famose
     (« There are many famous works of art »)
26
  Error types

       B) Grammatical errors

           6. Agreement

e.g. studio lingua italiano (136736 A)

  (« I study the Italian language »)
27
  Error types

       B) Grammatical errors

           6. Agreement

e.g. studio lingua italiano (136736 A)

  (« I study the Italian language »)
28
  Error types

       B) Grammatical errors

           6. Agreement

e.g. studio lingua italiano (136736 A)
                           lingua italiana
  (« I study the Italian language »)
29
    Error types

        B) Grammatical errors

              7. Number

e.g. fare una nuova amicizia (136380 B)

    (« to make a new friendship »)
30
    Error types

        B) Grammatical errors

              7. Number

e.g. fare una nuova amicizia (136380 B)

    (« to make a new friendship »)
31
    Error types

        B) Grammatical errors

              7. Number

e.g. fare una nuova amicizia (136380 B)
                           nuove amicizie
    (« to make a new friendship »)
32
Annotation lab

u   Participants: first year University students of a Master’s
    degree in “Teaching Italian as a second
    language” (University for Foreigners of Perugia)
u   The task was carried out in a computer lab, where a
    pc with internet connection was available to each
    student
u   Learner texts annotated using Brat
    u   http://brat.nlplab.org
33
Accessing the corpus
34
Longitudinal Corpus of Chinese Learners of
Italian (LOCCLI)

u   350 essays;
u   175 Chinese learners of Italian – each learner, two
    essays (beginning and end of a 6-month course);
u   3 proficiency levels (A1, A2, B1);
u   Age: 17-33 years old (mean=20.5, SD=2.7; 105
    females)
35
Annotation lab:
word combination types

u   three different types of combinations, particularly
    challenging for learners:
    u   Verb+noun (VN) combinations, where the noun is the
        direct object of the verb
    u   Noun+adjective (NADJ)
    u   Adjective+noun (ADJN)
        u   the two combinations used in the adjectival modifier
            grammatical dependency.
36
VN combinations

u   The sequence of verb and noun can be interrupted,
    and its internal order can be inverted, in the case of
    passive constructions:
    u   Fare la doccia (“take a shower”)
    u   Fare spesso la doccia (“often take a shower”)
    u   Fare spesso una lunga e piacevole doccia (“often take
        a long and nice shower”)
    u   La doccia deve essere fatta preferibilmente all’inizio
        della giornata (“the shower must be taken preferably at
        the beginning of the day”)
37
NADJ and ADJN combinations

u   noun in Italian: either preceded or followed by one or more
    adjectives.
u   syntactic and semantic constraints: it follows the noun
    u   if it is modified by an adverb (un libro molto interessante “a very
        interesting book”)
    u   if it is modified by a complement (un libro utile per gli studenti “a
        useful book for the students”)
    u   or if it has the function of narrowing the noun it refers to, defining
        a subclass in its meaning (ho comprato dei fiori gialli “I bought
        some yellow flowers”).
u   two possible phraseological sequences:
    u   noun + adjective (NADJ): scuola elementare “primary school”
    u   adjective + noun (ADJN): bel tempo “nice weather”.
38
Annotation lab: description

u   46 students previously instructed on the annotation
    scheme
u   23 groups:
    u   each group formed by two students (one native and
        one L2).
u   Each student was asked to annotate 20 texts written
    by 10 different learners and collected in the two
    collection points A (beginning of the course) and B
    (six months later).
39
Annotation lab: the task

u   Assign a label with the
    word combination type,
    and decide whether
    correct or incorrect.
    u   choice among the error
        types required by the
        annotation scheme;
u   Formulate a target
    hypothesis
u   Write a final report
40
Example

http://clizia.unistrapg.it/brat/#/
41
Annotated texts
42
Annotation lab: data

u   data on learner errors in the use of the selected word
    combinations
    u   what is mostly difficult for Chinese learners of Italian
u   data on annotators agreement
    u   what was most difficult for annotators.
43
Annotation lab:
preliminary data on errors

u   Sample of 20 texts, two annotators, 393 word combinations

                                 A1       A2      B1
       Texts                     4        6       10

       n. of word combinations   81       114     198

       Word combinations per     20.2     19      19.8
       text
44
Word combination types
45
Errors per word combination type
46
Errors per word combination type

u   Grammatical errors are the most frequent errors
u   Lexical errors are constant through word
    combination types
u   ADJN are the least frequent combinations, but those
    where errors occur most (52%)
u   NADJ are the combinations where errors occur less
    (25%)
47
Grammatical errors per word combination type
(x100)
48
Example: modifier position errors in ADJN
combinations

u   This error type is due to the wrong position of the
    adjective, and is likely a transfer error, since in
    Chinese the adjective precedes the noun.

    u   Anche la moda fa un importante parte di Italia.
    u   “Fashion too is an important part of Italy”.

    u   Infine, ho trovato gli spagnoli ragazzi sono non più belli
        di italiani ragazzi.
    u   “Finally, I found that Spanish boys are not more beautiful
        than Italian boys”.
49
Outcomes

u   Allow students
    u   to have a direct contact with data produced by
        learners
    u   Through error annotation, to discover patterns of
        recurrent errors
    u   To make hypotheses on their frequency and their
        motivations
50
Visualization of errors in word combination types
51
Annotation lab: preliminary data on inter-annotator
agreement

u   Students’ IAA (2 students on 20 texts): 0.54 (moderate
    agreement); z = 19.7; p-value = 0.
u   Experts’ IAA (2 experts): 0.81 (near perfect
    agreement); z = 25.6; p-value = 0.
52
No agreement

u   There’s no agreement between annotators in 106
    word combinations (27%)
u   Correct combinations: 15%
u   Grammatical errors: 50%
u   Lexical errors: 34%
u   Grammatical errors are those where there is the
    lowest degree of agreement between annotators.
53
No agreement in grammatical errors
54
No agreement in grammatical errors

u   Errors with the lowest degree of agreement between
    annotators:
    u   modifiers (50%-100%)
    u   determiner addition (85%)

u   B1 – data collection point A
    u   Inoltre, nel tempo libero, se c'è possibilità, mi piace fare
        lo sport, per esempio nuotare e sciare.
    u   “In addition, in my free time, if there is the possibility, I
        like doing sport, for example swimming and skiing”.
55
Motivation: choice between alternative errors

u   A1 – data collection point B
    u   Abbiamo mangiato qualche la pizza
    u   “We ate some (the) pizza”

u   Two possible errors:
    u   Abbiamo mangiato la pizza
    u   Abbiamo mangiato qualche pizza
56
Motivation: choice between overlapping errors

u   A1 – data collection point B
    u   Vogliamo ascoltare musica, ci piace cantante, la
        cinese nome è Chen Yi Xun
    u   “We want to listen to the music, we like a singer, her
        Chinese name is Chen Yi Xun”

u   Two possible errors:
    u   la cinese nome agreement choice
    u   la cinese nome modifier position
57
Motivation:
complexity was too high

u   A2 – data collection point B
    u   Ma quando io ho fatto una passeggiata e ho veduto i
        italiani guardavano concorrenza in strada, anch'io stop
        a guardare perché è attraente
    u   “But when I took a walk and I saw Italian looking at
        concurrence in the street, I stopped looking as well
        because it’s attractive”
58
What the annotators said…

u   Uno degli aspetti più difficili a mio avviso è stato proprio quello
    lessicale. Per capire un errore di questo tipo infatti bisogna
    innanzitutto interpretare il messaggio che lo studente vuole
    inviare.
    u   “One of the most difficult aspects was the lexical one. You need to
        interpret the message that learner wants to convey in order to
        understand a lexical error”.
u   Avere a che fare con produzioni scritte di apprendenti stranieri
    è una bella sfida per un italiano madrelingua. Capire i loro errori
    ci dà l’opportunità di riflettere sulla nostra lingua, e ci permette
    di vedere l’idioma, di cui siamo abili ‘padroni’, sotto un altro
    punto di vista.
    u   “Dealing with written productions of L2 learners is a challenge for a
        native Italian. Understanding their errors allows us to reflect upon our
        language –that we fully master - and to consider it under a different
        point of view”.
59
Conclusions

Pedagogical advantages of annotation tasks for students
aiming to become SL/FL teachers:
u   Increased awareness of the properties of word
    combinations in their native language;
u   Acquitision of skills in analysing annotated data and gaining
    insight in relation to interlanguage and contrastive analysis
    (comparison with learners’ L1);
u   Drawing connections with SLA theories studied in previous,
    introductory, applied linguistics modules;
u   Use of analysed data in lesson planning (selecting and
    grading learning aims) and pedagogical materials’ design
    (building classroom activities).
60
Conclusions

Pedagogical advantages of annotation tasks for researchers:
u       Precious insight into differences in inter-annotator
        agreement rates
    •      between annotators with different levels of expertise
           (e.g. researchers vs. students);
    •      across different error types (e.g. errors involving
           determiner: lowest agreement)
u       Opportunity to improve CALL systems from a computational
        perspective.
u       Opportunity to trace the development of phraseological
        errors through time and across proficiency levels.
61
Conclusions

Potential pedagogical advantages of annotation tasks
for in-service teachers:
u   Awareness of new tools and resources developed by
    applied linguistics researchers (corpora, data
    extraction and annotation tools, etc.)
u   Insight into the properties of word combinations and
    their challenges in SL/FL pedagogical practice;
u   Feedback and collaboration on possible uses of
    annotated data in the SL/FL classroom.
62
      References

Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 26, 28–41.

Ellis, N., Simpson-Vlach, R., Römer, U., O’Donnell, M., & Wulff, S. (2015). Learner corpora and formulaic language in SLA. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Cambridge handbook of learner
corpus research (pp. 357–378). Cambridge: Cambridge University Press.

Ellis, R. (2005). Principles of instructed language learning. Asian TEFL Journal, 7(3), 9–29.

Gass, S. M., & Selinker, L. (2008). Second Language Acquisition. An Introductory Course. New York: Routledge.

Granger, S., Dagneaux, E., & Meunier, F. (Eds.). (2009). International corpus of learner English (Version 2). Louvain la Neuve: Presses universitaires de Louvain.

Hoey, M. (2005). Lexical priming. A new theory of words and language. London; New York: Routledge/AHRB.

Howarth, P. A. (1998). Phraseology and Second Language Proficiency. Applied Linguistics, 19(1), 24–44.

Laufer, B., & Waldman, T. (2011). Verb-Noun Collocations in Second Language Writing: A Corpus Analysis of Learners’ English: Verb-Noun Collocations in L2 Writing. Language Learning, 61(2), 647–672.

Maley, A. (2016). Principles and Procedures in Materials Development. In M. Azarnoosh, M. Zeraatpishe, A. Faravani, & H. R. Kargozari (Eds.), Issues in Materials Development (pp. 11–30). Rotterdam: Sense
Publishers.

Nesselhauf, N. (2005). Collocations in a Learner Corpus. Amsterdam-Philadelphia: Benjamins.

Qi Y. & Ding Y. (2011). Use of formulaic sequences in monologues of Chinese EFL learners. System 39, 164-174.
Siyanova-Chanturia, A. (2015). On the ‘holistic’ nature of formulaic language. Corpus Linguistics and Linguistic Theory, 11(2).

Siyanova-Chanturia, A. (2013). Eye-tracking and ERPs in multi-word expression research. A state-of-the-art review of the method and findings. The Mental Lexicon, 8(2), 245–268.

Spina, forthcoming. The development of phraseological errors in Chinese learners of Italian: a longitudinal study

Taylor, J. R. (2012). The mental corpus: how language is represented in the mind. Oxford ; New York: Oxford University Press.

Thewissen, J. (2015). Accuracy across Proficiency Levels. A Learner Corpus Approach. Louvain: Presses universitaires de Louvain.,

Wang, Y. (2016). The Idiom Principle and L1 Influence. A contrastive learner-corpus study of delexical verb+noun collocations. Amsterdam; Philadelphia: John Benjamins Publishing Company.

Wray, A. (2013). Formulaic language. Language Teaching, 46(03), 316–334.

Yoon, H. (2016). Association strength of verb-noun combinations in experienced NS and less experienced NNS writing: Longitudinal and cross-sectional findings. Journal of Second Language Writing 34,
42-57.
63
THANK YOU!

      Stefania Spina
stefania.spina@unistrapg.it
          @sspina

       Luciana Forti
 luciana.forti@unistrapg.it
         @l_for_ti
You can also read