Climbing the Tower of Treebanks: Improving Low-Resource Dependency Parsing via Hierarchical Source Selection
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Climbing the Tower of Treebanks: Improving Low-Resource Dependency Parsing via Hierarchical Source Selection Goran Glavaš Ivan Vulić University of Mannheim University of Cambridge Data and Web Science Group Language Technology Lab goran@informatik.uni-mannheim.de iv250@cam.ac.uk Abstract et al., 2019; Liu et al., 2019; Brown et al., 2020) and their end-to-end fine-tuning for downstream Recent work on multilingual dependency pars- ing focused on developing highly multilingual tasks has reduced the downstream relevance of su- parsers that can be applied to a wide range of pervised syntactic parsing. What is more, there is low-resource languages. In this work, we sub- more and more evidence that PLMs implicitly ac- stantially outperform such “one model to rule quire rich syntactic knowledge through large-scale them all” approach with a heuristic selection pretraining (Hewitt and Manning, 2019; Chi et al., of languages and treebanks on which to train 2020) and that exposing them to explicit syntax the parser for a specific target language. Our from human-coded treebanks does not offer sig- approach, dubbed T OWER, first hierarchically nificant language understanding benefits (Kuncoro clusters all Universal Dependencies languages based on their mutual syntactic similarity com- et al., 2020; Glavaš and Vulić, 2021). In order to puted from human-coded URIEL vectors. For implicitly acquire syntactic competencies, however, each low-resource target language, we then PLMs need language-specific corpora at the scale climb this language hierarchy starting from at which it can only be obtained for a tiny por- the leaf node of that language and heuristi- tion of world’s 7,000+ languages. For the remain- cally choose the hierarchy level at which to col- ing vast majority of languages – with limited-size lect training treebanks. This treebank selection monolingual corpora – explicit syntax still provides heuristic is based on: (i) the aggregate size of all treebanks subsumed by the hierarchy level valuable linguistic bias for more sample-efficient and (ii) the similarity of the languages in the learning in downstream NLP tasks. training sample with the target language. For Reliable syntactic parsing requires annotated languages without development treebanks, we treebanks of reasonable size: this prerequisite is, additionally use (ii) for model selection (i.e., unfortunately, satisfied for even fewer languages. early stopping) in order to prevent overfitting Despite the multi-year, well-coordinated annota- to development treebanks of closest languages. Our T OWER approach shows substantial gains tion efforts such as the Universal Dependencies for low-resource languages over two state-of- (Nivre et al., 2016, 2020) project, language-specific the-art multilingual parsers, with more than 20 treebanks are unlikely to appear anytime soon for LAS point gains for some of those languages. most world languages. This renders the transfer of Parsing models and code available at: https: syntactic knowledge from high-resource languages //github.com/codogogo/towerparse. with annotated treebanks a necessity. A truly zero- shot transfer for low-resource languages assumes a 1 Introduction set of training treebanks from resource-rich source Syntactic parsing – grounded in a wide variety of languages and a target language without any syn- formalisms (Taylor et al., 2003; De Marneffe et al., tactic annotations. Effectively, the task is then to 2006; Hockenmaier and Steedman, 2007; Nivre identify the subset of source treebanks, the parser et al., 2016, inter alia) – has been the backbone trained on which would yield the best parsing per- of natural language processing (NLP) for decades, formance for the target language. An exhaustive and an indispensable preprocessing step for tack- search over all possible subsets of source treebanks ling higher-level language understanding tasks. A is not only computationally intractable1 but also recent major paradigm shift in NLP towards large- scale pretrained language models (PLMs) (Devlin 1 One can create 2N − 1 different training sets from a 4878 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 4878–4888 August 1–6, 2021. ©2021 Association for Computational Linguistics
uninformative in true zero-shot scenarios in which there is no development treebank (i.e., any syn- tactically annotated data) for the target language. Most existing transfer methods therefore either (1) choose one (or a few) best source languages for each target language (Rosa and Zabokrtsky, 2015; Agić, 2017; Lin et al., 2019; Litschko et al., 2020) or (2) train a single multilingual parser on all available treebanks; such parsers, based on pre- trained multilingual encoders, currently produce Figure 1: Part of the syntax-based hierarchical cluster- best results in low-resource parsing (Kondratyuk ing of UD languages (ISO 639-1 codes). and Straka, 2019; Üstün et al., 2020). Other trans- fer approaches, e.g., based on data augmentation (Şahin and Steedman, 2018; Vania et al., 2019), similarity. To this end, we represent each language violate the zero-shot transfer by assuming a small with its syntax knn vector from the URIEL target-language treebank – a requirement unful- database (Littell et al., 2017). Features of these filled for most world languages.2 103-dimensional vectors correspond to individual In this work, we propose a simple and effec- syntactic properties from manually coded linguis- tive heuristic for selecting a good set of source tic resources such as WALS (Dryer and Haspel- treebanks for any given low-resource target lan- math, 2013) and SSWL (Collins and Kayne, 2009). guage. In our approach, named T OWER, we first URIEL’s syntax knn strategy replaces feature hierarchically cluster all Universal Dependencies values missing in those resources with kNN-based (UD) languagues. To this end, we compute syntac- predictions (cf. (Littell et al., 2017) for more de- tic similarity of languages by comparing manually tails). We then carry out hierarchical agglomer- coded vectors of their syntactic properties from the ative clustering with Ward’s linkage (Anderberg, URIEL database (Littell et al., 2017). We then it- 2014) with Euclidean distances between URIEL eratively ‘climb’ that language hierarchy level by vectors guiding the clustering. Figure 1 shows a level, starting from the leaf node of the target lan- dendrogram of one part of the resulting hierarchy. guage. We stop ‘climbing’ (i.e., select the set of We display the complete hierarchy in the Appendix. source treebanks subsumed by the current hierar- The syntax-based clustering largely reflects mem- chy level), when the relative decrease in linguistic berships in language (sub)families, with a few no- similarity of the training sample w.r.t the target table exceptions: e.g., Tagalog (tl), from the Aus- language outweighs the increase in size of the train- tronesian family appears to be syntactically similar ing sample. We additionally exploit the linguistic to (and is joined with) Scottish (gd), Irish (ga), similarity between the target language and its clos- and Welsh (cy) from the Celtic branch of the Indo- est sources with existing development treebanks to European family. inform a model selection (that is, early-stopping) heuristic. T OWER substantially outperforms state- Treebank Selection (TBS). For a given test tree- of-the-art multilingual parsers – UDPipe (Straka, bank, we start climbing the hierarchy from the leaf 2018), UDify (Kondratyuk and Straka, 2019), and node of the treebank’s language. Let sl denote UDapter (Üstün et al., 2020) on low-resource lan- the number of climbing steps we take from the guages, while offering comparable performance for target leaf node l. If the target test treebank also high-resource languages. has the corresponding training portion, in-treebank training constitutes the first training configuration 2 Climbing the T OWER of Treebanks (we denote this configuration with sl = −1). For resource-rich languages with several training tree- Constructing the T OWER. We start by hierar- banks, we create the next training sample by con- chically clustering the set of 89 languages from catenating all of those treebanks (we denote this Universal Dependencies 3 based on their syntactic level with sl = 0).4 For low-resource target lan- collection of N source treebanks. 2 4 For the vast majority of world languages there does not For example, for the Russian test treebank SynTagRus, exist a single manually annotated syntactic tree. the training set at sl = −1 consists of the train portion of 3 We worked with the UD version 2.5. the same SynTagRus treebank; at sl = 0, we concatenate 4879
guages without any training treebanks, the first language. Model selection based on the develop- training sample is collected at sl = 1, where the ment set of the source language, on the other hand, language is joined with other languages. The train- overfits the model to the source language, which ing set corresponding to a hierarachy level (i.e., may hurt effectiveness of the cross-lingual transfer each join in the tree) concatenates all training tree- (Keung et al., 2020; Chen and Ritter, 2020). For banks of all languages (i.e., leaf nodes) of the re- test treebanks with a respective development por- spective hierarchy subtree.5 tion, T OWER uses that development set for model Let {Sn }Nn=0 (or −1) be the set of training configu- selection. For low-resource languages l without rations collected by climbing the hierarchy starting development treebanks, we compile a proxy de- from the target language l and let Sn = ∪{Tk }K k=1 velopment set Dl = ∪{Dk }K k=1 by collecting all be the n-th training set consisting of K training development treebanks Dk from the hierarchy level treebanks. As we climb the hierarchy (i.e., as n closest to l that encompasses at least one treebank increases), the training set Sn is bound to grow; at with a development set.6 Intuitively, the more the same time, the sample of training languages syntactically similar Dl is to l, the more benefi- becomes increasingly dissimilar w.r.t. the target cial the model selection based on Dl will be for language l. In other words, as we climb higher performance on l, the optimal model checkpoint up the induced syntactic hierarchy of languages, w.r.t. l should be closer to the model checkpoint we train on more data but from a mixture of (syn- exhibiting best performance on Dl . Accordingly, tactically) more distant languages. Let lk be the with M as the model checkpoint with best per- language of the training treebank Tk . We then quan- formance on Dl , we select the model chekpoint tify the syntactic similarity sim(Sn , l) between the M 0 = bsim(Dl , l) · M c (see Eq.(1)) as the “opti- training set Sn and the target language l as follows: mal” checkpoint for the target language l. K Shallow Biaffine Parser. T OWER employs the 1 X sim(Sn , l) = |Tk | · cos(lk , l) (1) shallow biaffine parser of Glavaš and Vulić (2021), |Sn | k=1 stacked on top of the pretrained XLM-R (Conneau with cos(lk , l) as cosine similarity between URIEL et al., 2020). Compared to the standard biaffine vectors of lk and l, and relative sizes of individual parser (Dozat and Manning, 2017; Kondratyuk treebanks |Tk |/|Sn | as weights. We then use the and Straka, 2019; Üstün et al., 2020), this shallow following simple heuristic to select the best train- variant forwards word-level representations (aggre- ing set Sn : we stop climbing when the relative gated from subword output) directly into biaffine growth of the training set becomes smaller than the products, bypassing deep feed-forward transforma- relative decrease of the similarity with the target tions that produce dependent- and head-specific language, i.e., we select the smallest n for which vectors (Dozat and Manning, 2017). The shallow the following condition is satisifed: variant is reported to perform comparably (Glavaš and Vulić, 2021), while being faster to train. |Sn+1 | sim(Sn , l) < . (2) 3 Evaluation and Discussion |Sn | sim(Sn+1 , l) Treebanks and Baselines. We evaluate T OWER on 138 (test) treebanks from Universal Dependen- Model Selection (MS). Early stopping based on cies (Nivre et al., 2020).7 We compare T OWER the model performance on a development set (dev) against two state-of-the-art multilingual parsers: is an important mechanism for preventing model (1) UDify (Kondratyuk and Straka, 2019) couples overfitting in supervised machine learning. In a the multilingual BERT (mBERT) (Devlin et al., truly zero-shot transfer setup, on the one hand, we 6 do not have any development data in the target E.g., Dl for l=tl consists of develompent portions of ga and gd treebanks, whereas Dl for l=cy consists only of the the training portions of Russian GSD, PUD, and SynTagRus development set of ga. treebanks. 7 We work with UD v2.5. Due to mismatches between 5 Note that the number of climbs sl needed to reach some XLM-R’s subword tokenizer and word-level treebank tokens hierarchy level depends on the language l: e.g., the hierarchy we skip: all Chinese treebanks, Assyrian (AS), Old Russian level joining Tagalog (tl) with Scottish, Irish, and Welsh ({gd, (RNC and TOROT), Skolt Sami (Giellagas), Japanese (Mod- ga, cy}) is reached in sl = 1 climbs from Tagalog, sl = 2 ern and BCCWJ), A. Greek (Perseus), Gothic (PROIEL), Cop- climbs from Scottish and sl = 3 climbs from Irish and Welsh. tic (Scriptorium), OC Slavonic (PROIEL) and Yoruba (YTB). 4880
∩UDify ∩UDapt H IGH L OW Model UAS LAS UAS LAS UAS LAS UAS LAS UDify 80.9 73.9 – – 89.2 85.3 39.9 22.2 UDapter – – 63.8 52.8 90.9 87.6 43.9 29.3 T OWER 82.4 74.3 68.9 56.0 90.0 86.3 53.7 33.8 -TBS 80.8 73.2 62.8 51.7 89.4 85.6 47.0 30.1 -MS 82.1 74.1 67.9 55.2 89.4 85.6 51.2 32.2 -TBS-MS 80.7 83.1 62.4 51.3 89.4 85.6 45.9 29.0 Table 1: Parsing performance (UAS, LAS) on different UD treebank subsets for state-of-the-art multilingual parsers UDify and UDapter and variants of our T OWER method. Bold: best performance in each column. 100 UDify UDApter Tower 93.7 93.5 92.1 92.8 93.11 92.2 93.8 91.5 92.0 91.4 92.0 90 88.5 89.7 89.3 89.0 90.2 88.1 88.8 87.1 89.4 89.0 90.3 85.9 86.6 84.4 83.7 83.3 82.9 82.0 81.7 81.0 80.0 80 74.3 69.6 70.0 70 67.4 60 ar (PADT) en (EWT) eu (BDT) fi (TDT) he (HTB) hi (HDTB) it (ISDT) ja (GSD) ko (GSD) ru (STR) sv (TB) tr (IMST) UDify UDApter Tower 74.3 75 69.2 68.4 69.5 58.5 59.3 50 44.5 44.7 38.4 39.8 40.1 36.7 35.0 33.8 32.2 29.3 25 22.2 22.2 17.3 16.4 19.2 19.4 18.6 16.2 13.0 12.5 12.1 8.2 6.2 8.6 8.1 8.0 8.0 4.5 3.5 5.9 0 akk am bm br fo kpv myv pcm sa tl wbp AVG Figure 2: LAS performance of UDify, UDapter and T OWER on 12 high-resource treebanks (top figure), and 11 low-resource languages (bottom figure). 2019) with the deep biaffine parser (Dozat and H = 768 and apply a dropout (p = 0.1) on its out- Manning, 2017) and trains on all UD treebanks; (2) puts before forwarding them to the shallow parsing UDapter (Üstün et al., 2020) extends mBERT with head. We train in batches of 32 sentences and op- adapter parameters (Houlsby et al., 2019; Pfeiffer timize parameters with Adam (Kingma and Ba, et al., 2020) that are contextually generated (Pla- 2015) (starting learning rate 10−5 ). We train for 30 tanios et al., 2018) from URIEL vectors – the pa- epochs, with early stopping based on dev loss.8 rameters of the adapter generator are trained on Results and Discussion. We show detailed re- treebanks of 13 diverse resource-rich languages se- sults for all 138 treebanks in the Appendix. In lected by Kulmizev et al. (2019). We additionally Table 1, we show averages over different treebank quantify the contributions of T OWER’s heuristic subsets: treebanks on which both T OWER and (1) components (TBS and MS, see §2) by evaluating UDify (∩UDify; 111 treebanks) and (2) UDapter variants in which we (1) remove TBS and train on (∩UDapt; 39 treebanks) have been evaluated, (3) the closest language with training data (-TBS), (2) 12 high-resource languages on which UDapter was remove MS and just select the model checkpoint trained (H IGH) and (4) 11 low-resource treebanks that performs best on the proxy dev set Dl (-MS), (L OW) for which all three models have been eval- and (3) remove both TBS and MS (-TBS-MS). uated. We show LAS scores for languages from 8 Training and Optimization Details. We limit For low-resource languages without the dev set, we use the proxy Dl (see 2). We checkpoint the model (i.e., measure input sequences to 128 subword tokens. We use the dev loss) 10 times per epoch and stop training when the XLM-R Base with L = 12 layers and hidden size loss does not decrease over 10 consecutive checkpoints. 4881
H IGH and L OW in Figure 2. Similar trends are heuristics to inform the source selection. A wide- observed with UAS scores. scale UD evaluation and comparisons to current T OWER outperforms UDify and UDapter in all state-of-the-art multilingual dependency parsers setups except H IGH, with especially pronounced validated the effectiveness of T OWER, especially in gains for L OW. This renders T OWER particu- low-resource languages. Moreover, while the main larly successful for the intended use case: low- experiments in this work were based on one partic- resource languages without any training data. Ad- ular state-of-the-art parsing architecture, T OWER is mittedly, the fact that T OWER is built on XLM- fully independent of the chosen underlying parsing R, whereas UDify and UDapter use mBERT, im- model, and thus widely applicable. pedes the direct “apples-to-apples” comparison. Two sets of results, however, strongly suggest that Acknowledgments it is T OWER’s heuristics (TBS & MS) that drive Goran Glavaš is supported by the Baden its performance rather than the XLM-R (instead Württemberg Stiftung (Eliteprogramm, AGREE of mBERT) encoder. First, UDapter outperforms grant). The work of Ivan Vulić is supported by T OWER on high-resource languages with large the ERC Consolidator Grant LEXICAL: Lexical training treebanks (i.e., the HIGH setup). For these Acquisition Across Languages (no. 648909). languages, however, T OWER effectively does not employ its heuristics: (i) TBS selects the large language-specific treebank(s), as adding any other References language prohibitively reduces the perfect similar- Željko Agić. 2017. Cross-lingual parser selection ity sim(S0 , l) = 1 (see Eq. (1)); (ii) MS is not used for low-resource languages. In Proceedings of the because each high-resource treebank has its own NoDaLiDa 2017 Workshop on Universal Dependen- dedicated dev set. Secondly, removing T OWER’s cies, pages 1–10. heuristics (see -TBS-MS in Table 1) brings its per- Michael R. Anderberg. 2014. Cluster Analysis for Pub- formance slightly below that of UDapter, rendering lications. Academic Press. TBS (primarily) and MS (rather than the XLM-R Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie encoder) crucial for T OWER’s gains. Comparing Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind -TBS and -MS reveals that, somewhat expectedly, Neelakantan, Pranav Shyam, Girish Sastry, Amanda selecting the “optimal” training sample (TBS) con- Askell, et al. 2020. Language models are few-shot tributes to the overall performance more than the learners. In Proceedings of NeurIPS 2020. heuristic early stopping (MS). Yang Chen and Alan Ritter. 2020. Model selection for Looking at individual low-resource languages cross-lingual transfer using a learned scoring func- (Fig. 2), we observe largest gains for Amharic (am) tion. arXiv preprint arXiv:2010.06127. and Sanskrit (sa). While Sanskrit benefits from Ethan A. Chi, John Hewitt, and Christopher D. Man- T OWER selecting training languages from the same ning. 2020. Finding universal grammatical relations family (Marathi, Urdu, and Hindi), Amharic (Afro- in multilingual BERT. In Proceedings of ACL 2020, Asiatic family), interestingly, benefits from tree- pages 5564–5577. banks of syntactically similar languages from an- Chris Collins and Richard Kayne. 2009. Syntactic other family (cf. the full T OWER hierarchy in the structures of the world’s languages. Appendix) – Tamil and Telugu (Dravidian family). Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Similarly, Tagalog (Austronesian language) pars- Vishrav Chaudhary, Guillaume Wenzek, Francisco ing massively benefits from training on Scottish Guzmán, Edouard Grave, Myle Ott, Luke Zettle- and Irish treebanks (Indo-European, Celtic). moyer, and Veselin Stoyanov. 2020. Unsupervised cross-lingual representation learning at scale. In 4 Conclusion Proceedings of ACL 2020, pages 8440–8451. Marie-Catherine De Marneffe, Bill MacCartney, We proposed T OWER, a simple yet effective ap- Christopher D Manning, et al. 2006. Generat- proach to the crucial problem of source language ing typed dependency parses from phrase structure selection for multilingual and cross-lingual depen- parses. In Proceedings of LREC 2006, pages 449– dency parsing. It leverages the language hierarchy, 454. induced from syntax-based manually coded URIEL Jacob Devlin, Ming-Wei Chang, Kenton Lee, and language vectors, and simple treebank selection Kristina Toutanova. 2019. BERT: Pre-training of 4882
deep bidirectional transformers for language under- learning. In Proceedings of ACL 2019, pages 3125– standing. In Proceedings of NAACL-HLT 2019, 3135. pages 4171–4186. Robert Litschko, Ivan Vulić, Željko Agić, and Goran Timothy Dozat and Christopher D. Manning. 2017. Glavaš. 2020. Towards instance-level parser selec- Deep biaffine attention for neural dependency pars- tion for cross-lingual transfer of dependency parsers. ing. In Proceedings of ICLR 2017. In Proceedings of COLING 2020, pages 3886–3898. M. S. Dryer and M. Haspelmath. 2013. WALS On- Patrick Littell, David R Mortensen, Ke Lin, Kather- line. Max Planck Institute for Evolutionary Anthro- ine Kairis, Carlisle Turner, and Lori Levin. 2017. pology, Leipzig. URIEL and lang2vec: Representing languages as typological, geographical, and phylogenetic vectors. Goran Glavaš and Ivan Vulić. 2021. Is supervised syn- In Proceedings of EACL 2017, pages 8–14. tactic parsing beneficial for language understanding? An empirical investigation. In Proceedings of EACL Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man- 2021, pages 3090–3104. dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. John Hewitt and Christopher D. Manning. 2019. A RoBERTa: A robustly optimized BERT pretraining structural probe for finding syntax in word represen- approach. arXiv:1907.11692. tations. In Proceedings of NAACL-HLT 2019, pages 4129–4138. Joakim Nivre, Marie-Catherine De Marneffe, Filip Gin- ter, Yoav Goldberg, Jan Hajic, Christopher D Man- Julia Hockenmaier and Mark Steedman. 2007. CCG- ning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, bank: A corpus of CCG derivations and dependency Natalia Silveira, et al. 2016. Universal Dependen- structures extracted from the Penn Treebank. Com- cies v1: A multilingual treebank collection. In Pro- putational Linguistics, 33(3):355–396. ceedings of LREC 2016, pages 1659–1666. Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Joakim Nivre, Marie-Catherine de Marneffe, Filip Gin- Gesmundo, Mona Attariyan, and Sylvain Gelly. ter, Jan Hajič, Christopher D Manning, Sampo 2019. Parameter-efficient transfer learning for NLP. Pyysalo, Sebastian Schuster, Francis Tyers, and In Proceedings of ICML 2019, pages 2790–2799. Daniel Zeman. 2020. Universal Dependencies v2: An evergrowing multilingual treebank collection. Phillip Keung, Yichao Lu, Julian Salazar, and Vikas arXiv preprint arXiv:2004.10643. Bhardwaj. 2020. Don’t use English dev: On the zero-shot cross-lingual evaluation of contextual em- Jonas Pfeiffer, Andreas Rücklé, Clifton Poth, Aish- beddings. In Proceedings of EMNLP 2020, pages warya Kamath, Ivan Vulić, Sebastian Ruder, 549–554. Kyunghyun Cho, and Iryna Gurevych. 2020. AdapterHub: A framework for adapting Transform- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A ers. In Proceedings of EMNLP 2020: System method for stochastic optimization. In Proceedings Demonstrations, pages 46–54. of ICLR 2015. Emmanouil Antonios Platanios, Mrinmaya Sachan, Dan Kondratyuk and Milan Straka. 2019. 75 lan- Graham Neubig, and Tom Mitchell. 2018. Contex- guages, 1 model: Parsing Universal Dependencies tual parameter generation for universal neural ma- universally. In Proceedings of EMNLP-IJCNLP chine translation. In Proceedings of EMNLP 2018, 2019, pages 2779–2795. pages 425–435. Artur Kulmizev, Miryam de Lhoneux, Johannes Rudolf Rosa and Zdenek Zabokrtsky. 2015. Klcpos3-a Gontrum, Elena Fano, and Joakim Nivre. 2019. language similarity measure for delexicalized parser Deep contextualized word embeddings in transition- transfer. In Proceedings of ACL-JCNLP 2015, pages based and graph-based dependency parsing - a tale 243–249. of two parsers revisited. In Proceedings of EMNLP- IJCNLP 2019, pages 2755–2768. Gözde Gül Şahin and Mark Steedman. 2018. Data aug- mentation via dependency tree morphing for low- Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, resource languages. In Proceedings of EMNLP Dani Yogatama, Laura Rimell, Chris Dyer, and Phil 2018, pages 5004–5009. Blunsom. 2020. Syntactic structure distillation pre- training for bidirectional encoders. Transactions of Milan Straka. 2018. UDPipe 2.0 prototype at CoNLL the ACL, 8:776–794. 2018 UD shared task. In Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Text to Universal Dependencies, pages 197–207. Yuyan Zhang, Mengzhou Xia, Shruti Rijhwani, Junxian He, Zhisong Zhang, Xuezhe Ma, Antonios Ann Taylor, Mitchell Marcus, and Beatrice Santorini. Anastasopoulos, Patrick Littell, and Graham Neubig. 2003. The Penn treebank: An overview. In Tree- 2019. Choosing transfer languages for cross-lingual banks, pages 5–22. 4883
Ahmet Üstün, Arianna Bisazza, Gosse Bouma, and Gertjan van Noord. 2020. UDapter: Language adap- tation for truly universal dependency parsing. In Proceedings of EMNLP 2020. Clara Vania, Yova Kementchedjhieva, Anders Søgaard, and Adam Lopez. 2019. A systematic comparison of methods for low-resource dependency parsing on genuinely low-resource languages. In Proceedings of EMNLP 2019, pages 1105–1116. 4884
Appendix Treebank Method UAS LAS Treebank Method UAS LAS Afrikaans UDPipe 89.38 86.58 Danish UDPipe 86.88 84.31 AfriBooms (af) UDify 86.97 83.48 DDT (da) UDify 87.76 84.5 T OWER 88.26 85.28 T OWER 85.60 82.14 Akkadian UDify 27.65 4.54 Dutch UDPipe 91.37 88.38 PISANDUB (akk) UDapter 26.4 8.2 Alpino (nl) UDify 94.23 91.21 T OWER 33.45 6.12 T OWER 93.42 90.31 Amharic UDify 17.38 3.49 Dutch UDPipe 90.2 86.39 ATT (am) UDapter 12.8 5.91 LassySmall (nl) UDify 94.34 91.22 T OWER 72.64 38.41 T OWER 92.45 88.29 Ancient Greek UDPipe 85.93 82.11 English ESL (en) T OWER 30.22 6.45 PROIEL (grc) UDify 78.91 72.66 T OWER 85.04 79.85 UDPipe 89.63 86.97 English UDify 90.96 88.5 Arabic NYUAD (ar) T OWER 33.53 15.94 EWT (en) UDapter 93.12 89.67 T OWER 92.16 89.29 UDPipe 87.54 82.94 Arabic UDify 87.72 82.88 UDPipe 87.27 84.12 English PADT (ar) UDapter 88.66 84.42 UDify 89.14 85.73 T OWER 88.92 83.72 GUM (en) T OWER 90.07 86.61 Arabic PUD (ar) UDify 76.17 67.07 English UDPipe 84.15 79.71 T OWER 75.94 59.72 LinES (en) UDify 87.33 83.71 T OWER 87.12 82.91 Armenian UDPipe 78.62 71.27 ArmTDP (hy) UDify 85.63 78.61 English PUD (en) UDify 91.52 88.66 T OWER 86.57 80.51 T OWER 90.89 87.33 Bambara UDify 30.28 8.6 English UDPipe 90.29 87.27 CRB (bm) UDapter 28.7 8.1 ParTUT (en) UDify 92.84 90.14 T OWER 31.33 8.03 T OWER 89.36 85.63 UDPipe 86.11 82.86 English Pronouns (en) T OWER 89.50 85.37 Basque UDify 84.94 80.97 BDT (eu) UDapter 87.25 83.33 Erzya UDify 31.9 16.38 T OWER 84.28 80.02 JR (myv) UDapter 34.21 19.15 T OWER 36.44 19.38 UDPipe 78.58 72.72 Belarusian UDify 91.82 87.19 Estonian UDPipe 88.0 85.18 HSE (be) UDapter 84.16 79.33 EDT (et) UDify 89.53 86.67 T OWER 86.40 81.56 T OWER 90.24 87.08 UDapter 52.9 37.34 Estonian EWT (et) T OWER 88.80 84.54 Bhojpuri BHTB (bho) T OWER 52.62 35.86 Faroese UDify 67.24 59.26 Breton UDify 63.52 39.84 OFT (fo) UDapter 77.15 69.2 UDapter 72.91 58.5 T OWER 77.43 68.41 KEB (br) T OWER 67.73 44.47 Finnish UDPipe 90.68 87.89 Bulgarian UDPipe 93.38 90.35 FTB (fi) UDify 86.37 81.4 UDify 95.54 92.4 T OWER 91.91 89.05 BTB (bg) T OWER 95.67 92.03 Finnish PUD (fi) UDify 89.76 86.58 UDPipe 32.6 18.83 T OWER 88.24 82.48 Buryat UDify 48.43 26.28 BDT (bxr) UDPipe 89.88 87.46 UDapter 48.68 28.89 Finnish UDify 86.42 82.03 T OWER 51.53 29.16 TDT (fi) UDapter 91.87 89.01 Catalan UDPipe 93.22 91.06 T OWER 92.78 90.22 AnCora (ca) UDify 94.25 92.33 French FQB (fr) T OWER 93.36 87.00 T OWER 94.04 92.08 French FTB (fr) T OWER 28.04 14.80 Croatian UDPipe 91.1 86.78 SET (hr) UDify 94.08 89.79 French UDPipe 90.65 88.06 T OWER 92.22 87.02 GSD (fr) UDify 93.6 91.45 T OWER 94.06 91.31 Czech UDPipe 92.99 90.71 CAC (cs) UDify 94.33 92.41 French PUD (fr) UDify 88.36 82.76 T OWER 94.91 92.10 T OWER 91.02 83.52 Czech UDPipe 86.9 84.03 French UDPipe 92.17 89.63 CLTT (cs) UDify 91.69 89.96 ParTUT (fr) UDify 90.55 88.06 T OWER 94.11 91.38 T OWER 87.90 79.33 Czech UDPipe 92.91 89.75 French UDPipe 92.37 90.73 FicTree (cs) UDify 95.19 92.77 Sequoia (fr) UDify 92.53 90.05 T OWER 95.12 91.83 T OWER 92.07 89.93 Czech UDPipe 93.33 91.31 French UDPipe 82.9 77.53 PDT (cs) UDify 94.73 92.88 Spoken (fr) UDify 85.24 80.01 T OWER 95.01 92.41 T OWER 84.41 74.77 Czech PUD (cs) UDify 92.59 87.95 Galician UDPipe 86.44 83.82 T OWER 93.26 87.06 CTG (gl) UDify 84.75 80.89 T OWER 83.85 80.65 4885
Treebank Method UAS LAS Treebank Method UAS LAS Galician UDPipe 82.72 77.69 UDPipe 87.7 84.24 UDify 84.08 76.77 Korean UDify 82.74 74.26 TreeGal (gl) T OWER 77.57 66.87 GSD (ko) UDapter 89.39 85.91 T OWER 86.04 81.70 German UDPipe 85.53 81.07 GSD (de) UDify 87.81 83.59 Korean UDPipe 88.42 86.48 T OWER 89.11 84.19 Kaist (ko) UDify 87.57 84.52 T OWER 88.78 86.11 German HDT (de) T OWER 97.65 96.54 Korean PUD (ko) UDify 63.57 46.89 German LIT (de) T OWER 86.55 78.74 T OWER 61.78 38.40 German PUD (de) UDify 89.86 84.46 UDPipe 45.23 34.32 T OWER 89.15 81.02 Kurmanji UDify 35.86 20.4 UDPipe 92.1 89.79 MG (kmr) UDapter 26.37 12.1 Greek T OWER 72.00 51.02 GDT (el) UDify 94.33 92.15 T OWER 94.13 91.16 UDPipe 91.06 88.8 Latin UDPipe 89.7 86.86 ITTB (la) UDify 92.43 90.12 Hebrew UDify 91.63 88.11 T OWER 91.25 87.67 HTB (he) UDapter 91.86 88.75 UDPipe 83.34 78.66 T OWER 90.71 87.05 Latin PROIEL (la) UDify 84.85 80.52 UDPipe 94.85 91.83 T OWER 83.74 77.75 Hindi UDify 95.13 91.46 HDTB (hi) Latin UDPipe 71.2 61.28 UDapter 95.29 91.96 UDify 78.33 69.6 T OWER 95.12 91.42 Perseus (la) T OWER 73.53 62.16 Hindi PUD (hi) UDify 71.64 58.42 UDPipe 87.2 83.35 T OWER 73.02 50.68 Latvian LVTB (lv) UDify 89.33 85.09 UDPipe 84.04 79.73 T OWER 92.26 88.52 Hungarian Szeged (hu) UDify 89.68 84.88 Lithuanian ALKSNIS (lt) T OWER 87.35 81.58 T OWER 87.87 81.02 Lithuanian UDPipe 51.98 42.17 Indonesian UDPipe 85.31 78.99 UDify 79.06 69.34 UDify 86.45 80.1 HSE (lt) T OWER 79.25 65.47 GSD (id) T OWER 83.71 76.84 Livvi KKPP (olo) UDapter 57.86 43.34 Indonesian PUD (id) UDify 77.47 56.9 T OWER 62.77 44.62 T OWER 76.71 53.16 Maltese UDPipe 84.65 79.71 Irish UDPipe 80.39 72.34 UDify 83.07 75.56 UDify 80.05 69.28 MUDT (mt) T OWER 76.64 67.31 IDT (ga) T OWER 80.33 66.80 UDPipe 70.63 61.41 UDPipe 93.49 91.54 Marathi UDify 79.37 67.72 Italian UDify 95.54 93.69 UFAL (mr) UDapter 61.01 44.4 ISDT (it) UDapter 95.32 93.46 T OWER 70.39 57.77 T OWER 94.47 91.98 Mbya Guarani Italian PUD (it) UDify 94.18 91.76 T OWER 18.10 5.82 T OWER 94.13 89.01 Dooley (gun) UDPipe 92.64 90.47 Mbya Guarani Italian T OWER 32.36 11.23 UDify 95.96 93.68 Thomas (gun) ParTUT (it) T OWER 95.06 91.57 Moksha JR (mdf) UDapter 40.15 26.55 Italian PoSTWITA (it) T OWER 86.95 81.75 T OWER 44.21 27.45 Italian TWITTIRO (it) T OWER 86.93 80.91 Naija UDify 45.75 32.16 NSC (pcm) UDapter 49.24 36.72 Italian VIT (it) T OWER 91.80 87.05 T OWER 52.03 34.95 UDPipe 95.06 93.73 North Sami UDPipe 78.3 73.49 Japanese UDify 94.37 92.08 Giella (sme) UDify 74.3 67.13 GSD (ja) UDapter 94.87 92.84 T OWER 53.53 42.05 T OWER 92.58 89.44 Norwegian UDPipe 92.39 90.49 Japanese PUD (ja) UDify 94.89 93.62 UDify 93.97 92.18 T OWER 91.12 88.41 Bokmaal (no) T OWER 94.77 93.12 Karelian KKPP (krl) UDapter 61.86 48.35 Norwegian UDPipe 92.09 90.01 T OWER 62.18 45.60 UDify 94.34 92.37 Nynorsk (no) T OWER 93.96 91.65 UDPipe 53.3 33.38 Kazakh UDify 74.77 63.66 Norwegian UDPipe 68.08 60.07 KTB (kk) UDapter 74.13 60.74 UDify 75.4 69.6 T OWER 73.70 59.88 NynorskLIA (no) T OWER 75.43 69.82 Komi Permyak UH (koi) UDapter 36.89 23.05 Old French UDPipe 91.74 86.83 T OWER 42.36 25.81 UDify 91.74 86.65 SRCMF (fro) T OWER 89.75 83.48 Komi Zyrian IKDP (kpv) UDify 36.01 22.12 T OWER 40.87 24.71 Persian UDPipe 90.05 86.66 UDify 28.85 12.99 Seraji (fa) UDify 89.59 85.84 Komi Zyrian T OWER 91.29 87.43 Lattice (kpv) UDapter 28.4 12.5 T OWER 33.29 17.33 4886
Treebank Method UAS LAS Treebank Method UAS LAS Polish UDPipe 96.58 94.76 Spanish PUD (es) UDify 90.45 83.08 LFG (pl) UDify 96.67 94.58 T OWER 89.66 80.23 T OWER 97.06 95.18 Swedish UDPipe 86.07 81.86 Polish PDB (pl) T OWER 94.99 89.95 LinES (sv) UDify 88.77 85.49 T OWER 88.63 85.07 Polish PUD (pl) T OWER 94.13 87.44 Swedish PUD (sv) UDify 89.17 86.1 Portuguese UDPipe 91.36 89.04 T OWER 89.20 84.95 Bosque (pt) UDify 91.37 87.84 T OWER 91.50 88.29 UDPipe 89.63 86.61 Swedish UDify 91.91 89.03 Portuguese UDPipe 93.01 91.63 Talbanken (sv) UDapter 92.62 90.26 GSD (pt) UDify 94.22 92.54 T OWER 89.70 86.60 T OWER 93.80 91.98 Swedish Sign Language UDPipe 50.35 37.94 Portuguese PUD (pt) UDify 87.02 80.17 UDify 40.43 26.95 T OWER 87.27 77.86 SSLC (swl) T OWER 31.56 20.57 Romanian UDPipe 89.12 84.2 UDapter 59.74 45.49 UDify 90.36 85.26 Swiss Ger. UZH (gsw) T OWER 55.61 40.17 Nonstandard (ro) T OWER 90.59 84.41 Tagalog UDify 64.04 40.07 Romanian UDPipe 91.31 86.74 UDapter 84.78 69.52 UDify 93.16 88.56 TRG (tl) T OWER 91.78 74.32 RRT (ro) T OWER 93.61 87.70 UDPipe 74.11 66.37 Romanian Tamil UDify 79.34 71.29 T OWER 91.19 86.75 TTB (ta) SiMoNERo (ro) UDapter 70.28 46.05 T OWER 71.28 64.36 Russian UDPipe 88.15 84.37 GSD (ru) UDify 90.71 86.03 UDPipe 91.26 85.02 T OWER 91.85 88.28 Telugu UDify 92.23 83.91 MTG (te) UDapter 83.52 71.1 Russian PUD (ru) UDify 93.51 87.14 T OWER 90.43 81.97 T OWER 94.59 88.26 Thai PUD (th) UDify 49.05 26.06 UDPipe 93.8 92.32 T OWER 78.23 53.80 Russian UDify 94.83 93.13 SynTagRus (ru) UDapter 94.04 92.24 Turkish GB (tr) T OWER 75.36 59.39 T OWER 95.28 93.75 UDPipe 74.19 67.56 Russian UDPipe 75.45 69.11 Turkish UDify 74.56 67.44 Taiga (ru) UDify 84.02 77.8 IMST (tr) UDapter 76.97 69.63 T OWER 84.83 77.71 T OWER 77.90 70.00 Sanskrit UDify 40.21 18.56 UDify 67.68 46.07 UDapter 44.32 22.22 Turkish PUD (tr) T OWER 62.29 41.57 UFAL (sa) T OWER 63.05 44.66 Ukrainian UDPipe 88.29 85.25 Scottish Gaelic UDify 92.83 90.3 T OWER 81.32 73.82 IU (uk) T OWER 92.54 89.89 ARCOSG (gd) Serbian UDPipe 92.7 89.27 UDPipe 45.58 34.54 UDify 95.68 91.95 Upper Sorbian UDify 71.55 62.82 SET (sr) T OWER 94.36 90.93 UFAL (hsb) UDapter 62.28 54.2 T OWER 70.98 60.90 Slovak UDPipe 89.82 86.9 UDify 95.92 93.87 Urdu UDPipe 87.5 81.62 SNK (sk) T OWER 93.77 90.87 UDify 88.43 82.84 UDTB (ur) T OWER 87.43 81.62 Slovenian UDPipe 92.96 91.16 UDify 94.74 93.07 Uyghur UDPipe 78.46 67.09 SSJ (sl) T OWER 94.91 93.50 UDify 65.89 48.8 UDT (ug) T OWER 79.11 66.41 Slovenian UDPipe 73.51 67.51 UDify 80.37 75.03 Vietnamese UDPipe 70.38 62.56 SST (sl) T OWER 78.64 73.10 UDify 74.11 66.0 VTB (vi) T OWER 72.40 63.50 Spanish UDPipe 92.34 90.26 UDify 92.99 90.5 Warlpiri UDify 21.66 7.96 AnCora (es) T OWER 92.67 90.44 UDapter 24.2 12.1 UFAL (wbp) T OWER 31.85 16.24 Spanish UDPipe 90.71 88.03 UDify 90.82 87.23 Welsh CCG (cy) UDapter 70.75 54.43 GSD (es) T OWER 92.12 89.64 T OWER 77.22 57.56 Wolof T OWER 69.06 58.13 WTB (wo) 4887
Figure 3: Dendrogram of the full syntax-based hierarchical clustering of 89 languages from UD v2.5. Languages are denoted with their ISO 639-1 codes. 4888
You can also read