Unsupervised Natural Language Parsing (Introductory Tutorial)
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Unsupervised Natural Language Parsing (Introductory Tutorial) Kewei Tu1 , Yong Jiang2 , Wenjuan Han3 , Yanpeng Zhao4 1 School of Information Science and Technology, ShanghaiTech University 2 Alibaba DAMO Academy, Alibaba Group 3 School of Computing, National University of Singapore, Singapore 4 ILCC, University of Edinburgh tukw@shanghaitech.edu.cn yongjiang.jy@alibaba-inc.com dcshanw@nus.edu.sg yanp.zhao@ed.ac.uk 1 Introduction More recently, however, there has been a resur- gence of interest in unsupervised parsing, with Syntactic parsing is an important task in natural more than ten papers on unsupervised parsing pub- language processing that aims to uncover the syn- lished in top NLP and AI venues over the past tactic structure (e.g., a constituent or dependency two years, including a best paper at ICLR 2019 tree) of an input sentence. Such syntactic struc- (Shen et al., 2019), a best paper nominee at ACL tures have been found useful in downstream tasks 2019 (Shi et al., 2019), and a best paper nominee such as semantic parsing, relation extraction, and at EMNLP 2020 (Zhao and Titov, 2020). This machine translation. renewed interest in unsupervised parsing can be Supervised learning is the main technique used attributed to the combination of two recent trends. to automatically learn a syntactic parser from data. First, there is a general trend in deep learning to- It requires the training sentences to be manually wards unsupervised training or pre-training. Sec- annotated with their correct parse trees. A major ond, there is an emerging trend in the NLP commu- challenge faced by supervised parsing is that syn- nity towards finding or modeling linguistic struc- tactically annotated sentences are not always avail- tures in neural models. The research on unsuper- able for a target language or domain and building vised parsing fits these two trends perfectly. a high-quality annotated corpus is very expensive Because of the renewed attention on unsuper- and time-consuming. vised parsing and its relevance to the recent trends A radical solution to this challenge is unsuper- in the NLP community, we believe a tutorial on vised parsing, sometimes also called grammar unsupervised parsing can be timely and beneficial induction, which learns a parser from training to many *ACL conference attendees. The tuto- sentences without parse tree annotations. Unsu- rial will introduce to the general audience what pervised parsing can also serve as the basis for unsupervised parsing does and how it can be use- semi-supervised and transfer learning of syntac- ful for and beyond syntactic parsing. It will then tic parsers when there exist both unannotated sen- provide a systematic overview of major classes of tences and (in-domain or out-of-domain) annotated approaches to unsupervised parsing, namely gener- sentences. In addition, the research of unsuper- ative and discriminative approaches, and analyze vised parsing is deemed interesting in the field of their relative strengths and weaknesses. It will machine learning because it is a representative task cover both decade-old statistical approaches and of unsupervised structured prediction, and in the more recent neural approaches to give the audience field of cognitive science because it inspires and a sense of the historical and recent development verifies cognitive research of human language ac- of the field. We also plan to discuss emerging re- quisition. search topics such as BERT-based approaches and The research on unsupervised parsing has a long visually grounded learning. history, dating back to theoretical studies in 1960s We expect that by taking this tutorial, one can (Gold, 1967) and algorithmic and empirical stud- not only obtain a deep understanding of the litera- ies in 1970s (Baker, 1979). Although deemed an ture and methodology of unsupervised parsing and interesting topic by the NLP community, unsuper- become well prepared for his own research into vised parsing had received much less attention than unsupervised parsing, but may also get inspirations supervised parsing over the past few decades. from the ideas and techniques of unsupervised pars- 1 Proceedings of EACL: Tutorials, pages 1–5 April 19 - 20, 2021. ©2020 Association for Computational Linguistics
ing and apply or extend them to other NLP tasks We will also discuss parameter learning algorithms that can potentially benefit from implicitly learned such as expectation-maximization (Baker, 1979; linguistic structures. Spitkovsky et al., 2010b), MCMC (Johnson et al., 2007) and curriculum learning (Spitkovsky et al., 2 Overview 2010a). After introducing approaches based on generative grammars, we will discuss recent ap- This will be a three-hour tutorial divided into five proaches that are instead based on neural language parts. models (Shen et al., 2018, 2019). In the first part, we will introduce the unsuper- The third part will cover discriminative ap- vised parsing task. We will start with the problem proaches, which model the conditional probability definition and discuss the motivations and appli- or score of the parse tree given the sentence. We cations of unsupervised parsing. For example, we will first introduce autoencoder approaches such will show that unsupervised parsing approaches as Cai et al. (2017), which contain an encoder that can be extended for semi-supervised parsing (Jia maps the sentence to an intermediate representa- et al., 2020) and cross-lingual syntactic transfer (He tion (such as a parse tree) and a decoder that tries to et al., 2019), and we will also show applications reconstruct the sentence. Their training objective of unsupervised parsing approaches beyond syn- is typically the reconstruction probability. We will tactic parsing (e.g., in computer vision (Tu et al., then introduce variational autoencoder approaches 2013)). We will then discuss how to evaluate unsu- such as Kim et al. (2019), which has a similar pervised parsing, including the evaluation metrics model structure to autoencoder approaches but uses and typical experimental setups. We will promote the evidence lower bound as the training objective. standardized setups to enable meaningful empirical Finally, we will briefly discuss other discriminative comparison between approaches (Li et al., 2020). approaches such as Grave and Elhadad (2015). Finally, we will give an overview of unsupervised In the fourth part, we will focus on several spe- parsing approaches to be discussed in the rest of cial topics. First, while most of the previous ap- the tutorial. proaches to unsupervised parsing are unlexicalized, In the second and third parts, we will introduce we will discuss the impact of partial and full lex- in detail two major classes of approaches to un- icalization (e.g., the work by Pate and Johnson supervised parsing, generative and discriminative (2016); Han et al. (2017)). Second, we will discuss approaches, and discuss their pros and cons. whether and how big training data could benefit The second part will cover generative ap- unsupervised parsing (Han et al., 2017). Third, we proaches, which model the joint probability of the will introduce recent attempts to induce syntactic sentence and the corresponding parse tree. Most parses from pretrained language models such as of the existing generative approaches are based BERT (Rosa and Mareček, 2019; Wu et al., 2020). on generative grammars, in particular context-free Fourth, we will cover unsupervised multilingual grammars and dependency models with valence parsing, the task of performing unsupervised pars- (Klein and Manning, 2004). There are also featur- ing jointly on multiple languages (e.g., the work ized and neural extensions of generative grammars, by Berg-Kirkpatrick and Klein (2010); Han et al. such as Berg-Kirkpatrick et al. (2010); Jiang et al. (2019)). Fifth, we will introduce visually grounded (2016). We will divide our discussion of learn- unsupervised parsing, which tries to improve unsu- ing generative grammars into two parts: structure pervised parsing with the help from visual data (Shi learning and parameter learning. Structure learn- et al., 2019). Finally, we will discuss latent tree ing concerns finding the optimal set of grammar models trained with feedback from downstream rules. We will introduce both probabilistic meth- tasks, which are related to unsupervised parsing ods such as Stolcke and Omohundro (1994) and (Yogatama et al., 2016; Choi et al., 2018). heuristic methods such as Clark (2007). Parame- In the last part, we will summarize the tutorial ter learning concerns learning the probabilities or and discuss potential future research directions of weights of a pre-specified set of grammar rules. unsupervised parsing. We will discuss a variety of priors and regular- izations designed to improve parameter learning, 3 Outline such as Cohen and Smith (2010), Tu and Honavar (2012), Noji et al. (2016), and (Jin et al., 2018). Part 1. Introduction [20 min] 2
• Problem definition 5 Reading List • Motivations and applications Klein and Manning (2004) – An influential gen- erative approach to unsupervised dependency • Evaluation parsing that is the basis for many subsequent papers. • Overview of approaches Jiang et al. (2016) – A neural extension of Klein Part 2. Generative Approaches [60 min] and Manning (2004). One of the first modern neural approaches to unsupervised parsing. • Overview Stolcke and Omohundro (1994) – One of the first • Approaches based on generative grammars structure learning approaches of context-free grammars for unsupervised constituency pars- – Structure learning ing. – Parameter learning Tu and Honavar (2012) – A parameter learning • Approaches based on language models approach to unsupervised dependency parsing based on unambiguity regularization. Coffee Break [30 min] Cai et al. (2017) – An autoencoder approach to Part 3. Discriminative Approaches [40 min] unsupervised dependency parsing. • Overview Kim et al. (2019) – A variational autoencoder ap- proach to unsupervised constituency parsing. • Autoencoders 6 Presenters • Variational autoencoders Kewei Tu (ShanghaiTech University) http://faculty.sist.shanghaitech.edu.cn/ • Other discriminative approaches faculty/tukw/ Kewei Tu is an associate professor with the Part 4. Special Topics [50 min] School of Information Science and Technology • Lexicalization at ShanghaiTech University. His research lies in the areas of natural language processing, machine • Big training data learning, and artificial intelligence in general, with a focus on the representation, learning and • BERT-based approaches application of linguistic structures. He has over 50 publications in NLP and AI conferences and • Unsupervised multilingual parsing journals including ACL, EMNLP, AAAI, IJCAI, NeurIPS and ICCV. He served as an area chair for • Visually grounded unsupervised parsing the syntax track at EMNLP 2020. • Latent tree models with downstream tasks Yong Jiang (Alibaba DAMO Academy) http://jiangyong.site Part 5. Summary and Future Directions [10 Yong Jiang is a researcher at Alibaba DAMO min] Academy, Alibaba Group. He received his Ph.D. degree from the joint program of ShanghaiTech 4 Prerequisites for the Attendees University and University of Chinese Academy of Linguistics Familiarity with grammars and syn- Sciences. He was a visiting student at University of tactic parsing. California, Berkeley in 2016. His research interest mainly focuses on machine learning and natural Machine Learning Basic knowledge about gen- language processing, especially multilingual and erative vs. discriminative models, unsuper- unsupervised natural language processing. His re- vised learning algorithms (such as expectation- search has been published in top-tier conferences maximization), and deep learning. and journals including ACL, EMNLP and AAAI. 3
Wenjuan Han (National University of Singa- Wenjuan Han, Yong Jiang, and Kewei Tu. 2017. De- pore) pendency grammar induction with neural lexicaliza- tion and big training data. In EMNLP. http://hanwenjuan.com Wenjuan Han is a research fellow at National Uni- Wenjuan Han, Ge Wang, Yong Jiang, and Kewei Tu. versity of Singapore. She received her Ph.D. degree 2019. Multilingual grammar induction with contin- from the joint program of ShanghaiTech University uous language identification. In EMNLP. and University of Chinese Academy of Sciences. Junxian He, Zhisong Zhang, Taylor Berg-Kirkpatrick, She was a visiting student at University of Califor- and Graham Neubig. 2019. Cross-lingual syntactic nia, Los Angeles in 2019. Her research focuses on transfer through unsupervised adaptation of invert- ible projections. In Proceedings of the 57th Annual natural language understanding, especially unsu- Meeting of the Association for Computational Lin- pervised syntactic parsing. Her research has been guistics, pages 3211–3223, Florence, Italy. Associa- published in top-tier *ACL conferences. tion for Computational Linguistics. Yanpeng Zhao (University of Edinburgh) Zixia Jia, Youmi Ma, Jiong Cai, and Kewei Tu. 2020. https://zhaoyanpeng.github.io/ Semi-supervised semantic dependency parsing us- ing CRF autoencoders. In Proceedings of the 58th Yanpeng Zhao is a Ph.D. student in the Institute for Annual Meeting of the Association for Computa- Language, Cognition and Computation (ILCC) at tional Linguistics, pages 6795–6805, Online. Asso- the University of Edinburgh. His research interests ciation for Computational Linguistics. lie in structured prediction and latent variable mod- Yong Jiang, Wenjuan Han, and Kewei Tu. 2016. Unsu- els with a focus on syntactic parsing and relation pervised neural dependency parsing. In EMNLP. induction. His work was nominated for the best pa- per award at ACL 2018 and received an honorable Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, and Lane Schwartz. 2018. Depth- mention for best paper award at EMNLP 2020. bounding is effective: Improvements and evaluation of unsupervised PCFG induction. In Proceedings of the 2018 Conference on Empirical Methods in Nat- References ural Language Processing, pages 2721–2731, Brus- J. K. Baker. 1979. Trainable grammars for speech sels, Belgium. Association for Computational Lin- recognition. In Speech Communication Papers for guistics. the 97th Meeting of the Acoustical Society of Amer- Mark Johnson, Thomas L. Griffiths, and Sharon Gold- ica. water. 2007. Bayesian inference for pcfgs via Taylor Berg-Kirkpatrick, Alexandre Bouchard-Côté, markov chain monte carlo. In HLT-NAACL, pages John DeNero, and Dan Klein. 2010. Painless unsu- 139–146. pervised learning with features. In NAACL. Yoon Kim, Alexander M Rush, Lei Yu, Adhiguna Kun- Taylor Berg-Kirkpatrick and Dan Klein. 2010. Phylo- coro, Chris Dyer, and Gábor Melis. 2019. Unsu- genetic grammar induction. In ACL. pervised recurrent neural network grammars. In NAACL. Jiong Cai, Yong Jiang, and Kewei Tu. 2017. CRF au- toencoder for unsupervised dependency parsing. In Dan Klein and Christopher D. Manning. 2004. Corpus- EMNLP. based induction of syntactic structure: Models of de- pendency and constituency. In ACL. Jihun Choi, Kang Min Yoo, and Sang-goo Lee. 2018. Unsupervised learning of task-specific tree struc- Jun Li, Yifan Cao, Jiong Cai, Yong Jiang, and Kewei tures with tree-lstms. AAAI. Tu. 2020. An empirical comparison of unsupervised constituency parsing methods. In Proceedings of the A. Clark. 2007. Learning deterministic context free 58th Annual Meeting of the Association for Compu- grammars: The omphalos competition. Machine tational Linguistics, pages 3278–3283, Online. As- Learning, 66. sociation for Computational Linguistics. Shay B Cohen and Noah A Smith. 2010. Covariance Hiroshi Noji, Yusuke Miyao, and Mark Johnson. 2016. in unsupervised learning of probabilistic grammars. Using left-corner parsing to encode universal struc- The Journal of Machine Learning Research. tural constraints in grammar induction. In EMNLP. E. Mark Gold. 1967. Language identification in the John K Pate and Mark Johnson. 2016. Grammar induc- limit. Information and Control, 10(5):447–474. tion from (lots of) words alone. In COLING. Edouard Grave and Noémie Elhadad. 2015. A convex Rudolf Rosa and David Mareček. 2019. Inducing and feature-rich discriminative approach to depen- syntactic trees from bert representations. In Black- dency grammar induction. In ACL-IJCNLP. boxNLP. 4
Yikang Shen, Zhouhan Lin, Chin wei Huang, and Aaron Courville. 2018. Neural language modeling by jointly learning syntax and lexicon. In Interna- tional Conference on Learning Representations. Yikang Shen, Shawn Tan, Alessandro Sordoni, and Aaron Courville. 2019. Ordered neurons: Integrat- ing tree structures into recurrent neural networks. In International Conference on Learning Representa- tions. Haoyue Shi, Jiayuan Mao, Kevin Gimpel, and Karen Livescu. 2019. Visually grounded neural syntax ac- quisition. arXiv preprint arXiv:1906.02890. Valentin I. Spitkovsky, Hiyan Alshawi, and Daniel Ju- rafsky. 2010a. From Baby Steps to Leapfrog: How “Less is More” in unsupervised dependency parsing. In NAACL. Valentin I Spitkovsky, Hiyan Alshawi, Daniel Jurafsky, and Christopher D Manning. 2010b. Viterbi train- ing improves unsupervised dependency parsing. In CoNLL. Andreas Stolcke and Stephen Omohundro. 1994. In- ducing probabilistic grammars by bayesian model merging. In International Colloquium on Grammat- ical Inference. Kewei Tu and Vasant Honavar. 2012. Unambiguity reg- ularization for unsupervised learning of probabilistic grammars. In EMNLP-CoNLL. Kewei Tu, Maria Pavlovskaia, and Song-Chun Zhu. 2013. Unsupervised structure learning of stochastic and-or grammars. In Advances in Neural Informa- tion Processing Systems 26, pages 1322–1330. Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. 2020. Perturbed masking: Parameter-free probing for ana- lyzing and interpreting BERT. In Proceedings of the 58th Annual Meeting of the Association for Compu- tational Linguistics, pages 4166–4176, Online. As- sociation for Computational Linguistics. Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, and Wang Ling. 2016. Learning to compose words into sentences with reinforcement learning. In ICLR. Yanpeng Zhao and Ivan Titov. 2020. Visually grounded compound PCFGs. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4369–4379, Online. Association for Computational Linguistics. 5
You can also read