Tutorial at LREC 2020 Graph-Based Meaning Representations: Design and Processing
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Tutorial at LREC 2020 Graph-Based Meaning Representations: Design and Processing https://github.com/cfmrp/tutorial Alexander Koller Stephan Oepen Weiwei Sun Saarland University University of Oslo Peking University koller@coli.uni-saarland.de oe@ifi.uio.no ws@pku.edu.cn Abstract our research community by providing a unifying view on these graph banks and their associated This tutorial is on representing and process- ing sentence meaning in the form of labeled parsing problems, while working out similarities directed graphs. The tutorial will (a) briefly and differences between common frameworks and review relevant background in formal and lin- techniques. guistic semantics; (b) semi-formally define a Based on common-sense linguistic and formal unified abstract view on different flavors of se- dimensions established in its first part, the tutorial mantic graphs and associated terminology; (c) will provide a coherent, systematized overview of survey common frameworks for graph-based meaning representation and available graph this field. Participants will be enabled to identify banks; and (d) offer a technical overview of genuine content differences between frameworks a representative selection of different parsing as well as to tease apart more superficial variation, approaches. for example in terminology or packaging. Fur- thermore, major current processing techniques for 1 Tutorial Content and Relevance semantic graphs will be reviewed against a high- All things semantic have been receiving height- level inventory of families of approaches. This part ened attention in recent years. Despite remarkable of the tutorial will emphasize reflections on co- advances in vector-based (continuous, dense, and dependencies with specific graph flavors or frame- distributed) encodings of meaning, ‘classic’ (hier- works, on worst-case and typical time and space archically structured and discrete) semantic rep- complexity, as well as on what guarantees (if any) resentations continue to play an important role in are obtained on the wellformedness and correct- ‘making sense’ of natural language. While parsing ness of output structures. has long been dominated by tree-structured target Kate and Wong (2010) suggest a definition of representations, there is now growing interest in semantic parsing as “the task of mapping natural general graphs as more expressive and arguably language sentences into complete formal mean- more adequate target structures for sentence-level ing representations which a computer can execute grammatical analysis beyond surface syntax and in for some domain-specific application.” This view particular for the representation of semantic struc- brings along a tacit expectation to map (more or ture. less) directly from a linguistic surface form to an Today, the landscape of meaning representation actionable encoding of its intended meaning, e.g. approaches, annotated graph banks, and parsing in a database query or even programming lan- techniques into these structures is complex and di- guage. In this tutorial, we embrace a broader per- verse. Graph-based semantic parsing has been a spective on semantic parsing as it has come to be task in almost every Semantic Evaluation (Sem- viewed commonly in recent years. We will review Eval) exercise since 2014. These shared tasks graph-based meaning representations that aim to were based on a variety of different corpora with be application- and domain-independent, i.e. seek graph-based meaning annotations (graph banks), to provide a reusable intermediate layer of inter- which differ both in their formal properties and in pretation that captures, in suitably abstract form, the facets of meaning they aim to represent. The relevant constraints that the linguistic signal im- goal of this tutorial is to clarify this landscape for poses on interpretation.
Tutorial slides and additional materials are 2008), DELPH-IN MRS Bi-Lexical Dependencies available at the following address: (DM; Ivanova et al., 2012) and Prague Semantic https://github.com/cfmrp/tutorial Dependencies (PSD; a simplification of the tecto- grammatical structures of Hajič et al., 2012). 2 Semantic Graph Banks In the first part of the tutorial, we will give a sys- tematic overview of the available semantic graph Type (1) A more general form of anchored se- banks. On the one hand, we will distinguish graph mantic graphs is characterized by relaxing the banks with respect to the facets of natural language correspondence relations between nodes and to- meaning they aim to represent. For instance, some kens, while still explicitly annotating the corre- graph banks focus on predicate–argument struc- spondence between nodes and parts of the sen- ture, perhaps with some extensions for polarity or tence. Some graph banks of this flavor align nodes tense, whereas others capture (some) scopal phe- with arbitrary parts of the sentence, including sub- nomena. Furthermore, while the graphs in most token or multi-token sequences, which affords graph banks do not have a precisely defined model more flexibility in the representation of meaning theory in the sense of classical linguistic seman- contributed by, for example, (derivational) affixes tics, there are still underlying intuitions about what or phrasal constructions. Some further allow mul- the nodes of the graphs mean (individual entities tiple nodes to correspond to overlapping spans, and eventualities in the world vs. more abstract ob- enabling lexical decomposition (e.g. of causatives jects to which statements about scope and presup- or comparatives). Frameworks instantiating this position can attach). We will discuss the different flavor of semantic graphs include Universal Con- intuitions that underly different graph banks. ceptual Cognitive Annotation (UCCA; Abend and On the other hand, we will follow Kuhlmann Rappoport, 2013; featured in a SemEval 2019 and Oepen (2016) in classifying graph banks with task) and two variants of ‘reducing’ the under- respect to the relationship they assume between specified logical forms of Flickinger (2000) and the tokens of the sentence and the nodes of the Copestake et al. (2005) into directed graphs, viz. graph (called anchoring of graph fragments onto Elementary Dependency Structures (EDS; Oepen input sub-strings). We will distinguish three fla- and Lønning, 2006) and Dependency Minimal Re- vors of semantic graphs, which by degree of an- cursion Semantics (DMRS; Copestake, 2009). All choring we will call type (0) to type (2). While we three frameworks serve as target representations in use ‘flavor’ to refer to formally defined sub-classes recent parsing research (e.g. Buys and Blunsom, of semantic graphs, we will reserve the term 2017; Chen et al., 2018; Hershcovich et al., 2018). ‘framework’ for a specific linguistic approach to graph-based meaning representation (typically cast in a particular graph flavor, of course). Type (2) Finally, our framework review will in- Type (0) The strongest form of anchoring is clude Abstract Meaning Representation (AMR; obtained in bi-lexical dependency graphs, where Banarescu et al., 2013), which in our hierarchy of graph nodes injectively correspond to surface lex- graph flavors is considered unanchored, in that the ical units (tokens). In such graphs, each node correspondence between nodes and tokens is not is directly linked to a specific token (conversely, explicitly annotated. The AMR framework de- there may be semantically empty tokens), and the liberately backgrounds notions of compositional- nodes inherit the linear order of their correspond- ity and derivation. At the same time, AMR fre- ing tokens. This flavor of semantic graphs was quently invokes lexical decomposition and repre- popularized in part through a series of Seman- sents some implicitly expressed elements of mean- tic Dependency Parsing (SDP) tasks at the Se- ing, such that AMR graphs quite generally appear mEval exercises in 2014–16 (Oepen et al., 2014, to ‘abstract’ furthest from the surface signal. Since 2015; Che et al., 2016). Prominent linguistic the first general release of an AMR graph bank in frameworks instantiating this graph flavor include 2014, the framework has provided a popular target CCG word–word dependencies (CCD; Hocken- for semantic parsing and has been the subject of maier and Steedman, 2007), Enju Predicate– two consecutive tasks at SemEval 2016 and 2017 Argument Structures (PAS; Miyao and Tsujii, (May, 2016; May and Priyadarshi, 2017).
3 Processing Semantic Graphs 4 Tutorial Structure The creation of large-scale, high-quality seman- We have organized the content of the tutorial into tic graph banks has driven research on semantic the following blocks, which add up to a total of parsing, where a system is trained to map from three hours of presentation. The references be- natural-language sentences to graphs. There is low are illustrative of the content in each block; now a dizzying array of different semantic pars- in the tutorial itself, we will present one or two ap- ing algorithms, and it is a challenge to keep track proaches per block in detail while treating others of their respective strengths and weaknesses. Dif- more superficially. ferent parsing approaches are, of course, more or (1) Linguistic Foundations: Layers of Sentence less effective for graph banks of different flavors Meaning (and, at times, even specific frameworks). We will discuss these interactions in the tutorial and cate- (2) Formal Foundations: Labeled Directed gorize existing approaches into four classes. Graphs (3) Meaning Representation Frameworks and Factorization-based approach A factorization- Graph Banks based parser explicitly models the target seman- tic structures by defining a score function that is • Bi-Lexical semantic dependencies (Hocken- able to evaluate the “goodness” of any candidate maier and Steedman, 2007; Miyao and Tsu- graph. To make a score function computable, a jii, 2008; Hajič et al., 2012; Ivanova et al., parser usually factorizes the score of a graph into 2012; Che et al., 2016); parts for smaller substrings and can then apply dy- • Universal Conceptual Cognitive Annotation namic programming to search for the best graph. (UCCA; Abend and Rappoport, 2013); Composition-based approach Following the • Graph-Based Minimal Recursion Semantics Principle of Compositionality, a semantic graph (EDS and DMRS; Oepen and Lønning, can be viewed as the result of a derivation pro- 2006; Copestake, 2009); cess, in which a set of lexical and syntactico- semantic rules are iteratively applied and evalu- • Abstract Meaning Representation (AMR; ated. A composition-based parser explicitly mod- Banarescu et al., 2013); els such derivation structures by defining a sym- • Non-Graph Representations: Discourse Rep- bolic system to manipulate graph construction and resentation Structures (DRS; Basile et al., a score function to select preferable derivations. 2012); Transition-based approach A transition-based • Contrastive review of selected examples parser models a derivation process in a left-to- across frameworks; right, word-by-word way. The key to building a • Availability of training and evaluation data; high-accuracy parser is to define a score function shared tasks; state-of-the-art empirical results that evaluates the individual derivation decisions (Oepen et al., 2019). for each token. In order to find a good derivation among a large set, a parser usually adopts a greedy (4) Parsing into Semantic Graphs search strategy which is sometimes psycholinguis- tically motivated. • Parser evaluation: quantifying semantic graph similarity; Translation-based approach A translation- • Parsing sub-tasks: segmentation, concept based parser takes a family of semantic graphs identification, relation detection, structural as a foreign language, in that a semantic graph is validation; encoded into a string and then viewed as a “sen- tence” from a different language. By linearizing • Composition-based methods (Callmeier, a graph into a string, a parser can reuse various 2000; Bos et al., 2004; Artzi et al., 2015; successful seq2seq models that are the heart of Groschwitz et al., 2018; Lindemann et al., modern Neural Machine Translation. 2019; Chen et al., 2018);
• Factorization-based methods (Flanigan http://www.coli.uni-saarland.de/ et al., 2014; Kuhlmann and Jonsson, 2015; ~koller Peng et al., 2017; Dozat and Manning, Alexander Koller received his PhD in 2004, with 2018); a thesis on underspecified processing of seman- • Transition-based methods (Sagae and Tsu- tic ambiguities using graph-based representations. jii, 2008; Wang et al., 2015; Buys and Blun- His research interests span a variety of topics in- som, 2017; Hershcovich et al., 2017); cluding parsing, generation, the expressive capac- ity of representation formalisms for natural lan- • Translation-based methods (Konstas et al., guage, and semantics. Within semantics, he has 2017; Peng et al., 2018; Stanovsky and Da- published extensively on semantic parsing using gan, 2018); both grammar-based and neural approaches. His most recent work in this field (Lindemann et al., • Cross-framework parsing and multi-task 2019) achieved state-of-the-art semantic parsing learning (Peng et al., 2017; Hershcovich accuracy across several graphbanks using neural et al., 2018; Stanovsky and Dagan, 2018); supertagging and dependency in the context of a • Cross-lingual parsing methods (Evang and compositional model. Bos, 2016; Damonte and Cohen, 2018; Stephan Oepen Zhang et al., 2018); Department of Informatics, University of Oslo, Norway • Contrastive discussion across frameworks, oe@ifi.uio.no approaches, and languages. https://www.mn.uio.no/ifi/ (5) Outlook: Applications of Semantic Graphs english/people/aca/oe/ 5 Content Breadth Stephan Oepen studied Linguistics, German and Russian Philology, Computer Science, and Com- Each of us has contributed research to the design putational Linguistics at Berlin, Volgograd, and of meaning representation frameworks, creation Saarbrücken. He has worked extensively on of semantic graph banks, and and/or the develop- constraint-based parsing and realization, on the ment of meaning representation parsing systems. design of broad-coverage meaning representa- Nonetheless, both the design and the processing of tions and the syntax–semantics interface, and on graph banks are highly active research areas, and the use of syntactico-semantic structure in natu- our own work will not represent more than a fifth ral language understanding applications. He has of the total tutorial content. been a co-developer of the LinGO English Re- source Grammar (ERG) since the mid-1990s, has 6 Participant Background helped create the Redwoods Treebank of scope- An understanding of basic parsing techniques underspecified MRS meaning representations, and (chart-based and transition-based) and a familiar- has chaired two SemEval tasks on Semantic De- ity with basic neural techniques (feed-forward and pendency Parsing as well as the First Shared recurrent networks, encoder–decoder) will be use- Task on Cross-Framework Meaning Representa- ful. tion Parsing (MRP) at the 2019 Conference for Computational Language Learning. 7 Presenters Weiwei Sun The tutorial will be given jointly by three presen- Institute of Computer Science and Technology, ters with partly overlapping and partly comple- Peking University, China mentary expertise. Each will contribute about one ws@pku.edu.cn third of the content, and each will be involved in https://wsun106.github.io/ multiple parts of the tutorial. Weiwei Sun completed her Ph.D. in the Depart- Alexander Koller ment of Computational Linguistics from Saarland Department of Language Science and University under the supervision of Prof. Hans Technology, Saarland University, Germany Uszkoreit. Before that, she studied at Peking Uni- koller@coli.uni-saarland.de versity, where she obtained BA in Linguistics, and
BS and MS in Computer Science. Her research Wanxiang Che, Yanqiu Shao, Ting Liu, and Yu Ding. lies at the intersection of computational linguistics 2016. SemEval-2016 task 9: Chinese semantic de- pendency parsing. In Proceedings of the 10th Inter- and natural language processing. The main topic national Workshop on Semantic Evaluation, pages is symbolic and statistical parsing, with a special 1074 – 1080, San Diego, CA, USA. focus on parsing into semantic graphs of various flavors. She has repeatedly chaired teams that Yufei Chen, Weiwei Sun, and Xiaojun Wan. 2018. Ac- curate SHRG-based semantic parsing. In Proceed- have submitted top-performing systems to recent ings of the 56th Meeting of the Association for Com- SemEval shared tasks and has continuously ad- putational Linguistics, pages 408 – 418, Melbourne, vanced both the state of the art in semantic parsing Australia. in terms of empirical results and the understand- Ann Copestake. 2009. Slacker semantics. Why super- ing of how design decisions in different schools of ficiality, dependency and avoidance of commitment linguistic graph representations impact formal and can be the right way to go. In Proceedings of the algorithmic complexity. 12th Meeting of the European Chapter of the Asso- ciation for Computational Linguistics, pages 1 – 9, Athens, Greece. References Ann Copestake, Dan Flickinger, Carl Pollard, and Ivan A. Sag. 2005. Minimal Recursion Semantics. Omri Abend and Ari Rappoport. 2013. Universal Con- An introduction. Research on Language and Com- ceptual Cognitive Annotation (UCCA). In Proceed- putation, 3(4):281 – 332. ings of the 51th Meeting of the Association for Com- putational Linguistics, pages 228 – 238, Sofia, Bul- Marco Damonte and Shay B. Cohen. 2018. Cross- garia. lingual Abstract Meaning Representation parsing. In Proceedings of the 2015 Conference of the North Yoav Artzi, Kenton Lee, and Luke Zettlemoyer. 2015. American Chapter of the Association for Computa- Broad-coverage CCG semantic parsing with AMR. tional Linguistics, pages 1146–1155, New Orleans, In Proceedings of the 2015 Conference on Empiri- LA, USA. cal Methods in Natural Language Processing, pages 1699–1710, Lisbon, Portugal. Timothy Dozat and Christopher D. Manning. 2018. Simpler but more accurate semantic dependency Laura Banarescu, Claire Bonial, Shu Cai, Madalina parsing. In Proceedings of the 56th Meeting of the Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Association for Computational Linguistics, pages Knight, Philipp Koehn, Martha Palmer, and Nathan 484–490, Melbourne, Australia. Schneider. 2013. Abstract Meaning Representation for sembanking. In Proceedings of the 7th Linguis- Kilian Evang and Johan Bos. 2016. Cross-lingual tic Annotation Workshop and Interoperability with learning of an open-domain semantic parser. In Pro- Discourse, pages 178 – 186, Sofia, Bulgaria. ceedings of the 26th International Conference on Computational Linguistics, pages 579–588, Osaka, Valerio Basile, Johan Bos, Kilian Evang, and Noortje Japan. Venhuizen. 2012. Developing a large semantically annotated corpus. In Proceedings of the 8th Interna- Jeffrey Flanigan, Sam Thomson, Jaime Carbonell, tional Conference on Language Resources and Eval- Chris Dyer, and Noah A. Smith. 2014. A discrim- uation, pages 3196 – 3200, Istanbul, Turkey. inative graph-based parser for the Abstract Meaning Representation. In Proceedings of the 52nd Meet- ing of the Association for Computational Linguis- Johan Bos, Stephen Clark, Mark Steedman, James R. tics, pages 1426 – 1436, Baltimore, MD, USA. Curran, and Julia Hockenmaier. 2004. Wide- coverage semantic representations from a CCG Dan Flickinger. 2000. On building a more efficient parser. In Proceedings of the 20th International grammar by exploiting types. Natural Language Conference on Computational Linguistics, pages Engineering, 6 (1):15 – 28. 1240–1246, Geneva, Switzerland. Jonas Groschwitz, Matthias Lindemann, Meaghan Jan Buys and Phil Blunsom. 2017. Robust incremen- Fowlie, Mark Johnson, and Alexander Koller. 2018. tal neural semantic graph parsing. In Proceedings AMR dependency parsing with a typed semantic al- of the 55th Meeting of the Association for Com- gebra. In Proceedings of the 56th Meeting of the putational Linguistics, pages 158 – 167, Vancouver, Association for Computational Linguistics, pages Canada. 1831–1841, Melbourne, Australia. Ulrich Callmeier. 2000. PET. A platform for ex- Jan Hajič, Eva Hajičová, Jarmila Panevová, Petr perimentation with efficient HPSG processing tech- Sgall, Ondřej Bojar, Silvie Cinková, Eva Fučíková, niques. Natural Language Engineering, 6(1):99 – Marie Mikulová, Petr Pajas, Jan Popelka, Jiří 108. Semecký, Jana Šindlerová, Jan Štěpánek, Josef
Toman, Zdeňka Urešová, and Zdeněk Žabokrtský. Yusuke Miyao and Jun’ichi Tsujii. 2008. Feature for- 2012. Announcing Prague Czech-English Depen- est models for probabilistic HPSG parsing. Compu- dency Treebank 2.0. In Proceedings of the 8th In- tational Linguistics, 34(1):35 – 80. ternational Conference on Language Resources and Evaluation, pages 3153 – 3160, Istanbul, Turkey. Stephan Oepen, Omri Abend, Jan Hajič, Daniel Hersh- covich, Marco Kuhlmann, Tim O’Gorman, Nianwen Daniel Hershcovich, Omri Abend, and Ari Rappoport. Xue, Jayeol Chun, Milan Straka, and Zdeňka Ure- 2017. A transition-based directed acyclic graph šová. 2019. MRP 2019: Cross-framework Mean- parser for UCCA. In Proceedings of the 55th Meet- ing Representation Parsing. In Proceedings of the ing of the Association for Computational Linguis- Shared Task on Cross-Framework Meaning Repre- tics, pages 1127–1138, Vancouver, Canada. sentation Parsing at the 2019 Conference on Natu- Daniel Hershcovich, Omri Abend, and Ari Rappoport. ral Language Learning, pages 1 – 27, Hong Kong, 2018. Multitask parsing across semantic represen- China. tations. In Proceedings of the 56th Meeting of the Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Association for Computational Linguistics, pages Daniel Zeman, Silvie Cinková, Dan Flickinger, 373 – 385, Melbourne, Australia. Jan Hajič, and Zdeňka Urešová. 2015. SemEval Julia Hockenmaier and Mark Steedman. 2007. CCG- 2015 Task 18. Broad-coverage semantic depen- bank. A corpus of CCG derivations and dependency dency parsing. In Proceedings of the 9th Inter- structures extracted from the Penn Treebank. Com- national Workshop on Semantic Evaluation, pages putational Linguistics, 33:355 – 396. 915 – 926, Denver, CO, USA. Angelina Ivanova, Stephan Oepen, Lilja Øvrelid, and Stephan Oepen, Marco Kuhlmann, Yusuke Miyao, Dan Flickinger. 2012. Who did what to whom? Daniel Zeman, Dan Flickinger, Jan Hajič, Angelina A contrastive study of syntacto-semantic dependen- Ivanova, and Yi Zhang. 2014. SemEval 2014 Task cies. In Proceedings of the 6th Linguistic Annota- 8. Broad-coverage semantic dependency parsing. In tion Workshop, pages 2 – 11, Jeju, Republic of Ko- Proceedings of the 8th International Workshop on rea. Semantic Evaluation, pages 63 – 72, Dublin, Ireland. Rohit J. Kate and Yuk Wah Wong. 2010. Semantic Stephan Oepen and Jan Tore Lønning. 2006. parsing. The task, the state of the art and the fu- Discriminant-based MRS banking. In Proceedings ture. In Tutorial Abstracts of the 20th Meeting of the of the 5th International Conference on Language Association for Computational Linguistics, page 6, Resources and Evaluation, pages 1250 – 1255, Uppsala, Sweden. Genoa, Italy. Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, and Luke Zettlemoyer. 2017. Neural AMR. Hao Peng, Sam Thomson, and Noah A. Smith. 2017. Sequence-to-sequence models for parsing and gen- Deep multitask learning for semantic dependency eration. In Proceedings of the 55th Meeting of the parsing. In Proceedings of the 55th Meeting of the Association for Computational Linguistics, pages Association for Computational Linguistics, pages 146–157, Vancouver, Canada. 2037 – 2048, Vancouver, Canada. Marco Kuhlmann and Peter Jonsson. 2015. Parsing to Xiaochang Peng, Linfeng Song, Daniel Gildea, and noncrossing dependency graphs. Transactions of the Giorgio Satta. 2018. Sequence-to-sequence mod- Association for Computational Linguistics, 3:559 – els for cache transition systems. In Proceedings 570. of the 56th Meeting of the Association for Compu- tational Linguistics, pages 1842–1852, Melbourne, Marco Kuhlmann and Stephan Oepen. 2016. Towards Australia. a catalogue of linguistic graph banks. Computa- tional Linguistics, 42(4):819 – 827. Kenji Sagae and Jun’ichi Tsujii. 2008. Shift-reduce dependency DAG parsing. In Proceedings of the Matthias Lindemann, Jonas Groschwitz, and Alexan- 22nd International Conference on Computational der Koller. 2019. Compositional semantic parsing Linguistics, pages 753 – 760, Manchester, UK. across graphbanks. In Proceedings of ACL (Short Papers), Florence, Italy. Gabriel Stanovsky and Ido Dagan. 2018. Semantics as Jonathan May. 2016. SemEval-2016 Task 8. Mean- a foreign language. In Proceedings of the 2018 Con- ing representation parsing. In Proceedings of the ference on Empirical Methods in Natural Language 10th International Workshop on Semantic Evalua- Processing, pages 2412–2421, Brussels, Belgium. tion, pages 1063 – 1073, San Diego, CA, USA. Chuan Wang, Nianwen Xue, and Sameer Pradhan. Jonathan May and Jay Priyadarshi. 2017. SemEval- 2015. A transition-based algorithm for AMR pars- 2017 Task 9. Abstract Meaning Representation pars- ing. In Proceedings of the 2015 Conference of ing and generation. In Proceedings of the 11th Inter- the North American Chapter of the Association for national Workshop on Semantic Evaluation, pages Computational Linguistics, pages 366 – 375, Denver, 536 – 545. CO, USA.
Sheng Zhang, Xutai Ma, Rachel Rudinger, Kevin Duh, and Benjamin Van Durme. 2018. Cross-lingual de- compositional semantic parsing. In Proceedings of the 2018 Conference on Empirical Methods in Nat- ural Language Processing, pages 1664–1675, Brus- sels, Belgium.
You can also read