SHERLIIC A TYPED EVENT-FOCUSED LEXICAL INFERENCE BENCHMARK FOR EVALUATING NATURAL LANGUAGE INFERENCE
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SherLIiC A Typed Event-Focused Lexical Inference Benchmark for Evaluating Natural Language Inference Martin Schmitt and Hinrich Schütze CIS, LMU Munich July 29, 2019 Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 1
Natural Language Inference (NLI) Natural language inference (NLI) is the task of recognizing entailment, contradiction or neutrality for a pair of sentences (Dagan et al., 2013; Williams et al., 2018). Example from SNLI (Bowman et al., 2015) Premise: Two men on bicycles competing in a race. Hypotheses: 1 People are riding bikes. entailment 2 Men are riding bicycles on the street. neutral 3 A few people are catching fish. contradiction a lot of different linguistic phenomena involved role of lexical knowledge often neglected (Gururangan et al., 2018; Glockner et al., 2018) Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 2
SherLIiC: Lexical Inference in Context Specifically: Verbal semantics in context If person[A] is running org[B], 3 then person[A] is leading org[B]. If computer[A] is running software[B], 3 then computer[A] is using software[B]. If computer[A] is running software[B], 7 then computer[A] is leading software[B]. Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 3
SherLIiC: Lexical Inference in Context Specifically: Verbal semantics in context If person[A] is running org[B], 3 then person[A] is leading org[B]. If computer[A] is running software[B], 3 then computer[A] is using software[B]. If computer[A] is running software[B], 7 then computer[A] is leading software[B]. Compared to general NLI: controlled yet challenging Task: Binary entailment detection Abstract context with knowledge graph types (Freebase) o Very similar sentences o Distributional similarity of positive and negative examples Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 3
Components of SherLIiC Annotated Typed event graph Inference candidates dev and test per be forced to leave loc United States of America Germany ⇒ per leave loc 3 org be led by per ⇒ per manage org politician meet with per 7 Barack Obama Angela Merkel ⇒ politician interact with per .. . ∼190k typed relations ∼960k pairs of ∼4k annotated between Freebase entities Freebase-typed relations inference candidates with high distributional split 25/75 ∼17M triples overlap in dev and test Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 4
Creation of SherLIiC (I) SherLIiC Typed Event Graph (TEG) Preprocess the large entity-linked corpus ClueWeb09 with a dependency parser Extract shortest paths between entities in the dependency graphs ⇒ relations Type heterogeneous relations by finding largest typable subsets (2) (1) United States of America Barack Obama Angela Merkel Germany (1) nsubj leader poss (2) nsubj meet prep with pobj nsubj lead dobj nsubj interact prep with pobj nsubj chancellor prep of pobj nsubj support dobj policy poss Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 5
Creation of SherLIiC (II) SherLIiC-InfCands Score all pairs of relations according to statistical relevance, significance and entity overlap (distributional features) Best-scoring relations pairs become inference candidates For two typed relations A, B ⊆ E × E, we compute three scores: P(B | A) S Relv(A, B) := i∈{1,2} πi (A ∩ B) P(B) esr(A, B) := 2 |A ∩ B| X σ(A, B) := 2 |A ∩ B| P(H | A) log(Relv(A, H)) H∈{B,¬B} Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 6
Creation of SherLIiC (III) SherLIiC-dev and SherLIiC-test Annotate a random subset of SherLIiC-InfCands on Amazon Mechanical Turk Collect at least 5 annotations per InfCand Filter annotators with a qualification test and confidence values Total number of annotated InfCands 3985 Balance yes/no 33% / 67% Pairs with unanimous gold label 53.0% Pairs with 1 disagreeing annotation 27.4% Pairs with 2 disagreeing annotations 19.6% Individual label = gold label 86.7% Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 7
Creation of SherLIiC (III) SherLIiC-dev and SherLIiC-test Annotate a random subset of SherLIiC-InfCands on Amazon Mechanical Turk Collect at least 5 annotations per InfCand Filter annotators with a qualification test and confidence values 1600 0 disagreeing with the majority 1400 1 Number of annotations 1200 2 1000 800 600 400 200 0 no yes Martin Schmitt and Hinrich Schütze (CIS) Class label SherLIiC July 29, 2019 7
State of the Art on SherLIiC Main Insights Knowledge graph embeddings do not capture necessary information Supervised NLI model ESIM is fooled by sentence similarity Best system combines word2vec and type-informed relation embeddings Baseline P in % R in % F1 in % Lemma 90.7 8.9 16.1 Always yes 33.3 100.0 49.9 TransE (typed) 33.3 99.1 49.8 ComplEx (typed) 33.7 94.9 49.7 ESIM 39.0 83.3 53.1 w2v+tsg rel emb 51.8 72.7 60.5 Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 8
Analysis of entailment scores on SherLIiC-dev 1.0 Normalized score on dev 0.5 0.0 0.5 Entailment yes 1.0 no d) d) 2ve c b b ( t y p e ( t yp e o r d r el _em r el _em Tra nsE m p lEx w t y ped_ + tsg_ Co w2 v Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 9
How can I use SherLIiC? SherLIiC-TEG, -InfCands, -dev and -test are publicly available: https://github.com/mnschmit/SherLIiC Summary SherLIiC-TEG: knowledge graph with event-like relations combines knowledge graph relations with textual relations large resource for relation inference in knowledge graphs SherLIiC-InfCand: large collection of unlabeled samples similar enough to SherLIiC-dev and -test for transfer learning contains noisy labels from best baseline w2v+tsg rel emb SherLIiC-test and -dev: Finetune on SherLIiC-dev Evaluate models of natural language inference (NLI) and/or lexical semantics Evaluate relation and graph embedding techniques Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 10
References I Ido Dagan, Dan Roth, Mark Sammons, and Fabio Massimo Zanzotto. Recognizing Textual Entailment: Models and Applications. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, 2013. ISBN 9781598298352. URL http://dx.doi.org/10.2200/S00509ED1V01Y201305HLT023. Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2015. Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 11
References II Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1101. URL https://www.aclweb.org/anthology/N18-1101. Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman, and Noah A. Smith. Annotation artifacts in natural language inference data. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018. Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 12
References III Max Glockner, Vered Shwartz, and Yoav Goldberg. Breaking NLI systems with sentences that require simple lexical inferences. In The 56th Annual Meeting of the Association for Computational Linguistics (ACL), Melbourne, Australia, July 2018. Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen. Enhanced lstm for natural language inference. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017. Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 13
Relation Extraction → Event Graph Example sentence from Freebase-linked ClueWeb09 In Japan[03 3d], the PS3[067gh] is slowly closing the huge sales gap with Wii[026kds]. closing prep nsubj aux advmod dobj In m.067gh is slowly gap pobj det det amod compound prep m.03_3d the the huge sales with pobj m.026kds Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 14
closing prep nsubj aux advmod dobj In m.067gh is slowly gap pobj det det amod compound prep m.03_3d the the huge sales with pobj m.026kds m.067gh nsubj– closing dobj gap prep with pobj m.026kds m.067gh nsubj– closing prep In pobj m.03 3d m.03 3d pobj– In prep– closing dobj gap prep with pobj m.026kds Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 15
Typing the Event Graph (SherLIiC-TEG) Type a relation R For each argument slot, identify the k entity types that induce the largest subsets Consider the k 2 typed subrelations of R constructed by restricting arguments to one of the types from the previous step Accept a typed relation if it contains at least ϑmin entity pairs nsubj lead dobj 99999999999999999 organization sports_team pro_athlete other organization person location person_or_entity_appearing_in_film other arg1 arg2 Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 16
Supervised combination of typed and untyped embeddings If influencer is explaining in written work 3 then influencer is writing in written work detected by w2v+typed rel, but not by w2v+untyped rel w2v+tsg rel emb Idea: Learn for each type signature if typed or untyped works better. Implementation: Count on the dev set for each type signature how often typed or untyped embeddings are more accurate pick typed or untyped according to these counts for seen type signatures for unseen type signatures, count individual types as well Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 17
Meta rule discovery A Simple Meta Rule Algorithm 1 Consider InfCands where the premise or the hypothesis is contained by the other 2 Mask the common part in both relations by X 3 Simplify pobj, dobj and iobj to obj 4 Count patterns found in this way Example A is followed by B nsubjpass– follow prep by pobj B is following A dobj– follow nsubj nsubjpass X prep by pobj ⇒ dobj X nsubj nsubjpass X prep by obj ⇒ obj X nsubj Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 18
Example meta rules nsubj-X-poss ⇔ nsubj-X-prep-of-obj A is B’s ally A is an ally of B nsubjpass-X-prep-by-obj ⇔ obj-X-nsubj A is followed by B B follows A nsubj-Xer-prep-of-obj ⇔ nsubj-X-obj A is a teacher of B A teaches B nsubj-reX-obj ⇒ nsubj-X-obj A rewrites B A writes B nsubj-agree-xcomp-X-obj ⇒ nsubj-X-obj A agrees to buy B A buys B nsubjpass-force-xcomp-X-obj ⇒ nsubj-X-obj A is forced to leave B A leaves B nsubj-decide-xcomp-X-obj ⇒ nsubj-X-obj A decides to move to B A moves to B Martin Schmitt and Hinrich Schütze (CIS) SherLIiC July 29, 2019 19
You can also read