SHERLIIC A TYPED EVENT-FOCUSED LEXICAL INFERENCE BENCHMARK FOR EVALUATING NATURAL LANGUAGE INFERENCE

Page created by Frank Floyd

Science

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

SherLIiC
   A Typed Event-Focused Lexical Inference Benchmark for Evaluating
                     Natural Language Inference

                                            Martin Schmitt and Hinrich Schütze
                                                       CIS, LMU Munich

                                                      July 29, 2019

Martin Schmitt and Hinrich Schütze (CIS)                  SherLIiC               July 29, 2019   1

Natural Language Inference (NLI)
Natural language inference (NLI) is the task of recognizing entailment, contradiction or
neutrality for a pair of sentences (Dagan et al., 2013; Williams et al., 2018).
Example from SNLI (Bowman et al., 2015)
Premise: Two men on bicycles competing in a race.

Hypotheses:
 1  People are riding bikes.                                                 entailment
 2  Men are riding bicycles on the street.                                       neutral
 3  A few people are catching fish.                                        contradiction

       a lot of different linguistic phenomena involved
       role of lexical knowledge often neglected
       (Gururangan et al., 2018; Glockner et al., 2018)
 Martin Schmitt and Hinrich Schütze (CIS)   SherLIiC                          July 29, 2019   2

SherLIiC: Lexical Inference in Context
Specifically: Verbal semantics in context
                  If   person[A]             is running      org[B],
                                                                            3
                  then person[A]             is leading      org[B].
                  If   computer[A]           is running      software[B],
                                                                            3
                  then computer[A]           is using        software[B].
                  If   computer[A]           is running      software[B],
                                                                            7
                  then computer[A]           is leading      software[B].

 Martin Schmitt and Hinrich Schütze (CIS)        SherLIiC                      July 29, 2019   3

SherLIiC: Lexical Inference in Context
Specifically: Verbal semantics in context
                  If   person[A]             is running      org[B],
                                                                            3
                  then person[A]             is leading      org[B].
                  If   computer[A]           is running      software[B],
                                                                            3
                  then computer[A]           is using        software[B].
                  If   computer[A]           is running      software[B],
                                                                            7
                  then computer[A]           is leading      software[B].
Compared to general NLI: controlled yet challenging
  Task: Binary entailment detection
  Abstract context with knowledge graph types (Freebase)
 o Very similar sentences
 o Distributional similarity of positive and negative examples
 Martin Schmitt and Hinrich Schütze (CIS)        SherLIiC                      July 29, 2019   3

Components of SherLIiC

                                                                                     Annotated
       Typed event graph                            Inference candidates            dev and test
                                                per be forced to leave loc
United States of America          Germany
                                                ⇒ per leave loc                                   3
                                                org be led by per
                                                ⇒ per manage org
                                                politician meet with per                          7
    Barack Obama                Angela Merkel   ⇒ politician interact with per
                                                                   ..
                                                                    .

  ∼190k typed relations                              ∼960k pairs of                 ∼4k annotated
 between Freebase entities                       Freebase-typed relations        inference candidates

                                                  with high distributional           split 25/75
           ∼17M triples
                                                          overlap                  in dev and test
 Martin Schmitt and Hinrich Schütze (CIS)                   SherLIiC                         July 29, 2019   4

Creation of SherLIiC (I)
SherLIiC Typed Event Graph (TEG)
       Preprocess the large entity-linked corpus ClueWeb09 with a dependency parser
       Extract shortest paths between entities in the dependency graphs ⇒ relations
       Type heterogeneous relations by finding largest typable subsets

                                                                 (2)                      (1)
 United States of America                    Barack Obama               Angela Merkel             Germany

                        (1) nsubj leader poss            (2) nsubj meet prep with pobj
                            nsubj lead dobj                  nsubj interact prep with pobj
                            nsubj chancellor prep of pobj    nsubj support dobj policy poss

 Martin Schmitt and Hinrich Schütze (CIS)            SherLIiC                                  July 29, 2019   5

Creation of SherLIiC (II)
SherLIiC-InfCands
       Score all pairs of relations according to statistical relevance, significance and entity
       overlap (distributional features)
       Best-scoring relations pairs become inference candidates

For two typed relations A, B ⊆ E × E, we compute three scores:
                                      P(B | A)                                     S
            Relv(A, B) :=                                                              i∈{1,2}   πi (A ∩ B)
                                       P(B)                         esr(A, B) :=
                                                                                          2 |A ∩ B|
                                                    X
                           σ(A, B) := 2 |A ∩ B|                 P(H | A) log(Relv(A, H))
                                                  H∈{B,¬B}

 Martin Schmitt and Hinrich Schütze (CIS)           SherLIiC                                         July 29, 2019   6

Creation of SherLIiC (III)
SherLIiC-dev and SherLIiC-test
       Annotate a random subset of SherLIiC-InfCands on Amazon Mechanical Turk
       Collect at least 5 annotations per InfCand
       Filter annotators with a qualification test and confidence values

                    Total number of annotated InfCands              3985
                    Balance yes/no                             33% / 67%
                    Pairs with unanimous gold label                53.0%
                    Pairs with 1 disagreeing annotation            27.4%
                    Pairs with 2 disagreeing annotations           19.6%
                    Individual label = gold label                  86.7%

 Martin Schmitt and Hinrich Schütze (CIS)          SherLIiC               July 29, 2019   7

Creation of SherLIiC (III)
SherLIiC-dev and SherLIiC-test
       Annotate a random subset of SherLIiC-InfCands on Amazon Mechanical Turk
       Collect at least 5 annotations per InfCand
       Filter annotators with a qualification test and confidence values

                                                                        1600                              0
                                        disagreeing with the majority   1400                              1
                                           Number of annotations
                                                                        1200
                                                                                                          2
                                                                        1000
                                                                        800
                                                                        600
                                                                        400
                                                                        200
                                                                          0
                                                                               no                   yes
 Martin Schmitt and Hinrich Schütze (CIS)                                            Class label
                                                                                    SherLIiC                  July 29, 2019   7

State of the Art on SherLIiC
Main Insights
       Knowledge graph embeddings do not capture necessary information
       Supervised NLI model ESIM is fooled by sentence similarity
       Best system combines word2vec and type-informed relation embeddings

                                Baseline          P in %     R in %   F1 in %
                                Lemma             90.7         8.9     16.1
                                Always yes        33.3       100.0     49.9
                                TransE (typed)    33.3        99.1     49.8
                                ComplEx (typed)   33.7        94.9     49.7
                                ESIM              39.0        83.3     53.1
                                w2v+tsg rel emb   51.8        72.7     60.5
 Martin Schmitt and Hinrich Schütze (CIS)        SherLIiC                      July 29, 2019   8

Analysis of entailment scores on SherLIiC-dev
                                                1.0

                      Normalized score on dev
                                                0.5

                                                0.0

                                                0.5                                                          Entailment
                                                                                                                   yes
                                                1.0                                                                no
                                                       d)               d)         2ve
                                                                                       c               b               b
                                             ( t y p e
                                                                ( t yp e
                                                                             o r d            r el _em        r el _em
                                     Tra
                                         nsE           m  p lEx            w
                                                                                     t y ped_          + tsg_
                                                   Co                                              w2 v
 Martin Schmitt and Hinrich Schütze (CIS)                                    SherLIiC                                     July 29, 2019   9

How can I use SherLIiC?
SherLIiC-TEG, -InfCands, -dev and -test are publicly available:
https://github.com/mnschmit/SherLIiC
Summary
       SherLIiC-TEG: knowledge graph with event-like relations
               combines knowledge graph relations with textual relations
               large resource for relation inference in knowledge graphs
       SherLIiC-InfCand: large collection of unlabeled samples
               similar enough to SherLIiC-dev and -test for transfer learning
               contains noisy labels from best baseline w2v+tsg rel emb
       SherLIiC-test and -dev:
               Finetune on SherLIiC-dev
               Evaluate models of natural language inference (NLI) and/or lexical semantics
               Evaluate relation and graph embedding techniques
 Martin Schmitt and Hinrich Schütze (CIS)        SherLIiC                            July 29, 2019   10

References I

Ido Dagan, Dan Roth, Mark Sammons, and Fabio Massimo Zanzotto. Recognizing
  Textual Entailment: Models and Applications. Synthesis Lectures on Human
  Language Technologies. Morgan & Claypool Publishers, 2013. ISBN 9781598298352.
  URL http://dx.doi.org/10.2200/S00509ED1V01Y201305HLT023.
Samuel R. Bowman, Gabor Angeli, Christopher Potts, and Christopher D. Manning. A
  large annotated corpus for learning natural language inference. In Proceedings of the
  2015 Conference on Empirical Methods in Natural Language Processing (EMNLP).
  Association for Computational Linguistics, 2015.

 Martin Schmitt and Hinrich Schütze (CIS)   SherLIiC                        July 29, 2019   11

References II

Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus
  for sentence understanding through inference. In Proceedings of the 2018 Conference
  of the North American Chapter of the Association for Computational Linguistics:
  Human Language Technologies, Volume 1 (Long Papers), pages 1112–1122, New
  Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi:
  10.18653/v1/N18-1101. URL https://www.aclweb.org/anthology/N18-1101.
Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel Bowman,
  and Noah A. Smith. Annotation artifacts in natural language inference data. In
  Proceedings of the 2018 Conference of the North American Chapter of the
  Association for Computational Linguistics: Human Language Technologies, Volume 2
  (Short Papers), 2018.

 Martin Schmitt and Hinrich Schütze (CIS)   SherLIiC                      July 29, 2019   12

References III

Max Glockner, Vered Shwartz, and Yoav Goldberg. Breaking NLI systems with
 sentences that require simple lexical inferences. In The 56th Annual Meeting of the
 Association for Computational Linguistics (ACL), Melbourne, Australia, July 2018.
Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, and Diana Inkpen.
  Enhanced lstm for natural language inference. In Proceedings of the 55th Annual
  Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),
  2017.

 Martin Schmitt and Hinrich Schütze (CIS)   SherLIiC                       July 29, 2019   13

Relation Extraction → Event Graph
Example sentence from Freebase-linked ClueWeb09
In Japan[03 3d], the PS3[067gh] is slowly closing the huge sales gap with Wii[026kds].
                                                             closing

                                                     prep nsubj aux      advmod         dobj

                             In              m.067gh           is             slowly           gap

                              pobj              det                                    det amod compound      prep

                         m.03_3d               the                      the            huge          sales     with

                                                                                                                 pobj

                                                                                                             m.026kds
 Martin Schmitt and Hinrich Schütze (CIS)                             SherLIiC                                         July 29, 2019   14

closing

                                                    prep nsubj aux      advmod         dobj

                            In              m.067gh           is             slowly           gap

                             pobj              det                                    det amod compound      prep

                        m.03_3d               the                      the            huge          sales     with

                                                                                                                pobj

                                                                                                            m.026kds

      m.067gh nsubj– closing dobj gap prep with pobj m.026kds
      m.067gh nsubj– closing prep In pobj m.03 3d
      m.03 3d pobj– In prep– closing dobj gap prep with pobj m.026kds
Martin Schmitt and Hinrich Schütze (CIS)                             SherLIiC                                         July 29, 2019   15

Typing the Event Graph (SherLIiC-TEG)
Type a relation R
       For each argument slot, identify the k entity types that induce the largest subsets
       Consider the k 2 typed subrelations of R constructed by restricting arguments to
       one of the types from the previous step
       Accept a typed relation if it contains at least ϑmin entity pairs
                                                                                         nsubj lead dobj

                                                                                                       99999999999999999
                                                                  organization

                                                                                                                                sports_team
                                                                                         pro_athlete
                                                                                                                                              other

                                                                                                                             organization
                                                                                           person

                                                                                                                                   location
                                         person_or_entity_appearing_in_film      other

                                                                  arg1                                                     arg2
 Martin Schmitt and Hinrich Schütze (CIS)                                                     SherLIiC                                               July 29, 2019   16

Supervised combination of typed and untyped embeddings
                  If   influencer            is explaining in   written work
                                                                               3
                  then influencer            is writing in      written work
detected by w2v+typed rel, but not by w2v+untyped rel

w2v+tsg rel emb
Idea: Learn for each type signature if typed or untyped works better.

Implementation:
    Count on the dev set for each type signature how often typed or untyped
    embeddings are more accurate
    pick typed or untyped according to these counts for seen type signatures
    for unseen type signatures, count individual types as well
 Martin Schmitt and Hinrich Schütze (CIS)        SherLIiC                         July 29, 2019   17

Meta rule discovery
A Simple Meta Rule Algorithm
  1    Consider InfCands where the premise or the hypothesis is contained by the other
  2    Mask the common part in both relations by X
  3    Simplify pobj, dobj and iobj to obj
  4    Count patterns found in this way

Example
A is followed by B                                                      nsubjpass– follow prep by pobj

B is following A                                                                   dobj– follow nsubj

nsubjpass              X      prep           by   pobj ⇒ dobj X nsubj
nsubjpass              X      prep           by   obj ⇒ obj X nsubj
 Martin Schmitt and Hinrich Schütze (CIS)               SherLIiC                         July 29, 2019   18

Example meta rules
                                             nsubj-X-poss               ⇔ nsubj-X-prep-of-obj
                                             A is B’s ally                 A is an ally of B
                                      nsubjpass-X-prep-by-obj           ⇔     obj-X-nsubj
                                        A is followed by B                    B follows A
                                        nsubj-Xer-prep-of-obj           ⇔     nsubj-X-obj
                                         A is a teacher of B                  A teaches B
                                             nsubj-reX-obj              ⇒     nsubj-X-obj
                                             A rewrites B                     A writes B
                                     nsubj-agree-xcomp-X-obj ⇒                nsubj-X-obj
                                        A agrees to buy B                      A buys B
                                  nsubjpass-force-xcomp-X-obj ⇒               nsubj-X-obj
                                     A is forced to leave B                   A leaves B
                                    nsubj-decide-xcomp-X-obj ⇒                nsubj-X-obj
                                     A decides to move to B                  A moves to B
 Martin Schmitt and Hinrich Schütze (CIS)                   SherLIiC                           July 29, 2019   19

You can also read