Antoine Bosselut - Neural Language Models as Commonsense Representation Engines

Neural Language Models as
Commonsense Representation Engines

          Antoine Bosselut
Limitations of Symbolic CSKGs

 -   Insufficient Coverage

 -   Limited expressivity

 -   No Contextualization
Limitations of Symbolic CSKGs

  Kai knew that things were getting
   out of control and managed to
Limitations of Symbolic CSKGs
-   Situations rarely found as-is in commonsense knowledge graphs

                   ATOMIC                    (X goes to the mall,
                                          Effect on X, buys clothes)

                                              (X goes the mall,
                                          (X gives Y some money,
                                           Reaction of Y, grateful)
                     (Sap et al., 2019)
Limitations of Symbolic CSKGs
      -   Situations rarely found as-is in commonsense knowledge graphs

      -   Connecting to knowledge graphs can yield incorrect nodes

                                     X keeps X’s temper
Kai knew that things were getting
 out of control and managed to
                                    X keeps ___ under control
    keep his temper in check                                        X

                                     X keeps X's ___ in check
Limitations of Symbolic CSKGs
      -   Situations rarely found as-is in commonsense knowledge graphs

      -   Connecting to knowledge graphs can yield incorrect nodes

                                     X keeps X’s temper
Kai knew that things were getting
 out of control and managed to
                                    X keeps ___ under control
    keep his temper in check                                    X

                                     X keeps X's ___ in check
Limitations of Symbolic CSKGs
      -   Situations rarely found as-is in commonsense knowledge graphs

      -   Connecting to knowledge graphs can yield incorrect nodes

      -   Suitable nodes are often uncontextualized

                                     X keeps X’s temper                       X sweats
Kai knew that things were getting
 out of control and managed to                                          X wants to show strength
                                    X keeps ___ under control
    keep his temper in check                                    X

                                     X keeps X's ___ in check
                                                                          X avoids a fight
Limitations of Symbolic CSKGs
      -   Situations rarely found as-is in commonsense knowledge graphs

      -   Connecting to knowledge graphs can yield incorrect nodes

      -   Suitable nodes are often uncontextualized

                                     X keeps X’s temper                       X sweats
Kai knew that things were getting
 out of control and managed to                                          X wants to show strength
                                    X keeps ___ under control
    keep his temper in check

                                     X keeps X's ___ in check
                                                                          X avoids a fight

   How do we provide machines with
large-scale commonsense knowledge?
Constructing Knowledge Graphs
                Write commonsense                                          Store facts in
Observe world
                 knowledge facts                                         knowledge graph

                                                                                               (Miller, 1995)   (Singh et al., 2002)

                                                                                                                     ATOMIC: ATlas Of MachIne Commonsense

                                                                                               (Lenat, 1995)     (Sap et al., 2019)

• Commonsense knowledge is immeasurably vast, making it
 impossible to manually enumerate
Extracting Knowledge Graphs from Text

Gather Textual               Automatically extract
                                                                                                                    Store in knowledge graph
   Corpus                        knowledge
 John went to the grocery
 store to buy some steaks.
                                   (Schubert, 2002)
                                  (Banko et al., 2007)
                                  (Zhang et al., 2020)                                                                                     Webchild

                                                                                                                    (Speer et al., 2017)
                                                                                                                                           (Tandon et al., 2019)
Learning Relations from Existing KGs

      Gather training set                                  Learn relationships               Predict new
                                                                                                                        Store in knowledge graph
     of knowledge tuples                                     among entities                 relationships

                                                                (Socher et al., 2013)
                                                                (Bordes et al., 2013)
                                                                (Riedel et al., 2013)
                                                              (Toutanova et al., 2015)
                                                                 (Yang et al., 2015)
                                                               (Trouillon et al., 2016)
                                                               (Nguyen et al., 2016)
                                                               (Dettmers et al., 2018)
Commonsense KGs are Different
                  Knowledge Graph     # Entities   # Edges

                  ConceptNet - 100k    78088       100000     1.25

                       ATOMIC          256570      610536     2.25

                                                                         Knowledge base
                      FB15k-237        14505       272115     16.98    completion assumes
                                                                       explicit connectivity

Malaviya et al., AAAI 2020
Commonsense KGs are Different
                  Knowledge Graph     # Entities   # Edges

                  ConceptNet - 100k    78088       100000     1.25

                       ATOMIC          256570      610536     2.25

                                                                         Knowledge base
                      FB15k-237        14505       272115     16.98    completion assumes
                                                                       explicit connectivity

Malaviya et al., AAAI 2020

• Commonsense knowledge is immeasurably vast, making it
 impossible to manually enumerate

• Commonsense
              else can we  learn
                       is often   commonsense
                                implicit or underspecified, and
 therefore can’t be directly extracted
                     knowledge         from text
                                   at scale?

• Commonsense knowledge resources are quite sparse, making
 them difficult to extend by only learning from examples
Deep Language Models
sailed   across       oceans     in         bought     a      boat     
Output   Output       Output   Output       Output   Output   Output   Output

                                                                                          Next Word

Block    Block        Block    Block        Block    Block    Block     Block

                  …                     …                      …                         Transformer

Block    Block        Block    Block        Block    Block    Block     Block

                                                                                         Input Vector
 h0        h1           h2       h3          hT-3      hT-2    hT-1      hT
Allen    sailed       across   oceans        He      bought     a       boat
                                                                       (Radford et al., 2018, 2019, many others)
Learning in Language Models
  Text Corpus             Transformer Language Model

                Used to


                                  (Radford et al., 2018, 2019, many others)
Do language models have commonsense knowledge?
Knowledge in Language Models
Memory                     Query                Answer
      Knowledge  Prompting
             (Dante, born-in, X)

                         ( Dante, , ? )
KG            Dante                                  Florence
                              Memory Access
                                         map relation to one or more
                                         natural language prompts

                      “Dante was born in [Mask].”

                               Neural LM
LM                                                     Florence
                              Memory Access

     e.g. ELMo/BERT
                                                         Petroni et al., EMNLP 2019
Instance:            Knowledge Prompting
    Christmas was a special holiday to Eric
    but not Adam since ____ was a Jew.

   Question Generation:
    Christmas was a special holiday to Eric                                         Clarification Question
    but not Adam since ____ was a Jew.                           b t @ !   What is the definition of Christmas?
                                                        i s jt un
                                                  cjsui      csbu
   Answer Generation:                          LM       !pg!D jpo!pg!
                                                              isjtu   uif!
                                                                           The definition of Christmas is the
                                                                            The purpose
                                                                           celebration of of
                                                                                          theChristmas   is to celebrate
                                                                                              birth of Christ.
                       Disjtunbt                                             The
                                                                            the   definition
                                                                                birth        of Christmas is the
                                                                                      of Christ.
                                                                               The definition
                                                                             celebration       of birth
                                                                                          of the  Christmas   is a
                                                                                                        of Christ.
                                                                               Christian Holiday.

              What is the definition of
              The definition of           is

  Question &
Answer Prefixes
                                                                                                 Shwartz et al., EMNLP 2021
Prompt Sensitivity
            manual    DirectX is developed by !"#$
            mined      !"'$( released the DirectX
          paraphrased  DirectX is created by !%#&#
           Top 5 predictions and log probabilities
         !"#$               !"'$(                 !%#&#
1   Intel      -1.06 Microsoft -1.77 Microsoft -2.23
2   Microsoft -2.21 They           -2.43 Intel          -2.30
3   IBM        -2.76 It            -2.80 default -2.96
4   Google     -3.40 Sega          -3.01 Apple          -3.44
5   Nokia      -3.58 Sony          -3.19 Google         -3.45
              Jiang et al., TACL 2020
Prompt Sensitivity
            manual    DirectX is developed by !"#$
            mined      !"'$( released the DirectX
          paraphrased  DirectX is created by !%#&#
           Top 5 predictions and log probabilities
         !"#$               !"'$(                 !%#&#
1   Intel      -1.06 Microsoft -1.77 Microsoft -2.23
2   Microsoft -2.21 They           -2.43 Intel          -2.30
3   IBM        -2.76 It            -2.80 default -2.96
4   Google     -3.40 Sega          -3.01 Apple          -3.44
5   Nokia      -3.58 Sony          -3.19 Google         -3.45
                                                                Feldman et al., EMNLP 2019
              Jiang et al., TACL 2020
Context Sensitivity
            manual    DirectX is developed by !"#$
            mined      !"'$( released the DirectX
          paraphrased  DirectX is created by !%#&#
           Top 5 predictions and log probabilities
         !"#$               !"'$(                 !%#&#
1   Intel      -1.06 Microsoft -1.77 Microsoft -2.23
2   Microsoft -2.21 They           -2.43 Intel          -2.30
3   IBM        -2.76 It            -2.80 default -2.96
4   Google     -3.40 Sega          -3.01 Apple          -3.44
5   Nokia      -3.58 Sony          -3.19 Google         -3.45
                                                                                     Feldman et al., EMNLP 2019
              Jiang et al., TACL 2020

                                                          Weir et al., CogSci 2020
Do language models have commonsense?

• Distinction between encoding commonsense knowledge and
 expressing commonsense knowledge
Do language models have commonsense?

• Distinction between encoding commonsense knowledge and
 expressing commonsense knowledge

• Probing with prompts measures whether LMs can express
 commonsense knowledge and the results are mixed
Do language models encode commonsense knowledge?
From Unstructured to Structured Knowledge

Transformer Language Model

                               Used to

Transformer Language Models
 sailed   across   oceans     in     …   bought     a       boat      

 Output   Output   Output   Output       Output   Output   Output      Output

 Block    Block    Block    Block         Block    Block    Block       Block


 Block    Block    Block    Block         Block    Block    Block       Block

 Block    Block    Block    Block         Block    Block    Block       Block

 Input    Input    Input    Input         Input    Input    Input       Input

 Allen    sailed   across   oceans   …     He     bought      a         boat
                                                           (Radford et al., 2018, 2019, many others)
Transformer Language Models
- Trained to generate the next word given a set of preceding words

   sailed   across   oceans     in      …   bought     a      boat   

                     Language Model
   Allen    sailed    across   oceans   …    He      bought    a     boat
Transformer Language Models
- Trained to generate the next word given a set of preceding words
- Follow-up tokens can be generated using generated tokens as input

                                           bought     a      boat   

                    Language Model
   Allen   sailed    across   oceans   …    He      bought    a     boat
Transformer Language Models
- Trained to generate the next word given a set of preceding words
- Follow-up tokens can be generated using generated tokens as input

                                           bought     a      boat   

                    Language Model
   Allen   sailed    across   oceans   …    He      bought    a     boat
Structure of Knowledge Tuple


    person sails
                                     buy a boat
    across oceans

    head entity                      tail entity

                                 (entity to generate)
Learning Structure of Knowledge
                                                          ℒ=−       log P(tail words | head words, relation)
           Given a head entity and a relation,
           learn to generate the tail entity                                 tail entity

                                                                     buy          a        boat

                                     Language Model
                            person   sails   across   oceans    

                                     head entity                 relation
Bosselut et al., ACL 2019
Learning Structure of Knowledge
           Language Model      Knowledge Model:
           generates knowledge of the structure
                                                                           tail entity
           of the examples used for training
                                                                  buy          a         boat

                                     Knowledge Model
                            person   sails   across   oceans   

                                     head entity                relation
Bosselut et al., ACL 2019
Generate commonsense
knowledge for any input concept
Generate commonsense
                            knowledge for any input concept

                             COMmonsEnse Transformers

Bosselut et al., ACL 2019
                                       a s
                                 e   d               smart
                             ei v
                         e rc
                      X p
                                                  d e d
                                         , X   nee           to be a teacher
                                e f o re
Antoine gives
 a tutorial on
 knowledge                      Others
                                               will wa
                                                             to ask questions

                                  rs the              none gain knowledge
COMET - ConceptNet
                        on               classroom
                lo c

                                   b y
                       t i v a ted              knowledge

                             starts with
listens to                                        listen to teacher
                             ends w
                                            they learn
Why does this work?
Transfer Learning from Language

                                         is a
                            mango               fruit

Bosselut et al., ACL 2019
Transfer Learning from Language

                                                   is a
                            mango                           fruit

                                                 used for
                            mango   ConceptNet              salsa

Bosselut et al., ACL 2019
Transfer Learning from Language
                                                       is a
                            mango                               fruit

                                                     used for
                            mango   ConceptNet                  salsa

                                     Same Model,       is a
                            mango   Not Pretrained               ?
                                     on language

Bosselut et al., ACL 2019
Transfer Learning from Language
                                                       is a
                            mango                               fruit

                                                     used for
                            mango   ConceptNet                  salsa

                                     Same Model,       is a
                            mango   Not Pretrained              spice
                                     on language

Bosselut et al., ACL 2019
Can’t language models just do this?
Do Language Models know this?
Do Language Models know this?
Do Language Models know this?
Do Masked Language Models know this?
Let’s talk about the elephant in the room.
            What about GPT-3?
Does GPT-3 have commonsense knowledge?

   Q: What is your favorite animal?
   A: My favorite animal is a dog.

   Q: Why?
   A: Because  dogs are loyal and friendly.
           Q: How many eyes does a giraffe have?
           A: A giraffe has two eyes.

                                 Q: Why don't animals have three legs?
                                 A: Animals don't have three legs because
                                 they would fall over.

  Kevin Lacker’s Blog. July 6, 2020 (
Knowledge Models   Off-the-shelf Language Models

        Correct %
        judged by
                                     84.5          73.0
                                 COMET (BART)      GPT3                      GPT2

   COMET (BART): 435x smaller model
   (~400M params), informed by ATOMIC20
                                     20                   GPT-3 (Few Shot): 175B parameters!!
                                                          pre-trained with a ton of web text (~500B tokens)
Hwang et al., AAAI 2021
Commonsense Transformers
- Learn implicit knowledge at scale from language models and web-scale text

        Language Model
Commonsense Transformers
- Learn implicit knowledge at scale from language models and web-scale text
- Learn explicit structure of knowledge from symbolic knowledge graphs

           Pre-trained          Seed Knowledge
        Language Model           Graph Training
Commonsense Transformers
- Learn implicit knowledge at scale from language models and web-scale text
- Learn explicit structure of knowledge from symbolic knowledge graphs
- Resulting knowledge model generalizes structure to other concepts

                           +                      =
           Pre-trained          Seed Knowledge
        Language Model           Graph Training       COMET
What are the implications of representing
commonsense knowledge in this manner?
Commonsense Knowledge for any Situation
  • transformer-style architecture — input format is natural language
    - event can be fully parsed

Kai knew that things were getting
 out of control and managed to
    keep his temper in check
Commonsense Knowledge for any Situation
  • transformer-style architecture — input format is natural language
    - event can be fully parsed
    - knowledge generated dynamically from neural knowledge model

                                                   Kai wants to avoid trouble
Kai knew that things were getting                  Kai intends to be calm
 out of control and managed to
    keep his temper in check                       Kai stays calm
                                                   Kai is viewed as cautious
Reasoning with Knowledge Graphs

                                     X keeps X’s temper                       X sweats
                                                                X   X

Kai knew that things were getting
 out of control and managed to                                          X wants to show strength
                                    X keeps ___ under control
    keep his temper in check                                    X

                                     X keeps X's ___ in check
                                                                          X avoids a fight
Commonsense Knowledge for any Situation
  • transformer-style architecture — input format is natural language
    - event can be fully parsed
    - knowledge generated dynamically from neural knowledge model

                                                   Kai wants to avoid trouble
Kai knew that things were getting                  Kai intends to be calm
 out of control and managed to
    keep his temper in check                       Kai stays calm
                                                   Kai is viewed as cautious
Dynamic Construction of Knowledge Graphs

                                                        Kai wants to avoid trouble

                                                         Kai intends to be calm
                               Kai knew that things
                               were getting out of
                             control and managed to
                             keep his temper in check        Kai stays calm

                                                        Kai is viewed as cautious

Bosselut et al., AAAI 2021
Dynamic Construction of Knowledge Graphs

                                                        Kai wants to avoid trouble

                                                         Kai intends to be calm
                               Kai knew that things
                               were getting out of
                             control and managed to
                             keep his temper in check        Kai stays calm

                                  Root Node             Kai is viewed as cautious

Bosselut et al., AAAI 2021
Dynamic Construction of Knowledge Graphs

                                                        Kai wants to avoid trouble

                                                         Kai intends to be calm
                               Kai knew that things
                               were getting out of
                             control and managed to
                             keep his temper in check        Kai stays calm

                                  Root Node             Kai is viewed as cautious

                                                        Generated Commonsense
                                                            Inference Nodes
Bosselut et al., AAAI 2021
root node

            generated node

                                          Kai wants to
                                          avoid trouble

                                           Kai intends
                                           to be calm

                 Kai knew that things
                 were getting out of
               control and managed to
               keep his temper in check

                                            Kai stays

                                          Kai is viewed
                                           as cautious

Bosselut et al., AAAI 2021
root node
            generated node

                                          Kai wants to
                                          avoid trouble

                                           Kai intends
                                           to be calm

                 Kai knew that things
                 were getting out of
               control and managed to
               keep his temper in check

                                            Kai stays

                                          Kai is viewed
                                           as cautious

Bosselut et al., AAAI 2021
root node                                         Kai then

                                                           avoids trouble

            generated node

                                          Kai wants to       Kai wants
                                          avoid trouble      to be safe

                                           Kai intends
                                           to be calm        Kai feels
                 Kai knew that things
                 were getting out of
               control and managed to
               keep his temper in check

                                            Kai stays
                                             calm         ℓOthers
                                                                  2 to
                                                          avoid trouble

                                          Kai is viewed
                                           as cautious

Bosselut et al., AAAI 2021
root node

            generated node

                                          Kai wants to
                                          avoid trouble

                                           Kai intends
                                           to be calm

                 Kai knew that things
                 were getting out of
               control and managed to
               keep his temper in check

                                            Kai stays

                                          Kai is viewed
                                           as cautious

Bosselut et al., AAAI 2021
root node                                               relieved

            generated node

            condition node                Kai wants to
                                          avoid trouble
                                                          …         scared

                                           Kai intends
                                           to be calm
                 Kai knew that things
                 were getting out of
               control and managed to
               keep his temper in check

                                            Kai stays

                                          Kai is viewed
                                           as cautious

Bosselut et al., AAAI 2021
What about the relations? Those are fixed.
Fixed Relation Sets


                  Number of Relations


                                        20                             23
                                             ATOMIC   ConceptNet   ATOMIC 2020   TransOMCS

Da et al., 2021
Few-shot Knowledge Models
                   Using prompts induces
                  rapid knowledge model
                      adaptation in T5!

                      PersonX goes to the mall
                       to buy clothes     With
                                                 Prompts   +
                      PersonX goes to the mall
                      because they wanted to
                      buy clothes

Da et al., 2021
Few-shot Knowledge Models

                  Human Judgment - Plausibility

                                                  75         84.6
                                                                                     78.6             73.0


                                                        COMET (T5) - full     COMET (T5) - few shot   GPT3

                                                                            Just ~100 examples!
                                                                            5 examples / relation
Da et al., 2021
Can we model more complex commonsense knowledge?
Path Knowledge Models

 PersonX saw                       
                            wants to
  a fight was                                         leaves
                          avoid trouble
 breaking out

                                                           Wang et al., EMNLP Findings 2020
Path Knowledge Models
                         Path entities
                                      wants to                              Tail
                                    avoid trouble

                        Language Model

Head     PersonX saw
          a fight was
entity   breaking out

                                                                    Wang et al., EMNLP Findings 2020
Path Knowledge Models
                        Path entities
                                     wants to                              Tail
                                   avoid trouble

  Multi-step Knowledge Model

Head     PersonX saw
          a fight was
entity   breaking out

                                                                   Wang et al., EMNLP Findings 2020
Visual Commonsense Knowledge Models
                              Try to help
     Sink in the                                        Save himself      Wait for help
       water.                                         from drowning.       to arrive.                              Notice water
                                                                                                                    washing in.
                                                                                           Swim towards
                                After Person1
                                                                         Person             the statute.
          Swim to              will most likely …
           safety.                                                wanted to …

                                            Person1                                                                     Sense his
                                                             Person                                                    own death.
   Get to                                                                                 Before
                     Because Person1
 the top of                                                                                needed to …
                      wanted to …
 the deck.

                     Before Person1                                                                                    Be washed
   Realize                                                                                 After
                      needed to …                                                                                        away.
 the ship is                                                                              will most likely …
                      Get caught in a                                                        Gasp for air.
    Start                                                                                                           for help.
moving against         rush of water.
  the water.
                                                                                                               Park et al., ICCV 2020
VisualCOMET                                          Tail entity
                                                                    gasp      for       air      

                        Language Model
  ROI       ROI
Feature   Feature

                    …       is   holding   onto   a   …      Gasp       for        air

 Visual Context                     Head entity                           Tail entity

                                                                                              Park et al., ICCV 2020
VisualCOMET                                          Tail entity
                                                                    gasp      for       air      

   Multimodal Knowledge Model
  ROI       ROI
Feature   Feature

                    …       is   holding   onto   a   …      Gasp       for        air

 Visual Context                     Head entity                           Tail entity

                                                                                              Park et al., ICCV 2020
Okay, what’s the catch?
Limitations of Knowledge Models

•       Base Self-supervised Model
    -     biases, history

•       Seed Knowledge Graph
    -     bias, language, relations, schema
Limitations of Knowledge Models

•       Base Self-supervised Model
    -     biases, history

•       Seed Knowledge Graph
    -     bias, language, relations, schema
                                                                  n                  classroom
•       Generation Algorithm                                   Moti
                                                                   v a t e d B   y
                                                                                         you be

    -     diversity, mode collapse            Listen to
                                                                  Starts w
                                                                                          sit down

                                                          Caus                   good grade
                                            Bosselut et al. 2019 @ ACL 2019

                Sarcasm generation
       Chakrabarty et al. 2020 @ ACL 2020

                                            Therapy Chabot
                                            Kearns et al. 2020 @ CHI EA 2020

              Personalized Dialogue
      Majumder et al. 2020 @ EMNLP 2020

                                            Simile generation
                                            Chakrabarty et al. 2020 @ EMNLP 2020

                 Text-Based Games
Dambekodi et al. 2020 @ arXiv:2012.02757

                                            Automated Storytelling
                                            Ammanabrolu et al. 2021 @ AAAI 2021
Sarcasm generation
       Chakrabarty et al. 2020 @ ACL 2020

                                            Therapy Chabot
                                            Kearns et al. 2020 @ CHI EA 2020

              Personalized Dialogue
      Majumder et al. 2020 @ EMNLP 2020

                                            Simile generation
                                            Chakrabarty et al. 2020 @ EMNLP 2020

                 Text-Based Games
Dambekodi et al. 2020 @ arXiv:2012.02757

                                            Automated Storytelling
                                            Ammanabrolu et al. 2021 @ AAAI 2021

         Where else can commonsense knowledge access
                      improve our systems?
Static vs. Dynamic
                                                  Kai knew that things were getting
                                                   out of control and managed to
                                                      keep his temper in check

                                     bad        Link to static        Generate dynamic       Kai stays calm
                                    links     Knowledge Graph         graph with COMET
              X keeps ___
             under control
                                                                                                   Kai is viewed
                                                                                                    as cautious

                    X keeps X’s temper                                                   linking
                                                                                                   Kai wants to
                                                                                                   avoid trouble

                               X wants to
  X sweats
                             show strength
                                                                                              Kai intends
         X avoids a fight     context-free         X keeps X's                                to be calm
                                                   ___ in check
Bosselut et al., AAAI 2021
Language Models as CSKBs?

• Large-scale language models encode a lot of commonsense
 knowledge implicitly, but it’s not not directly accessible

• We can develop methods to extract it, but they need to be
 adaptable, robust and efficient

• Need to rethink how we design commonsense knowledge graphs
 if transfer from language models is a new use case
Building Commonsense Knowledge Bases

                     Quality of KB is important!
                                                                       More varieties of relations!
          Careful validation is critical so that LMs can
                 learn from precise examples.                           Represent a wide range of
                                                                       commonsense relationships
                                                                    (Use prompts for fast adaptation)

                                           Focus on textually less explicit examples

                                         These are less likely to be known by LMs, thus
                                             more impactful in knowledge transfer

Hwang et al., AAAI 2021
