IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP

Page created by Andrea Chen
IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP
                               Dialog systems

                Benoit Favre 

                        Aix-Marseille Université, LIS/CNRS

                     last generated on January 20, 2020

Benoit Favre (AMU)                PSTALN: Dialog             January 20, 2020   1 / 35
IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP
What is a dialog system?

   Definition :
      ▶   Input/output with natural language
      ▶   Free use of language
      ▶   Reproduces human agent behavior
      ▶   Reply (or not) in natural language
      ▶   Uni/multimodal
   Spoken Dialog System (SDS)
      ▶   Interactive system with spoken language
      ▶   Required when using an acoustic communication channel only (phone)
      ▶   Can free other modalities (hands free)
      ▶   No control of inputs
      ▶   Contextualize information
      ▶   Automatic speech recognition → transcript errors

   Benoit Favre (AMU)             PSTALN: Dialog             January 20, 2020   2 / 35
IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP
  Online discussion
     ▶   Eliza, virtual therapy
            ⋆   https://www.eclecticenergies.com/ego/eliza
     ▶   Mitsuku (best chatbot at Loebner price 2013)
            ⋆   http://www.mitsuku.com/
  Automated voice services
     ▶   “To erase a message, say erase..."
  Customer care
     ▶   1013 (in France): describe freely your problem
     ▶   Air Travel Information System (ATIS)
     ▶   Clippy
     ▶   SIRI

  Benoit Favre (AMU)              PSTALN: Dialog             January 20, 2020   3 / 35
IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP
Learning with an intelligent agent

   Replaces teacher in MOOC

   Benoit Favre (AMU)         PSTALN: Dialog   January 20, 2020   4 / 35
IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP
Interaction with a robot

Caroline Lyon, Chrystopher L. Nehaniv, Joe
Saunders, Interactive Language Learning by
Robots: The Transition from Babbling to               Nao (http://www.aldebaran-
Word Forms, PLoS One, 2012                            robotics.com)

      Benoit Favre (AMU)             PSTALN: Dialog                 January 20, 2020   5 / 35
IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP

  What’s a dialog?
     ▶   Study human behavior
  How to understand a sentence?
     ▶   Rule-based and template-based system
     ▶   Robust concept detection
  What strategies to make a successful dialog?
     ▶   Enforce local coherency
     ▶   Finite state machine
     ▶   Explore possible futures
  How to formulate an answer
     ▶   Language and speech synthesis

  Benoit Favre (AMU)                PSTALN: Dialog   January 20, 2020   6 / 35
IAAA / PSTALN Dialog systems - Benoit Favre last generated on January 20, 2020 - page du TP
Corpus-based study
   Human-human dialogues
     ▶   Language in the wild
     ▶   Want to build a program that replicates human behavior
     ▶   Difficult to ignore non-verbal interactions
     ▶   Humans behave differently in front of a system
   Wizard of oz (WoZ)
     ▶   Replace the system by a human operator
     ▶   Make the user believe that it is a real system (simulate errors / wait
     ▶   Collect users dialog strategies
   Human-machine dialog
     ▶   Collect interactions with an existing system
     ▶   Use these data to evaluate / improve the system
     ▶   From a user model, simulate human input
     ▶   Detect infinite loops
     ▶   Estimate resolution time
     ▶   Train system parameters
  Benoit Favre (AMU)              PSTALN: Dialog                January 20, 2020   7 / 35
Human dialogs

  Sequence of spoken turns
     ▶   Between two or more people
  Who speaks after whom?
     ▶   Interruptions
     ▶   Finish someone else’s sentence
     ▶   Overlap
  Establish of a common ground
     ▶   Use of a common vocabulary
     ▶   Acquiescence, reformulation, convergence

  Benoit Favre (AMU)             PSTALN: Dialog     January 20, 2020   8 / 35
Speech acts

Theory (Austin & Searl) * Meaning can be expressed in terms of actions
instead of concepts (declarative logical form)
       ▶   “I apologize"
       ▶   “Can you do this?"
    Types of dialog acts
       ▶   verdictifs ou actes juridiques (acquitter, condamner, décréter)
       ▶   exercitifs (dégrader, commander, ordonner, pardonner, léguer)
       ▶   promissifs (promettre, faire vu de, garantir, parier, jurer de)
       ▶   comportatifs (sexcuser, remercier, déplorer, critiquer)
       ▶   expositifs (affirmer, nier, postuler, remarquer)

    Benoit Favre (AMU)              PSTALN: Dialog                January 20, 2020   9 / 35
Cooperation principle (Grice)

   For a conversation to be successful, speakers must cooperate:
      ▶   Quantity: give the right amount of information
      ▶   Quality: tell the truth
      ▶   Relevance: say important things
      ▶   Manner: be clear, brief and structured
   Example (Mitkov - Computational Linguistics)
      ▶   Quantity: Marie ate some chocolate → Marie did not eat all the
      ▶   Quality: (about an invoice) It costs an arm → It costs a lot
      ▶   Relevance: A: Can I watch TV? B: It’s bath time.
      ▶   Manner: Are you ready? vs Are you ready or are you not ready?
   Integration in a dialogue system:
      ▶   The user follows them if he has something to gain
      ▶   Should the system follow these principles?

   Benoit Favre (AMU)             PSTALN: Dialog              January 20, 2020   10 / 35
Dialog acts (DA)
   Dialog acts: a specialization of speech acts
   Subdivision of a speaking turn into “intentions"
     ▶   Question
            ⋆   Closed / open questions
     ▶   Declaration
            ⋆   Short answers
     ▶   Dysfluancies
            ⋆   Repetitions
            ⋆   Wrong pronunciation
     ▶   Interruption
     ▶   Filled pauses
     ▶   MRDA / DASL: specification of more than 160 types
   Task-oriented dialogue act categories
     ▶   Greetings
     ▶   Opening
     ▶   Negotiation
     ▶   Closing
     ▶   Good-bye
  Benoit Favre (AMU)                  PSTALN: Dialog        January 20, 2020   11 / 35
Anatomy of a dialog system
                                                  words                  tree
                       question     Automatic             Syntactic                Semantic
                                  transcription           analysis                  analysis


                                    Speech                  Lexical                 Syntactic       logical
                                   synthesis              generation               generation   representation

                                                words,                 primitive
                                               prosody                  syntax

     ▶   Dialogue acts
     ▶   Grammar
     ▶   Concepts
   Dialog management
     ▶   Matching
     ▶   Finite state machines
     ▶   Exploration
     ▶   Templates
     ▶   Statistical generation
  Benoit Favre (AMU)                           PSTALN: Dialog                                          January 20, 2020   12 / 35
DA classification
   Model domain dialog acts (politeness, commands, information, ...)
   Corpus: (15651 instances, 66 classes)
      ▶   “je n’ai plus de tonalité sur ma ligne" → interruption_ligne
      ▶   “j’ai un problème suite euh déménagement" → mise_en_service
      ▶   “euh téléphone illimité en panne" → internet_voip
      ▶   “le clignotant reste toujours allumé" → messagerie_vocale
      1   Extract word n-gramms
             ⋆   je je_n'ai je_n'ai_plus n'ai_plus_de plus_de_tonalité...
      2   Use them as features of a classifier (résultats mlcomp.org)
             ⋆   0.207   SMO_weka_nominal
             ⋆   0.216   boostexter
             ⋆   0.232   sgd-logistic-stepsize0.3-iter5
             ⋆   0.272   liblinear-s6-B1
      3   Most relevant features
             ⋆   “plus de tonalité"
             ⋆   “sur ma ligne"
             ⋆   “une autre demande"
             ⋆   “ai un problème"
   Benoit Favre (AMU)                    PSTALN: Dialog        January 20, 2020   13 / 35
    Extension of classic syntagmatic grammars
       ▶   Integrate domain semantics into grammar


                   Action                               Voyage

                Je voudrais    Moyen      Départ        Arrivée        Temps

                              un train   de Paris   à Marseille   dans l'après-midi

    Grammar example:

Request -> $Action $Travel
Travel -> $Mean $Departure $Arrival $Time
Action -> I would like | I would have to | have you got
Mean -> a train | a flight | a plane
Departure -> from $City | starting from $City
Arrival -> to $City | arriving at $Ville
Weather -> in the afternoon | today | tomorrow | at $Hour | the $Date

    Benoit Favre (AMU)                    PSTALN: Dialog                       January 20, 2020   14 / 35
Concept detection
   Example: from Media corpus (WoZ, hotel reservation)
     ▶   Detect triplets (modality, type, value) linked to a task
     ▶   heu [+:reponse:oui oui] [+:connect:opposition mais] j’aimerai
         d’abord savoir si le [+:hotel:Ibis ibis] il y a un
         [?:service:jacuzzi jacuzzi] et une [?:service:piscine piscine] si
         ils acceptent [?:service:animaux les chiens]
   Formulate as sequence prediction
     ▶   BIO formalism (begin-inside-outside)
     ▶   Model type HMM/RNN/CRF
     ▶   Features on words, pos-tags...
                           ...         ...
                           occasion    O
                           de          O
                           la          B-evenement
                           fête        I-evenement
                           du          I-evenement
                           cheval      I-evenement
                           donc        B-connect
                           euh         O
                           à           B-loc-relative
                           proximité   I-loc-relative

  Benoit Favre (AMU)             PSTALN: Dialog           January 20, 2020   15 / 35
Dialogue management

                          Modèle de       Modèle de
                         l'utilisateur     la tâche    Réponse
                         Modèle de         Modèle     Commandes
                          discours        du monde

              Entrées     Gestionnaire de dialogue      Sorties

  Task model: expected requests, possible commands
  User model: do not tell the user what he already knows, predict the
  following sentence (objectives, knowledge, interests)
  Discourse model: conversation history (pronouns resolution), dialogue
  states, what to do at a given time
  World model: general knowledge for understanding

  Benoit Favre (AMU)             PSTALN: Dialog            January 20, 2020   16 / 35
ELIZA: the psychanalist

   EMACS: alt-x-doctor

   Benoit Favre (AMU)     PSTALN: Dialog   January 20, 2020   17 / 35
ELIZA: How does it work?
   Key words → responses
     ▶   “BONJOUR" → “Comment vas-tu aujourd’hui.. De quoi désires-tu
     ▶   “PEUX-TU" → “Tu ne crois que je suis capable de
Dialog management: AIML (ALICE)

    AIML (Artificial Intelligence Markup Language)

    Recursive rules
    Memorize previous answer and topic (broad category)
    Learn new responses from user input

    Benoit Favre (AMU)                PSTALN: Dialog      January 20, 2020   19 / 35
ALICE: how to make a bot ?
     Available source code
        ▶   Program AB: https://code.google.com/p/program-ab/
        ▶   Knowledge base:
        ▶   Yes ("Is violet a color?")
        ▶   No ("Are fish mammals?")
        ▶   Sometimes ("Is the sky blue?")
        ▶   Synonyms: “Hello", “Hi there", “Howdy" → “Hi".
        ▶   Simplification: “I am feeling very happy right now" → “I am happy".
        ▶   Input segmentation: “Yes my name is Jim" → “Yes" + “My name is
 age = 15
 baseballteam = Red Sox
 birthday = Nov. 23, 1995
 birthplace = Bethlehem, Pennsylvania
 boyfriend = I am single
 celebrities = Oprah, Steve Carell, John Stewart, Lady Gaga
 ... Benoit Favre (AMU)                 PSTALN: Dialog         January 20, 2020   20 / 35
Voice XML
    W3C standard
    XML representation of a dialogue

             Please choose airline, hotel, or rental car.
             [airline hotel "rental car"]
             You have chosen .
    Can specify a grammar for the fields to be filled and the order in
    which to fill them
    Code evaluation ()
    Benoit Favre (AMU)                 PSTALN: Dialog         January 20, 2020   21 / 35
Information retrieval-based system
   Find most relevant answer for a question
      ▶   Exemple : stack overflow

   Multi-user: ask a question to another user (chatbots)
   Benoit Favre (AMU)               PSTALN: Dialog         January 20, 2020   22 / 35

  Finite state machine to represent dialogue states
     ▶   Don’t list flights until all the fields are filled
  Entry into a state is conditioned by an interpretation (of concepts,
  recognition DA...) and dialogue history
  A state corresponds to a command and/or a response from the system
  Limit: locked in the structure of the dialogue
     ▶   Given a state and an interpretation of the user, we always go to
         another state

  Benoit Favre (AMU)               PSTALN: Dialog             January 20, 2020   23 / 35
 Problem: Which trajectory to follow in the dialog automaton to
 minimize task completion time?
    ▶   We don’t know the future, we have to make assumptions
 Markov Decision Process
    ▶   At each time t, the process (the user) is in a state s (it has an
    ▶   The system chooses to perform an action a (for example it asks a
    ▶   This action randomly places the process (the user) in state s ′ ,
        according to a probability Pa (s, s ′ )
    ▶   This change of state gives the system a gain Ra (s, s ′ )
    ▶   The choices of the process depend only on a and s and not on the rest
        of the history (Markov property)
 How to maximize the cumulative gain over the entire dialogue?
    ▶   We call policy a series of actions noted Pi
    ▶   Use dynamic programming to find the policy that maximizes
                                        γt Ra (st , st+1 )

 Benoit Favre (AMU)             PSTALN: Dialog               January 20, 2020   24 / 35

  Partially Observable Markov Decision Process (POMDP)
       ▶   Extension of MDP the state (user intent) isn’t directly observed
       ▶   Distribution over states that the user can be in given what she says
       ▶   Observations are user sentences
 MDP                                             POMDP

                                 s'                                       s'

                            R(s,s')                                  R(s,s')

                            s                                        s

            Passé       Futur                            Passé   Futur

  Requires approximate inference

 Benoit Favre (AMU)                   PSTALN: Dialog             January 20, 2020   25 / 35
Model comparison

                       Markov Chain

           Markov Decision Process                     ?       ?

                                                       ?       ?
              Hidden Markov Chain

                                                       ?       ?
           Partially-observable MDP

  Benoit Favre (AMU)                  PSTALN: Dialog       January 20, 2020   26 / 35
Advanced elements
     ▶   System initiative (= le technicien freebox qui suit un script)
     ▶   User initiative (Google Now, commande vocale)
     ▶   Mixed initiative :
            ⋆   “SIRI, can you change brightness?" (user initiative)
            ⋆   “Yes, how bright?" (system initiative)
     ▶   Explicit
            ⋆   “J’aimerai aller de Marseille à Barcelone"
            ⋆   “Votre lieu de départ est-il Marseille ?"
            ⋆   “Oui"
            ⋆   “Votre lieu d’arrivée est-il Barcelone"
     ▶   Implicite
            ⋆   “J’aimerai aller de Marseille à Barcelone"
            ⋆   “Quand souhaitez-vous aller de Marseille à Barcelone ?"
  Who is the user talking to?
     ▶   Kinect / Nao

  Benoit Favre (AMU)                 PSTALN: Dialog                    January 20, 2020   27 / 35
Chatbots with deep learning
   Replace classic elements
      1   Intent classification (what do you want?)
      2   Slot filling (what are the parameters of your query?)
      3   Next action prediction
   Typical frameworks (i.e. RASA)
      ▶   Provide pretrained intents / slots
      ▶   Active learning for annotating data
      ▶   Dialog flow is pre-scripted (best control)
   Data-driven approach to conversation modeling
      ▶   Given a conversation up to a point, can we predict what will happen
      ▶   No need for linguistic analysis, but no linguistic prior
      ▶   Examples:
             1   Alternating language models
             2   Turn retrieval
             3   Machine translation (history → next turn)

   Benoit Favre (AMU)                PSTALN: Dialog              January 20, 2020   28 / 35
Alternating language model

   A simplified version of the encoder-decoder (or seq2seq) framework
      ▶   Trained the same way as a regular word-based language model
      ▶   At prediction time, alternate between user input and generation
             ⋆   Training data needs to be in the same form

      ▶   Word-by-word prediction
      ▶   Any language model (GPT-2...)
      ▶   Attention mechanisms
             w1 w2 w3                               w1 w2 w3

             M                    H    w1 w2  M

   Benoit Favre (AMU)                 PSTALN: Dialog              January 20, 2020   29 / 35
Representation learning
   Create an information retrieval system
      ▶   Which can retrieve the next turn given a history
      ▶   Encode history with a first recurrent model
      ▶   Encode next turn with a second recurrent model
      ▶   Compute a similarity between those representations (dot product)
   Training objective: triplet ranking
      ▶   Make sure the correct association has a higher score than a randomly
          selected pair
   Problem: the cost of retrieving a turn
      ▶   Everything can be precomputed, just the dot product remains
      ▶   Many approaches for finding approximate nearest neighbors in a high
          dimensional space (ie. locality preserving hashing)

                        M   w1 w2  H      w1 w2        cosine


                                         M   w1 w2 

   Benoit Favre (AMU)              PSTALN: Dialog             January 20, 2020   30 / 35
Bi-encoder training
   Maximize margin between the result of hi · ri and ni · ri
      ▶   hi is the history
      ▶   ni is a random history
      ▶   ri is the response
                        Loss =         max(0, 1 − hi · ri + ni · ri ))
                                   n i

   Keras model

   Benoit Favre (AMU)                PSTALN: Dialog                January 20, 2020   31 / 35
Experiments on Datcha corpus

   Corpus: Orange ATH TV

               Stat                Train            Valid      Test
               Conversations      16,140              698       606
               Turns             465,693           20,090    18,392
               Words           7,744,262          327,979   299,340

     ▶   Tokenization (based on penn tokenizer)
     ▶   A few rules to strip additional URLs, phone numbers, etc.
     ▶   Lower case
     ▶   Concatenate turns of the same participant with 
     ▶   Separate conversations by 
     ▶   Replace all TC[1-9] by a generic TC

  Benoit Favre (AMU)             PSTALN: Dialog                  January 20, 2020   32 / 35
Datcha results

   Evaluation metrics
      ▶   Perplexity (PPL): − n1   logP(turn|history)
      ▶   Better-than-random (BTR): n1 |P(turn|history) > P(turn|noise)|
   Results on the ATH TV test set (3 last files):

                        Method                     PPL      BTR
                        Language model            17.52   69.39%
                        Information retrieval     11.85   93.91%

      ▶   LM: vocab=30k, layers=2, hidden=650, sample=1024, maxlen=35,
          batch=20, optim=sgd, epochs=8
      ▶   Bi-encoder: vocab=30k, embeddings=128 (init=w2v), hidden=256,
          maxlen=64, repr=128, batch=256, optim=Nadam, epochs=100

   Benoit Favre (AMU)                 PSTALN: Dialog               January 20, 2020   33 / 35
t-SNE Analysis

   t-SNE Projections of turn representations

  Benoit Favre (AMU)          PSTALN: Dialog   January 20, 2020   34 / 35
   Dialogue systems
     ▶   Understanding
     ▶   Planning
     ▶   Data driven
   Open issues
     ▶   Noisy input
            ⋆   Speech recognition / spelling errors
     ▶   Unrestricted domain
            ⋆   Chit-chat
            ⋆   Rapid development of new domains
     ▶   Grounding
            ⋆   Tackle real-world objects
     ▶   End-to-end training
            ⋆   Speech-recognition → dialog management → speech synthesis
     ▶   Training without experiencing
            ⋆   Simulation
   Current state of the art
     ▶   https://nlpprogress.com/english/dialogue.html
  Benoit Favre (AMU)                 PSTALN: Dialog            January 20, 2020   35 / 35
You can also read