NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School

Page created by Lonnie Bell
 
CONTINUE READING
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
NLP APPLICATIONS III: DIALOGUE SYSTEMS
MIULAB
NTU

 Yun-Nung Vivian Chen
 http://vivianchen.idv.tw
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
MIULAB
NTU

 2

 Iron Man (2008)
 What can machines achieve now or in the future?
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
MIULAB
 Language Empowering Intelligent Assistants

 Apple Siri (2011) Google Now (2012) Microsoft Cortana (2014)
NTU

 Google Assistant (2016)

 Amazon Alexa/Echo (2014) Google Home (2016) Apple HomePod (2017) Facebook Portal (2019)
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
Why Natural Language?
 • Global Digital Statistics (2018 January)
MIULAB
NTU

 Active Social Unique Mobile Active Mobile
 Total Population Internet Users Media Users Users Social Users
 7.59B 4.02B 3.20B 5.14B 2.96B

 The more natural and convenient input of devices evolves towards speech.
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
Why and When We Need?
 “I want to chat” Turing Test (talk like a human) Social Chit-Chat
 “I have a question” Information consumption
MIULAB

 Task-Oriented
 “I need to get this done” Task completion
 Dialogues
 “What should I do?” Decision support

 • What is today’s agenda?
NTU

 • What does NLP stand for?
 • Book me the train ticket from Kaohsiung to Taipei
 • Reserve a table at Din Tai Fung for 5 people, 7PM tonight 5

 • Schedule a meeting with Vivian at 10:00 tomorrow

 • Is this summer school good to attend?
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
MIULAB
NTU
 Intelligent Assistants

 Task-Oriented
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
App → Bot
 • A bot is responsible for a “single” domain, similar to an app
MIULAB
NTU

 7

 Users can initiate dialogues instead of following the GUI design
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
MIULAB
 Two Branches of Conversational AI

 Chit-Chat
NTU

 Task-Oriented
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
Task-Oriented Dialogues
MIULAB
NTU

 JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
Task-Oriented Dialogue Systems (Young, 2000)

 Speech Signal Hypothesis
 are there any action movies to
MIULAB

 see this weekend Language Understanding (LU)
 Speech • Domain Identification
 Recognition • User Intent Detection
 Text Input • Slot Filling
 Are there any action movies to see this weekend?
 Semantic Frame
 request_movie
NTU

 genre=action, date=this weekend

 Dialogue Management
 10 (DM)
 Natural Language
 • Dialogue State Tracking (DST)
 Text response Generation (NLG)
 • Dialogue Policy
 Where are you located?
 System Action/Policy
 request_location
 Backend Database/
 Knowledge Providers
Task-Oriented Dialogue Systems (Young, 2000)

 Speech Signal Hypothesis
 are there any action movies to
MIULAB

 see this weekend Language Understanding (LU)
 Speech • Domain Identification
 Recognition • User Intent Detection
 Text Input • Slot Filling
 Are there any action movies to see this weekend?
 Semantic Frame
 request_movie
NTU

 genre=action, date=this weekend

 Dialogue Management
 11 (DM)
 Natural Language
 • Dialogue State Tracking (DST)
 Text response Generation (NLG)
 • Dialogue Policy
 Where are you located?
 System Action/Policy
 request_location
 Backend Action /
 Knowledge Providers
Language Understanding (LU)
 • Pipelined
MIULAB

 1. Domain 2. Intent
NTU

 3. Slot Filling
 Classification Classification
 12
1. Domain Identification
 Requires Predefined Domain Ontology

 User
MIULAB

 find a good eating place for taiwanese food
NTU

 Restaurant DB Taxi DB Movie DB
 13

 Intelligent Organized Domain Knowledge (Database)
 Agent
 Classification!
2. Intent Detection
 Requires Predefined Schema

 User
MIULAB

 find a good eating place for taiwanese food

 FIND_RESTAURANT
NTU

 Restaurant DB
 FIND_PRICE
 FIND_TYPE
 14
 :
 Intelligent
 Agent
 Classification!
3. Slot Filling
 Requires Predefined Schema

 O O B-rating O O O B-type O
 User
MIULAB

 find a good eating place for taiwanese food

 Restaurant Rating Type
 Rest 1 good Taiwanese
 Rest 2 bad Thai
NTU

 Restaurant DB : : :
 15

 FIND_RESTAURANT SELECT restaurant {
 Intelligent rating=“good” rest.rating=“good”
 Agent type=“taiwanese” rest.type=“taiwanese”
 Semantic Frame }
 Sequence Labeling
Slot Tagging (Yao et al, 2013; Mesnil et al, 2015)
 • Variations: http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380

 a. RNNs with LSTM cells
MIULAB

 b. Input, sliding window of n-grams
 c. Bi-directional LSTMs

 0 1 2 
NTU

 0 1 2 0 1 2 

 ℎ0 ℎ1 ℎ2 ℎ 
 ℎ0 ℎ1 ℎ2 ℎ ℎ0 ℎ1 ℎ2 ℎ ℎ0
 
 ℎ1
 
 ℎ2 ℎ 
 0 1 2 0 1 2 0 1 2 
 (a) LSTM (b) LSTM-LA (c) bLSTM
Slot Tagging (Kurata et al., 2016; Simonnet et al., 2015)
 • Encoder-decoder networks http://www.aclweb.org/anthology/D16-1223

 • Leverages sentence level information 0 1 2 
MIULAB

 ℎ ℎ2 ℎ1 ℎ0

 2 1 0 0 1 2 
 • Attention-based encoder-decoder
 • Use of attention (as in MT) in the encoder-decoder network
NTU

 • Attention is estimated using a feed-forward network with input: ht and st at
 time t 0 1 2 

 0 1 2 
 ℎ0 ℎ1 ℎ2 ℎ 

 0 1 2 
 ci

 ℎ0 …ℎ 
Joint Semantic Frame Parsing
 • Slot filling and • Intent prediction
 intent prediction and slot filling are
MIULAB

 Sequence- in the same Parallel (Liu performed in two
 based (Hakkani- branches
 Tur et al., 2016) output sequence and Lane, 2016)
NTU

 taiwanese food please EOS
 U U U U

 ht-1 ht ht+1 hT+1
 W W W W
 V V V V
 B-type O O FIND_REST

 Slot Filling Intent Prediction
MIULAB
 Joint Model Comparison

 Attention Intent-Slot
 Mechanism Relationship
 Joint bi-LSTM X Δ (Implicit)
 Attentional Encoder-Decoder √ Δ (Implicit)
NTU

 Slot Gate Joint Model √ √ (Explicit)

 19
Slot-Gated Joint SLU (Goo+, 2018)

 Slot 1 2 3 4 
MIULAB

 Sequence
 Intent Attention 
 Slot
 Gate 

 Slot Attention BLSTM tanh

 Word 1 
NTU

 2 3 4
 BLSTM Sequence
 
 Word Slot Gate
 1 2 3 4
 Sequence = ∑ ∙ tanh + ∙ 
 Slot Prediction
 = ℎ + ∙ + 

 will be larger if slot and intent are better related
Contextual Language Understanding
 • User utterances are highly ambiguous in isolation
MIULAB

 Restaurant
 Book a table for 10 people tonight.
 Booking

 Which restaurant would you like to book a table for?
NTU

 Cascal, for 6.

 ?
 #people time
End-to-End Memory Networks (Sukhbaatar et al, 2015)

 U: “i d like to purchase tickets to see deepwater horizon” m0
MIULAB

 S: “for which theatre”
 U: “angelika”
 S: “you want them for angelika theatre?”
 U: “yes angelika” mi
 S: “how many tickets would you like ?”
NTU

 U: “3 tickets for saturday”
 S: “What time would you like ?”
 U: “Any time on saturday is fine”
 mn-1
 S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm”
 U: “Let’s do 5:40” u
E2E MemNN for Contextual LU (Chen+, 2016)
 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf

 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
MIULAB

 pi Knowledge Attention Distribution slot tagging sequence y
 RNN Tagger
 Contextual yt-1 yt
 Sentence Encoder
 V V
 mi Weighted
 RNNmem Sum ht-1 ht
 h W W W
NTU

 x1 x2 … xi Memory Representation
 M U M U
 Sentence Inner wt-1 wt
 Encoder Product
 RNNin
 ∑ Wkg 23

 x1 …
 x2 xi Knowledge Encoding
 c u Representation o
 history utterances {xi}
 current utterance

 Idea: additionally incorporating contextual knowledge during slot tagging
 → track dialogue states in a latent way
E2E MemNN for Contextual LU (Chen et al., 2016)

 U: “i d like to purchase tickets to see deepwater horizon” 0.69
MIULAB

 S: “for which theatre”
 U: “angelika”
 S: “you want them for angelika theatre?”
 U: “yes angelika”
 S: “how many tickets would you like ?” 0.13
NTU

 U: “3 tickets for saturday”
 S: “What time would you like ?”
 U: “Any time on saturday is fine”
 S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm” 0.16

 U: “Let’s do 5:40”
Recent Advances in NLP
 • Contextual Embeddings (ELMo & BERT)
 • Boost many understanding performance
MIULAB

 with pre-trained natural language

 ?
NTU
NTU MIULAB
NTU MIULAB
Task-Oriented Dialogue Systems (Young, 2000)

 Speech Signal Hypothesis
 are there any action movies to
MIULAB

 see this weekend Language Understanding (LU)
 Speech • Domain Identification
 Recognition • User Intent Detection
 Text Input • Slot Filling
 Are there any action movies to see this weekend?
 Semantic Frame
 request_movie
NTU

 genre=action, date=this weekend

 Dialogue Management
 28 (DM)
 Natural Language
 • Dialogue State Tracking (DST)
 Text response Generation (NLG)
 • Dialogue Policy
 Where are you located?
 System Action/Policy
 request_location
 Backend Action /
 Knowledge Providers
Mismatch between Written and Spoken Languages

 Training Testing
MIULAB

 • Written language • Spoken language
 • Include recognition errors

 • Goal: ASR-Robust Contextualized Embeddings
NTU

 ✓learning contextualized word embeddings specifically for spoken language
 ✓achieves better performance on spoken language understanding tasks
Adapting Transformer to ASR Lattices (Huang and Chen, 2019)
 
 • Idea: lattices may include correct words Linear

 • Goal: feed lattices into Transformer Transformer Encoder
MIULAB

 1 ...
 1 2 −1 
 air fair
 0.4 1
 1 0.3 1 1 1
 cheapest airfare to Milwaukee 
 0.3
NTU

 1
 affair
ASR-Robust Contextualized Embeddings
 • Confusion-Aware Fine-Tuning
 • Supervised
MIULAB
NTU

 • Unsupervised
LU Evaluation
 • Metrics
 • Sub-sentence-level: intent accuracy, intent F1, slot F1
MIULAB

 • Sentence-level: whole frame accuracy
NTU

 32
Task-Oriented Dialogue Systems (Young, 2000)

 Speech Signal Hypothesis
 are there any action movies to
MIULAB

 see this weekend Language Understanding (LU)
 Speech • Domain Identification
 Recognition • User Intent Detection
 Text Input • Slot Filling
 Are there any action movies to see this weekend?
 Semantic Frame
 request_movie
NTU

 genre=action, date=this weekend

 Dialogue Management
 33 (DM)
 Natural Language
 • Dialogue State Tracking (DST)
 Text response Generation (NLG)
 • Dialogue Policy
 Where are you located?
 System Action/Policy
 request_location
 Backend Action /
 Knowledge Providers
MIULAB
 Dialogue State Tracking

 request (restaurant; foodtype=Thai)

 inform (area=centre)
NTU

 34
 request (address)

 bye ()
Dialogue State Tracking
 Requires Hand-Crafted States

 User
MIULAB

 find a good eating place for taiwanese food

 i want it near to my office
NTU

 NULL

 location rating type
 35

 Intelligent
 loc, rating rating, type loc, type
 Agent

 all
Dialogue State Tracking
 Requires Hand-Crafted States

 User
MIULAB

 find a good eating place for taiwanese food

 i want it near to my office
NTU

 NULL

 location rating type
 36

 Intelligent
 loc, rating rating, type loc, type
 Agent

 all
Dialogue State Tracking
 Handling Errors and Confidence

 User
MIULAB

 find a good eating place for taixxxx food

 FIND_RESTAURANT FIND_RESTAURANT FIND_RESTAURANT
 rating=“good” rating=“good” rating=“good”
 type=“taiwanese” type=“thai”
 rating=“good”, ?
NTU

 NULL type=“thai”

 ? rating=“good”,
 type
 ?
 location rating 37
 type=“taiwanese”

 Intelligent ?
 loc, rating rating, type loc, type
 Agent

 all
Dialogue State Tracking (DST)
 • Maintain a probabilistic distribution instead of a 1-best prediction for
 better robustness to SLU errors or ambiguous input
MIULAB

 Slot Value
 # people 5 (0.5)
NTU

 time 5 (0.5) How can I help you?

 Book a table at
 38 Sumiko for 5

 Slot Value How many people?

 # people 3 (0.8) 3

 time 5 (0.8)
Multi-Domain Dialogue State Tracking
 • A full representation of the system's belief of the user's goal at any
 point during the dialogue
MIULAB

 • Used for making API calls
 I wanna buy two
 tickets for tonight at
 the Shoreline theater.

 Which movie are you
 interested in?

 Inferno.
NTU

 Movies
 Date 11/15/17

 8 pm 9 pm
 39
 Time 6 pm 7 pm

 #People 2

 Century 16
 Theater
 Shoreline

 Movie Inferno

 Less More
 Likely Likely
Multi-Domain Dialogue State Tracking
 • A full representation of the system's belief of the user's goal at any
 point during the dialogue
MIULAB

 • Used for making API calls
 I wanna buy two
 tickets for tonight at
 the Shoreline theater.

 Which movie are you
 interested in?

 Inferno.
NTU

 Inferno showtimes at
 Movies Restaurants Century 16 Shoreline are
 6:30pm, 7:30pm, 8:45pm
 11/15/17 Date 11/15/17
 Date and 9:45pm. What time
 do you prefer? 40
 Time 6:30 pm 7:30 pm 8:45 pm 9:45 pm Time 6:00 pm 6:30 pm 7:00 pm

 #People Restaurant Cascal We'd like to eat dinner
 2
 before the movie at
 Theater Century 16
 #People 2 Cascal, can you check
 Shoreline
 what time i can get a
 Movie Inferno table?

 Less More
 Likely Likely
Multi-Domain Dialogue State Tracking
 • A full representation of the system's belief of the user's goal at any
 point during the dialogue
MIULAB

 • Used for making API calls
 Inferno.

 Inferno showtimes at
 Century 16 Shoreline are
 6:30pm, 7:30pm, 8:45pm
 and 9:45pm. What time
 do you prefer?

 We'd like to eat dinner
NTU

 before the movie at
 Movies Restaurants Cascal, can you check
 what time i can get a
 11/15/17 Date 11/15/17
 Date table?
 41
 Time 6:30 pm 7:30 pm 8:45 pm 9:45 pm Time 6:00 pm 6:30 pm 7:00 pm Cascal has a table for 2
 at 6pm and 7:30pm.
 #People 2 Restaurant Cascal
 OK, let me get the
 Theater Century 16
 #People 2 table at 6 and tickets
 Shoreline
 for the 7:30 showing.
 Movie Inferno

 Less More
 Likely Likely
MIULAB
 DNN for DST

 feature
 DNN
 extraction

 A slot value distribution
NTU

 for each slot
 state of this turn
 42

 multi-turn conversation
RNN-CNN DST (Mrkšić+, 2015)
 http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777
MIULAB
NTU

 43

 (Figure from Wen et al, 2016)
Neural Belief Tracker (Mrkšić+, 2016)
 • Candidate pairs are considered https://arxiv.org/abs/1606.03777
MIULAB
NTU

 Previous Belief State: [bt-1]

 Belief State Updates: [bt]
Global-Locally Self-Attentive DST (Zhong+, 2018)
 • More advanced encoder http://www.aclweb.org/anthology/P18-1135

 • Global modules share parameters for all slots
 • Local modules learn slot-specific feature representations
MIULAB
NTU

 45
Dialog State Tracking Challenge (DSTC)
 (Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016)
MIULAB

 Challenge Type Domain Data Provider Main Theme
 DSTC1 Human-Machine Bus Route CMU Evaluation Metrics
 DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes
 DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation
 DSTC4 Human-Human Tourist Information I2R Human Conversation
NTU

 DSTC5 Human-Human Tourist Information I2R Language Adaptation
DSTC4-5
 • Type: Human-Human {Topic: Accommodation; Type: Hostel; Pricerange:
 Cheap; GuideAct: ACK; TouristAct: REQ}
 • Domain: Tourist Information
MIULAB

 Tourist: Can you give me some uh- tell me some cheap rate
 {Topic: Accommodation; NAME: InnCrowd
 hotels, because I'm planning just to leave my bags there
 Backpackers Hostel; GuideAct: REC; TouristAct: ACK} and go somewhere take some pictures.
 Guide: Let's try this one, okay? Guide: Okay. I'm going to recommend firstly you want to have a
 backpack type of hotel, right?
 Tourist: Okay.
 Tourist: Yes. I'm just gonna bring my backpack and my buddy with
 Guide: It's InnCrowd Backpackers Hostel in Singapore. If me. So I'm kinda looking for a hotel that is not that
 you take a dorm bed per person only twenty
NTU

 expensive. Just gonna leave our things there and, you
 dollars. If you take a room, it's two single beds at know, stay out the whole day.
 fifty nine dollars.
 Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit
 Tourist: Um. Wow, that's good. uh not so roomy like
 47 hotel because you just back to sleep.

 Guide: Yah, the prices are based on per person per bed Tourist: Yes. Yes. As we just gonna put our things there and then
 or dorm. But this one is room. So it should be go out to take some pictures.
 fifty nine for the two room. So you're actually
 paying about ten dollars more per person only. Guide: Okay, um-
 Tourist: Hm.
 Tourist: Oh okay. That's- the price is reasonable actually.
 It's good.
DST Evaluation
 • Metric
 • Tracked state accuracy with respect to user goal
MIULAB

 • Recall/Precision/F-measure individual slots
NTU

 48
Dialogue Policy Optimization

 greeting ()
MIULAB

 request (restaurant; foodtype=Thai)

 request (area)

 inform (area=centre)
NTU

 inform (restaurant=Bangkok city,
 area=centre of town, foodtype=Thai)

 49
 request (address)

 inform (address=24 Green street)

 bye ()
Supervised v.s. Reinforcement
 • Supervised
 “Hello” Say “Hi”
MIULAB

 Learning from teacher
 “Bye bye” Say “Good bye”

 • Reinforcement
 OXX???!
NTU

 ……. ……. ……
 50

 Hello ☺ ……
 Bad
 Learning from critics
Dialogue Policy Optimization
 • Dialogue management in a RL framework
MIULAB

 User
 Environment

 Natural Language Generation Language Understanding
NTU

 Action A Reward R Observation O
 51

 Dialogue Manager Agent

 Select the best action that maximizes the future reward
Reward for RL ≅ Evaluation for System
 • Dialogue is a special RL task
 • Human involves in interaction and rating (evaluation) of a dialogue
MIULAB

 • Fully human-in-the-loop framework
 • Rating: correctness, appropriateness, and adequacy
NTU

 - Expert rating high quality, high cost
 - User rating unreliable quality,
 52 medium cost
 - Objective rating Check desired aspects, low cost
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017)

 wi wi+1 EOS
MIULAB

 wi wi+1 EOS
 wi wi+1 EOS
 NLG w Natural Language LU
 w0 w1 2 EOS O 
 O 
 Are there any action movies O 
 to see this weekend?

 DST Knowledge
 User Goal 
 Database
NTU

 Dialogue Policy Learning
 Dialogue Policy
 User Model 1 2 …… 
 53 Neural
 User Simulation request_location
 Dialogue
 1 2 … System

 Idea: SL for each component and RL for end-to-end training the neural dialogue system
Reinforcement Learning Agent (Li et al., 2017)
 User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle.

 RULE BASED SYSTEM REINFORCEMENT LEARNING SYSTEM
 User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm? User: I want two tickets for movie ‘Deadpool’.
MIULAB

 Agent: Which theater do you want? Agent: What is the starttime you prefer?
 User: AMC pacific place 11 theater, please! User: 9:00 PM
 Agent: How many tickets do you want? Agent: Which day?
 User: Two! User: Tomorrow
 Agent: What date? Agent: Which city?
 User: Tomorrow! User: Seattle
 Agent: Which city do you want? Agent: Which theater do you prefer?
 User: Seattle! User: AMC Pacific Place 11 theater.
NTU

 Agent: Do you have any specification for zip code? Agent: Okay, I will issue 2 tickets for you, tomorrow
 User: No, I do not care. 9:00 pm at AMC pacific place 11 theater, Seattle, movie
 Agent: Do you have any distance constraint? ‘Deadpool’. 54
 User: No, I do not care. Issue: no notion about User: Okay, thanks.
 Agent: How about video format? 2d? Imax 3d? what requests can be Agent: Thanks!
 User: Any kind is okay. skipped
 Agent: Do you have any constraint to price?
 User: No!
 Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’.
 User: Okay, thanks.
 Agent: Thanks!
 Skip the requests the user may not care about to improve efficiency
Planning – Deep Dyna-Q (Peng+, 2018)
 • Issues: sample-inefficient, discrepancy between simulator & real user
 • Idea: learning with real users with planning
MIULAB

 Human
 Conversational Data
 Supervised Imitation
 Learning Learning

 Policy Acting
NTU

 Planning Model
 World Direct User 55
 Model Reinforcement
 Learning
 World Model
 Learning Real
 Experience

 Policy learning suffers from the poor quality of fake experiences
Robust Planning – D3Q (Su+, 2018)
 • Idea: add a discriminator to filter out the bad experiences (to appear) EMNLP 2018
MIULAB

 Real Semantic
 Experience Frame
 NLU DST Human
 Imitation
 Simulated
 Conversational Data
 Learning
 Experience Supervised
 Learning Controlled Planning
 Discriminator State
 Representation Policy Acting
 Discriminator
NTU

 Model
 User World Discriminative Direct
 World Model Reinforcement
 Model Training User 56
 Policy Learning
 NLG
 Learning Real
 System Action (Policy) World Model Experience
 Learning
Robust Planning – D3Q (Su+, 2018)
 (to appear) EMNLP 2018
MIULAB
NTU

 S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018.

 The policy learning is more robust and shows the improvement in human evaluation
Dialogue Management Evaluation
 • Metrics
 • Turn-level evaluation: system action accuracy
MIULAB

 • Dialogue-level evaluation: task success rate, reward
NTU

 58
RL-Based DM Challenge
 • SLT 2018 Microsoft Dialogue Challenge:
 End-to-End Task-Completion Dialogue Systems
MIULAB

 • Domain 1: Movie-ticket booking
 • Domain 2: Restaurant reservation
 • Domain 3: Taxi ordering
NTU
Task-Oriented Dialogue Systems (Young, 2000)

 Speech Signal Hypothesis
 are there any action movies to
MIULAB

 see this weekend Language Understanding (LU)
 Speech • Domain Identification
 Recognition • User Intent Detection
 Text Input • Slot Filling
 Are there any action movies to see this weekend?
 Semantic Frame
 request_movie
NTU

 genre=action, date=this weekend

 Dialogue Management (DM)
 Natural Language
 • Dialogue State Tracking (DST)
 Text response Generation (NLG)
 • Dialogue Policy
 Where are you located?
 System Action/Policy
 request_location
 Backend Action /
 Knowledge Providers
Natural Language Generation (NLG)
 • Mapping dialogue acts into natural language
MIULAB

 inform(name=Seven_Days, foodtype=Chinese)
NTU

 Seven Days is a nice Chinese restaurant

 61
Template-Based NLG
 • Define a set of rules to map frames to natural language
MIULAB

 Semantic Frame Natural Language
 confirm() “Please tell me more about the product your are looking for.”
 confirm(area=$V) “Do you want somewhere in the $V?”
 confirm(food=$V) “Do you want a $V restaurant?”
 confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.”
NTU

 Pros: simple, error-free, easy to control 62

 Cons: time-consuming, rigid, poor scalability
RNN-Based LM NLG (Wen et al., 2015)

 Input Inform(name=Din Tai Fung, food=Taiwanese) dialogue act 1-hot
 representation
MIULAB

 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0…

 SLOT_NAME serves SLOT_FOOD . 

 conditioned on
 the dialogue act
NTU

 SLOT_NAME serves SLOT_FOOD .
 Output Din Tai Fung serves Taiwanese .
 delexicalisation
Semantic Conditioned LSTM (Wen et al., 2015)
 • Issue: semantic repetition
 • Din Tai Fung is a great Taiwanese xt ht-1 xt ht-1 xt ht-1
MIULAB

 restaurant that serves Taiwanese.
 • Din Tai Fung is a child friendly
 ht-1 ft
 restaurant, and also allows kids.
 xt ht
 it Ct ot

 LSTM cell
NTU

 DA cell
 dt-1 dt
 rt

 64
 d0
 xt ht-1
 dialog act 1-hot
 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, …
 representation
 Inform(name=Seven_Days, food=Chinese)

 Idea: using gate mechanism to control the generated semantics (dialogue act/slots)
Issues in NLG
 • Issue
 • NLG tends to generate shorter sentences
MIULAB

 • NLG may generate grammatically-incorrect sentences
 • Solution
 • Generate word patterns in a order
 • Consider linguistic patterns
NTU

 65
Hierarchical NLG w/ Linguistic Patterns (Su et al., 2018)
 GRU Decoder Near All Bar One is a moderately priced Italian place it is
 … is a moderately … called Midsummer House
 1. Repeat-input
MIULAB

 2. Inner-Layer Teacher Forcing
 3. Inter-Layer Teacher Forcing DECODING LAYER4 4. Others
 4. Curriculum Learning
 All Bar One is moderately priced Italian place it is called
 last output − …All Bar One is a …
 Midsummer House
 output from last layer − 
 … All Bar One is moderately…
 DECODING LAYER3 3. ADJ + ADV
 Bidirectional GRU
NTU

 Encoder
 All Bar One is priced place it is called Midsummer House
 66

 name … Italian priceRange… DECODING LAYER2 2. VERB
 Semantic 1-hot [ … 1, 0, 0, 1, 0, …]
 All Bar One place it Midsummer House
 Representation
 enc
 Input name[Midsummer House], food[Italian], ENCODER DECODING LAYER1 1. NOUN + PROPN + PRON
 Semantics priceRange[moderate], near[All Bar One] Hierarchical Decoder
NLG Evaluation
 • Metrics
 • Subjective: human judgement (Stent+, 2005)
MIULAB

 • Adequacy: correct meaning
 • Fluency: linguistic fluency
 • Readability: fluency in the dialogue context
 • Variation: multiple realizations for the same concept
 • Objective: automatic metrics
 • Word overlap: BLEU (Papineni+, 2002), METEOR, ROUGE
NTU

 • Word embedding based: vector extrema, greedy matching, embedding average
 67

 There is a gap between human perception and automatic metrics
MIULAB
 Evolution Roadmap
 Dialogue depth (complexity)

 I feel sad…

 I’ve got a cold what do I do?
NTU

 Tell me a joke.
 Single Multi- Open
 Extended
 What is influenza? 68
 domain domain domain
 systems
 systems systems systems

 Dialogue breadth (coverage)
Dialogue Systems
 Task-Oriented Dialogue
MIULAB

 Understanding
 input x State tracker
 (NLU)
 Database
 DB Memory
 Generation External knowledge
 output y Dialog policy
 (NLG)
NTU
 Fully Data-Driven

 Understanding
 input x State tracker
 (NLU)

 output y Generation DB
 Dialog policy
 (NLG)
 69
MIULAB

 Chit-Chat Social Bots
NTU
Neural Response Generation (Sordoni et al., 2015; Vinyals & Le, 2015)

 Source: conversation history
MIULAB

 … because of your game? EOS Yeah I’m on my
NTU

 encoder decoder

 Target:
 Yeah I’m on my way
 response

 Learns to generate dialogues from offline data (no state, action, intent, slot, etc.)
Sci-Fi Short Film - SUNSPRING
 https://www.youtube.com/watch?v=LY7x2Ihqj
MIULAB
NTU

 72
Issue 1: Blandness Problem

 Wow sour starbursts really do make your mouth water... mm drool.
MIULAB

 Can I have one?
 Of course!
 Milan apparently selling Zlatan to balance the books... Where next, Madrid?
 I don’t know.
NTU

 ‘tis a fine brew on a day like this! Strong 32% responses
 though, howare general
 many and meaningless
 is sensible?
 “I don’t know”
 I'm not sure yet, I'll let “Iyou
 don’t know what
 know ! you are talking about”
 “I don’t think that is a good idea”
 “Oh my god”
 Well he was on in Bromley a while ago... still touring.
 I don't even know what he's talking about.
Mutual Information for Neural Generation (Li et al., 2016)
 • Mutual information objective
MIULAB
NTU

 standard anti-LM
 likelihood
MMI for Response Diversity (Li et al., 2016)

 Wow sour starbursts really do make your mouth water... mm drool.
MIULAB

 Can I have one?
 Of course you can! They’re delicious!
 Milan apparently selling Zlatan to balance the books... Where next, Madrid?
 I think he'd be a good signing.
NTU

 ‘tis a fine brew on a day like this! Strong though, how many is sensible?
 Depends on how much you drink!
 Well he was on in Bromley a while ago... still touring.
 I’ve never seen him live.
MMI for Response Diversity (Li et al., 2016)

 Wow sour starbursts really do make your mouth water... mm drool.
MIULAB

 Can I have one?
 Of course you can! They’re delicious!
 Milan apparently selling Zlatan to balance the books... Where next, Madrid?
 I think he'd be a good signing.
NTU

 ‘tis a fine brew on a day like this! Strong though, how many is sensible?
 Depends on how much you drink!
 Well he was on in Bromley a while ago... still touring.
 I’ve never seen him live.
Real-World Conversations
  Multimodality
 • Conversation history
MIULAB

 • Persona
 • User profile data
 (bio, social graph, etc.)
 • Visual signal
 (camera, picture etc.)
NTU

 • Knowledge base …
 Because of your game? EOS Yeah I’m
 • Mood
 • Geolocation
 • Time

 77
MIULAB
NTU
 Issue 2: Response Inconsistency

 78
Personalized Response Generation (Li et al., 2016)

 D_Gomes25 Jinnmeow3
 Speaker embeddings (70k)

 u.s. london
MIULAB

 Word embeddings (50k)
 skinnyoflynny2 england
 TheCharlieZ
 great
 Rob_712
 Dreamswalls good
 Tomcoatez
 Bob_Kelly2
 Kush_322 monday live okay
 kierongillen5
 This_Is_Artful tuesday
 DigitalDan285
 stay
 The_Football_Bar
NTU

 where do you live Rob EOS Rob in Rob england Rob .

 in england . EOS
Persona Model for Speaker Consistency (Li et al., 2016)

 Baseline model → inconsistency Persona model using speaker embedding → consistency
MIULAB
NTU
Issue 3: Dialogue-Level Optimization via RL

 Language Collect rewards
MIULAB

 User input (o)
 understanding Dialogue ( , , , ’)
 Manager
 = ( )
 Language Optimize
 Response (response)
 ( , )
 generation
NTU

 Application State Action Reward
 Task Completion Bots User input + Context Dialog act + slot-value Task success rate
 81
 (Movies, Restaurants, …) # of turns
 Info Bots Question + Context Clarification questions, Relevance of answer
 (Q&A bot over KB, Web etc.) Answers # of turns
 Social Bot Conversation history Response Engagement(?)
 (XiaoIce)
Deep RL for Response Generation (Li et al., 2016)

 Input message Supervised Learning Agent Reinforcement Learning Agent
MIULAB
NTU

 • RL agent generates more interactive responses
 • RL agent tends to end a sentence with a question and hand the
 conversation over to the user
MIULAB
 Issue 4: No Grounding (Sordoni et al., 2015; Li et al., 2016)

 Social Chat Task-Oriented
 Engaging, Human-Like Interaction Task Completion, Decision Support
 (Ungrounded) (Grounded)

 The weather is so depressing these days.
NTU

 I know, I dislike rain too.
 What about a day trip to eastern Washington?83

 Any recommendation?

 Try Dry Falls, it’s spectacular!
 83
Knowledge-Grounded Responses (Ghazvininejad et al., 2017)

 Dialogue
 Going to Kusakabe tonight Encoder
 Σ Decoder Try omakase, the best in town
MIULAB

 Conversation History Response

 Fact Encoder

 Consistently the best omakase
NTU

 Amazing sushi tasting […]

 ... A They were out of kaisui […]
 ...
 World “Facts” Contextually-Relevant “Facts”
MIULAB
 Conversation and Non-Conversation Data

 You know any good A restaurant in B?

 Try C, one of the best D in the city.
NTU

 Conversation Data

 Knowledge85 Resource

 You know any good Japanese restaurant in Seattle?

 Try Kisaku, one of the best sushi restaurants in the city.
MIULAB
NTU
 Knowledge-Grounded Responses (Ghazvininejad et al., 2017)

 86

 Results (23M conversations) outperforms competitive neural baseline (human + automatic eval)
MIULAB
 Evolution Roadmap
 Dialogue depth (complexity)

 I feel sad…
 Empathetic systems

 I’ve got a cold what do I do?
 Common sense system
NTU

 Tell me a joke.
 What is influenza? 87

 Knowledge based system

 Dialogue breadth (coverage)
Multimodality & Personalization (Chen et al., 2018)
 • Task: user intent prediction
 • Challenge: language ambiguity
MIULAB

 send to vivian v.s.
 Communication
 Email? Message?
 User preference
NTU

 ✓ Some people prefer “Message” to “Email”
 ✓ Some people prefer “Ping” to “Text”
 App-level contexts
 ✓ “Message” is more likely to follow “Camera”
 ✓ “Email” is more likely to follow “Excel”
 Behavioral patterns in history helps intent prediction.
High-Level Intention Learning (Sun et al., 2016; Sun et al., 2016)
 • High-level intention may span several domains
MIULAB

 Schedule a lunch with Vivian.

 find restaurant check location contact
NTU

 What kind of restaurants do you prefer?
 The distance is …
 Should I send the restaurant information to Vivian?

 Users interact via high-level descriptions and the system learns how to plan the dialogues
Empathy in Dialogue System (Fung et al., 2016)
 • Embed an empathy module
 • Recognize emotion using multimodality
MIULAB

 • Generate emotion-aware responses

 text
NTU

 speech 90

 vision
 Emotion Recognizer
Cognitive Behavioral Therapy (CBT)

 Mood Tracking Content Providing
MIULAB

 Pattern Mining Always Be There
NTU

 Depression Reduction Know You Well
MIULAB

 Challenges & Conclusions
NTU
Challenge Summary

 The human-machine interface is a hot topic but several components must be integrated!
MIULAB

 Most state-of-the-art technologies are based on DNN
 • Requires huge amounts of labeled data
 • Several frameworks/models are available

 Fast domain adaptation with scarse data + re-use of rules/knowledge
NTU

 Handling reasoning and personalization

 93
 Data collection and analysis from un-structured data

 Complex-cascade systems require high accuracy for working good as a whole
MIULAB
NTU

 Her (2013)
 What can machines achieve now or in the future?
MIULAB

 Yun-Nung (Vivian) Chen
 Assistant Professor, National Taiwan University
NTU

 y.v.chen@ieee.org / http://vivianchen.idv.tw
You can also read