NLP APPLICATIONS III: DIALOGUE SYSTEMS - Yun-Nung Vivian Chen Athens NLP Summer School
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
MIULAB Language Empowering Intelligent Assistants Apple Siri (2011) Google Now (2012) Microsoft Cortana (2014) NTU Google Assistant (2016) Amazon Alexa/Echo (2014) Google Home (2016) Apple HomePod (2017) Facebook Portal (2019)
Why Natural Language? • Global Digital Statistics (2018 January) MIULAB NTU Active Social Unique Mobile Active Mobile Total Population Internet Users Media Users Users Social Users 7.59B 4.02B 3.20B 5.14B 2.96B The more natural and convenient input of devices evolves towards speech.
Why and When We Need? “I want to chat” Turing Test (talk like a human) Social Chit-Chat “I have a question” Information consumption MIULAB Task-Oriented “I need to get this done” Task completion Dialogues “What should I do?” Decision support • What is today’s agenda? NTU • What does NLP stand for? • Book me the train ticket from Kaohsiung to Taipei • Reserve a table at Din Tai Fung for 5 people, 7PM tonight 5 • Schedule a meeting with Vivian at 10:00 tomorrow • Is this summer school good to attend?
App → Bot • A bot is responsible for a “single” domain, similar to an app MIULAB NTU 7 Users can initiate dialogues instead of following the GUI design
Task-Oriented Dialogues MIULAB NTU JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
Task-Oriented Dialogue Systems (Young, 2000) Speech Signal Hypothesis are there any action movies to MIULAB see this weekend Language Understanding (LU) Speech • Domain Identification Recognition • User Intent Detection Text Input • Slot Filling Are there any action movies to see this weekend? Semantic Frame request_movie NTU genre=action, date=this weekend Dialogue Management 10 (DM) Natural Language • Dialogue State Tracking (DST) Text response Generation (NLG) • Dialogue Policy Where are you located? System Action/Policy request_location Backend Database/ Knowledge Providers
Task-Oriented Dialogue Systems (Young, 2000) Speech Signal Hypothesis are there any action movies to MIULAB see this weekend Language Understanding (LU) Speech • Domain Identification Recognition • User Intent Detection Text Input • Slot Filling Are there any action movies to see this weekend? Semantic Frame request_movie NTU genre=action, date=this weekend Dialogue Management 11 (DM) Natural Language • Dialogue State Tracking (DST) Text response Generation (NLG) • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers
Language Understanding (LU) • Pipelined MIULAB 1. Domain 2. Intent NTU 3. Slot Filling Classification Classification 12
1. Domain Identification Requires Predefined Domain Ontology User MIULAB find a good eating place for taiwanese food NTU Restaurant DB Taxi DB Movie DB 13 Intelligent Organized Domain Knowledge (Database) Agent Classification!
2. Intent Detection Requires Predefined Schema User MIULAB find a good eating place for taiwanese food FIND_RESTAURANT NTU Restaurant DB FIND_PRICE FIND_TYPE 14 : Intelligent Agent Classification!
3. Slot Filling Requires Predefined Schema O O B-rating O O O B-type O User MIULAB find a good eating place for taiwanese food Restaurant Rating Type Rest 1 good Taiwanese Rest 2 bad Thai NTU Restaurant DB : : : 15 FIND_RESTAURANT SELECT restaurant { Intelligent rating=“good” rest.rating=“good” Agent type=“taiwanese” rest.type=“taiwanese” Semantic Frame } Sequence Labeling
Slot Tagging (Yao et al, 2013; Mesnil et al, 2015) • Variations: http://131.107.65.14/en-us/um/people/gzweig/Pubs/Interspeech2013RNNLU.pdf; http://dl.acm.org/citation.cfm?id=2876380 a. RNNs with LSTM cells MIULAB b. Input, sliding window of n-grams c. Bi-directional LSTMs 0 1 2 NTU 0 1 2 0 1 2 ℎ0 ℎ1 ℎ2 ℎ ℎ0 ℎ1 ℎ2 ℎ ℎ0 ℎ1 ℎ2 ℎ ℎ0 ℎ1 ℎ2 ℎ 0 1 2 0 1 2 0 1 2 (a) LSTM (b) LSTM-LA (c) bLSTM
Slot Tagging (Kurata et al., 2016; Simonnet et al., 2015) • Encoder-decoder networks http://www.aclweb.org/anthology/D16-1223 • Leverages sentence level information 0 1 2 MIULAB ℎ ℎ2 ℎ1 ℎ0 2 1 0 0 1 2 • Attention-based encoder-decoder • Use of attention (as in MT) in the encoder-decoder network NTU • Attention is estimated using a feed-forward network with input: ht and st at time t 0 1 2 0 1 2 ℎ0 ℎ1 ℎ2 ℎ 0 1 2 ci ℎ0 …ℎ
Joint Semantic Frame Parsing • Slot filling and • Intent prediction intent prediction and slot filling are MIULAB Sequence- in the same Parallel (Liu performed in two based (Hakkani- branches Tur et al., 2016) output sequence and Lane, 2016) NTU taiwanese food please EOS U U U U ht-1 ht ht+1 hT+1 W W W W V V V V B-type O O FIND_REST Slot Filling Intent Prediction
MIULAB Joint Model Comparison Attention Intent-Slot Mechanism Relationship Joint bi-LSTM X Δ (Implicit) Attentional Encoder-Decoder √ Δ (Implicit) NTU Slot Gate Joint Model √ √ (Explicit) 19
Slot-Gated Joint SLU (Goo+, 2018) Slot 1 2 3 4 MIULAB Sequence Intent Attention Slot Gate Slot Attention BLSTM tanh Word 1 NTU 2 3 4 BLSTM Sequence Word Slot Gate 1 2 3 4 Sequence = ∑ ∙ tanh + ∙ Slot Prediction = ℎ + ∙ + will be larger if slot and intent are better related
Contextual Language Understanding • User utterances are highly ambiguous in isolation MIULAB Restaurant Book a table for 10 people tonight. Booking Which restaurant would you like to book a table for? NTU Cascal, for 6. ? #people time
End-to-End Memory Networks (Sukhbaatar et al, 2015) U: “i d like to purchase tickets to see deepwater horizon” m0 MIULAB S: “for which theatre” U: “angelika” S: “you want them for angelika theatre?” U: “yes angelika” mi S: “how many tickets would you like ?” NTU U: “3 tickets for saturday” S: “What time would you like ?” U: “Any time on saturday is fine” mn-1 S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm” U: “Let’s do 5:40” u
E2E MemNN for Contextual LU (Chen+, 2016) https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/IS16_ContextualSLU.pdf 1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding MIULAB pi Knowledge Attention Distribution slot tagging sequence y RNN Tagger Contextual yt-1 yt Sentence Encoder V V mi Weighted RNNmem Sum ht-1 ht h W W W NTU x1 x2 … xi Memory Representation M U M U Sentence Inner wt-1 wt Encoder Product RNNin ∑ Wkg 23 x1 … x2 xi Knowledge Encoding c u Representation o history utterances {xi} current utterance Idea: additionally incorporating contextual knowledge during slot tagging → track dialogue states in a latent way
E2E MemNN for Contextual LU (Chen et al., 2016) U: “i d like to purchase tickets to see deepwater horizon” 0.69 MIULAB S: “for which theatre” U: “angelika” S: “you want them for angelika theatre?” U: “yes angelika” S: “how many tickets would you like ?” 0.13 NTU U: “3 tickets for saturday” S: “What time would you like ?” U: “Any time on saturday is fine” S: “okay , there is 4:10 pm , 5:40 pm and 9:20 pm” 0.16 U: “Let’s do 5:40”
Recent Advances in NLP • Contextual Embeddings (ELMo & BERT) • Boost many understanding performance MIULAB with pre-trained natural language ? NTU
NTU MIULAB
NTU MIULAB
Task-Oriented Dialogue Systems (Young, 2000) Speech Signal Hypothesis are there any action movies to MIULAB see this weekend Language Understanding (LU) Speech • Domain Identification Recognition • User Intent Detection Text Input • Slot Filling Are there any action movies to see this weekend? Semantic Frame request_movie NTU genre=action, date=this weekend Dialogue Management 28 (DM) Natural Language • Dialogue State Tracking (DST) Text response Generation (NLG) • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers
Mismatch between Written and Spoken Languages Training Testing MIULAB • Written language • Spoken language • Include recognition errors • Goal: ASR-Robust Contextualized Embeddings NTU ✓learning contextualized word embeddings specifically for spoken language ✓achieves better performance on spoken language understanding tasks
Adapting Transformer to ASR Lattices (Huang and Chen, 2019) • Idea: lattices may include correct words Linear • Goal: feed lattices into Transformer Transformer Encoder MIULAB 1 ... 1 2 −1 air fair 0.4 1 1 0.3 1 1 1 cheapest airfare to Milwaukee 0.3 NTU 1 affair
ASR-Robust Contextualized Embeddings • Confusion-Aware Fine-Tuning • Supervised MIULAB NTU • Unsupervised
LU Evaluation • Metrics • Sub-sentence-level: intent accuracy, intent F1, slot F1 MIULAB • Sentence-level: whole frame accuracy NTU 32
Task-Oriented Dialogue Systems (Young, 2000) Speech Signal Hypothesis are there any action movies to MIULAB see this weekend Language Understanding (LU) Speech • Domain Identification Recognition • User Intent Detection Text Input • Slot Filling Are there any action movies to see this weekend? Semantic Frame request_movie NTU genre=action, date=this weekend Dialogue Management 33 (DM) Natural Language • Dialogue State Tracking (DST) Text response Generation (NLG) • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers
MIULAB Dialogue State Tracking request (restaurant; foodtype=Thai) inform (area=centre) NTU 34 request (address) bye ()
Dialogue State Tracking Requires Hand-Crafted States User MIULAB find a good eating place for taiwanese food i want it near to my office NTU NULL location rating type 35 Intelligent loc, rating rating, type loc, type Agent all
Dialogue State Tracking Requires Hand-Crafted States User MIULAB find a good eating place for taiwanese food i want it near to my office NTU NULL location rating type 36 Intelligent loc, rating rating, type loc, type Agent all
Dialogue State Tracking Handling Errors and Confidence User MIULAB find a good eating place for taixxxx food FIND_RESTAURANT FIND_RESTAURANT FIND_RESTAURANT rating=“good” rating=“good” rating=“good” type=“taiwanese” type=“thai” rating=“good”, ? NTU NULL type=“thai” ? rating=“good”, type ? location rating 37 type=“taiwanese” Intelligent ? loc, rating rating, type loc, type Agent all
Dialogue State Tracking (DST) • Maintain a probabilistic distribution instead of a 1-best prediction for better robustness to SLU errors or ambiguous input MIULAB Slot Value # people 5 (0.5) NTU time 5 (0.5) How can I help you? Book a table at 38 Sumiko for 5 Slot Value How many people? # people 3 (0.8) 3 time 5 (0.8)
Multi-Domain Dialogue State Tracking • A full representation of the system's belief of the user's goal at any point during the dialogue MIULAB • Used for making API calls I wanna buy two tickets for tonight at the Shoreline theater. Which movie are you interested in? Inferno. NTU Movies Date 11/15/17 8 pm 9 pm 39 Time 6 pm 7 pm #People 2 Century 16 Theater Shoreline Movie Inferno Less More Likely Likely
Multi-Domain Dialogue State Tracking • A full representation of the system's belief of the user's goal at any point during the dialogue MIULAB • Used for making API calls I wanna buy two tickets for tonight at the Shoreline theater. Which movie are you interested in? Inferno. NTU Inferno showtimes at Movies Restaurants Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm 11/15/17 Date 11/15/17 Date and 9:45pm. What time do you prefer? 40 Time 6:30 pm 7:30 pm 8:45 pm 9:45 pm Time 6:00 pm 6:30 pm 7:00 pm #People Restaurant Cascal We'd like to eat dinner 2 before the movie at Theater Century 16 #People 2 Cascal, can you check Shoreline what time i can get a Movie Inferno table? Less More Likely Likely
Multi-Domain Dialogue State Tracking • A full representation of the system's belief of the user's goal at any point during the dialogue MIULAB • Used for making API calls Inferno. Inferno showtimes at Century 16 Shoreline are 6:30pm, 7:30pm, 8:45pm and 9:45pm. What time do you prefer? We'd like to eat dinner NTU before the movie at Movies Restaurants Cascal, can you check what time i can get a 11/15/17 Date 11/15/17 Date table? 41 Time 6:30 pm 7:30 pm 8:45 pm 9:45 pm Time 6:00 pm 6:30 pm 7:00 pm Cascal has a table for 2 at 6pm and 7:30pm. #People 2 Restaurant Cascal OK, let me get the Theater Century 16 #People 2 table at 6 and tickets Shoreline for the 7:30 showing. Movie Inferno Less More Likely Likely
MIULAB DNN for DST feature DNN extraction A slot value distribution NTU for each slot state of this turn 42 multi-turn conversation
RNN-CNN DST (Mrkšić+, 2015) http://www.anthology.aclweb.org/W/W13/W13-4073.pdf; https://arxiv.org/abs/1506.07190; https://arxiv.org/abs/1606.03777 MIULAB NTU 43 (Figure from Wen et al, 2016)
Neural Belief Tracker (Mrkšić+, 2016) • Candidate pairs are considered https://arxiv.org/abs/1606.03777 MIULAB NTU Previous Belief State: [bt-1] Belief State Updates: [bt]
Global-Locally Self-Attentive DST (Zhong+, 2018) • More advanced encoder http://www.aclweb.org/anthology/P18-1135 • Global modules share parameters for all slots • Local modules learn slot-specific feature representations MIULAB NTU 45
Dialog State Tracking Challenge (DSTC) (Williams et al. 2013, Henderson et al. 2014, Henderson et al. 2014, Kim et al. 2016, Kim et al. 2016) MIULAB Challenge Type Domain Data Provider Main Theme DSTC1 Human-Machine Bus Route CMU Evaluation Metrics DSTC2 Human-Machine Restaurant U. Cambridge User Goal Changes DSTC3 Human-Machine Tourist Information U. Cambridge Domain Adaptation DSTC4 Human-Human Tourist Information I2R Human Conversation NTU DSTC5 Human-Human Tourist Information I2R Language Adaptation
DSTC4-5 • Type: Human-Human {Topic: Accommodation; Type: Hostel; Pricerange: Cheap; GuideAct: ACK; TouristAct: REQ} • Domain: Tourist Information MIULAB Tourist: Can you give me some uh- tell me some cheap rate {Topic: Accommodation; NAME: InnCrowd hotels, because I'm planning just to leave my bags there Backpackers Hostel; GuideAct: REC; TouristAct: ACK} and go somewhere take some pictures. Guide: Let's try this one, okay? Guide: Okay. I'm going to recommend firstly you want to have a backpack type of hotel, right? Tourist: Okay. Tourist: Yes. I'm just gonna bring my backpack and my buddy with Guide: It's InnCrowd Backpackers Hostel in Singapore. If me. So I'm kinda looking for a hotel that is not that you take a dorm bed per person only twenty NTU expensive. Just gonna leave our things there and, you dollars. If you take a room, it's two single beds at know, stay out the whole day. fifty nine dollars. Guide: Okay. Let me get you hm hm. So you don't mind if it's a bit Tourist: Um. Wow, that's good. uh not so roomy like 47 hotel because you just back to sleep. Guide: Yah, the prices are based on per person per bed Tourist: Yes. Yes. As we just gonna put our things there and then or dorm. But this one is room. So it should be go out to take some pictures. fifty nine for the two room. So you're actually paying about ten dollars more per person only. Guide: Okay, um- Tourist: Hm. Tourist: Oh okay. That's- the price is reasonable actually. It's good.
DST Evaluation • Metric • Tracked state accuracy with respect to user goal MIULAB • Recall/Precision/F-measure individual slots NTU 48
Dialogue Policy Optimization greeting () MIULAB request (restaurant; foodtype=Thai) request (area) inform (area=centre) NTU inform (restaurant=Bangkok city, area=centre of town, foodtype=Thai) 49 request (address) inform (address=24 Green street) bye ()
Supervised v.s. Reinforcement • Supervised “Hello” Say “Hi” MIULAB Learning from teacher “Bye bye” Say “Good bye” • Reinforcement OXX???! NTU ……. ……. …… 50 Hello ☺ …… Bad Learning from critics
Dialogue Policy Optimization • Dialogue management in a RL framework MIULAB User Environment Natural Language Generation Language Understanding NTU Action A Reward R Observation O 51 Dialogue Manager Agent Select the best action that maximizes the future reward
Reward for RL ≅ Evaluation for System • Dialogue is a special RL task • Human involves in interaction and rating (evaluation) of a dialogue MIULAB • Fully human-in-the-loop framework • Rating: correctness, appropriateness, and adequacy NTU - Expert rating high quality, high cost - User rating unreliable quality, 52 medium cost - Objective rating Check desired aspects, low cost
E2E Task-Completion Bot (TC-Bot) (Li et al., 2017) wi wi+1 EOS MIULAB wi wi+1 EOS wi wi+1 EOS NLG w Natural Language LU w0 w1 2 EOS O O Are there any action movies O to see this weekend? DST Knowledge User Goal Database NTU Dialogue Policy Learning Dialogue Policy User Model 1 2 …… 53 Neural User Simulation request_location Dialogue 1 2 … System Idea: SL for each component and RL for end-to-end training the neural dialogue system
Reinforcement Learning Agent (Li et al., 2017) User goal – Two tickets for “Deadpool” tomorrow 9PM at AMC Pacific Place 11 theater, Seattle. RULE BASED SYSTEM REINFORCEMENT LEARNING SYSTEM User: Do you have any ticket for movie ‘Deadpool’ at 9:00 pm? User: I want two tickets for movie ‘Deadpool’. MIULAB Agent: Which theater do you want? Agent: What is the starttime you prefer? User: AMC pacific place 11 theater, please! User: 9:00 PM Agent: How many tickets do you want? Agent: Which day? User: Two! User: Tomorrow Agent: What date? Agent: Which city? User: Tomorrow! User: Seattle Agent: Which city do you want? Agent: Which theater do you prefer? User: Seattle! User: AMC Pacific Place 11 theater. NTU Agent: Do you have any specification for zip code? Agent: Okay, I will issue 2 tickets for you, tomorrow User: No, I do not care. 9:00 pm at AMC pacific place 11 theater, Seattle, movie Agent: Do you have any distance constraint? ‘Deadpool’. 54 User: No, I do not care. Issue: no notion about User: Okay, thanks. Agent: How about video format? 2d? Imax 3d? what requests can be Agent: Thanks! User: Any kind is okay. skipped Agent: Do you have any constraint to price? User: No! Agent: Okay, I will issue 2 tickets for you, tomorrow 9:00 pm at AMC pacific place 11 theater, Seattle, movie ‘Deadpool’. User: Okay, thanks. Agent: Thanks! Skip the requests the user may not care about to improve efficiency
Planning – Deep Dyna-Q (Peng+, 2018) • Issues: sample-inefficient, discrepancy between simulator & real user • Idea: learning with real users with planning MIULAB Human Conversational Data Supervised Imitation Learning Learning Policy Acting NTU Planning Model World Direct User 55 Model Reinforcement Learning World Model Learning Real Experience Policy learning suffers from the poor quality of fake experiences
Robust Planning – D3Q (Su+, 2018) • Idea: add a discriminator to filter out the bad experiences (to appear) EMNLP 2018 MIULAB Real Semantic Experience Frame NLU DST Human Imitation Simulated Conversational Data Learning Experience Supervised Learning Controlled Planning Discriminator State Representation Policy Acting Discriminator NTU Model User World Discriminative Direct World Model Reinforcement Model Training User 56 Policy Learning NLG Learning Real System Action (Policy) World Model Experience Learning
Robust Planning – D3Q (Su+, 2018) (to appear) EMNLP 2018 MIULAB NTU S.-Y. Su, X. Li, J. Gao, J. Liu, and Y.-N. Chen, “Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning," (to appear) in Proc. of EMNLP, 2018. The policy learning is more robust and shows the improvement in human evaluation
Dialogue Management Evaluation • Metrics • Turn-level evaluation: system action accuracy MIULAB • Dialogue-level evaluation: task success rate, reward NTU 58
RL-Based DM Challenge • SLT 2018 Microsoft Dialogue Challenge: End-to-End Task-Completion Dialogue Systems MIULAB • Domain 1: Movie-ticket booking • Domain 2: Restaurant reservation • Domain 3: Taxi ordering NTU
Task-Oriented Dialogue Systems (Young, 2000) Speech Signal Hypothesis are there any action movies to MIULAB see this weekend Language Understanding (LU) Speech • Domain Identification Recognition • User Intent Detection Text Input • Slot Filling Are there any action movies to see this weekend? Semantic Frame request_movie NTU genre=action, date=this weekend Dialogue Management (DM) Natural Language • Dialogue State Tracking (DST) Text response Generation (NLG) • Dialogue Policy Where are you located? System Action/Policy request_location Backend Action / Knowledge Providers
Natural Language Generation (NLG) • Mapping dialogue acts into natural language MIULAB inform(name=Seven_Days, foodtype=Chinese) NTU Seven Days is a nice Chinese restaurant 61
Template-Based NLG • Define a set of rules to map frames to natural language MIULAB Semantic Frame Natural Language confirm() “Please tell me more about the product your are looking for.” confirm(area=$V) “Do you want somewhere in the $V?” confirm(food=$V) “Do you want a $V restaurant?” confirm(food=$V,area=$W) “Do you want a $V restaurant in the $W.” NTU Pros: simple, error-free, easy to control 62 Cons: time-consuming, rigid, poor scalability
RNN-Based LM NLG (Wen et al., 2015) Input Inform(name=Din Tai Fung, food=Taiwanese) dialogue act 1-hot representation MIULAB 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, 0, 0, 0… SLOT_NAME serves SLOT_FOOD . conditioned on the dialogue act NTU SLOT_NAME serves SLOT_FOOD . Output Din Tai Fung serves Taiwanese . delexicalisation
Semantic Conditioned LSTM (Wen et al., 2015) • Issue: semantic repetition • Din Tai Fung is a great Taiwanese xt ht-1 xt ht-1 xt ht-1 MIULAB restaurant that serves Taiwanese. • Din Tai Fung is a child friendly ht-1 ft restaurant, and also allows kids. xt ht it Ct ot LSTM cell NTU DA cell dt-1 dt rt 64 d0 xt ht-1 dialog act 1-hot 0, 0, 1, 0, 0, …, 1, 0, 0, …, 1, 0, 0, … representation Inform(name=Seven_Days, food=Chinese) Idea: using gate mechanism to control the generated semantics (dialogue act/slots)
Issues in NLG • Issue • NLG tends to generate shorter sentences MIULAB • NLG may generate grammatically-incorrect sentences • Solution • Generate word patterns in a order • Consider linguistic patterns NTU 65
Hierarchical NLG w/ Linguistic Patterns (Su et al., 2018) GRU Decoder Near All Bar One is a moderately priced Italian place it is … is a moderately … called Midsummer House 1. Repeat-input MIULAB 2. Inner-Layer Teacher Forcing 3. Inter-Layer Teacher Forcing DECODING LAYER4 4. Others 4. Curriculum Learning All Bar One is moderately priced Italian place it is called last output − …All Bar One is a … Midsummer House output from last layer − … All Bar One is moderately… DECODING LAYER3 3. ADJ + ADV Bidirectional GRU NTU Encoder All Bar One is priced place it is called Midsummer House 66 name … Italian priceRange… DECODING LAYER2 2. VERB Semantic 1-hot [ … 1, 0, 0, 1, 0, …] All Bar One place it Midsummer House Representation enc Input name[Midsummer House], food[Italian], ENCODER DECODING LAYER1 1. NOUN + PROPN + PRON Semantics priceRange[moderate], near[All Bar One] Hierarchical Decoder
NLG Evaluation • Metrics • Subjective: human judgement (Stent+, 2005) MIULAB • Adequacy: correct meaning • Fluency: linguistic fluency • Readability: fluency in the dialogue context • Variation: multiple realizations for the same concept • Objective: automatic metrics • Word overlap: BLEU (Papineni+, 2002), METEOR, ROUGE NTU • Word embedding based: vector extrema, greedy matching, embedding average 67 There is a gap between human perception and automatic metrics
MIULAB Evolution Roadmap Dialogue depth (complexity) I feel sad… I’ve got a cold what do I do? NTU Tell me a joke. Single Multi- Open Extended What is influenza? 68 domain domain domain systems systems systems systems Dialogue breadth (coverage)
Dialogue Systems Task-Oriented Dialogue MIULAB Understanding input x State tracker (NLU) Database DB Memory Generation External knowledge output y Dialog policy (NLG) NTU Fully Data-Driven Understanding input x State tracker (NLU) output y Generation DB Dialog policy (NLG) 69
MIULAB Chit-Chat Social Bots NTU
Neural Response Generation (Sordoni et al., 2015; Vinyals & Le, 2015) Source: conversation history MIULAB … because of your game? EOS Yeah I’m on my NTU encoder decoder Target: Yeah I’m on my way response Learns to generate dialogues from offline data (no state, action, intent, slot, etc.)
Sci-Fi Short Film - SUNSPRING https://www.youtube.com/watch?v=LY7x2Ihqj MIULAB NTU 72
Issue 1: Blandness Problem Wow sour starbursts really do make your mouth water... mm drool. MIULAB Can I have one? Of course! Milan apparently selling Zlatan to balance the books... Where next, Madrid? I don’t know. NTU ‘tis a fine brew on a day like this! Strong 32% responses though, howare general many and meaningless is sensible? “I don’t know” I'm not sure yet, I'll let “Iyou don’t know what know ! you are talking about” “I don’t think that is a good idea” “Oh my god” Well he was on in Bromley a while ago... still touring. I don't even know what he's talking about.
Mutual Information for Neural Generation (Li et al., 2016) • Mutual information objective MIULAB NTU standard anti-LM likelihood
MMI for Response Diversity (Li et al., 2016) Wow sour starbursts really do make your mouth water... mm drool. MIULAB Can I have one? Of course you can! They’re delicious! Milan apparently selling Zlatan to balance the books... Where next, Madrid? I think he'd be a good signing. NTU ‘tis a fine brew on a day like this! Strong though, how many is sensible? Depends on how much you drink! Well he was on in Bromley a while ago... still touring. I’ve never seen him live.
MMI for Response Diversity (Li et al., 2016) Wow sour starbursts really do make your mouth water... mm drool. MIULAB Can I have one? Of course you can! They’re delicious! Milan apparently selling Zlatan to balance the books... Where next, Madrid? I think he'd be a good signing. NTU ‘tis a fine brew on a day like this! Strong though, how many is sensible? Depends on how much you drink! Well he was on in Bromley a while ago... still touring. I’ve never seen him live.
Real-World Conversations Multimodality • Conversation history MIULAB • Persona • User profile data (bio, social graph, etc.) • Visual signal (camera, picture etc.) NTU • Knowledge base … Because of your game? EOS Yeah I’m • Mood • Geolocation • Time 77
MIULAB NTU Issue 2: Response Inconsistency 78
Personalized Response Generation (Li et al., 2016) D_Gomes25 Jinnmeow3 Speaker embeddings (70k) u.s. london MIULAB Word embeddings (50k) skinnyoflynny2 england TheCharlieZ great Rob_712 Dreamswalls good Tomcoatez Bob_Kelly2 Kush_322 monday live okay kierongillen5 This_Is_Artful tuesday DigitalDan285 stay The_Football_Bar NTU where do you live Rob EOS Rob in Rob england Rob . in england . EOS
Persona Model for Speaker Consistency (Li et al., 2016) Baseline model → inconsistency Persona model using speaker embedding → consistency MIULAB NTU
Issue 3: Dialogue-Level Optimization via RL Language Collect rewards MIULAB User input (o) understanding Dialogue ( , , , ’) Manager = ( ) Language Optimize Response (response) ( , ) generation NTU Application State Action Reward Task Completion Bots User input + Context Dialog act + slot-value Task success rate 81 (Movies, Restaurants, …) # of turns Info Bots Question + Context Clarification questions, Relevance of answer (Q&A bot over KB, Web etc.) Answers # of turns Social Bot Conversation history Response Engagement(?) (XiaoIce)
Deep RL for Response Generation (Li et al., 2016) Input message Supervised Learning Agent Reinforcement Learning Agent MIULAB NTU • RL agent generates more interactive responses • RL agent tends to end a sentence with a question and hand the conversation over to the user
MIULAB Issue 4: No Grounding (Sordoni et al., 2015; Li et al., 2016) Social Chat Task-Oriented Engaging, Human-Like Interaction Task Completion, Decision Support (Ungrounded) (Grounded) The weather is so depressing these days. NTU I know, I dislike rain too. What about a day trip to eastern Washington?83 Any recommendation? Try Dry Falls, it’s spectacular! 83
Knowledge-Grounded Responses (Ghazvininejad et al., 2017) Dialogue Going to Kusakabe tonight Encoder Σ Decoder Try omakase, the best in town MIULAB Conversation History Response Fact Encoder Consistently the best omakase NTU Amazing sushi tasting […] ... A They were out of kaisui […] ... World “Facts” Contextually-Relevant “Facts”
MIULAB Conversation and Non-Conversation Data You know any good A restaurant in B? Try C, one of the best D in the city. NTU Conversation Data Knowledge85 Resource You know any good Japanese restaurant in Seattle? Try Kisaku, one of the best sushi restaurants in the city.
MIULAB NTU Knowledge-Grounded Responses (Ghazvininejad et al., 2017) 86 Results (23M conversations) outperforms competitive neural baseline (human + automatic eval)
MIULAB Evolution Roadmap Dialogue depth (complexity) I feel sad… Empathetic systems I’ve got a cold what do I do? Common sense system NTU Tell me a joke. What is influenza? 87 Knowledge based system Dialogue breadth (coverage)
Multimodality & Personalization (Chen et al., 2018) • Task: user intent prediction • Challenge: language ambiguity MIULAB send to vivian v.s. Communication Email? Message? User preference NTU ✓ Some people prefer “Message” to “Email” ✓ Some people prefer “Ping” to “Text” App-level contexts ✓ “Message” is more likely to follow “Camera” ✓ “Email” is more likely to follow “Excel” Behavioral patterns in history helps intent prediction.
High-Level Intention Learning (Sun et al., 2016; Sun et al., 2016) • High-level intention may span several domains MIULAB Schedule a lunch with Vivian. find restaurant check location contact NTU What kind of restaurants do you prefer? The distance is … Should I send the restaurant information to Vivian? Users interact via high-level descriptions and the system learns how to plan the dialogues
Empathy in Dialogue System (Fung et al., 2016) • Embed an empathy module • Recognize emotion using multimodality MIULAB • Generate emotion-aware responses text NTU speech 90 vision Emotion Recognizer
Cognitive Behavioral Therapy (CBT) Mood Tracking Content Providing MIULAB Pattern Mining Always Be There NTU Depression Reduction Know You Well
MIULAB Challenges & Conclusions NTU
Challenge Summary The human-machine interface is a hot topic but several components must be integrated! MIULAB Most state-of-the-art technologies are based on DNN • Requires huge amounts of labeled data • Several frameworks/models are available Fast domain adaptation with scarse data + re-use of rules/knowledge NTU Handling reasoning and personalization 93 Data collection and analysis from un-structured data Complex-cascade systems require high accuracy for working good as a whole
MIULAB NTU Her (2013) What can machines achieve now or in the future?
MIULAB Yun-Nung (Vivian) Chen Assistant Professor, National Taiwan University NTU y.v.chen@ieee.org / http://vivianchen.idv.tw
You can also read