Generating Personalized Dialogue via Multi-Task Meta-Learning
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Generating Personalized Dialogue via Multi-Task Meta-Learning Lee Jing Yang1 , Lee Kong Aik2 , Gan Woon Seng3 School of Electrical and Electronic Engineering, Nanyang Technological University1,3 Institute for Infocomm Research, A*STAR2 jingyang001@e.ntu.edu.sg1 , lee kong aik@i2r.a-star.edu.sg2 , ewsgan@ntu.edu.sg3 Abstract To address these practical issues, Persona Agnos- tic Meta-Learning (PAML) (Madotto et al., 2019), Conventional approaches to personalized dia- a framework which aims to train a model capable logue generation typically require a large cor- pus, as well as predefined persona informa- of rapid adaptation to new unseen personas, was tion. However, in a real-world setting, nei- proposed. The PAML framework was based on the ther a large corpus of training data nor persona popular Model-Agnostic Meta-Learning (MAML) information are readily available. To address framework (Finn et al., 2017). The recently pro- these practical limitations, we propose a novel posed Customized Model Agnostic Meta-Learning multi-task meta-learning approach which in- (CMAML) (Song et al., 2020b) framework largely volves training a model to adapt to new per- follows the PAML framework, with the exception sonas without relying on a large corpus, or of an additional network structure optimization on any predefined persona information. In- stead, the model is tasked with generating component. Both the PAML and CMAML frame- personalized responses based on only the di- works were benchmarked on the PersonaChat cor- alogue context. Unlike prior work, our ap- pus (Zhang et al., 2018), a popular personalized proach leverages on the provided persona in- dialogue generation corpus which provides persona formation only during training via the intro- statements describing each interlocutor in addition duction of an auxiliary persona reconstruc- to the persona-specific dialogues. As it is unfeasi- tion task. In this paper, we introduce 2 ble to collect persona statements from interlocutors frameworks that adopt the proposed multi-task meta-learning approach: the Multi-Task Meta- in a real world setting, the PAML framework does Learning (MTML) framework, and the Alter- not utilize the available persona statements during nating Multi-Task Meta-Learning (AMTML) both meta-learning and inference. However, even framework. Experimental results show that uti- though it is impractical to utilize the persona state- lizing MTML and AMTML results in dialogue ments during inference, the persona statements can responses with greater persona consistency. be used during meta-learning to further improve model performance. 1 Introduction Hence, we introduce a novel multi-task meta- Personalized dialogue generation involves generat- learning approach which leverages predefined per- ing dialogue responses which incorporates the per- sona statements only during meta-learning via an sonality of the interlocutors, leading to more natu- additional persona reconstruction task. Essentially, ral and human-like dialogue. Thus far, approaches this task involves generating all corresponding per- to personalized dialogue generation typically re- sona statements in its entirety given the dialogue quire a large number of persona-specific dialogue context. We hypothesize that the introduction of examples. Certain approaches also require persona the persona reconstruction task would result in pa- information presented in the form of several pre- rameters capable of effectively inducing the per- defined persona statements (eg. ’I love dogs’, ’I sona information from the dialogue context, which am an engineering student.’). However, in a real- would lead to the generation of persona consistent world system, large amounts of persona-specific di- dialogue. The persona statements are not used dur- alogue are rarely available, and collecting descrip- ing inference. Prior usage of multi-task learning tive persona statements from every interlocutor is for personalized dialogue generation involved the intractable. addition of a persona classification task (Yang et al., Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
2021; Su et al., 2019a) and a distraction utterance involves reconstructing the persona statements as binary classification task (Na et al., 2021). To our well as generating the dialogue response given the knowledge, this is the first attempt at incorporating dialogue context. The primary task of generating persona statement reconstruction. the response and the corresponding loss function Our contributions include 2 multi-task meta- Lres can be expressed in the following equations: learning frameworks which leverage the persona reconstruction task only during training. The Multi- fφ (xt |x1:t−1 ) = p(xt |x1:t−1 ; φ) (1) Task Meta-Learning (MTML) framework, as well as a variant known as the Alternating Multi-Task T Meta-Learning (AMTML) framework. While both X MTML and AMTML involve the addition of a per- Lres (φ) = − log p(xt |x1:t−1 ; φ) (2) t=1 sona reconstruction task only during meta-learning, MTML involves combining the losses derived from where xt and x1:t−1 refer to the response and di- generating the response and reconstructing the per- alogue context respectively, and φ represents the sona. AMTML, on the other hand, functions by model parameters. We hypothesize that the persona constantly alternating between both tasks. Exper- reconstruction task would result in model param- imental results on the PersonaChat corpus reveal eters capable of inducing the persona information that utilizing MTML and AMTML result in re- from the dialogue context to a larger extent. This sponses which reflect the interlocutor’s persona to is due to the increased emphasis on persona consis- a larger extent compared to prior work. tency in the task of persona reconstruction. Also, rather than generating only selected keywords or 2 Methodology phrases, the persona reconstruction task involves generating all corresponding persona statements Our approach to personalized dialogue generation in its entirety given the dialogue context. This is involves extending the PAML framework (Madotto because generating complete sentences would also et al., 2019) by introducing a persona reconstruc- require the model to account for fluency in addition tion task only during the meta-learning stage. The to persona consistency, which is also a vital aspect PAML framework is essentially an adaptation of of dialogue generation. The persona statements the general MAML framework for personalized dia- P1:N are concatenated to form a single sequence logue generation. The MAML framework involves P. The auxiliary persona reconstruction task and learning a parameter initialization capable of gen- the corresponding loss function Lrec is formalized eralizing and adapting rapidly to new tasks unseen in the following expression: during the training process via gradient descent. Specifically, this involves obtaining the updated pa- P = concat(P1:N ) (3) rameters by adjusting the model parameters by uti- lizing data from several related tasks. The original parameters are then optimized based on the updated fφ (P|x1:t−1 ) = p(P|x1:t−1 ; φ) (4) parameters and a separate test set by computing the second order derivatives (Hessian matrix). In the PAML framework, each persona is viewed as a T X unique task. Consequently, the goal of the PAML Lrec (φ) = − log p(P|x1:t−1 ; φ) (5) framework is to obtain a parameter initialization t=1 capable of rapidly adapting to new personas. We During training, the persona reconstruction loss hypothesize that the introduction of the persona re- and the response generation loss will be weighted construction task would result in parameters which and summed. Hence, the total multi-task loss is induce the persona from the dialogue context to a expressed as: larger extent. L(φ) = αLres (φ) + (1 − α)Lrec (φ) (6) 2.1 Multi-task Learning Unlike traditional approaches that involve utilizing where α determines the contribution of the persona both the dialogue context and persona information reconstruction and response generation loss to the as model inputs during training, our framework learning process. Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
(a) (b) Figure 1: Diagrams depicting the computation path for MTML and AMTML. A batch size of 1 is assumed. (a): Diagram depicting the MTML framework. The weighted sum of the persona reconstruction loss and the response generation loss is used to update the parameters φ. (b): Diagram depicting the AMTML framework. The compu- tation flow alternates between the solid arrows and the dotted arrows after every iteration. Persona i have three guns and love hunting my family lives down the street from me i go to church every sunday i drive a ford pickup truck i am very conservative Dialogue Context User1: hey there how are you i am shy User2: well i am conservative so sounds like a match to me User1: i like that , did you go to college ? User2: no i just got my high school diploma and opened my owns shooting range User1: i went an got a computer science degree User2: wish i had continued school but i love what i do hunting , going to church and driving User1: gotcha , i build spaceships , the models User2: o ok my family lives down from me and they build garages and farms User1: that is cool , my mom is a doctor Responses Ref you ever eat at wendys ? i went to washington school before i quit . 10-shot Std that is cool where are you from ? Stdp that is cool what do you do for work ? P AM L he is good to church every week M T M L0.8 sure , i pick him up for church every sunday with my ford pickup AM T M L my parents live in the house i grew up in , just down the street . 5-shot Std that is cool what kind of doctor do you do ? Stdp that is cool what kind of garages do you drive ? P AM L that is good where do you live ? M T M L0.8 that is good what do you do for work ? AM T M L oh okay lol what do you do for work ? Table 1: Dialogue responses generated by the implemented models. Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
2.2 Multi-Task Meta Learning Framework where ηt refers to the inner loop learning rate and i Similar to PAML, MTML aims to learn a set of LT (φ) represents the multi-task training loss at- parameters φ capable of quickly adapting to an tained. unseen persona in order to generate contextually During meta-optimization, the original model appropriate responses which reflect the persona, φ is optimized on the multi-task loss attained on without relying on user persona statements. How- the meta optimization set Oi and the updated pa- ever, unlike PAML, which does not utilize the per- rameters φ0 . This meta-objective function can be sona statements during both meta-learning and in- expressed as: ference, we leverage the persona information only X i min LO (φ0 ) during meta-learning. The persona statements are φ C i ∼Dtrain not used during inference. The persona informa- X i i (12) tion is incorporated during meta-learning via the = LO (φ − ηt ∇φ LT (φ)) multi-task learning (Section 3.1). C i ∼Dtrain We begin by dividing the corpus into a training i set, validation set and a test set denoted by Dtrain , where LO refers to the multi-task loss attained on Dvalid and Dtest respectively. For each iteration i, the sampled meta optimization set Oi . To obtain i i Oi 1:M and a correspond- a batch distinct of personas P1:N LO (φ0 ), we first compute LO 0 0 res (φ ) and Lrec (φ ) ing set of M dialogues are randomly sampled from by applying Equation 7 - 9 on Oi . Then, we com- the training set Dtrain to form the meta training pute the weighted sum between both losses: set T i (support set). Then, another set of M dia- i i i LO (φ0 ) = αLO 0 O 0 res (φ ) + (1 − α)Lrec (φ ) (13) logues corresponding to the same batch of personas 1:M are randomly sampled from the training set P1:N where α determines the contribution of the per- Dtrain to form the meta optimization set Oi (query sona reconstruction and response generation loss set). We refer to the meta training set and meta respectively. When α = 1, MTML is analogous to optimization set collectively as C i i.e., C i = (T i , i MAML/PAML. Next, we sum the LO (φ0 ) attained Oi ). for every sampled persona P1:N1:M . The original pa- i During meta-training, the multi-task loss LTmul is rameters φ are then updated using the average of computed via a weighted sum between the response losses obtained by dividing the summed loss by the i generation loss LTres and the persona recreation loss batch size M . This can be formalized as: T i . To calculate LT i , the persona statements Lrec rec 1:M of each persona were concatenated into a 1 X Oi 0 P1:N φ = φ − η o ∇φ L (φ ) (14) 1:M M i single sequence, resulting in P . For an arbitrary O persona in the meta training set, the equations for where ηo refers to the outer loop learning rate and computing the multi-task loss are as follows: i LO (φ0 ) represents the multi-task training loss at- i X tained by the updated parameters φ0 on the sam- LTres (φ) = − log p(xt , |x1:t−1 ; φ) (7) pled validation set Oi . This computation involves obtaining the gradient of a gradient i.e., second P 1:M 1:M = concat(P1:N ) (8) order differentiation. A summary of MTML frame- work is provided in Algorithm 1. Additionally, an i X overview of the MTML framework is provided in LTrec (φ) = − log p(P|x1:t−1 ; φ) (9) Figure 1(a). 2.3 Alternating Multi-Task Meta-Learning i i i Framework LT (φ) = αLTres (φ) + (1 − α)LTrec (φ) (10) Instead of combining the response generation where α accounts for the distribution between the 2 loss and persona reconstruction loss, Alternating- losses. Subsequently, the parameters φ are updated MTML(AMTML) involves constantly alternating via SGD. The updated model parameters φ0 can be between the two loss functions. Essentially, at ev- expressed as: ery iteration, meta-training and meta-optimization are conducted with different loss functions. For i φ0 = φ − ηt ∇φ LT (φ) (11) AMTML, the α parameter would not be used. At Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
Algorithm 1 MTML Algorithm 2 Alternating MTML Require: Hyperparameters α, ηt , ηo Require: Hyperparameters ηt , ηo Require: Dataset Dtrain Require: Dataset Dtrain for iteration i = 1 to n do for iteration i = 1 to n do Sample C i = (T i , Oi ) ∼ Dtrain Sample C i = (T i , Oi ) ∼ Dtrain i for each persona in T do for each persona in T i do i i i Calc LT (φ) = αLTres (φ)+(1−α)LTrec (φ) if i is even then i i Update φ0 = φ − ηt ∇φ LT (φ) Calc LTres (φ) i i Calc LO (φ0 ) = αLO i 0 O i 0 res (φ ) + (1 − α)Lrec (φ ) Update φ0 = φ − ηt ∇φ LTres (φ) i i end for Calc LO (φ0 ) = LO 0 rec (φ ) 1 P O i 0 Update φ = φ − ηo ∇φ M Oi L (φ ) else if i is odd then i end for Calc LO 0 rec (φ ) i Update φ0 = φ − ηt ∇φ LTrec (φ) i i Calc LO (φ0 ) = LO 0 res (φ ) every iteration, during meta-training, either the end if response generation loss Lres or the persona re- end for construction loss Lrec will be used to update the 1 P Oi 0 Update φ = φ − ηo ∇φ M Oi L (φ ) parameters in the inner loop. Then, the alternate end for loss would be used to compute the loss for meta- optimization in the outer loop. For example, if the response generation loss is used to update the idation and test sets each consist of 100 unique i parameters during meta-training i.e. LTres , the per- personas. For our experiments, we utilize the train- sona reconstruction loss would be used to derive ing and validation sets during the meta-learning O i the meta-optimization loss i.e. Lrec . stage and the test set during the testing stage. In our implementation, this is achieved by uti- lizing the response generation loss during meta- 3.2 Implementation training when the iteration count is even, and uti- Following Madotto et al., we adopt the standard lizing the persona reconstruction loss when dur- Transformer architecture (Vaswani et al., 2017) con- ing meta-optimization when the iteration count is sisting of 6 encoder layers, 6 decoder layers and 4 odd. Due to the alternating loss functions, the com- attention heads is used along with the GloVe em- putational complexity and memory requirements bedding (Pennington et al., 2014). The dimensions for AMTML is lower than MTML, which requires of the word embedding and hidden dimension of computing both loss functions during both meta- the Transformer are fixed at 300. We use SGD (ηt = training and meta-optimization. A summary of the 0.005, M = 16) during meta-training and Adam (ηo AMTML framework is provided in Algorithm 2. = 0.003, M = 16) during meta-optimization. For Additionally, an overview of the AMTML frame- MTML, we define an additional hyperparameter, work is provided in Figure 1(b). The training set α, which accounts for the distribution between the Dtrain is used to train the model and the validation persona recreation loss and the response generation set Dvalid is used to facilitate early stopping. loss. 3 Experiment and Results 3.3 Evaluation Automatic Metrics Similar to Madotto et al., we 3.1 Corpus compute the BLEU score (Papineni et al., 2002), The proposed MTML and AMTML frameworks the perplexity (PPL) and the C score (Madotto et al., were evaluated on the PersonaChat dialogue corpus 2019) to evaluate the quality of the generated re- (Zhang et al., 2018). The corpus comprises 1155 sponse. The BLEU score measures the similarity distinct personas, each consisting of several per- between the generated response and the reference sona statements. The PersonaChat dialogue corpus response. PPL is the negative log of the gener- was chosen due to the available persona statements ated response. The C-score reflects the amount of which are used to compute the persona reconstruc- persona information present in the generated re- tion loss. In our experiment, the corpus is divided sponse by measuring the persona consistency with into a training, validation and test set. The val- respect to the corresponding persona statements via Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
a BERT-based Natural Language Inference (NLI) length of the dialogue context in each dialogue ex- model, which is finetuned to indicate if the gener- ample would vary according to the number of turns. ated response entails or contradicts the correspond- No restriction was placed on the number of dia- ing persona statements. A high C score would logue turns required. No persona statements were imply greater persona consistency. used during finetuning. In the 5-shot and 10-shot Human Evaluation We engaged 3 graduated indi- setting, 5 and 10 dialogues corresponding to a per- viduals to evaluate 50 responses for each model us- sona was used to finetune the model parameters re- ing 3 criteria: persona consistency, fluency and con- spectively. Then, the model tasked with generating textual coherence. Consistency reflects the amount the response corresponding to the same persona. of persona information corresponding to the per- Samples of the dialogue responses generated by sona statements are reflected in the generated re- each model is provided in Table 1. Table 2 and sponse. Fluency accounts for any grammatical, 3 depicts the results of automatic evaluation in a spelling and phrasing issues, while Coherence re- 10-shot and 5-shot setting respectively, while Table flects the appropriateness of the dialogue with re- 4 and 5 depicts the results of the human evaluation spect to the dialogue history. Responses were as- in a 10-shot and 5-shot setting respectively. signed a rating from -1 to 1 for each criteria. For consistency, -1 indicates contradiction, 0 indicates 3.5 Results & Discussion neutrality and 1 indicates persona consistency (i.e. the response accurately reflects information from M T M Lα generally achieves higher C-scores and the corresponding persona statements). For coher- Consistency scores compared to PAML, indicating ence, the individuals were told to assign a score of responses which incorporate a larger amount of -1 for incoherent, contextually illogical responses, persona information. This confirms our hypothesis 0 for a moderate coherent responses and 1 for co- that the introduction of the persona reconstruction herent, contextual responses. Finally, for fluency, task during meta-learning would result in a model -1 is assigned to responses with several fluency is- which induces the persona from the dialogue con- sues, 0 for responses with one or two fluency errors text to a larger extent. However, it can be seen that and 1 for perfectly fluent responses. PPL increases as α decreases. Since PPL scores have been found to correlate negatively with hu- 3.4 Experimental Settings man likeness (Adiwardana et al., 2020), a high PPL score is undesirable. This finding is supported by We benchmark MTML/AMTML with the Per- the human evaluation results, where the Fluency sonaChat corpus. We train the following Trans- and Coherence scores drop as α increases in both former models: the 5-shot and 10-shot settings. This implies that Std: We pretrain a standard Transformer model there is a trade-off between general fluency and per- with only the dialogue context as input. sona consistency in the generated response. During Stdp : We pretrain a standard Transformer model meta-learning, if α is too large, the combined loss with both the dialogue context and persona state- can be effectively reduced by minimizing the per- ments as inputs. sona reconstruction loss. Hence, the model would P AM L: We pretrain a standard Transformer be trained to generate responses which contain as via PAML (Madotto et al., 2019). much persona information as possible without con- M T M Lα : We pretrain a standard Transformer sidering fluency or context. While this would re- trained via MTML (Section 3.2) where α = [0.9, sult in persona consistent responses, the responses 0.8, 0.7, 0.6, 0.5]. would be largely incoherent and unnatural.Since AM T M L: We pretrain a standard Transformer α = 0.8 strikes a balance between the PPL and via AMTML (Section 3.3). C-scores, we conclude that the optimal value of During testing, the pretrained models described α = 0.8. above were further trained, or finetuned, on di- Both M T M L0.8 and AM T M L improved the alogues corresponding to personas from the test persona consistency of the generated responses. set Dtest . As highlighted earlier, the models were In terms of C-score, M T M L0.8 demonstrated a finetuned using only the dialogue context, which 78.9%(10-shot) and 100%(5-shot) improvement was constructed by concatenating all previous ut- over P AM L. In terms of persona consistency, terances in the dialogue. For our experiment, the M T M L0.8 demonstrated a 64.0%(10-shot) and Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
Automatic PPL BLEU C-score PPL BLEU C-score P 2 Bot 18.1 0.61 0.33 Std 35.87 0.93 0.00 Human Consistency Fluency Coherence Stdp 38.72 1.66 0.10 P 2 Bot 0.39 0.91 0.43 P AM L 41.80 0.71 0.19 M T M L0.5 77.32 0.53 0.46 Table 6: Automatic and human evaluation results at- M T M L0.6 57.10 0.53 0.41 tained by P 2 Bot M T M L0.7 52.44 0.57 0.47 M T M L0.8 43.28 0.42 0.34 M T M L0.9 40.39 0.71 0.21 15.4%(5-shot) improvement over P AM L. Sim- AM T M L 48.66 0.48 0.29 ilarly, compared to P AM L, AM T M L achieved a 52.6%(10-shot) and 73.3%(5-shot) improvement Table 2: Automatic evaluation results (10-shot). when it comes to C-score. In terms of Consistency, AM T M L demonstrated a 68.0%(10-shot) and Method PPL BLEU C-score 76.9%(5-shot) improvement over P AM L. Com- Std 36.75 1.02 -0.02 pared to M T M L0.8 , in both the 10-shot and 5- Stdp 38.78 1.79 0.09 shot settings, AM T M L achieved similar Consis- P AM L 40.46 0.65 0.15 tency scores and slightly lower C-scores. How- M T M L0.5 76.38 0.41 0.50 ever, while M T M L0.8 is comparable to P AM L M T M L0.6 55.19 0.53 0.48 with regard to Fluency and PPL, AM T M L outper- M T M L0.7 50.69 0.44 0.45 formed P AM L, M T M L0.8 and all other MTML M T M L0.8 41.42 0.38 0.30 variants in terms of Fluency. When it comes to co- M T M L0.9 39.94 0.62 0.13 herence, P AM L, M T M L0.8 and AM T M L gen- AM T M L 44.90 0.42 0.26 erally achieved similar results. Based on the results attained, while responses Table 3: Automatic evaluation results (5-shot). generated via MTML has a slight edge in terms of persona consistency, responses generated via AMTML are more fluent. On a side note, it should Consistency Fluency Coherence also be highlighted that the BLEU score did not cor- Std 0.16 0.92 0.24 relate with any aspect of human evaluation. This Stdp 0.19 0.89 0.29 further emphasizes the unsuitability of the BLEU P AM L 0.25 0.77 0.28 score as an evaluation metric for dialogue genera- M T M L0.5 0.47 0.24 0.20 tion (Liu et al., 2016). M T M L0.6 0.45 0.57 0.27 M T M L0.7 0.45 0.60 0.30 3.5.1 PersonaChat SOTA Comparison M T M L0.8 0.41 0.80 0.32 Additionally, we compare our proposed frame- M T M L0.9 0.39 0.88 0.17 works with the current state-of-the-art framework AM T M L 0.42 0.89 0.33 for PersonaChat: P 2 Bot (Song et al., 2020a). Table 4: Human evaluation results (10-shot). P 2 Bot involves finetuning the GPT pretrained lan- guage model on the training set via a transmitter receiver framework. This framework models the Consistency Fluency Coherence user’s perception of the other party’s persona in Std 0.10 0.87 0.20 addition to the user’s own persona. Hence, unlike Stdp 0.09 0.89 0.21 MTML and AMTML, in the case of P 2 Bot, per- P AM L 0.13 0.84 0.27 sona statements are provided to the model along M T M L0.5 0.22 0.03 -0.12 with the dialogue context during inference and test- M T M L0.6 0.24 0.65 0.15 ing. M T M L0.7 0.29 0.65 0.13 From Table 6, it can be observed that the C- M T M L0.8 0.22 0.78 0.29 score attained by M T M L0.8 , in the 10-shot set- M T M L0.9 0.15 0.77 0.19 ting, was comparable to P 2 Bot. When it comes AM T M L 0.23 0.85 0.20 to the Consistency score, in the 10-shot setting, Table 5: Human evaluation results (5-shot). both M T M L0.8 and AM T M L outperformed P 2 Bot. This implies that the responses generated Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
by M T M L0.8 and AM T M L generally reflect the Meta-learning Meta-learning involves teaching corresponding persona information to a greater ex- models how to learn efficiently and quickly. There tent compared to P 2 Bot despite not being provided are 3 broad categories of meta-learning algorithms: the persona statements during testing. However, in optimization-based (Finn et al., 2017; Nichol et al., terms of Fluency and Coherence, P 2 Bot still out- 2018), metric-based (Snell et al., 2017; Vinyals performs both M T M L0.8 and AM T M L. This et al., 2016), and model-based (Mishra et al., 2018; could be partially attributed to the use of the GPT Santoro et al., 2016). Optimization-based meta- pretrained model, which enhanced the overall qual- learning approaches, which involve directly updat- ity of the generated responses. ing the model’s parameters to allow for rapid adap- tation to unseen tasks, have been applied to various 3.5.2 Persona Reconstruction dialogue tasks. Examples of such applications in- In this section, we will provide a brief discussion clude task-oriented dialogue generation (Mi et al., regarding the persona reconstruction task. Based 2019; Dai et al., 2020; Peng et al., 2020), domain on the results attained, it is clear that the intro- adaptation (Qian and Yu, 2019) and dialogue state duction of the persona reconstruction task during tracking (Peng et al., 2020; Huang et al., 2020). meta-learning further incentivizes the model to in- Personalized Dialogue Generation There are nu- corporate more persona information in the gener- merous forms of personalized dialogue generation. ated responses. Under the proposed frameworks, it The form covered in this paper requires leveraging is challenging to evaluate the performance of the both the persona information and dialogue context. model solely on the persona reconstruction task. Another form of personalized dialogue generation However, based on the observed responses and involves conditioning the response on external pro- loss values, persona reconstruction tend to be more file/identity information. Thus far, many different successful when a longer dialogue context x1:t−1 architectures (Wu et al., 2020; Song et al., 2019; (greater number of turns) is provided. This is ex- Wolf et al., 2019; Kottur et al., 2017; Joshi et al., pected as a short dialogue context would not con- 2017) and training frameworks (Song et al., 2020a; tain sufficient persona information for the model to Liu et al., 2020; Zheng et al., 2019; Su et al., 2019b) reconstruct. which involve utilizing the encoded persona/ or per- Persona reconstruction is a interesting and chal- sonality information and dialogue context as input lenging task that could be explored in future work. have been proposed. For certain dialogue corpora For this task, the model has to successfully infer the such as DailyDialog (Li et al., 2017) and PER- persona from the dialogue context as well as ensure SONALDIALOG (Zheng et al., 2020), the persona the fluency of the generated description. Finetuning descriptions are not provided. Instead, a represen- pretrained language models would be a good start- tation of the interlocutor’s personality should be ing point for future work. Also, to prevent a mix-up inferred from the dialogue history. between the personas from each interlocutor, only the dialogue utterances of the corresponding in- terlocutor should be utilized during training and 5 Conclusion inference. In this work, we proposed MTML and AMTML, 2 4 Related Work meta-learning frameworks which adopt our multi- Multi-task Learning Multi-task learning broadly task learning approach involving the addition of refers to the process of learning more than a persona reconstruction task. Empirical results one tasks/objectives concurrently with a shared demonstrate that both MTML and AMTML effec- model. In addition to personalized dialogue tively increases the amount of persona information generation, multi-task learning has been applied reflected in the generated dialogue responses com- to task-oriented dialogue subtasks including re- pared to prior work. However, there is still room sponse generation(Zhu et al., 2019), dialogue state for improvement when it comes to the fluency and tracking(GM and Sengupta, 2019; Trinh et al., contextual coherence of the generated responses. 2018; Rastogi et al., 2018), dialogue act selection Future work could involve improving these aspects (McLeod et al., 2019), as well as conditional open- of the responses by incorporating pretrained lan- domain dialogue generation(Zeng and Nie, 2021; guage models in meta-learning framework. Ide and Kawahara, 2021). Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
References Qian Liu, Yihong Chen, Bei Chen, Jian-Guang Lou, Zixuan Chen, Bin Zhou, and Dongmei Zhang. 2020. Daniel Adiwardana, Minh-Thang Luong, David R. So, You impress me: Dialogue generation via mutual Jamie Hall, Noah Fiedel, Romal Thoppilan, Zi Yang, persona perception. In Proceedings of the 58th An- Apoorv Kulshreshtha, Gaurav Nemade, Yifeng Lu, nual Meeting of the Association for Computational and Quoc V. Le. 2020. Towards a human-like open- Linguistics, pages 1417–1427, Online. Association domain chatbot. for Computational Linguistics. Yinpei Dai, Hangyu Li, Chengguang Tang, Yongbin Li, Jian Sun, and Xiaodan Zhu. 2020. Learning low- Andrea Madotto, Zhaojiang Lin, Chien-Sheng Wu, and resource end-to-end goal-oriented dialog for fast and Pascale Fung. 2019. Personalizing dialogue agents reliable system deployment. In Proceedings of the via meta-learning. In Proceedings of the 57th An- 58th Annual Meeting of the Association for Compu- nual Meeting of the Association for Computational tational Linguistics, pages 609–618, Online. Associ- Linguistics, pages 5454–5459, Florence, Italy. Asso- ation for Computational Linguistics. ciation for Computational Linguistics. Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Sarah McLeod, Ivana Kruijff-Korbayova, and Bernd Model-agnostic meta-learning for fast adaptation of Kiefer. 2019. Multi-task learning of system di- deep networks. In Proceedings of the 34th In- alogue act selection for supervised pretraining of ternational Conference on Machine Learning, vol- goal-oriented dialogue policies. In Proceedings of ume 70 of Proceedings of Machine Learning Re- the 20th Annual SIGdial Meeting on Discourse and search, pages 1126–1135, International Convention Dialogue, pages 411–417, Stockholm, Sweden. As- Centre, Sydney, Australia. PMLR. sociation for Computational Linguistics. Sushravya GM and Shubhashis Sengupta. 2019. Unsu- Fei Mi, Minlie Huang, Jiyong Zhang, and Boi Faltings. pervised multi-task learning dialogue management. 2019. Meta-learning for low-resource natural lan- pages 196–202. guage generation in task-oriented dialogue systems. In Proceedings of the Twenty-Eighth International Yi Huang, Junlan Feng, Min Hu, Xiaoting Wu, Xiaoyu Joint Conference on Artificial Intelligence, IJCAI- Du, and Shuo Ma. 2020. Meta-reinforced multi- 19, pages 3151–3157. International Joint Confer- domain state generator for dialogue systems. In Pro- ences on Artificial Intelligence Organization. ceedings of the 58th Annual Meeting of the Asso- ciation for Computational Linguistics, pages 7109– N. Mishra, Mostafa Rohaninejad, Xi Chen, and 7118, Online. Association for Computational Lin- P. Abbeel. 2018. A simple neural attentive meta- guistics. learner. In ICLR. Tatsuya Ide and Daisuke Kawahara. 2021. Multi-task Young Na, Junekyu Park, and Kyung-Ah Sohn. 2021. learning of generation and classification for emotion- Is your chatbot perplexing?: Confident personal- aware dialogue response generation. CoRR, ized conversational agent for consistent chit-chat di- abs/2105.11696. alogue. In Proceedings of the 13th International Chaitanya K. Joshi, Fei Mi, and Boi Faltings. 2017. Conference on Agents and Artificial Intelligence - Personalization in goal-oriented dialog. CoRR, Volume 2: ICAART,, pages 1226–1232. INSTICC, abs/1706.07503. SciTePress. Satwik Kottur, Xiaoyu Wang, and Vitor Carvalho. 2017. Alex Nichol, Joshua Achiam, and John Schulman. Exploring personalized neural conversational mod- 2018. On first-order meta-learning algorithms. els. In Proceedings of the Twenty-Sixth Interna- ArXiv, abs/1803.02999. tional Joint Conference on Artificial Intelligence, IJCAI-17, pages 3728–3734. Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. 2002. Bleu: a method for automatic eval- Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang uation of machine translation. In Proceedings of Cao, and Shuzi Niu. 2017. DailyDialog: A manu- the 40th Annual Meeting of the Association for Com- ally labelled multi-turn dialogue dataset. In Proceed- putational Linguistics, pages 311–318, Philadelphia, ings of the Eighth International Joint Conference on Pennsylvania, USA. Association for Computational Natural Language Processing (Volume 1: Long Pa- Linguistics. pers), pages 986–995, Taipei, Taiwan. Asian Federa- tion of Natural Language Processing. Baolin Peng, Chenguang Zhu, Chunyuan Li, Xiujun Li, Jinchao Li, Michael Zeng, and Jianfeng Gao. Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Nose- 2020. Few-shot natural language generation for worthy, Laurent Charlin, and Joelle Pineau. 2016. task-oriented dialog. In Findings of the Associa- How NOT to evaluate your dialogue system: An tion for Computational Linguistics: EMNLP 2020, empirical study of unsupervised evaluation metrics pages 172–182, Online. Association for Computa- for dialogue response generation. In Proceedings of tional Linguistics. the 2016 Conference on Empirical Methods in Natu- ral Language Processing, pages 2122–2132, Austin, Jeffrey Pennington, Richard Socher, and Christopher Texas. Association for Computational Linguistics. Manning. 2014. GloVe: Global vectors for word Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
representation. In Proceedings of the 2014 Confer- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob ence on Empirical Methods in Natural Language Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Processing (EMNLP), pages 1532–1543, Doha, Kaiser, and Illia Polosukhin. 2017. Attention is all Qatar. Association for Computational Linguistics. you need. In Advances in Neural Information Pro- cessing Systems, volume 30. Curran Associates, Inc. Kun Qian and Zhou Yu. 2019. Domain adaptive dia- log generation via meta learning. In Proceedings of Oriol Vinyals, Charles Blundell, Timothy Lillicrap, ko- the 57th Annual Meeting of the Association for Com- ray kavukcuoglu, and Daan Wierstra. 2016. Match- putational Linguistics, pages 2639–2649, Florence, ing networks for one shot learning. In Advances in Italy. Association for Computational Linguistics. Neural Information Processing Systems, volume 29. Curran Associates, Inc. Abhinav Rastogi, Raghav Gupta, and Dilek Hakkani- Tur. 2018. Multi-task learning for joint language Thomas Wolf, Victor Sanh, Julien Chaumond, and understanding and dialogue state tracking. In Pro- Clement Delangue. 2019. Transfertransfo: A trans- ceedings of the 19th Annual SIGdial Meeting on Dis- fer learning approach for neural network based con- course and Dialogue, pages 376–384, Melbourne, versational agents. CoRR, abs/1901.08149. Australia. Association for Computational Linguis- tics. Bowen Wu, Mengyuan Li, Zongsheng Wang, Yifu Chen, Derek Wong, Qihang Feng, Junhong Huang, Adam Santoro, Sergey Bartunov, Matthew Botvinick, and Baoxun Wang. 2020. Guiding variational re- Daan Wierstra, and Timothy Lillicrap. 2016. Meta- sponse generator to exploit persona. learning with memory-augmented neural networks. In Proceedings of The 33rd International Confer- M. Yang, W. Huang, W. Tu, Q. Qu, Y. Shen, and K. Lei. ence on Machine Learning, volume 48 of Proceed- 2021. Multitask learning and reinforcement learn- ings of Machine Learning Research, pages 1842– ing for personalized dialog generation: An empirical 1850, New York, New York, USA. PMLR. study. IEEE Transactions on Neural Networks and Learning Systems, 32(1):49–62. Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Ad- Yan Zeng and Jian-Yun Nie. 2021. A simple and effi- vances in Neural Information Processing Systems, cient multi-task learning approach for conditioned volume 30. Curran Associates, Inc. dialogue generation. In Proceedings of the 2021 Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Conference of the North American Chapter of the Wang, and Ting Liu. 2019. Exploiting persona in- Association for Computational Linguistics: Human formation for diverse generation of conversational Language Technologies, pages 4927–4939, Online. responses. Association for Computational Linguistics. Haoyu Song, Wei-Nan Zhang, Jingwen Hu, and Ting Saizheng Zhang, Emily Dinan, Jack Urbanek, Arthur Liu. 2020a. Generating persona consistent dia- Szlam, Douwe Kiela, and Jason Weston. 2018. Per- logues by exploiting natural language inference. sonalizing dialogue agents: I have a dog, do you Proceedings of the AAAI Conference on Artificial In- have pets too? In Proceedings of the 56th An- telligence, 34(05):8878–8885. nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2204– Yiping Song, Zequn Liu, Wei Bi, Rui Yan, and Ming 2213, Melbourne, Australia. Association for Com- Zhang. 2020b. Learning to customize model struc- putational Linguistics. tures for few-shot dialogue generation tasks. In Pro- ceedings of the 58th Annual Meeting of the Asso- Yinhe Zheng, Guanyi Chen, Minlie Huang, Song Liu, ciation for Computational Linguistics, pages 5832– and Xuan Zhu. 2019. Personalized dialogue genera- 5841, Online. Association for Computational Lin- tion with diversified traits. CoRR, abs/1901.09672. guistics. Yinhe Zheng, Rongsheng Zhang, Minlie Huang, and Feng-Guang Su, Aliyah R. Hsu, Yi-Lin Tuan, and Xiaoxi Mao. 2020. A pre-training based personal- Hung yi Lee. 2019a. Personalized dialogue re- ized dialogue generation model with persona-sparse sponse generation learned from monologues. In IN- data. Proceedings of the AAAI Conference on Artifi- TERSPEECH. cial Intelligence, 34(05):9693–9700. Feng-Guang Su, Aliyah R. Hsu, Yi-Lin Tuan, and Chenguang Zhu, Michael Zeng, and Xuedong Huang. Hung-Yi Lee. 2019b. Personalized Dialogue Re- 2019. Multi-task learning for natural language gen- sponse Generation Learned from Monologues. In eration in task-oriented dialogue. In Proceedings of Proc. Interspeech 2019, pages 4160–4164. the 2019 Conference on Empirical Methods in Nat- ural Language Processing and the 9th International Anh Duong Trinh, Robert J Ross, and John D Kelle- Joint Conference on Natural Language Processing her. 2018. A multi-task approach to incremental di- (EMNLP-IJCNLP), pages 1261–1266, Hong Kong, alogue state tracking. In Proceedings of the 22nd China. Association for Computational Linguistics. Workshop on the Semantics and Pragmatics of Dia- logue - Full Papers, Aix-en-Provence, France. SEM- DIAL. Proceedings of the 25th Workshop on the Semantics and Pragmatics of Dialogue, September 20–22, 2021, Potsdam / The Internet.
You can also read