Generating Personalized Dialogue via Multi-Task Meta-Learning

                           Lee Jing Yang1 , Lee Kong Aik2 , Gan Woon Seng3
          School of Electrical and Electronic Engineering, Nanyang Technological University1,3
                                Institute for Infocomm Research, A*STAR2 , lee kong ,

                             Abstract                             To address these practical issues, Persona Agnos-
                                                               tic Meta-Learning (PAML) (Madotto et al., 2019),
           Conventional approaches to personalized dia-
                                                               a framework which aims to train a model capable
           logue generation typically require a large cor-
           pus, as well as predefined persona informa-
                                                               of rapid adaptation to new unseen personas, was
           tion. However, in a real-world setting, nei-        proposed. The PAML framework was based on the
           ther a large corpus of training data nor persona    popular Model-Agnostic Meta-Learning (MAML)
           information are readily available. To address       framework (Finn et al., 2017). The recently pro-
           these practical limitations, we propose a novel     posed Customized Model Agnostic Meta-Learning
           multi-task meta-learning approach which in-         (CMAML) (Song et al., 2020b) framework largely
           volves training a model to adapt to new per-        follows the PAML framework, with the exception
           sonas without relying on a large corpus, or
                                                               of an additional network structure optimization
           on any predefined persona information. In-
           stead, the model is tasked with generating          component. Both the PAML and CMAML frame-
           personalized responses based on only the di-        works were benchmarked on the PersonaChat cor-
           alogue context. Unlike prior work, our ap-          pus (Zhang et al., 2018), a popular personalized
           proach leverages on the provided persona in-        dialogue generation corpus which provides persona
           formation only during training via the intro-       statements describing each interlocutor in addition
           duction of an auxiliary persona reconstruc-         to the persona-specific dialogues. As it is unfeasi-
           tion task. In this paper, we introduce 2
                                                               ble to collect persona statements from interlocutors
           frameworks that adopt the proposed multi-task
           meta-learning approach: the Multi-Task Meta-
                                                               in a real world setting, the PAML framework does
           Learning (MTML) framework, and the Alter-           not utilize the available persona statements during
           nating Multi-Task Meta-Learning (AMTML)             both meta-learning and inference. However, even
           framework. Experimental results show that uti-      though it is impractical to utilize the persona state-
           lizing MTML and AMTML results in dialogue           ments during inference, the persona statements can
           responses with greater persona consistency.         be used during meta-learning to further improve
                                                               model performance.
       1   Introduction
                                                                  Hence, we introduce a novel multi-task meta-
       Personalized dialogue generation involves generat-      learning approach which leverages predefined per-
       ing dialogue responses which incorporates the per-      sona statements only during meta-learning via an
       sonality of the interlocutors, leading to more natu-    additional persona reconstruction task. Essentially,
       ral and human-like dialogue. Thus far, approaches       this task involves generating all corresponding per-
       to personalized dialogue generation typically re-       sona statements in its entirety given the dialogue
       quire a large number of persona-specific dialogue       context. We hypothesize that the introduction of
       examples. Certain approaches also require persona       the persona reconstruction task would result in pa-
       information presented in the form of several pre-       rameters capable of effectively inducing the per-
       defined persona statements (eg. ’I love dogs’, ’I       sona information from the dialogue context, which
       am an engineering student.’). However, in a real-       would lead to the generation of persona consistent
       world system, large amounts of persona-specific di-     dialogue. The persona statements are not used dur-
       alogue are rarely available, and collecting descrip-    ing inference. Prior usage of multi-task learning
       tive persona statements from every interlocutor is      for personalized dialogue generation involved the
       intractable.                                            addition of a persona classification task (Yang et al.,

2021; Su et al., 2019a) and a distraction utterance       involves reconstructing the persona statements as
binary classification task (Na et al., 2021). To our      well as generating the dialogue response given the
knowledge, this is the first attempt at incorporating     dialogue context. The primary task of generating
persona statement reconstruction.                         the response and the corresponding loss function
   Our contributions include 2 multi-task meta-           Lres can be expressed in the following equations:
learning frameworks which leverage the persona
reconstruction task only during training. The Multi-               fφ (xt |x1:t−1 ) = p(xt |x1:t−1 ; φ)       (1)
Task Meta-Learning (MTML) framework, as well
as a variant known as the Alternating Multi-Task
Meta-Learning (AMTML) framework. While both                                     X
MTML and AMTML involve the addition of a per-                    Lres (φ) = −         log p(xt |x1:t−1 ; φ)   (2)
sona reconstruction task only during meta-learning,
MTML involves combining the losses derived from           where xt and x1:t−1 refer to the response and di-
generating the response and reconstructing the per-       alogue context respectively, and φ represents the
sona. AMTML, on the other hand, functions by              model parameters. We hypothesize that the persona
constantly alternating between both tasks. Exper-         reconstruction task would result in model param-
imental results on the PersonaChat corpus reveal          eters capable of inducing the persona information
that utilizing MTML and AMTML result in re-               from the dialogue context to a larger extent. This
sponses which reflect the interlocutor’s persona to       is due to the increased emphasis on persona consis-
a larger extent compared to prior work.                   tency in the task of persona reconstruction. Also,
                                                          rather than generating only selected keywords or
2     Methodology                                         phrases, the persona reconstruction task involves
                                                          generating all corresponding persona statements
Our approach to personalized dialogue generation          in its entirety given the dialogue context. This is
involves extending the PAML framework (Madotto            because generating complete sentences would also
et al., 2019) by introducing a persona reconstruc-        require the model to account for fluency in addition
tion task only during the meta-learning stage. The        to persona consistency, which is also a vital aspect
PAML framework is essentially an adaptation of            of dialogue generation. The persona statements
the general MAML framework for personalized dia-          P1:N are concatenated to form a single sequence
logue generation. The MAML framework involves             P. The auxiliary persona reconstruction task and
learning a parameter initialization capable of gen-       the corresponding loss function Lrec is formalized
eralizing and adapting rapidly to new tasks unseen        in the following expression:
during the training process via gradient descent.
Specifically, this involves obtaining the updated pa-                     P = concat(P1:N )                   (3)
rameters by adjusting the model parameters by uti-
lizing data from several related tasks. The original
parameters are then optimized based on the updated
                                                                   fφ (P|x1:t−1 ) = p(P|x1:t−1 ; φ)           (4)
parameters and a separate test set by computing the
second order derivatives (Hessian matrix). In the
PAML framework, each persona is viewed as a                                     T
unique task. Consequently, the goal of the PAML                  Lrec (φ) = −         log p(P|x1:t−1 ; φ)     (5)
framework is to obtain a parameter initialization                               t=1
capable of rapidly adapting to new personas. We             During training, the persona reconstruction loss
hypothesize that the introduction of the persona re-      and the response generation loss will be weighted
construction task would result in parameters which        and summed. Hence, the total multi-task loss is
induce the persona from the dialogue context to a         expressed as:
larger extent.
                                                                 L(φ) = αLres (φ) + (1 − α)Lrec (φ)           (6)
2.1     Multi-task Learning
Unlike traditional approaches that involve utilizing      where α determines the contribution of the persona
both the dialogue context and persona information         reconstruction and response generation loss to the
as model inputs during training, our framework            learning process.

(a)                                        (b)

Figure 1: Diagrams depicting the computation path for MTML and AMTML. A batch size of 1 is assumed. (a):
Diagram depicting the MTML framework. The weighted sum of the persona reconstruction loss and the response
generation loss is used to update the parameters φ. (b): Diagram depicting the AMTML framework. The compu-
tation flow alternates between the solid arrows and the dotted arrows after every iteration.

 i have three guns and love hunting
 my family lives down the street from me
 i go to church every sunday
 i drive a ford pickup truck
 i am very conservative
                                         Dialogue Context
User1:        hey there how are you i am shy
User2:        well i am conservative so sounds like a match to me
User1:        i like that , did you go to college ?
User2:        no i just got my high school diploma and opened my owns shooting range
User1:        i went an got a computer science degree
User2:        wish i had continued school but i love what i do hunting , going to church and driving
User1:        gotcha , i build spaceships , the models
User2:        o ok my family lives down from me and they build garages and farms
User1:        that is cool , my mom is a doctor
Ref           you ever eat at wendys ? i went to washington school before i quit .
Std           that is cool where are you from ?
Stdp          that is cool what do you do for work ?
P AM L        he is good to church every week
M T M L0.8    sure , i pick him up for church every sunday with my ford pickup
AM T M L      my parents live in the house i grew up in , just down the street .
Std           that is cool what kind of doctor do you do ?
Stdp          that is cool what kind of garages do you drive ?
P AM L        that is good where do you live ?
M T M L0.8    that is good what do you do for work ?
AM T M L      oh okay lol what do you do for work ?

                     Table 1: Dialogue responses generated by the implemented models.

2.2     Multi-Task Meta Learning Framework                      where ηt refers to the inner loop learning rate and
Similar to PAML, MTML aims to learn a set of                    LT (φ) represents the multi-task training loss at-
parameters φ capable of quickly adapting to an                  tained.
unseen persona in order to generate contextually                   During meta-optimization, the original model
appropriate responses which reflect the persona,                φ is optimized on the multi-task loss attained on
without relying on user persona statements. How-                the meta optimization set Oi and the updated pa-
ever, unlike PAML, which does not utilize the per-              rameters φ0 . This meta-objective function can be
sona statements during both meta-learning and in-               expressed as:
ference, we leverage the persona information only                            X          i
                                                                      min           LO (φ0 )
during meta-learning. The persona statements are                          φ
                                                                                C i ∼Dtrain
not used during inference. The persona informa-                                     X             i         i
tion is incorporated during meta-learning via the                     =                     LO (φ − ηt ∇φ LT (φ))
multi-task learning (Section 3.1).                                            C i ∼Dtrain
   We begin by dividing the corpus into a training                              i
set, validation set and a test set denoted by Dtrain ,          where LO refers to the multi-task loss attained on
Dvalid and Dtest respectively. For each iteration i,            the sampled meta optimization set Oi . To obtain
                                                                   i                           i            Oi
                                1:M and a correspond-
a batch distinct of personas P1:N                               LO (φ0 ), we first compute LO      0             0
                                                                                             res (φ ) and Lrec (φ )
ing set of M dialogues are randomly sampled from                by applying Equation 7 - 9 on Oi . Then, we com-
the training set Dtrain to form the meta training               pute the weighted sum between both losses:
set T i (support set). Then, another set of M dia-                    i                       i                 i
                                                                  LO (φ0 ) = αLO     0            O     0
                                                                               res (φ ) + (1 − α)Lrec (φ ) (13)
logues corresponding to the same batch of personas
  1:M are randomly sampled from the training set
P1:N                                                               where α determines the contribution of the per-
Dtrain to form the meta optimization set Oi (query              sona reconstruction and response generation loss
set). We refer to the meta training set and meta                respectively. When α = 1, MTML is analogous to
optimization set collectively as C i i.e., C i = (T i ,                                               i
                                                                MAML/PAML. Next, we sum the LO (φ0 ) attained
Oi ).                                                           for every sampled persona P1:N1:M . The original pa-
   During meta-training, the multi-task loss LTmul is           rameters φ are then updated using the average of
computed via a weighted sum between the response                losses obtained by dividing the summed loss by the
generation loss LTres and the persona recreation loss           batch size M . This can be formalized as:
  T i . To calculate LT i , the persona statements
Lrec                    rec
  1:M of each persona were concatenated into a                                            1 X Oi 0
P1:N                                                                      φ = φ − η o ∇φ         L (φ )        (14)
                                 1:M                                                      M i
single sequence, resulting in P      . For an arbitrary                                               O
persona in the meta training set, the equations for
                                                                where ηo refers to the outer loop learning rate and
computing the multi-task loss are as follows:                       i
                                                                LO (φ0 ) represents the multi-task training loss at-
                       X                                        tained by the updated parameters φ0 on the sam-
        LTres (φ) = −     log p(xt , |x1:t−1 ; φ) (7)
                                                                pled validation set Oi . This computation involves
                                                                obtaining the gradient of a gradient i.e., second
                     1:M             1:M
                           = concat(P1:N )                (8)   order differentiation. A summary of MTML frame-
                                                                work is provided in Algorithm 1. Additionally, an
                               X                                overview of the MTML framework is provided in
         LTrec (φ) = −             log p(P|x1:t−1 ; φ)    (9)   Figure 1(a).

                                                                2.3   Alternating Multi-Task Meta-Learning
         i                 i                    i                     Framework
      LT (φ) = αLTres (φ) + (1 − α)LTrec (φ)             (10)
                                                                Instead of combining the response generation
where α accounts for the distribution between the 2             loss and persona reconstruction loss, Alternating-
losses. Subsequently, the parameters φ are updated              MTML(AMTML) involves constantly alternating
via SGD. The updated model parameters φ0 can be                 between the two loss functions. Essentially, at ev-
expressed as:                                                   ery iteration, meta-training and meta-optimization
                                                                are conducted with different loss functions. For
                 φ0 = φ − ηt ∇φ LT (φ)                   (11)   AMTML, the α parameter would not be used. At

Algorithm 1 MTML                                           Algorithm 2 Alternating MTML
Require: Hyperparameters α, ηt , ηo                        Require: Hyperparameters ηt , ηo
Require: Dataset Dtrain                                    Require: Dataset Dtrain
   for iteration i = 1 to n do                                for iteration i = 1 to n do
      Sample C i = (T i , Oi ) ∼ Dtrain                          Sample C i = (T i , Oi ) ∼ Dtrain
      for each persona in T do                                   for each persona in T i do
                  i             i                 i
         Calc LT (φ) = αLTres (φ)+(1−α)LTrec (φ)                    if i is even then
                                       i                                          i
         Update φ0 = φ − ηt ∇φ LT (φ)                                  Calc LTres (φ)
         Calc LO (φ0 ) = αLO
                                  i  0              O i 0
                               res (φ ) + (1 − α)Lrec (φ )
                                                                       Update φ0 = φ − ηt ∇φ LTres (φ)
                                                                                  i         i
      end for                                                          Calc LO (φ0 ) = LO       0
                                                                                          rec (φ )
                                 1  P      O i  0
      Update φ = φ − ηo ∇φ M           Oi L (φ )                    else if i is odd then
   end for                                                             Calc LO        0
                                                                                rec (φ )
                                                                       Update φ0 = φ − ηt ∇φ LTrec (φ)
                                                                                  i         i
                                                                       Calc LO (φ0 ) = LO       0
                                                                                          res (φ )
every iteration, during meta-training, either the
                                                                    end if
response generation loss Lres or the persona re-
                                                                 end for
construction loss Lrec will be used to update the                                         1 P        Oi 0
                                                                 Update φ = φ − ηo ∇φ M         Oi L (φ )
parameters in the inner loop. Then, the alternate
                                                              end for
loss would be used to compute the loss for meta-
optimization in the outer loop. For example, if
the response generation loss is used to update the         idation and test sets each consist of 100 unique
parameters during meta-training i.e. LTres , the per- personas. For our experiments, we utilize the train-
sona reconstruction loss would be used to derive           ing and validation sets during the meta-learning
                                     O i
the meta-optimization loss i.e. Lrec .                     stage and the test set during the testing stage.
   In our implementation, this is achieved by uti-
lizing the response generation loss during meta- 3.2 Implementation
training when the iteration count is even, and uti- Following Madotto et al., we adopt the standard
lizing the persona reconstruction loss when dur- Transformer architecture (Vaswani et al., 2017) con-
ing meta-optimization when the iteration count is          sisting of 6 encoder layers, 6 decoder layers and 4
odd. Due to the alternating loss functions, the com- attention heads is used along with the GloVe em-
putational complexity and memory requirements              bedding (Pennington et al., 2014). The dimensions
for AMTML is lower than MTML, which requires               of the word embedding and hidden dimension of
computing both loss functions during both meta- the Transformer are fixed at 300. We use SGD (ηt =
training and meta-optimization. A summary of the           0.005, M = 16) during meta-training and Adam (ηo
AMTML framework is provided in Algorithm 2. = 0.003, M = 16) during meta-optimization. For
Additionally, an overview of the AMTML frame- MTML, we define an additional hyperparameter,
work is provided in Figure 1(b). The training set          α, which accounts for the distribution between the
Dtrain is used to train the model and the validation       persona recreation loss and the response generation
set Dvalid is used to facilitate early stopping.           loss.

3     Experiment and Results                              3.3   Evaluation
                                                          Automatic Metrics Similar to Madotto et al., we
3.1     Corpus                                            compute the BLEU score (Papineni et al., 2002),
The proposed MTML and AMTML frameworks                    the perplexity (PPL) and the C score (Madotto et al.,
were evaluated on the PersonaChat dialogue corpus         2019) to evaluate the quality of the generated re-
(Zhang et al., 2018). The corpus comprises 1155           sponse. The BLEU score measures the similarity
distinct personas, each consisting of several per-        between the generated response and the reference
sona statements. The PersonaChat dialogue corpus          response. PPL is the negative log of the gener-
was chosen due to the available persona statements        ated response. The C-score reflects the amount of
which are used to compute the persona reconstruc-         persona information present in the generated re-
tion loss. In our experiment, the corpus is divided       sponse by measuring the persona consistency with
into a training, validation and test set. The val-        respect to the corresponding persona statements via

a BERT-based Natural Language Inference (NLI)             length of the dialogue context in each dialogue ex-
model, which is finetuned to indicate if the gener-       ample would vary according to the number of turns.
ated response entails or contradicts the correspond-      No restriction was placed on the number of dia-
ing persona statements. A high C score would              logue turns required. No persona statements were
imply greater persona consistency.                        used during finetuning. In the 5-shot and 10-shot
Human Evaluation We engaged 3 graduated indi-             setting, 5 and 10 dialogues corresponding to a per-
viduals to evaluate 50 responses for each model us-       sona was used to finetune the model parameters re-
ing 3 criteria: persona consistency, fluency and con-     spectively. Then, the model tasked with generating
textual coherence. Consistency reflects the amount        the response corresponding to the same persona.
of persona information corresponding to the per-          Samples of the dialogue responses generated by
sona statements are reflected in the generated re-        each model is provided in Table 1. Table 2 and
sponse. Fluency accounts for any grammatical,             3 depicts the results of automatic evaluation in a
spelling and phrasing issues, while Coherence re-         10-shot and 5-shot setting respectively, while Table
flects the appropriateness of the dialogue with re-       4 and 5 depicts the results of the human evaluation
spect to the dialogue history. Responses were as-         in a 10-shot and 5-shot setting respectively.
signed a rating from -1 to 1 for each criteria. For
consistency, -1 indicates contradiction, 0 indicates      3.5   Results & Discussion
neutrality and 1 indicates persona consistency (i.e.
the response accurately reflects information from         M T M Lα generally achieves higher C-scores and
the corresponding persona statements). For coher-         Consistency scores compared to PAML, indicating
ence, the individuals were told to assign a score of      responses which incorporate a larger amount of
-1 for incoherent, contextually illogical responses,      persona information. This confirms our hypothesis
0 for a moderate coherent responses and 1 for co-         that the introduction of the persona reconstruction
herent, contextual responses. Finally, for fluency,       task during meta-learning would result in a model
-1 is assigned to responses with several fluency is-      which induces the persona from the dialogue con-
sues, 0 for responses with one or two fluency errors      text to a larger extent. However, it can be seen that
and 1 for perfectly fluent responses.                     PPL increases as α decreases. Since PPL scores
                                                          have been found to correlate negatively with hu-
3.4     Experimental Settings                             man likeness (Adiwardana et al., 2020), a high PPL
                                                          score is undesirable. This finding is supported by
We benchmark MTML/AMTML with the Per-                     the human evaluation results, where the Fluency
sonaChat corpus. We train the following Trans-            and Coherence scores drop as α increases in both
former models:                                            the 5-shot and 10-shot settings. This implies that
   Std: We pretrain a standard Transformer model          there is a trade-off between general fluency and per-
with only the dialogue context as input.                  sona consistency in the generated response. During
   Stdp : We pretrain a standard Transformer model        meta-learning, if α is too large, the combined loss
with both the dialogue context and persona state-         can be effectively reduced by minimizing the per-
ments as inputs.                                          sona reconstruction loss. Hence, the model would
   P AM L: We pretrain a standard Transformer             be trained to generate responses which contain as
via PAML (Madotto et al., 2019).                          much persona information as possible without con-
   M T M Lα : We pretrain a standard Transformer          sidering fluency or context. While this would re-
trained via MTML (Section 3.2) where α = [0.9,            sult in persona consistent responses, the responses
0.8, 0.7, 0.6, 0.5].                                      would be largely incoherent and unnatural.Since
   AM T M L: We pretrain a standard Transformer           α = 0.8 strikes a balance between the PPL and
via AMTML (Section 3.3).                                  C-scores, we conclude that the optimal value of
   During testing, the pretrained models described        α = 0.8.
above were further trained, or finetuned, on di-             Both M T M L0.8 and AM T M L improved the
alogues corresponding to personas from the test           persona consistency of the generated responses.
set Dtest . As highlighted earlier, the models were       In terms of C-score, M T M L0.8 demonstrated a
finetuned using only the dialogue context, which          78.9%(10-shot) and 100%(5-shot) improvement
was constructed by concatenating all previous ut-         over P AM L. In terms of persona consistency,
terances in the dialogue. For our experiment, the         M T M L0.8 demonstrated a 64.0%(10-shot) and

Automatic        PPL          BLEU       C-score
                    PPL     BLEU      C-score
                                                      P 2 Bot          18.1          0.61       0.33
      Std          35.87     0.93      0.00
                                                      Human         Consistency    Fluency    Coherence
     Stdp          38.72     1.66      0.10
                                                      P 2 Bot          0.39          0.91       0.43
    P AM L         41.80     0.71      0.19
   M T M L0.5      77.32     0.53      0.46          Table 6: Automatic and human evaluation results at-
   M T M L0.6      57.10     0.53      0.41          tained by P 2 Bot
   M T M L0.7      52.44     0.57      0.47
   M T M L0.8      43.28     0.42      0.34
   M T M L0.9      40.39     0.71      0.21        15.4%(5-shot) improvement over P AM L. Sim-
   AM T M L        48.66     0.48      0.29        ilarly, compared to P AM L, AM T M L achieved
                                                   a 52.6%(10-shot) and 73.3%(5-shot) improvement
 Table 2: Automatic evaluation results (10-shot).  when it comes to C-score. In terms of Consistency,
                                                   AM T M L demonstrated a 68.0%(10-shot) and
     Method       PPL BLEU C-score                 76.9%(5-shot)     improvement over P AM L. Com-
       Std       36.75      1.02        -0.02      pared   to M  T M  L0.8 , in both the 10-shot and 5-
       Stdp      38.78      1.79         0.09      shot settings, AM T M L achieved similar Consis-
     P AM L      40.46      0.65         0.15      tency scores and slightly lower C-scores. How-
   M T M L0.5 76.38         0.41         0.50      ever, while M T M L0.8 is comparable to P AM L
   M T M L0.6 55.19         0.53         0.48      with  regard to Fluency and PPL, AM T M L outper-
   M T M L0.7 50.69         0.44         0.45      formed   P AM L, M T M L0.8 and all other MTML
   M T M L0.8 41.42         0.38         0.30      variants in terms of Fluency. When it comes to co-
   M T M L0.9 39.94         0.62         0.13      herence, P AM L, M T M L0.8 and AM T M L gen-
   AM T M L 44.90           0.42         0.26      erally achieved similar results.
                                                      Based on the results attained, while responses
 Table 3: Automatic evaluation results (5-shot).   generated   via MTML has a slight edge in terms
                                                   of persona consistency, responses generated via
                                                   AMTML are more fluent. On a side note, it should
              Consistency Fluency Coherence
                                                   also be highlighted that the BLEU score did not cor-
   Std            0.16          0.92          0.24
                                                   relate with any aspect of human evaluation. This
  Stdp            0.19          0.89          0.29
                                                   further emphasizes the unsuitability of the BLEU
 P AM L           0.25          0.77          0.28
                                                   score as an evaluation metric for dialogue genera-
M T M L0.5        0.47          0.24          0.20
                                                   tion (Liu et al., 2016).
M T M L0.6        0.45          0.57          0.27
M T M L0.7        0.45          0.60          0.30 3.5.1 PersonaChat SOTA Comparison
M T M L0.8        0.41          0.80          0.32 Additionally, we compare our proposed frame-
M T M L0.9        0.39          0.88          0.17 works with the current state-of-the-art framework
AM T M L          0.42          0.89          0.33 for PersonaChat: P 2 Bot (Song et al., 2020a).
   Table 4: Human evaluation results (10-shot).    P 2 Bot involves finetuning the GPT pretrained lan-
                                                   guage model on the training set via a transmitter
                                                   receiver framework. This framework models the
              Consistency Fluency Coherence user’s perception of the other party’s persona in
   Std           0.10          0.87           0.20 addition to the user’s own persona. Hence, unlike
  Stdp           0.09          0.89           0.21 MTML and AMTML, in the case of P 2 Bot, per-
 P AM L          0.13          0.84           0.27 sona statements are provided to the model along
M T M L0.5       0.22          0.03          -0.12 with the dialogue context during inference and test-
M T M L0.6       0.24          0.65           0.15 ing.
M T M L0.7       0.29          0.65           0.13    From Table 6, it can be observed that the C-
M T M L0.8       0.22          0.78           0.29 score attained by M T M L0.8 , in the 10-shot set-
M T M L0.9       0.15          0.77           0.19 ting, was comparable to P 2 Bot. When it comes
AM T M L         0.23          0.85           0.20 to the Consistency score, in the 10-shot setting,
   Table 5: Human evaluation results (5-shot).     both M T M L0.8 and AM T M L outperformed
                                                   P 2 Bot. This implies that the responses generated

by M T M L0.8 and AM T M L generally reflect the         Meta-learning Meta-learning involves teaching
corresponding persona information to a greater ex-       models how to learn efficiently and quickly. There
tent compared to P 2 Bot despite not being provided      are 3 broad categories of meta-learning algorithms:
the persona statements during testing. However, in       optimization-based (Finn et al., 2017; Nichol et al.,
terms of Fluency and Coherence, P 2 Bot still out-       2018), metric-based (Snell et al., 2017; Vinyals
performs both M T M L0.8 and AM T M L. This              et al., 2016), and model-based (Mishra et al., 2018;
could be partially attributed to the use of the GPT      Santoro et al., 2016). Optimization-based meta-
pretrained model, which enhanced the overall qual-       learning approaches, which involve directly updat-
ity of the generated responses.                          ing the model’s parameters to allow for rapid adap-
                                                         tation to unseen tasks, have been applied to various
3.5.2 Persona Reconstruction
                                                         dialogue tasks. Examples of such applications in-
In this section, we will provide a brief discussion      clude task-oriented dialogue generation (Mi et al.,
regarding the persona reconstruction task. Based         2019; Dai et al., 2020; Peng et al., 2020), domain
on the results attained, it is clear that the intro-     adaptation (Qian and Yu, 2019) and dialogue state
duction of the persona reconstruction task during        tracking (Peng et al., 2020; Huang et al., 2020).
meta-learning further incentivizes the model to in-
                                                         Personalized Dialogue Generation There are nu-
corporate more persona information in the gener-
                                                         merous forms of personalized dialogue generation.
ated responses. Under the proposed frameworks, it
                                                         The form covered in this paper requires leveraging
is challenging to evaluate the performance of the
                                                         both the persona information and dialogue context.
model solely on the persona reconstruction task.
                                                         Another form of personalized dialogue generation
However, based on the observed responses and
                                                         involves conditioning the response on external pro-
loss values, persona reconstruction tend to be more
                                                         file/identity information. Thus far, many different
successful when a longer dialogue context x1:t−1
                                                         architectures (Wu et al., 2020; Song et al., 2019;
(greater number of turns) is provided. This is ex-
                                                         Wolf et al., 2019; Kottur et al., 2017; Joshi et al.,
pected as a short dialogue context would not con-
                                                         2017) and training frameworks (Song et al., 2020a;
tain sufficient persona information for the model to
                                                         Liu et al., 2020; Zheng et al., 2019; Su et al., 2019b)
                                                         which involve utilizing the encoded persona/ or per-
   Persona reconstruction is a interesting and chal-
                                                         sonality information and dialogue context as input
lenging task that could be explored in future work.
                                                         have been proposed. For certain dialogue corpora
For this task, the model has to successfully infer the
                                                         such as DailyDialog (Li et al., 2017) and PER-
persona from the dialogue context as well as ensure
                                                         SONALDIALOG (Zheng et al., 2020), the persona
the fluency of the generated description. Finetuning
                                                         descriptions are not provided. Instead, a represen-
pretrained language models would be a good start-
                                                         tation of the interlocutor’s personality should be
ing point for future work. Also, to prevent a mix-up
                                                         inferred from the dialogue history.
between the personas from each interlocutor, only
the dialogue utterances of the corresponding in-
terlocutor should be utilized during training and        5   Conclusion
                                                         In this work, we proposed MTML and AMTML, 2
4   Related Work
                                                         meta-learning frameworks which adopt our multi-
Multi-task Learning Multi-task learning broadly          task learning approach involving the addition of
refers to the process of learning more than              a persona reconstruction task. Empirical results
one tasks/objectives concurrently with a shared          demonstrate that both MTML and AMTML effec-
model. In addition to personalized dialogue              tively increases the amount of persona information
generation, multi-task learning has been applied         reflected in the generated dialogue responses com-
to task-oriented dialogue subtasks including re-         pared to prior work. However, there is still room
sponse generation(Zhu et al., 2019), dialogue state      for improvement when it comes to the fluency and
tracking(GM and Sengupta, 2019; Trinh et al.,            contextual coherence of the generated responses.
2018; Rastogi et al., 2018), dialogue act selection      Future work could involve improving these aspects
(McLeod et al., 2019), as well as conditional open-      of the responses by incorporating pretrained lan-
domain dialogue generation(Zeng and Nie, 2021;           guage models in meta-learning framework.
Ide and Kawahara, 2021).

