CONTROL PREFIXES for Text Generation

Page created by Linda Contreras
 
CONTINUE READING
C ONTROL P REFIXES for Text Generation
                                                  Jordan Clive                             Kris Cao                         Marek Rei
                                             Imperial College London                  DeepMind, London, UK            Imperial College London

                                                              {jordan.clive19,marek.rei}@imperial.ac.uk
                                                                         kriscao@deepmind.com

                                                                Abstract                           existence of parameter efficient alternatives.
                                                                                                      Many researchers have sought to alleviate these
                                             Prompt learning methods adapt pre-trained
                                             language models to downstream applications
                                                                                                   issues by using fixed-LM techniques, where all
                                             by using a task-specific prompt together with         the parameters of the base LM always remain un-
arXiv:2110.08329v1 [cs.CL] 15 Oct 2021

                                             the input. Most of the current work on prompt         changed. An ever-growing subsection of these
                                             learning in text generation relies on a shared        methods can be classed under prompt learning,
                                             dataset-level prompt for all examples in the          where language models are adapted to downstream
                                             dataset. We extend this approach and pro-             tasks with the aid of a prompt accompanying the
                                             pose a dynamic method, C ONTROL P REFIXES,            input. A recent survey on prompt learning (Liu
                                             which allows for the inclusion of conditional
                                                                                                   et al., 2021a), however, notes the dearth of re-
                                             input-dependent information in each prompt.
                                             C ONTROL P REFIXES is at the intersection of          search exploring dynamic prompts, which are input-
                                             prompt learning and controlled generation, em-        dependent. This work considers dynamic prompts,
                                             powering the model to have finer-grained con-         and is inspired by how traditional controlled genera-
                                             trol during text generation. The method in-           tion methods utilize controllable attributes to gener-
                                             corporates attribute-level learnable representa-      ate target sentences with desired qualities. Existing
                                             tions into different layers of a pre-trained trans-   controlled generation techniques either aim to gen-
                                             former, allowing for the generated text to be
                                                                                                   erate text with specific target qualities, independent
                                             guided in a particular direction. We provide
                                             a systematic evaluation of the technique and
                                                                                                   of overall task performance or are methods that
                                             apply it to five datasets from the GEM bench-         have the benefit of updating not only the attribute-
                                             mark for natural language generation (NLG).           level parameters, but adjusting, at the same time,
                                             We present state-of-the-art results on several        all the LM parameters.
                                             data-to-text datasets, including WebNLG.                 We propose the dynamic prompting method
                                                                                                   C ONTROL P REFIXES, which extends prefix-tuning.
                                         1   Introduction                                          The prefix-tuning method integrates static task-
                                         Recently, approaches in text generation have been         specific prompts at every layer of a model, adding
                                         dominated by adapting one large-scale, pre-trained        only 0.1–2% additional parameters to the base LM.
                                         language model (PLM) to various downstream                With C ONTROL P REFIXES we aim to preserve the
                                         tasks. Such adaptation is often performed via fine-       fixed-LM property, while also allowing datapoint-
                                         tuning, which necessitates updating and storing all       specific attributes to act as guidance signals at the
                                         of the parameters, resulting in multiple new lan-         input-level. This is done by employing modular
                                         guage models (LMs), one for each task. This poses         control prefixes, which change alongside the in-
                                         a considerable deployment challenge as the scale          put according to the guidance signal. Operating
                                         of PLMs continues to climb from millions to bil-          together with the static prompt parameters, these
                                         lions of parameters. Moreover, full fine-tuning has       dynamic prompts can steer the frozen PLM to ex-
                                         been shown to be unnecessarily profligate through         tend finer-grained control. The chosen attributes
                                         overwriting natural language understanding (NLU)          can provide additional information about the input,
                                         that could otherwise be shared among tasks (Peters        for example the domain of a data-to-text tripleset,
                                         et al., 2019); it has also been shown that fine-tuned     or it can specify some aspect of the desired output,
                                         networks do not deviate substantially from the pre-       such as the target length for text simplification.
                                         trained one in parameter space (Aghajanyan et al.,           We evaluate our method on an array of text gen-
                                         2020; Radiya-Dixit and Wang, 2020), implying the          eration tasks, leveraging additional input-level in-
formation specific to each dataset. Our results show     tribute (Nguyen et al., 2016; Dathathri et al., 2020).
that our fixed-LM architecture outperforms previ-        These methods are fixed-LM and are able to con-
ous approaches, usually based on fine-tuning, ac-        trol target qualities such as sentiment and topic.
cording to the WebNLG (Gardent et al., 2017),            However, they are slow at inference time due to
DART (Radev et al., 2020) and E2E Clean (Dušek           requiring multiple passes for a single batch. The
et al., 2019) data-to-text datasets. In addition,        shift in conditional probability can also lead to text
our method attains higher human-assessed perfor-         degeneration (Holtzman et al., 2019).
mance than existing systems for summarization.
                                                         Dynamic prompts There have been few works
This work establishes that the parameters learnt,
                                                         exploring dynamic prompts (Liu et al., 2021a;
corresponding to similar labels of a single attribute,
                                                         Tsimpoukelli et al., 2021), which are input-
share properties. We also demonstrate that zero-
                                                         dependent. Perhaps most similar to our work is
shot learning with C ONTROL P REFIXES can be
                                                         work by Yu et al. (2021), who use an attribute align-
effective for conditioning on input-level informa-
                                                         ment function to form dynamic prompts. Unlike
tion previously unseen during training.
                                                         our work, the prompt does not have a static compo-
2   Related Work                                         nent and aims to generate text with specific target
                                                         attributes, independent of task performance. With
Prompt Learning Prompt learning (Liu et al.,             C ONTROL P REFIXES, the intention is to also maxi-
2021a; Sun et al., 2021; Schick and Schutze, 2021)       mize task-specific performance, which is why we
is a nascent field, instigated by the arrival of GPT-3   maintain a large static prompt component to specify
(Brown et al., 2020), involving task-specific adap-      the task itself.
tation of large LMs via prepending an instruc-
                                                         Auxiliary scaffold tasks Incorporating auxiliary
tion. Several successive works (Logeswaran et al.,
                                                         scaffold tasks via multitask learning has been pre-
2020; Liu et al., 2021b; Lester et al., 2021) employ
                                                         viously used for improving span-labeling and text
prompt-embedding tuning, which trains continuous
                                                         classification (Swayamdipta et al., 2018; Cohan
embeddings prepended to the input embeddings. Li
                                                         et al., 2019). Cachola et al. (2020) demonstrate
and Liang (2021) discovered that prefix-tuning was
                                                         that control tokens can be used to effectively in-
more effective than prompt-embedding tuning for
                                                         corporate scaffold tasks alongside the main task
text generation. In prefix-tuning, additional train-
                                                         for BARTLARGE . Inspired by this form of data
able key-value pairs, which are fixed across all
                                                         augmentation, we apply a similar procedure with
examples, are used to augment the left context in
                                                         C ONTROL P REFIXES when training on DART, a
every attention computation. Therefore, the prompt
                                                         dataset formed from an accumulation of heteroge-
has constituents at every layer rather than being
                                                         neous sub-datasets.
confined to steer the frozen LM only through the
input as in embedding tuning.                            3     C ONTROL P REFIXES
Controlled generation A complementary field
to prompt learning is controlled generation, which       3.1    Background
aims to incorporate various types of guidance (e.g.      This work considers sequence-to-sequence tasks
length specifications (Kikuchi et al., 2016) or high-    where the objective is to model the conditional
lighted phrases (Grangier and Auli, 2018) beyond         probability P (Y | X) with X and Y representing
the input text into the generation model. Johnson        the tokenized input and output sequences respec-
et al. (2016) successfully trained a multilingual        tively. For example, in summarization, X is an
translation model with control tokens to encode          article and Y is a short target summary.
each language. Keskar et al. (2019) pre-trained a           To model P (Y | X), this paper adopts T5-large
1.63B parameter model, also alongside conditional        (Raffel et al., 2020) or BARTLARGE (Lewis et al.,
control tokens, and demonstrated these learnt to         2020) as the underlying pre-trained LM with pa-
govern style, content, and task-specific behaviour.      rameters φ; and as we consider fixed-LM meth-
However, these examples are undesirable in not           ods, φ always remains frozen. These models
being fixed-LM techniques—the whole underlying           are Transformer encoder-decoder models where
LM can adapt alongside the control tokens.               decoding proceeds auto-regressively. Let us denote
   Alternatives exist, such as plug-and-play pertur-     d to represent the hidden state dimension and L
bations of the LM hidden states towards a target at-     the number of layers. We use (E, Dc, Dm) to de-
Prefix-tuning                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        General Task Prefix                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Control Prefixes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Control Prefixes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            General Task Prefix
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            (400k - 8M params)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  (70k - 400k params each)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        (400k - 8M params)
        Single Task Batch                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Single Task Batch
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  P
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        CA CB CC
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        AAAB7nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstERpLDGRjwQuZG9ZYMPe3mV3zoRc+BE2Fhpj6++x89+4wBUKvmSSl/dmMjMviKUw6LrfTm5jc2t7J79b2Ns/ODwqHp+0TJRoxpsskpHuBNRwKRRvokDJO7HmNAwkbweT+txvP3FtRKQecRpzP6QjJYaCUbRSu17up7ezcr9YcivuAmSdeBkpQYZGv/jVG0QsCblCJqkxXc+N0U+pRsEknxV6ieExZRM64l1LFQ258dPFuTNyYZUBGUbalkKyUH9PpDQ0ZhoGtjOkODar3lz8z+smOLzxU6HiBLliy0XDRBKMyPx3MhCaM5RTSyjTwt5K2JhqytAmVLAheKsvr5NWteJdVaoP1VLtLosjD2dwDpfgwTXU4B4a0AQGE3iGV3hzYufFeXc+lq05J5s5hT9wPn8AUfeO5Q==

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              AAAB7nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTQWGIiHwlcyN4ywIa9vcvungm58CNsLDTG1t9j579xgSsUfMkkL+/NZGZeEAuujet+O7mt7Z3dvfx+4eDw6PikeHrW1lGiGLZYJCLVDahGwSW2DDcCu7FCGgYCO8G0sfA7T6g0j+SjmcXoh3Qs+YgzaqzUaZQHaX1eHhRLbsVdgmwSLyMlyNAcFL/6w4glIUrDBNW657mx8VOqDGcC54V+ojGmbErH2LNU0hC1ny7PnZMrqwzJKFK2pCFL9fdESkOtZ2FgO0NqJnrdW4j/eb3EjO78lMs4MSjZatEoEcREZPE7GXKFzIiZJZQpbm8lbEIVZcYmVLAheOsvb5J2teLdVKoP1VKtnsWRhwu4hGvw4BZqcA9NaAGDKTzDK7w5sfPivDsfq9ack82cwx84nz9TfY7m         AAAB7nicbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBlMYxnBfEByhL3NXrJkb+/YnRPCkR9hY6GIrb/Hzn/jJrlCEx8MPN6bYWZekEhh0HW/nY3Nre2d3cJecf/g8Oi4dHLaNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMGnO/88S1EbF6xGnC/YiOlAgFo2ilTqMyyBqzyqBUdqvuAmSdeDkpQ47moPTVH8YsjbhCJqkxPc9N0M+oRsEknxX7qeEJZRM64j1LFY248bPFuTNyaZUhCWNtSyFZqL8nMhoZM40C2xlRHJtVby7+5/VSDG/9TKgkRa7YclGYSoIxmf9OhkJzhnJqCWVa2FsJG1NNGdqEijYEb/XlddKuVb3rau2hVq7f5XEU4Bwu4Ao8uIE63EMTWsBgAs/wCm9O4rw4787HsnXDyWfO4A+czx9VA47n

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     P
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            i) guidance                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        ii) control prefixes
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               hX, Y i                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               hX, Y, Gi

          P                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      P
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     AAACCnicbVA9SwNBEN2L3/Hr1NJmNQgWIdxFQcughZYKxkRyIext5pIle3vH7pwQQmob/4qNhSK2/gI7/42bmEITHww83pthZl6YSmHQ876c3Nz8wuLS8kp+dW19Y9Pd2r41SaY5VHkiE10PmQEpFFRRoIR6qoHFoYRa2Dsf+bV70EYk6gb7KTRj1lEiEpyhlVruXiAhwkAy1ZFA60V6V6QXgRadLgZ6LLbcglfyxqCzxJ+QApngquV+Bu2EZzEo5JIZ0/C9FJsDplFwCcN8kBlIGe+xDjQsVSwG0xyMXxnSA6u0aZRoWwrpWP09MWCxMf04tJ0xw66Z9kbif14jw+i0ORAqzRAU/1kUZZJiQke50LbQwFH2LWFcC3sr5V2mGUebXt6G4E+/PEtuyyX/qFS+Pi5UziZxLJNdsk8OiU9OSIVckitSJZw8kCfyQl6dR+fZeXPef1pzzmRmh/yB8/EN94qZzg==

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               AAACB3icbVDLSgNBEJyNrxhfUY+CDAbBg4TdKOgx6MVjBPOQbAizk95kyOzsMtMrhJCbF3/FiwdFvPoL3vwbJ4+DJhY0FFXddHcFiRQGXffbySwtr6yuZddzG5tb2zv53b2aiVPNocpjGetGwAxIoaCKAiU0Eg0sCiTUg/712K8/gDYiVnc4SKAVsa4SoeAMrdTOH/oSQvQlU10JtHFK730tuj309URp5wtu0Z2ALhJvRgpkhko7/+V3Yp5GoJBLZkzTcxNsDZlGwSWMcn5qIGG8z7rQtFSxCExrOPljRI+t0qFhrG0ppBP198SQRcYMosB2Rgx7Zt4bi/95zRTDy9ZQqCRFUHy6KEwlxZiOQ6EdoYGjHFjCuBb2Vsp7TDOONrqcDcGbf3mR1EpF76xYuj0vlK9mcWTJATkiJ8QjF6RMbkiFVAknj+SZvJI358l5cd6dj2lrxpnN7JM/cD5/AI8qmR0=

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  CB
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               1
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       AAAB7nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTQWGIiHwlcyN4ywIa9vcvungm58CNsLDTG1t9j579xgSsUfMkkL+/NZGZeEAuujet+O7mt7Z3dvfx+4eDw6PikeHrW1lGiGLZYJCLVDahGwSW2DDcCu7FCGgYCO8G0sfA7T6g0j+SjmcXoh3Qs+YgzaqzUaZQHaX1eHhRLbsVdgmwSLyMlyNAcFL/6w4glIUrDBNW657mx8VOqDGcC54V+ojGmbErH2LNU0hC1ny7PnZMrqwzJKFK2pCFL9fdESkOtZ2FgO0NqJnrdW4j/eb3EjO78lMs4MSjZatEoEcREZPE7GXKFzIiZJZQpbm8lbEIVZcYmVLAheOsvb5J2teLdVKoP1VKtnsWRhwu4hGvw4BZqcA9NaAGDKTzDK7w5sfPivDsfq9ack82cwx84nz9TfY7m

           AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

          PAAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 CA
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  AAAB7nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstERpLDGRjwQuZG9ZYMPe3mV3zoRc+BE2Fhpj6++x89+4wBUKvmSSl/dmMjMviKUw6LrfTm5jc2t7J79b2Ns/ODwqHp+0TJRoxpsskpHuBNRwKRRvokDJO7HmNAwkbweT+txvP3FtRKQecRpzP6QjJYaCUbRSu17up7ezcr9YcivuAmSdeBkpQYZGv/jVG0QsCblCJqkxXc+N0U+pRsEknxV6ieExZRM64l1LFQ258dPFuTNyYZUBGUbalkKyUH9PpDQ0ZhoGtjOkODar3lz8z+smOLzxU6HiBLliy0XDRBKMyPx3MhCaM5RTSyjTwt5K2JhqytAmVLAheKsvr5NWteJdVaoP1VLtLosjD2dwDpfgwTXU4B4a0AQGE3iGV3hzYufFeXc+lq05J5s5hT9wPn8AUfeO5Q==

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 P
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      2

          PAAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            Pre-trained Model
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              (0.4B params)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 CC
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  AAAB7nicbVA9SwNBEJ3zM8avqKXNYiJYhbtYaBlMYxnBfEByhL3NXrJkb+/YnRPCkR9hY6GIrb/Hzn/jJrlCEx8MPN6bYWZekEhh0HW/nY3Nre2d3cJecf/g8Oi4dHLaNnGqGW+xWMa6G1DDpVC8hQIl7yaa0yiQvBNMGnO/88S1EbF6xGnC/YiOlAgFo2ilTqMyyBqzyqBUdqvuAmSdeDkpQ47moPTVH8YsjbhCJqkxPc9N0M+oRsEknxX7qeEJZRM64j1LFY248bPFuTNyaZUhCWNtSyFZqL8nMhoZM40C2xlRHJtVby7+5/VSDG/9TKgkRa7YclGYSoIxmf9OhkJzhnJqCWVa2FsJG1NNGdqEijYEb/XlddKuVb3rau2hVq7f5XEU4Bwu4Ao8uIE63EMTWsBgAs/wCm9O4rw4787HsnXDyWfO4A+czx9VA47n

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 P
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      3
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Pre-trained Model
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  (0.4B params)
          PAAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        4
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  CB                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   AAAB7nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTQWGIiHwlcyN4ywIa9vcvungm58CNsLDTG1t9j579xgSsUfMkkL+/NZGZeEAuujet+O7mt7Z3dvfx+4eDw6PikeHrW1lGiGLZYJCLVDahGwSW2DDcCu7FCGgYCO8G0sfA7T6g0j+SjmcXoh3Qs+YgzaqzUaZQHaX1eHhRLbsVdgmwSLyMlyNAcFL/6w4glIUrDBNW657mx8VOqDGcC54V+ojGmbErH2LNU0hC1ny7PnZMrqwzJKFK2pCFL9fdESkOtZ2FgO0NqJnrdW4j/eb3EjO78lMs4MSjZatEoEcREZPE7GXKFzIiZJZQpbm8lbEIVZcYmVLAheOsvb5J2teLdVKoP1VKtnsWRhwu4hGvw4BZqcA9NaAGDKTzDK7w5sfPivDsfq9ack82cwx84nz9TfY7m

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 P
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      4

          PAAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        5
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 CA
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  AAAB7nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstERpLDGRjwQuZG9ZYMPe3mV3zoRc+BE2Fhpj6++x89+4wBUKvmSSl/dmMjMviKUw6LrfTm5jc2t7J79b2Ns/ODwqHp+0TJRoxpsskpHuBNRwKRRvokDJO7HmNAwkbweT+txvP3FtRKQecRpzP6QjJYaCUbRSu17up7ezcr9YcivuAmSdeBkpQYZGv/jVG0QsCblCJqkxXc+N0U+pRsEknxV6ieExZRM64l1LFQ258dPFuTNyYZUBGUbalkKyUH9PpDQ0ZhoGtjOkODar3lz8z+smOLzxU6HiBLliy0XDRBKMyPx3MhCaM5RTSyjTwt5K2JhqytAmVLAheKsvr5NWteJdVaoP1VLtLosjD2dwDpfgwTXU4B4a0AQGE3iGV3hzYufFeXc+lq05J5s5hT9wPn8AUfeO5Q==

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 P
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 AAAB6nicbVA9TwJBEJ3DL8Qv1NJmI5hYkTsstCTaWGIUJIEL2Vv2YMPe3mV3zoRc+Ak2Fhpj6y+y89+4wBUKvmSSl/dmMjMvSKQw6LrfTmFtfWNzq7hd2tnd2z8oHx61TZxqxlsslrHuBNRwKRRvoUDJO4nmNAokfwzGNzP/8YlrI2L1gJOE+xEdKhEKRtFK99VmtV+uuDV3DrJKvJxUIEezX/7qDWKWRlwhk9SYrucm6GdUo2CST0u91PCEsjEd8q6likbc+Nn81Ck5s8qAhLG2pZDM1d8TGY2MmUSB7YwojsyyNxP/87ophld+JlSSIldssShMJcGYzP4mA6E5QzmxhDIt7K2EjaimDG06JRuCt/zyKmnXa95FrX5XrzSu8ziKcAKncA4eXEIDbqEJLWAwhGd4hTdHOi/Ou/OxaC04+cwx/IHz+QNh2I0y

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      5

Figure 1: High-level diagram contrasting prefix-tuning and C ONTROL P REFIXES in the single-task setup for a
PLM such as BARTLARGE . The same single-task batch (examples 1,2,3,4 and 5) is considered for both setups.
Left: Prefix-tuning has one general prefix P for all examples. Right: C ONTROL P REFIXES utilizes additional
attribute information at the input-level, G, in i). This conditional information is used in ii) to dictate which control
prefix (CA , CB , CC ) to use for a particular example in a batch. This takes advantage of prefix-tuning’s capacity to
include different prefixes in one forward pass.

note the three classes of attention present in each                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   remains static, and train at the same time Cθ
layer: self-attention in the encoder (E), decoder                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     ("attribute-level parameters"): a set of prefixes
cross-attention (Dc) and decoder masked-attention                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     that change depending on the input. This requires
(Dm). For an attention computation in the l-th                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        attribute-level information or guidance G, to indi-
layer, the query, key and value matrices are de-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      cate which control prefixes to be used while pro-
noted Ql ∈ RN ×d , and Kl , Vl ∈ RM ×d , where                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             X.j Let
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      cessing            us consider the parallel corpus
N is the number of tokens in the series relating                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      Z = X , Y , Gj j=1,..,N , where Gj indicates
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     j

to queries, and M is the number of tokens in the                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      all the conditional attribute-level information for
series relating to keys and values.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   the sample j. The goal is to optimize through gradi-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ent descent the final inference parameters, θ, whilst
3.2   Intuition                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       the underlying φ parameters of the pre-trained LM
We believe having fixed PLM parameters that cap-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      remain frozen:
ture broad natural language understanding, shared                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              N
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               X
task-specific parameters which specify the task it-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   θ∗ = arg max                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    log p Y j | X j , Gj ; Pθ , Cθ , φ .
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
self, and attribute-level parameters which integrate                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         θ
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               j=1
input-level information has a range of benefits. The                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              (1)
general task-specific parameters, which channel                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       General Prefix          For each attention class
the frozen LM to carry out the overall task, can                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      (E, Dc, Dm), a distinct prefix of key-value pairs
themselves adapt to modular control prefixes which                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    is learnt, P = {P1 , . . . , PL }, where Pl ∈
change according to the guidance signal, for each                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Rρ×2d ∀l ∈ {1, . . . , L}. P ∈ Rρ×2dL and ρ is
input X. This demarcation of parameters enables                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       the prompt length, i.e. the number of additional
fine-grained control to be extended to aid perfor-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    key-value pairs in each attention computation. In
mance on a downstream task. C ONTROL P REFIXES                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        prefix-tuning3 , for an attention computation in the
is, therefore, able to leverage input-level informa-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  l-th layer, Kl and Vl are augmented to become
tion while being a fixed-LM, parameter efficient1
method. For this work, we only consider attributes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Kl0 = [Pl,K ; Kl ] , Vl0 = [Pl,V ; Vl ]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           (2)
as guidance signal which are made up of discrete
labels.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               where Kl0 , Vl0 ∈ R(ρ+M )×d .    The overall
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      general
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       E Dc  prefix, parameterized by θ, is Pθ =
3.3   Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      P , P , P Dm , where Pθ ∈ Rρ×6dL .
The idea is to have a general task prefix Pθ ("task-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      2
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           We discuss cases where G is not present in §6.2.
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          3
specific parameters"), as in prefix-tuning which                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           There has been confusion in recent work concerning dif-
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ferent forms of prefix-tuning (Li and Liang, 2021). For details
   1
     We use the term parameter efficient to denote methods                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            and observations of the benefits (previously unremarked upon)
adding
Control Prefixes Let us consider one attribute                      processing of the input sequence X than P Dm and
with R possible labels4 , such as the news do-                      CrDm . This is due to being formed from the shared
main of an article (e.g. sport, technology etc.),                   mapping MLPE .
Cθ = {Cθ,1 , . . . , Cθ,R }, where Cθ,r ∈ Rρc ×6dL ,
∀r ∈ {1 . . . .R}. Cθ,r represents the control prefix               4       Experimental Setup
learnt for the r-th attribute label and the parameter               4.1      Datasets, Guidance and Metrics
ρc denotes the control prompt length for this par-
ticular attribute. Let A be a function which returns                Data-to-text The objective of data-to-text gen-
the corresponding control prefix for the attribute                  eration is to produce fluent text from structured
label indicated by G. In C ONTROL P REFIXES the                     input, viz. a tripleset (a set of subject-predicate-
Kl and Vl are augmented to become                                   objects). As in Li and Liang (2021), we elect to
                                                                    evaluate on the data-to-text datasets DART and
              Kl00 = [A(G)l,K ; Pl,K ; Kl ] ,
                                                            (3)     WebNLG. However, we implement prefix-tuning
                 Vl00 = [A(G)l,V ; Pl,V ; Vl ]                      for T5-large rather than GPT-2; for T5-large pro-
                                                                    vides a much stronger baseline and enables compar-
where Kl00 , Vl00 ∈ R(ρc +ρ+M )×d .
                                                                    ison with state-of-the-art (SOTA) systems.6 Results
Shared Re-parameterization            As in Li and                  are also reported on E2E Clean, a dataset solely fo-
Liang (2021), optimization is stabilized by an                      cused on the restaurant domain. We use the official
increase in the trainable parameters. However,                      evaluation scripts and report a selection of BLEU
rather than one network, we use three distinct                      (Papineni et al., 2002), METEOR (Lavie and Agar-
two-layered large feed-forward neural networks for                  wal, 2007), and TER (Snover et al., 2006) for each
each attention class, applied row-wise. For each                    dataset.7
attention class (E, Dc, Dm), P = MLP(P̃ ) where                        WebNLG contains triplesets from DBPedia
P̃ ∈ Rρ×d is smaller than the matrix P ∈ Rρ×2dL ,                   (Auer et al., 2007). The test set is divided into
and each MLP has an intermediate dimension k                        two partitions: Seen, which contains 10 DBpedia
which we set to 800. The distinct MLPs and each                     categories present in the training set, and Unseen,
P̃ are parameterized by training parameters θ̃; thus,               which covers 5 categories never seen during train-
θ is a function of θ̃ and |θ| < |θ̃|. Once training                 ing.8 These categories, such as Airport or Food are
is complete, the final θ parameters can be saved                    used as guidance signal in our experiments (indi-
for use at inference and the re-parameterization                    cated by A1 in Table 1); our approach for unseen
parameters dispensed with.                                          categories is discussed in §6.2.
   As described for the general prefix, Pθ , each con-                 Providing the category explicitly as guidance
trol prefix, Cθ,r , comprises three
                                  constituents    for              with C ONTROL P REFIXES may enable inductive bi-
                                     E
each attention class: Cθ,r = Cr , Cr , CrDc     Dm    .             ases relating to properties of triples belonging to a
The re-parameterization of Cθ,r occurs in exactly                   specific WebNLG category to be captured more
the same manner as Pθ , sharing the same MLPE ,                     effectively. This idea is encouraged by studies
MLPDc and MLPDm . When using a disjoint set of                      where there is a clear disparity in performance
re-parameterizations for the control prefixes, learn-               on different categories between different model
ing becomes unstable and performance degrades.5                     types (Moryossef et al., 2019; Castro Ferreira et al.,
   Recent work by Buhai et al. (2020) show that                     2020).
over-parameterization can smooth the optimization                      DART is an open-domain, multi-source corpus,
landscape. With this in mind, the three distinct                    with six sources: internal and external human an-
re-parameterizations compel each prefix element                     notation of both Wikipedia tables and WikiSQl; as
to coordinate control for the particular attention                  well as the two existing datasets WebNLG and E2E
class. For example, the rows of P E and CrE lie in a                Clean. Radev et al. (2020) revealed fine-tuning
vector space better coordinated for moderating the                      6
                                                                          BARTLARGE exhibits inferior performance to T5-large on
   4
      It is easy to see how the procedure can be generalized to     data-to-text; for example, 9.7 BLEU points lower on WebNLG
multiple attributes; we use up to four attributes and varying       Unseen (Ribeiro et al., 2020).
control prompt lengths.                                                 7
                                                                          Full results from the evaluation scripts including machine-
    5
      This also results in a significant increase in the number     learned metrics can be found in Appendix A.
                                                                        8
of training parameters θ̃. In contrast, with the methodology              Every training category label can be seen in Appendix
outlined, each additional control prefix relates to only an addi-   B, where we visualize control prefixes corresponding to each
tional dρc training parameters.                                     training category.
T5-large on the WebNLG dataset with only the hu-        tomary ROUGE scores (Lin, 2004), we submit our
man annotated portion of DART achieves SOTA             C ONTROL P REFIXES model outputs to the GENIE
performance, whilst using the whole DART dataset        external human evaluation framework (Khashabi
is not as effective. Nevertheless, this inspired the    et al., 2021)—where 300 instances are assessed
idea of using the six DART sub-dataset sources as a     across 5 intrinsic dimensions.
controllable attribute—represented by A2 in Table
1–as a data augmentation strategy.                      4.2    Architecture and Hyper-parameters

Simplification We use WikiLarge (Zhang and              All implementations in this study are built on top
Lapata, 2017) as training data and evaluate on          of the Transformers library (Wolf et al., 2020). As
the two benchmarks TurkCorpus (Xu et al., 2016)         T5 has relative position biases, we set these in all
and ASSET (Alva-Manchego et al., 2020). Both            layers pertaining to offsets where the key is part
benchmarks are composed of the same 2000 vali-          of a prefix to zero. For BARTLARGE we adapt the
dation source and 359 test source sentences. Mar-       original implementation (Li and Liang, 2021). For
tin et al. (2020) introduced ‘BARTLARGE with AC-        the data-to-text datasets, we follow Ribeiro et al.
CESS’, which is a fine-tuned BARTLARGE model            (2020) and linearize the triples, prepending the spe-
trained alongside control tokens to condition on        cial tokens , , and  before the subject,
four simplification-specific attributes, such as the    predicate, and object of an individual triple.10 We
length compression ratio (the length of the target      also prepend “translate Graph to English: ” to every
sequence relative to the source sequence). We use       input (Raffel et al., 2020).
the same controllable attributes in this work to di-       The general prompt length and each control
rectly compare with Martin et al. (2020) (Table         prompt length are architecture-specific parameters
2). The control ratios are discretized into bins of     that we vary to try and maximize performance on
fixed-width 0.05, capped to a maximum ratio of 2.       the validation set. We use gradient accumulation
At inference time, once the model has been trained      across batches to maintain an effective batch size
with these oracle controls, the control ratios are      above 64, a linear learning rate scheduler for all
set to desired values by tuning on the respective       models and beam-search decoding. The hyper-
validation set.                                         parameters we consider are principally the learning
   We report the non-learned metrics SARI (Xu           rate and the optimizer: AdamW (Loshchilov and
et al., 2016) and FKGL (Kincaid et al., 1975).9         Hutter, 2017) or AdaFactor (Shazeer and Stern,
Unlike previous studies, we also use the machine-       2018).11 We chose the checkpoint with the highest
learned Q&A metric QuestEval (Scialom et al.,           validation set score using BLEU for data-to-text,
2021) to assess our text simplification models.         SARI for simplification and ROUGE-2 for summa-
                                                        rization. For all tasks, we train our models on single
Summarization As in Li and Liang (2021),                Tesla V100-SXM2-16GB machines, with mixed
we report results on the XSum dataset (Narayan          precision for BARTLARGE based models (fp16) and
et al., 2018) using BARTLARGE . XSum com-               full precision for T5-large based models (fp32).
prises 226,711 British Broadcasting Corporation
(BBC) articles coupled with their single-sentence       5     Results
summaries—where each sample corresponds to a
unique URL. The URL contains information on             5.1    Data-to-Text
whether the sub-directory is from the BBC Sport or      For DART, both C ONTROL P REFIXES (A2 ) and
BBC News page (A1 in Table 3), and further sub-         prefix-tuning attain higher performance (Table 1)
directory information (A2 in Table 3, where A2          than the current SOTA—which is T5-large fined-
has 40 labels), for example (‘sport’, ‘formula1’) or    tuned (Radev et al., 2020)—by 1.29 and 0.54
(‘news’, ‘science’). The motivation for using this as   BLEU points respectively. This indicates C ON -
guidance is that different sub-directories are likely   TROL P REFIXES can extend control of the frozen
to share properties relating to how the information     T5-large more effectively than prefix-tuning.
is presented; journalists are also usually confined       The SOTA for WebNLG is a T5-large model
to one domain. In addition to reporting the cus-          10
                                                             The embeddings relating to these special tokens are the
   9                                                    only embeddings we train, as our work is focused on fixed-LM
    We use the FKGL and latest version of SARI imple-
mented in EASSE (Alva-Manchego et al., 2019) which is   methods.
                                                          11
used in Martin et al. (2020).                                Full details can be found in Appendix D.
φ%             DART               φ%            WebNLG           φ%       E2E Clean
                                               BLEU    METEOR     TER ↓           S        U       A            BLEU    METEOR

          T5-large fine-tuned            100   50.66        40       43    100   64.89    54.01   59.95   100   38.74          37.4
          SOTA                           100   50.66        40       43    100   65.82    56.01   61.44   100    43.6           39
          Prefix-tuning                  1.0   51.20      40.62   43.13    1.0   66.95    55.39   61.73   1.0   43.66          39.0
          C ONTROL P REFIXES (A1 )         -       -          -       -    1.4   67.32    55.38   61.94     -       -             -
        +Data: DART
          Prefix-tuning                  1.0   51.20      40.62   43.13    1.0   67.05    55.37   61.78   1.0   43.04          38.7
          C ONTROL P REFIXES (A2 )       1.1   51.95      41.07   42.75    1.0   66.99    55.56   61.83   1.0   44.15          39.2
          C ONTROL P REFIXES (A1 ,A2 )     -       -          -       -    1.4   67.15    56.41   62.27     -       -             -

Table 1: Data-to-text test set results reported on the respective official evaluation scripts. φ% denotes the %
of additional parameters to the number of fixed-LM parameters required at inference time. T5-large fine-tuned
results for WebNLG are from Ribeiro et al. (2020), and for E2E Clean are calculated from public model outputs
(Gehrmann et al., 2021). Several of the baseline results were only reported to the significant figures shown. A1
signifies models trained with control prefixes for the WebNLG category attribute, and A2 with control prefixes for
the DART sub-dataset source attribute. For WebNLG, S, U and A refer to BLEU scores for the Seen, Unseen and
All portions of the dataset. The DART results are reported on the official evaluation script for v1.1.1, the same
version as the official leaderboard. A C ONTROL P REFIXES model attains state-of-the-art results for each dataset12 .

fine-tuned on WebNLG and the human annotated                       comparing our C ONTROL P REFIXES to fine-tuned
portion of DART (Radev et al., 2020). C ONTROL                     ‘BARTLARGE with ACCESS’ there is comparable
P REFIXES achieves a 0.83 higher BLEU overall,                     performance in terms of SARI for ASSET, and bet-
and 1.33 on the Seen categories than this model.                   ter FKGL results. However on TurkCorpus, C ON -
Notably, C ONTROL P REFIXES (A1 ) outperforms                      TROL P REFIXES yields lower performance on av-
C ONTROL P REFIXES (A1 ,A2 ) on the Seen com-                      erage for SARI and FKGL. For text simplification,
ponent of the dataset, but does not generalize as                  Martin et al. (2020) indicate the gains from using
well to the unseen categories. We argue this il-                   the controllable attributes, as assessed by SARI and
lustrates the benefit of using both controllable at-               FKGL, are mostly due to being able to calibrate
tributes. The prefix-tuning model with additional                  the length ratio, with validation and test sets being
DART data, like the SOTA, is trained on only the                   drawn from the same distribution, as opposed to
human annotated portion and yields a minor perfor-                 the WikiLarge training distribution. We highlight
mance increase of 0.05 BLEU compared to prefix-                    the Gold Reference score for TurkCorpus, which
tuning solely trained on WebNLG. We believe this                   produces inferior results for SARI and FKGL com-
indicates that for fine-tuning, training on a comple-              pared to both guided models. The Gold Reference
mentary type of additional data allows the PLM                     result is computed via a leave-one-out scenario
to maintain more NLU by not over-fitting a nar-                    where each reference is evaluated against all others,
row distribution. Therefore the LM can generalize                  and then an average is taken.
better. Whilst for prefix-tuning, much of this gain
has already been realized by retaining the original                 5.3     Summarization
frozen parameters.                                                 Our research is not solely focused on parameter ef-
   The SOTA (Harkous et al., 2020) for E2E Clean                   ficiency, but more on the effectiveness of adapting
consists of a fine-tuned GPT-2 with a semantic                     an already parameter efficient, fixed-LM method
fidelity classifier trained on additional generated                (adding
φ%                   ASSET                                    TurkCorpus
                                                               SARI       FKGL ↓            QuestEval       SARI      FKGL ↓         QuestEval

                   Gold Reference                         -    44.87          6.49               0.63∗       40.04          8.77         0.66∗
                   BARTLARGE with ACCESS†               100    43.63          6.25               0.64∗       42.62          6.98         0.66∗
                   BARTLARGE fine-tuned                 100   39.91∗         7.73∗                   -      39.55∗         7.73∗             -
                   Prefix-tuning                        1.8     40.12          7.28                  -      39.06          7.28              -
                   C ONTROL P REFIXES                   1.8     43.58          5.97               0.64      42.32          7.74           0.66

Table 2: Simplification results on ASSET and TurkCorpus test sets. † This model is from Martin et al. (2020),
where the authors fine-tuned BARTLARGE model alongside control tokens for the four attributes. The C ONTROL
P REFIXES model is trained with control prefixes for these same four attributes. Prefix-tuning and C ONTROL
P REFIXES use BARTLARGE as the fixed LM. The ∗ denotes baseline results calculated in this study—the model
outputs of Martin et al. (2020) are publicly available. The BARTLARGE with ACCESS and C ONTROL P REFIXES
model are the average test set results over 5 random seeds.

                                             Human         Human         Human               Human              Human
                                      φ%                                                                                           R-1    R-2      R-L
                                             overall     conciseness     fluency         no-hallucination   informativeness

      BARTLARGE fine-tuned            100   0.49+0.03
                                                −0.04     0.50+0.03
                                                              −0.03     0.50+0.03
                                                                            −0.03           0.52+0.03
                                                                                                −0.03          0.49+0.03
                                                                                                                   −0.03      45.14∗     22.27∗   37.25∗
      Prefix-tuning                   3.0       -            -              -                   -                  -           43.53     20.66    35.63
                                                +0.03
      C ONTROL P REFIXES (A1 , A2 )   2.8   0.51−0.03    0.53+0.02
                                                             −0.02      0.51+0.03
                                                                            −0.03
                                                                                                +0.03
                                                                                            0.53−0.03          0.49+0.03
                                                                                                                   −0.03       43.81     20.84    35.81

Table 3: Summarization results on XSum13 . R-1, R-2 and R-L refer to ROUGE-1, ROUGE-2 and ROUGE-L. The
human-assessed results are from the GENIE benchmark, where the 95% confidence intervals are computed with
bootstrap re-sampling. Note the BARTLARGE fine-tuned results for the human-assessed dimensions are transcribed
from Khashabi et al. (2021), whilst the automatic metric results, indicated by ∗ , are from Lewis et al. (2020). Prefix-
tuning and C ONTROL P REFIXES (A1 ,A2 ) use BARTLARGE as the fixed LM. A1 refers to the BBC news/sport page
attribute and A2 the further sub-directory attribute.

   Table 3 shows that despite C ONTROL P REFIXES                              prefixes learnt as part of our simplification C ON -
underperforming fine-tuning according to auto-                                TROL P REFIXES model.15 We plot only the de-
matic metrics, C ONTROL P REFIXES attains higher                              coder self-attention constituent of each control pre-
human-assessed results. C ONTROL P REFIXES also                               fix (comprising multiple key-value pairs at each
holds the highest overall human evaluation ranking                            layer) as the length ratio directly concerns the tar-
on the GENIE platform (higher than T5-large and                               get.16 The relationship learnt by the control pre-
PEGASUSLARGE (Zhang et al., 2019) fine-tuned).                                fixes is very manifest—aided by the near uniform
This study is limited in not being able to compare                            distribution of length ratios in the WikiLarge train-
human-assessment of prefix-tuning, which yields                               ing dataset from 0 to 1.1.
slightly lower ROUGE scores than C ONTROL P RE -
FIXES, as participants of GENIE are limited to one
                                                                                 Fig. 2 establishes that for this simplistic at-
submission. The confidence intervals indicate that                            tribute, different control prefixes corresponding to
this result is not necessarily definitive—but it at                           similar attribute labels (i.e., varying length ratios
least highlights the problems with evaluation for                             for the length attribute) share properties. Inter-
XSum, and that the quality of generations in this do-                         estingly the decoder cross-attention of the control
main is not captured fully with ROUGE. A sample                               prefix is not as manifest. We believe this is due to
size of 300 is typically much larger than that where                          BARTLARGE being accustomed to the same cross-
authors construct their own evaluation (Narayan                               attention key-value pairs in each layer.
et al. (2018) use 50 and Dou et al. (2020) use 100).

6      Analysis                                                                     15
                                                                                    A perplexity of 5 is used for all plots.
                                                                                    16
                                                                                    Plots for the encoder and decoder cross-attention con-
6.1      Visualizing Control Prefixes                                         stituents can be seen found in Appendix E.
                                                                                 16
                                                                                    The public GENIE leaderboard is available
Fig. 2 displays t-SNE (Maaten and Hinton, 2008)                               at             https://leaderboard.allenai.org/
visualizations of the length compression control                              genie-xsum/submissions/public
0.0   examples relating to the unseen category Athlete.19
                                                                                                           0.05
                                                                                                                        Table 4 shows a comparison of using an out-of-
                                                                                                         0.1
                                                                                                  0.15               vocabulary (OOV) control prefix for each example
                                                                                           0.25
                                                                                                  0.2                with an unseen category, and the zero-shot transfer
                                                                                           0.3                       method for both WebNLG datasets20 . The OOV
                                                                                    0.35
                                                                              0.4
                                                                                                                     control prefix is trained on a random 2% of the data
                                                                      0.5
                                                                             0.45                                    for each accumulated batch. These results indicate
                                                                      0.55
                                                                                                                     that zero-shot transfer is more promising than a
                                                   0.65         0.6
                                                   0.75
                                                                                                                     learned OOV representation. The result fundamen-
                                                          0.7
                                             0.8                                                                     tally depends on the WebNLG categories, and if
                                      0.85
                                0.9                                                                                  similar textual labels pertain to similar triplesets
                         0.95
                   1.0                                                                                               that C ONTROL P REFIXES can utilize.
        1.1 1.05

                                                                                                                                                    Unseen Component
                                                                                                                                              # Examples # Categories BLEU
Figure 2: t-SNE visualizations for the decoder self-                                                                  WebNLG                     891          5
attention constituent of the length compression control                                                                OOV Representation                             56.35
prefixes of the simplification model. Each circle repre-                                                               Zero-shot                                      56.41
sents a control prefix corresponding to each length ratio                                                             WebNLG+ 2020               896          3
(bins of fixed width 0.05, from 0 to 1.1).                                                                             OOV Representation                             50.02
                                                                                                                       Zero-shot                                      50.39

                                                                                                                     Table 4: A comparison of the performance on the Un-
6.2    Zero-shot Learning
                                                                                                                     seen portions for WebNLG test sets, with i) a single
We argue that even for more complicated attributes,                                                                  OOV Control Prefix used for all samples from unseen
such as the WebNLG category attribute, if the at-                                                                    categories, or ii) the zero-shot transfer approach out-
tribute labels are similar, the respective control pre-                                                              lined, utilizing the available textual labels.
fixes will similarly guide both the general, task-
specific prefix parameters and the frozen LM pa-
rameters. Previous work has discussed the notion                                                                     7        Conclusion
of task similarity (Achille et al., 2019) for prompt                                                                 We introduce C ONTROL P REFIXES, a controlled
learning methods (Lester et al., 2021); however,                                                                     generation technique, which integrates a task-
we argue prefixes concerning different labels of                                                                     specific prompt alongside dynamic prompts to
one attribute are more likely to overlap in terms of                                                                 leverage additional input-level information. The
learnable properties than different tasks or whole                                                                   method extends prefix-tuning, enabling the model
datasets.                                                                                                            to have finer-grained control over generated text,
   In the case of WebNLG, where although no ex-                                                                      and assists in maximizing downstream task perfor-
amples of the unseen category are present during                                                                     mance.
training, a textual label for the category exists.17                                                                    We demonstrate that C ONTROL P REFIXES out-
This gives us some prior on the properties of the                                                                    performs prefix-tuning, as well as existing ap-
unseen categories, which we show is enough to                                                                        proaches, on an array of natural language gen-
successfully zero-shot transfer with control pre-                                                                    eration tasks. Our method attains state-of-the-
fixes. For each WebNLG model with the category                                                                       art results on several data-to-text datasets includ-
attribute, we map each category’s textual label, in-                                                                 ing WebNLG. This is despite learning
References                                               Arman Cohan, Waleed Ammar, Madeleine van Zuylen,
                                                           and Field Cady. 2019. Structural scaffolds for ci-
Alessandro Achille, Michael Lam, Rahul Tewari,             tation intent classification in scientific publications.
  Avinash Ravichandran, Subhransu Maji, Charless           CoRR, abs/1904.01608.
  Fowlkes, Stefano Soatto, and Pietro Perona. 2019.
  Task2vec: Task embedding for meta-learning. In         Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane
  2019 IEEE/CVF International Conference on Com-           Hung, Eric Frank, Piero Molino, Jason Yosinski, and
  puter Vision (ICCV), pages 6429–6438.                    Rosanne Liu. 2020. Plug and play language mod-
                                                           els: A simple approach to controlled text generation.
Armen Aghajanyan, Luke Zettlemoyer, and Sonal              In International Conference on Learning Represen-
  Gupta. 2020. Intrinsic dimensionality explains the       tations.
  effectiveness of language model fine-tuning. CoRR,
  abs/2012.13255.                                        Zi-Yi Dou, Pengfei Liu, Hiroaki Hayashi, Zhengbao
                                                            Jiang, and Graham Neubig. 2020. Gsum: A general
Fernando Alva-Manchego, Louis Martin, Carolina              framework for guided neural abstractive summariza-
  Scarton, and Lucia Specia. 2019. Easse: Easier            tion. CoRR, abs/2010.08014.
  automatic sentence simplification evaluation. arXiv
  preprint arXiv:1908.04567.                             Ondřej Dušek, David M. Howcroft, and Verena Rieser.
                                                           2019. Semantic noise matters for neural natural lan-
Fernando Emilio Alva-Manchego, Louis Martin, An-           guage generation. In Proc. of the 12th International
  toine Bordes, Carolina Scarton, Benoît Sagot, and        Conference on Natural Language Generation, pages
  Lucia Specia. 2020. ASSET: A dataset for tun-            421–426, Tokyo, Japan. Association for Computa-
  ing and evaluation of sentence simplification mod-       tional Linguistics.
  els with multiple rewriting transformations. CoRR,
                                                         Claire Gardent, Anastasia Shimorina, Shashi Narayan,
  abs/2005.00481.
                                                           and Laura Perez-Beltrachini. 2017. The WebNLG
                                                           challenge: Generating text from RDF data. In Pro-
Sören Auer, Christian Bizer, Georgi Kobilarov, Jens        ceedings of the 10th International Conference on
  Lehmann, Richard Cyganiak, and Zachary Ives.             Natural Language Generation, pages 124–133, San-
  2007. Dbpedia: A nucleus for a web of open data.         tiago de Compostela, Spain. Association for Compu-
  In The Semantic Web, pages 722–735, Berlin, Hei-         tational Linguistics.
  delberg. Springer Berlin Heidelberg.
                                                         Sebastian Gehrmann, Tosin P. Adewumi, Karmanya
Tom Brown, Benjamin Mann, Nick Ryder, Melanie              Aggarwal, Pawan Sasanka Ammanamanchi, Aremu
  Subbiah, Jared D Kaplan, Prafulla Dhariwal,              Anuoluwapo, Antoine Bosselut, Khyathi Raghavi
  Arvind Neelakantan, Pranav Shyam, Girish Sastry,         Chandu, Miruna-Adriana Clinciu, Dipanjan Das,
  Amanda Askell, Sandhini Agarwal, Ariel Herbert-          Kaustubh D. Dhole, Wanyu Du, Esin Durmus,
  Voss, Gretchen Krueger, Tom Henighan, Rewon              Ondrej Dusek, Chris Emezue, Varun Gangal,
  Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu,        Cristina Garbacea, Tatsunori Hashimoto, Yufang
  Clemens Winter, Chris Hesse, Mark Chen, Eric             Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng
  Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess,      Ji, Shailza Jolly, Dhruv Kumar, Faisal Ladhak,
  Jack Clark, Christopher Berner, Sam McCandlish,          Aman Madaan, Mounica Maddela, Khyati Mahajan,
  Alec Radford, Ilya Sutskever, and Dario Amodei.          Saad Mahamood, Bodhisattwa Prasad Majumder,
  2020. Language models are few-shot learners. In          Pedro Henrique Martins, Angelina McMillan-Major,
  NeurIPS.                                                 Simon Mille, Emiel van Miltenburg, Moin Nadeem,
                                                           Shashi Narayan, Vitaly Nikolaev, Rubungo An-
Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej         dre Niyongabo, Salomey Osei, Ankur P. Parikh,
  Risteski, and David Sontag. 2020. Empirical study        Laura Perez-Beltrachini, Niranjan Ramesh Rao,
  of the benefits of overparameterization in learning      Vikas Raunak, Juan Diego Rodriguez, Sashank
  latent variable models.                                  Santhanam, João Sedoc, Thibault Sellam, Samira
                                                           Shaikh, Anastasia Shimorina, Marco Antonio So-
Isabel Cachola, Kyle Lo, Arman Cohan, and Daniel S.        brevilla Cabezudo, Hendrik Strobelt, Nishant Sub-
   Weld. 2020. TLDR: extreme summarization of sci-         ramani, Wei Xu, Diyi Yang, Akhila Yerukola, and
   entific documents. CoRR, abs/2004.15011.                Jiawei Zhou. 2021. The GEM benchmark: Natu-
                                                           ral language generation, its evaluation and metrics.
Thiago Castro Ferreira, Claire Gardent, Nikolai            CoRR, abs/2102.01672.
  Ilinykh, Chris van der Lee, Simon Mille, Diego
  Moussallem, and Anastasia Shimorina. 2020. The         David Grangier and Michael Auli. 2018. QuickEdit:
  2020 bilingual, bi-directional WebNLG+ shared            Editing text & translations by crossing words out.
  task: Overview and evaluation results (WebNLG+           In Proceedings of the 2018 Conference of the North
  2020). In Proceedings of the 3rd International Work-     American Chapter of the Association for Compu-
  shop on Natural Language Generation from the Se-         tational Linguistics: Human Language Technolo-
  mantic Web (WebNLG+), pages 55–76, Dublin, Ire-          gies, Volume 1 (Long Papers), pages 272–282, New
  land (Virtual). Association for Computational Lin-       Orleans, Louisiana. Association for Computational
  guistics.                                                Linguistics.
Hamza Harkous, Isabel Groves, and Amir Saffari. 2020.      Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning:
  Have your text and use it too! end-to-end neural data-     Optimizing continuous prompts for generation.
  to-text generation with semantic fidelity. CoRR,
  abs/2004.06577.                                          Chin-Yew Lin. 2004. ROUGE: A package for auto-
                                                             matic evaluation of summaries. In Text Summariza-
Ari Holtzman, Jan Buys, Maxwell Forbes, and Yejin            tion Branches Out, pages 74–81, Barcelona, Spain.
  Choi. 2019. The curious case of neural text degener-       Association for Computational Linguistics.
  ation. CoRR, abs/1904.09751.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan          Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang,
  Allen-Zhu, Yuanzhi Li, Shean Wang, and Weizhu              Hiroaki Hayashi, and Graham Neubig. 2021a. Pre-
  Chen. 2021. Lora: Low-rank adaptation of large lan-        train, prompt, and predict: A systematic survey of
  guage models. CoRR, abs/2106.09685.                        prompting methods in natural language processing.

Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim           Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding,
 Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Tho-               Yujie Qian, Zhilin Yang, and Jie Tang. 2021b. GPT
 rat, Fernanda B. Viégas, Martin Wattenberg, Greg            understands, too. CoRR, abs/2103.10385.
 Corrado, Macduff Hughes, and Jeffrey Dean. 2016.
 Google’s multilingual neural machine translation          Lajanugen Logeswaran, Ann Lee, Myle Ott, Honglak
 system: Enabling zero-shot translation. CoRR,               Lee, Marc’Aurelio Ranzato, and Arthur Szlam.
 abs/1611.04558.                                             2020. Few-shot sequence learning with transform-
                                                             ers. CoRR, abs/2012.09543.
N. Keskar, B. McCann, L. R. Varshney, Caiming Xiong,
   and R. Socher. 2019. Ctrl: A conditional trans-         Ilya Loshchilov and Frank Hutter. 2017.         Fixing
   former language model for controllable generation.         weight decay regularization in adam.         CoRR,
  ArXiv, abs/1909.05858.                                      abs/1711.05101.
Daniel Khashabi, Gabriel Stanovsky, Jonathan Bragg,
                                                           L. V. D. Maaten and Geoffrey E. Hinton. 2008. Visual-
  Nicholas Lourie, Jungo Kasai, Yejin Choi, Noah A.
                                                              izing data using t-sne. Journal of Machine Learning
  Smith, and Daniel S. Weld. 2021. GENIE: A leader-
                                                              Research, 9:2579–2605.
  board for human-in-the-loop evaluation of text gen-
  eration. CoRR, abs/2101.06561.
                                                           Louis Martin, Angela Fan, Éric de la Clergerie, An-
Yuta Kikuchi, Graham Neubig, Ryohei Sasano, Hiroya           toine Bordes, and Benoît Sagot. 2020. Multilin-
  Takamura, and Manabu Okumura. 2016. Control-               gual unsupervised sentence simplification. CoRR,
  ling output length in neural encoder-decoders. In          abs/2005.00352.
  Proceedings of the 2016 Conference on Empirical
  Methods in Natural Language Processing, pages            Amit Moryossef, Ido Dagan, and Yoav Goldberg. 2019.
  1328–1338, Austin, Texas. Association for Compu-          Improving quality and efficiency in plan-based neu-
  tational Linguistics.                                     ral data-to-text generation.

J. Peter Kincaid, Robert P Fishburne Jr., Richard L.       Shashi Narayan, Shay B. Cohen, and Mirella Lapata.
   Rogers, and Brad S. Chissom. 1975. Derivation of          2018. Don’t give me the details, just the summary!
   new readability formulas (automated readability in-       topic-aware convolutional neural networks for ex-
   dex, fog count and flesch reading ease formula) for       treme summarization. CoRR, abs/1808.08745.
   navy enlisted personnel.
                                                           Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey
Alon Lavie and Abhaya Agarwal. 2007. METEOR: An              Dosovitskiy, and Jeff Clune. 2016. Plug & play gen-
  automatic metric for MT evaluation with high levels        erative networks: Conditional iterative generation of
  of correlation with human judgments. In Proceed-           images in latent space. CoRR, abs/1612.00005.
  ings of the Second Workshop on Statistical Machine
  Translation, pages 228–231, Prague, Czech Repub-         Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
  lic. Association for Computational Linguistics.            Jing Zhu. 2002. Bleu: A method for automatic eval-
Brian Lester, Rami Al-Rfou, and Noah Constant. 2021.         uation of machine translation. In Proceedings of the
  The power of scale for parameter-efficient prompt          40th Annual Meeting on Association for Computa-
  tuning. CoRR, abs/2104.08691.                              tional Linguistics, ACL ’02, page 311–318, USA.
                                                             Association for Computational Linguistics.
Mike Lewis, Yinhan Liu, Naman Goyal, Mar-
  jan Ghazvininejad, Abdelrahman Mohamed, Omer             Nivranshu Pasricha, Mihael Arcan, and Paul Buite-
  Levy, Veselin Stoyanov, and Luke Zettlemoyer.              laar. 2020. NUIG-DSI at the WebNLG+ chal-
  2020. BART: Denoising sequence-to-sequence pre-            lenge: Leveraging transfer learning for RDF-to-text
  training for natural language generation, translation,     generation. In Proceedings of the 3rd Interna-
  and comprehension. In Proceedings of the 58th An-          tional Workshop on Natural Language Generation
  nual Meeting of the Association for Computational          from the Semantic Web (WebNLG+), pages 137–143,
  Linguistics, pages 7871–7880, Online. Association          Dublin, Ireland (Virtual). Association for Computa-
  for Computational Linguistics.                             tional Linguistics.
Jeffrey Pennington, Richard Socher, and Christopher         Yu Sun, Shuohuan Wang, Shikun Feng, Siyu Ding,
   Manning. 2014. GloVe: Global vectors for word              Chao Pang, Junyuan Shang, Jiaxiang Liu, Xuyi
   representation. In Proceedings of the 2014 Confer-         Chen, Yanbin Zhao, Yuxiang Lu, Weixin Liu, Zhi-
   ence on Empirical Methods in Natural Language              hua Wu, Weibao Gong, Jianzhong Liang, Zhizhou
   Processing (EMNLP), pages 1532–1543, Doha,                 Shang, Peng Sun, Wei Liu, Xuan Ouyang, Dianhai
   Qatar. Association for Computational Linguistics.          Yu, Hao Tian, Hua Wu, and Haifeng Wang. 2021.
                                                              ERNIE 3.0: Large-scale knowledge enhanced pre-
Matthew E. Peters, Sebastian Ruder, and Noah A.               training for language understanding and generation.
 Smith. 2019. To tune or not to tune? adapting pre-           CoRR, abs/2107.02137.
 trained representations to diverse tasks. In Proceed-
 ings of the 4th Workshop on Representation Learn-          Swabha Swayamdipta, Sam Thomson, Kenton Lee,
 ing for NLP (RepL4NLP-2019), pages 7–14, Flo-                Luke Zettlemoyer, Chris Dyer, and Noah A. Smith.
 rence, Italy. Association for Computational Linguis-         2018. Syntactic scaffolds for semantic structures. In
 tics.                                                        EMNLP.

                                                            Maria Tsimpoukelli, Jacob Menick, Serkan Cabi,
Dragomir R. Radev, Rui Zhang, Amrit Rau, Abhi-
                                                             S. M. Ali Eslami, Oriol Vinyals, and Felix Hill. 2021.
  nand Sivaprasad, Chiachun Hsieh, Nazneen Fatema
                                                             Multimodal few-shot learning with frozen language
  Rajani, Xiangru Tang, Aadit Vyas, Neha Verma,
                                                             models. CoRR, abs/2106.13884.
  Pranav Krishna, Yangxiaokang Liu, Nadia Irwanto,
  Jessica Pan, Faiaz Rahman, Ahmad Zaidi, Murori            Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
  Mutuma, Yasin Tarabar, Ankit Gupta, Tao Yu,                 Chaumond, Clement Delangue, Anthony Moi, Pier-
  Yi Chern Tan, Xi Victoria Lin, Caiming Xiong,               ric Cistac, Tim Rault, Rémi Louf, Morgan Funtow-
  and Richard Socher. 2020. DART: open-domain                 icz, Joe Davison, Sam Shleifer, Patrick von Platen,
  structured data record to text generation. CoRR,            Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
  abs/2007.02871.                                             Teven Le Scao, Sylvain Gugger, Mariama Drame,
                                                              Quentin Lhoest, and Alexander M. Rush. 2020.
Evani Radiya-Dixit and Xin Wang. 2020. How fine can           Transformers: State-of-the-art natural language pro-
  fine-tuning be? learning efficient language models.         cessing. In Proceedings of the 2020 Conference on
  In Proceedings of the Twenty Third International            Empirical Methods in Natural Language Processing:
  Conference on Artificial Intelligence and Statistics,       System Demonstrations, pages 38–45, Online. Asso-
  volume 108 of Proceedings of Machine Learning Re-           ciation for Computational Linguistics.
  search, pages 2435–2443, Online. PMLR.
                                                            Wei Xu, Courtney Napoles, Ellie Pavlick, Quanze
Colin Raffel, Noam Shazeer, Adam Roberts, Kather-             Chen, and Chris Callison-Burch. 2016. Optimizing
  ine Lee, Sharan Narang, Michael Matena, Yanqi               statistical machine translation for text simplification.
  Zhou, Wei Li, and Peter J. Liu. 2020. Exploring            Transactions of the Association for Computational
  the limits of transfer learning with a unified text-to-    Linguistics, 4:401–415.
  text transformer. Journal of Machine Learning Re-
  search, 21(140):1–67.                                     Dian Yu, Kenji Sagae, and Zhou Yu. 2021. Attribute
                                                              alignment: Controlling text generation from pre-
Leonardo F. R. Ribeiro, Martin Schmitt, Hinrich               trained language models. CoRR, abs/2103.11070.
  Schütze, and Iryna Gurevych. 2020. Investigating
                                                            Jingqing Zhang, Yao Zhao, Mohammad Saleh, and Pe-
  pretrained language models for graph-to-text gener-
                                                               ter J. Liu. 2019. PEGASUS: pre-training with ex-
  ation. arXiv.
                                                               tracted gap-sentences for abstractive summarization.
                                                               CoRR, abs/1912.08777.
Timo Schick and H. Schutze. 2021. Exploiting cloze-
  questions for few-shot text classification and natural    Xingxing Zhang and Mirella Lapata. 2017. Sen-
  language inference. In EACL.                                tence simplification with deep reinforcement learn-
                                                              ing. arXiv preprint arXiv:1703.10931.
Thomas Scialom, Louis Martin, Jacopo Staiano,
  Éric Villemonte de la Clergerie, and Benoît Sagot.
  2021. Rethinking automatic evaluation in sentence
  simplification. CoRR, abs/2104.07560.

Noam Shazeer and Mitchell Stern. 2018. Adafactor:
  Adaptive learning rates with sublinear memory cost.
  CoRR, abs/1804.04235.

Matthew Snover, Bonnie Dorr, Richard Schwartz, Lin-
 nea Micciulla, and Ralph Weischedel. 2006. A study
 of translation error rate with targeted human annota-
 tion. In In Proceedings of the Association for Ma-
 chine Transaltion in the Americas (AMTA 2006.
You can also read