GEDI: GENERATIVE DISCRIMINATOR GUIDED SEQUENCE GENERATION

Page created by Terrence Patel
 
CONTINUE READING
GEDI: GENERATIVE DISCRIMINATOR GUIDED SEQUENCE GENERATION
GeDi: Generative Discriminator Guided Sequence Generation
                      WARNING: This paper contains GPT-3 outputs which are offensive in nature.

        Ben Krause∗, Akhilesh Deepak Gotmare∗, Bryan McCann†, Nitish Shirish Keskar
                     Shafiq Joty, Richard Socher†, Nazneen Fatema Rajani

                                    Salesforce Research
                      {bkrause,akhilesh.gotmare}@salesforce.com

                        Abstract                            to learn the distribution of their training set well
                                                            enough to generate realistic text. However, simply
    While large-scale language models (LMs) are             imitating the distribution of the training data dur-
    able to imitate the distribution of natural lan-
                                                            ing generation has many drawbacks (Bender et al.,
    guage well enough to generate realistic text,
    it is difficult to control which regions of the         2021); large-scale text training sets are crawled
    distribution they generate. This is especially          from the web, which is imbued with toxicity, bias,
    problematic because datasets used for train-            and misinformation. Methods for controlling gen-
    ing large LMs usually contain significant toxi-         eration are valuable for making LMs trained on
    city, hate, bias, and negativity. One promising         such data safer and more useful for downstream
    approach to address this is to use discrimina-          applications.
    tors to guide decoding from LMs, but exist-
                                                               Existing approaches to controlling LMs have
    ing methods for this are too slow to be use-
    ful in practice for many applications. We               limitations. Class-conditional LMs (CC-LMs) such
    present GeDi as a significantly more efficient          as CTRL (Keskar et al., 2019) attempt to control
    discriminator-based approach for guiding de-            text generation by conditioning on a control code,
    coding. GeDi guides generation at each step             which is an attribute variable representing a data
    by computing classification probabilities for           source. However, using a specific control code can
    all possible next tokens via Bayes rule by nor-         reduce sample diversity across prompts, as sam-
    malizing over two class-conditional distribu-
                                                            ples will generally resemble the data source of the
    tions; one conditioned on the desired attribute,
    or control code, and another conditioned on
                                                            control code.
    the undesired attribute, or anti control code.             Another approach for controlling LMs is to use
    We find that GeDi gives controllability on par          discriminators to guide decoding, but existing meth-
    with or better than previous controllable gen-          ods to do this are very computationally intensive.
    eration methods. GeDi results in significantly          Weighted decoding (Holtzman et al., 2018) requires
    faster generation speeds than the only previ-           feeding candidate next tokens into a discriminator,
    ous method that achieved comparable control-            and thus scales linearly in computation with the
    lability in our experiments. We also show that
                                                            number of tokens to be re-weighted. Plug and Play
    GeDi can make GPT-2 and GPT-3 significantly
    less toxic while maintaining linguistic fluency,        LM (Dathathri et al., 2020, PPLM) applies up to
    without sacrificing significantly on generation         10 updates to the generating LM’s latent states per
    speed. Lastly, we find training GeDi on only            time step using gradients from a discriminator, also
    three topics allows us to controllably generate         making it many times slower than generating from
    new topics zero-shot from just a keyword.               the LM directly.
                                                               We present GeDi1,2 as a significantly more effi-
1   Introduction                                            cient algorithm for discriminator guided decoding.
Natural language generation has seen great                  Our proposed method uses class-conditional LMs
progress with the advent of Transformers (Vaswani           as generative discriminators (GeDis) to steer lan-
et al., 2017) and large scale training (Radford et al.,     guage generation towards desired attributes. We
2017, 2018, 2019; Brown et al., 2020). Large lan-           use GeDis to compute classification likelihoods
guage models (LMs) like GPT-2 (Radford et al.,              for all candidate next tokens during generation
2019) and GPT-3 (Brown et al., 2020) are able               using Bayes rule, saving many thousand-fold in
    ∗                                                           1
        Equal Contribution                                          pronounced “Jedi”
    †                                                           2
        Work performed while at Salesforce Research                 Code available at https://github.com/salesforce/GeDi
computation as compared with using a standard            Models can assign probabilities to sequences by
(non-generative) discriminator of the same size to       iteratively predicting a distribution over the next
compute this for large vocabulary sizes. We then         token given the previous tokens. Generating from
show how these likelihoods can guide decoding            language models requires iteratively sampling from
from large language models via weighted decoding         Pθ (xt |x
ic

                                                                                                                                                                                                     ic
                                                                       g

                                                                                             at

                                                                                                                                                                                                   at
                                                                                                                                                                         g
                                                                     in

                                                                                                                                                                       in

                                                                                                                                                                                              m
                                                                                         m

                                                                                                    ay

                                                                                                                                                                                  l
                                                                 az

                                                                                l

                                                                                                                                                                     az
                                                                                    …

                                                                                                                                                                              fu

                                                                                                                                                                                                          ay
                                                                            fu
                                                                                                                                                                                      …

                                                                                                                                                                                            ne
                                                                                        ne

                                                                                                                                                                             aw
                                                                                                ok
                                  desired attribute: positive

                                                                am

                                                                           aw

                                                                                                                                                                   am

                                                                                                                                                                                                      ok
                                                                                                                                                                                            ci
                                                                                        ci
    The   party   was                                  .09       0.01     …    0.2        0.05      The   party   was amazing         Discriminator      0.9

                                                                                                              The   party   was   awful         Discriminator                0.1

    The   party   was                                 0.01       0.09     …    0.2        0.05

                                                                                                                                    …….

                                                                                                                                                       …….

                                                                                                                                                                                      …….
                                                                                                              The   party   was cinematic       Discriminator                               0.5

                                                                 0.9       0.1      …    0.5        0.5
                                                                                                              The   party   was   okay          Discriminator                                             0.5

              The   party   was             GPT-2                p1         p2      …    pn-1        pn

                                                                                                                                                                   0.9       0.1      …     0.5       0.5

Figure 1: A toy example of how GeDi-guided decod-
                                                                                                              The   party   was                   GPT-2             p1
ing uses Bayes rule to efficiently compute classifica-                                                                                                                       p2       …     pn-1          pn

tion probabilities for possible next tokens at each gen-
eration timestep using only element-wise operations.                                                       Figure 2: A toy example of using a language model
These classification probabilities can then be used to                                                     with a discriminator head to guide next token genera-
guide generation from a language model (e.g., GPT-2)                                                       tion. This requires feeding in each word in the vocabu-
to achieve attribute control across domains. If a class                                                    lary to compute the probability that the resulting gener-
conditional language model was trained on movie re-                                                        ation would have positive sentiment, and using these
views for sentiment control, its direct class-conditional                                                  probabilities to guide the base language model (e.g.,
predictions will be biased towards predicting movie re-                                                    GPT-2) towards positive sentiment. This requires |V|  2
view words (illustrated by next word prediction of “cin-                                                   times the amount of computation to compute the final
ematic”). However, the bias towards movie reviews can                                                      hidden states of the network as compared with using
be canceled out by contrasting the predictions of oppos-                                                   GeDi if computing for the full vocabulary and using
ing control codes via Bayes rule.                                                                          the same neural architecture for both methods.

Equation (4) is normalized over all xt in the vo-
cabulary to obtain Pw (xt |x
that this does not give strong attribute control.              Algorithm 1 GeDi-guided decoding
   In practice, applying Equation (5) to long se-              Inputs: base LM PLM , CC-LM Pθ , vocabulary V,
quences often results in poorly calibrated distribu-           posterior mixing weight ω, decoding scheme
tions later in the sequence that assign classification          1: P (x|c) ← 1
probabilities of 1 or 0 to all candidate next words,            2: P (x|c̄) ← 1
which provides no useful signal. We addressed this              3: for t = 1 . . . , N do
by normalizing probabilities by current sequence                4:     pLM ← [PLM (xt = v|x
2017; Yu et al., 2017; Hu et al., 2017) or using a        ation. RealToxictyPrompts (Gehman et al., 2020)
discriminator to guide decoding (Ghazvininejad            gives an automatic evaluation of toxicity using gen-
et al., 2017; Holtzman et al., 2018; Dathathri et al.,    erations from different language models using a
2020). Keskar et al. (2019) train a CC-LM with pre-       set of webtext prompts. (Gehman et al., 2020)
defined control codes placed at the start of every        also tests methods for mitigating toxicity, and finds
sequence. GeDi also uses CC-LMs, but instead of           that applying PPLM was more effective than sim-
generating from them directly, GeDi uses them as          pler decoding-based detoxification methods such
discriminators to guide decoding from another lan-        as swear word filters. Xu et al. (2020) develop a
guage model. This is much more computationally            human in the loop method for adversarially probing
efficient than previous methods for discriminator         toxic responses in conversational agents, and train a
guided decoding. Holtzman et al. (2018) apply dis-        model to give preset responses when encountering
criminators to re-weight a beam search, requiring         potentially unsafe probes. Other work has focused
all candidate tokens to be passed through the dis-        on removing gender bias from language models
criminator, scaling linearly with the number of re-       (Bordia and Bowman, 2019; Dinan et al., 2020;
scored tokens. PPLM (Dathathri et al., 2020) trains       Bolukbasi et al., 2016). Related to the problem of
an attribute model on top of a language model’s           addressing toxicity in generation is toxicity detec-
last hidden layer and backpropagates gradients to         tion, which can be performed using the Perspective
update the hidden states of the model. This is com-       API or using a classifier trained on a labelled tox-
putationally intensive because it requires multiple       icity dataset such as the Jigsaw Toxic Comment
forward and backward passes for each generation           Classification Dataset (Borkan et al., 2019). Tox-
step. For instance, applying PPLM with 10 up-             icity detection is difficult as toxicity labelling is
date steps as done in Dathathri et al. (2020) would       subjective and often has poor annotator agreement
require an additional factor of 20 fold computa-          (Waseem, 2016; Ross et al., 2017). Additionally,
tion (10 forward passes, 10 backward passes) as           existing toxicity classifiers are often biased in that
compared to base LM generation at the first de-           they overestimate the toxicity of text that mentions
coding timestep. This factor also increases as the        sexual orientations or racial minorities (Dixon et al.,
sequence length increases, since PPLM updates the         2018; Sap et al., 2019; Hutchinson et al., 2020).
previously stored keys and values. GeDi in compar-
ison only adds constant overhead that is indepen-         5   Experiments
dent of the size of the base LM, and this constant
will be minimal if the GeDi is significantly smaller      We experiment with GeDi-guided decoding for sen-
than the base LM. GeDi also relates to the ratio-         timent, detoxification, and topic control. We fine-
nal speech acts framework for computational prag-         tune GPT2-medium (345M parameter) (Radford
matics (Frank and Goodman, 2012; Goodman and              et al., 2019) using the loss in Equation (3) with
Stuhlmüller, 2013) where a “listener” model and a         control codes specific to each task to form a class-
“speaker” model interactively generate a sequence         conditional language model. We use these CC-LMs
such that the listener can recover the input. GeDi        as GeDis to guide generation from GPT2-XL (1.5B
most closely relates to distractor based pragmat-         parameter), and GPT-3 (Brown et al., 2020) in our
ics (Andreas and Klein, 2016; Cohn-Gordon et al.,         detoxification experiments. All experiments were
2018; Shen et al., 2019), where a single model pro-       performed using adaptations of Huggingface Trans-
cesses a true input and a distractor input, and uses      formers (Wolf et al., 2020).
Bayes rule to produce text that fits the true input but      We include experiments with greedy decoding
not the distractor input. GeDi differs from previ-        with a repetition penalty (Keskar et al., 2019) (con-
ous pragmatics based approaches in that it trains a       ditioning on varying prompts to give diversity
separate class-conditional language model (which          across generations), which we found to give the
acts as the listener) on a single attribute, allowing     best quality generations, and top-p sampling (Holtz-
that attribute to be isolated, and uses it to guide       man et al., 2020). Our hyper-parameter settings for
generation from a separate language model (which          GeDi-guided generation are given in Appendix C.1.
acts as the speaker).                                     We also perform ablation studies in Appendix D,
                                                          and find that combining both the weighted decod-
  Other previous works seek to understand and             ing and filtering heuristics appears to be beneficial
address toxicity and hate speech in language gener-       although is not critical to the success of the method,
Model                            Generation time       for Amazon review sentiment, greedy decoding
                                  (sec/token)           (repetition penalty of 1.2).

 GPT2-XL                          0.060                    CTRL was applied using the control codes corre-
 GeDi-guided (w/ GPT2-XL)         0.095                 sponding to positive and negative Amazon reviews
 PPLM (w/ GPT2-XL)                3.116                 used during training by Keskar et al. (2019). The
                                                        PPLM discriminator was trained on SST-5 as in
                                                        Dathathri et al. (2020), with the step size param-
Table 1: Average generation time in seconds per token
                                                        eter retuned for GPT2-XL (since Dathathri et al.
for generating sequences of length 256 on a V100 GPU.
                                                        (2020) used GPT2-medium.). We found that it was
                                                        more than 30× faster to guide GPT2-XL with a
and that applying a very small LSTM (Hochre-            GeDi as compared with PPLM (assuming 10 up-
iter and Schmidhuber, 1997) discriminator that can      date steps as used in (Dathathri et al., 2020) and in
match the efficiency of GeDi is not as effective for    our experiments), as shown in Table 1.
controlling generation.                                    Amazon Mechanical Turk annotators rated the
                                                        generated text on sentiment, how book-like the text
5.1   Controlling sentiment of generations              was, fluency, and whether or not the text resembled
      from book prompts                                 an Amazon review or movie review (since CTRL
We experiment with GeDi-guided decoding from            was trained on Amazon reviews and GeDi was
GPT-2 for sentiment control using CC-LMs fine-          trained on movie reviews). Instructions given to
tuned on IMDb movie reviews. We noticed that,           annotators are given in Appendix G. The results of
while direct generation from CC-LMs could effec-        the experiment are given in Table 2. Using GeDi to
tively control the sentiment of movie reviews, it       guide GPT2-XL was able to generate book-like and
struggled to generalize to out-of-domain prompts,       linguistically fluent text while giving strong con-
and would generally try to convert prompts into         trol over the tone. In the greedy setting, GeDi was
movie reviews. However, when we used this same          also able to give roughly equivalent positive senti-
model as a GeDi to guide sampling from GPT-2,           ment control and statistically significantly stronger
we were able to effectively control the sentiment       negative sentiment control compared with PPLM
of a wide variety of topics.                            (p < 0.01 by two-tailed Wilcoxon signed rank test).
   To experimentally verify that GeDi can gener-        In the top-p setting, GeDi achieved statistically sig-
alize the concepts of “positivity” and “negativity”     nificantly stronger sentiment control than PPLM
beyond its training domain, we evaluate on a task       for both positive and negative sentiment (p = 0.01
where models conditionally generate text from the       and p = 0.005 for positive and negative sentiment
start of book chapters from Bookcorpus (Zhu et al.,     respectively). p-values for all significance tests are
2015), and each prompt is at least 150 characters       given in Appendix E. We include samples from all
and ends on the first word break after the mini-        greedy decoding models in Tables 11, 12, 13.
mum length. We run human evaluation on gen-                CTRL struggled to control tone/sentiment in this
erations from 50 different book prompts from 14         setting because its training domain for sentiment
different models; including raw GPT2-XL with            was Amazon reviews, and direct generation from
both top-p sampling (p = 0.9) and greedy de-            the CC-LMs that we used as GeDis failed to gen-
coding (repetition penalty=1.2), and the following      erate book-like text because their training domain
models with both positive and negative sentiment:       was movie reviews. According to our annotators,
1. GPT2-XL guided by GeDi, greedy decoding              27% of CTRL samples resembled Amazon reviews,
(repetition penalty of 1.2). 2. GPT2-XL guided          and 61% of CC-LM samples resembled movie re-
by GeDi, top-p sampling with p = 0.9 (repetition        views (Amazon and movie review resemblance
penalty of 1.05). 3. PPLM (w/GPT2-XL), greedy           percentages were less than 5% for samples from
decoding (repetition penalty of 1.2). 4. PPLM           all other models). This is a critical drawback of
(w/GPT2-XL), top-p sampling with p = 0.9.               CTRL-style generation – the model can only reli-
5. CC-LM trained on movie reviews (same model           ably generate text and control attributes within the
used as GeDi, but with direct CTRL-style genera-        training domain corresponding to the control code.
tion), greedy decoding (repetition penalty of 1.2).     Samples that illustrate this are given in Table 14.
6. CTRL (Keskar et al., 2019) using control codes       Discriminator-guided methods GeDi and PPLM
Model                      Positivity   Book-like ↑   Fluency ↑   Label fidelity ↑   Perplexity score ↓
                 GeDi-guided-pos (greedy)     3.73          4.18         4.43           96 %                12.8
                 GeDi-guided-pos (top-p)      3.82          4.17         4.35          100 %                17.3
                 PPLM-pos (greedy)            3.70          4.31         4.37           76 %                14.0
                 PPLM-pos (top-p)             3.47          4.24         4.00           66 %                21.4
                 CC-LM-pos (greedy)           3.13          3.18         3.83           62 %                14.7
                 CTRL-pos (greedy)            2.85          3.76         3.99           48 %                 9.7
                 GPT2-XL (greedy)             3.16          4.45         4.35             -                 10.4
                 GPT2-XL (top-p)              2.89          4.45         4.16             -                 13.8
                 CTRL-neg (greedy)            2.87          3.59         4.07           48 %                 9.7
                 CC-LM-neg (greedy)           2.30          2.70         3.68           76 %                14.3
                 PPLM-neg (top-p)             2.56          4.15         4.03           62 %                32.3
                 PPLM-neg (greedy)            2.57          4.31         4.21           78 %                15.8
                 GeDi-guided-neg (top-p)      2.04          4.01         3.88           98 %                26.7
                 GeDi-guided-neg (greedy)     2.15          4.21         4.06           96 %                14.2

Table 2: Human and automatic evaluation for sentiment on book text generation (rated for positivity, book resem-
blance and fluency all on a scale of 1-5). For human evaluation, we average three annotations on generations from
50 prompts for each model, where prompts are from the start of book chapters, and are a minimum of 150 char. For
automatic evaluation, we use a RoBERTa classifier trained on SST-2 (Socher et al., 2013) to measure label fidelity
(how often the sample is classified as having the same label as the control code), and measure the perplexity of
generations under GPT-2 to compute perplexity scores. We compare using a CC-LM as a GeDi to guide GPT2-XL
(GeDi-guided), vs. direct class conditional generation (CC-LM). GeDi gives the strongest control over sentiment.
PPLM also gives strong sentiment control, but results in generation 30× slower.
.

result in text rated more book-like that very rarely                   of 1.2). We add details of how we apply GeDi to
if ever reverts back to the domain that the discrimi-                  GPT-3 in Appendix B.
nator was trained on. However, as compared with                           For our large-scale automatic evaluation, we se-
PPLM, GeDi was able to generate 30× faster, and                        lect 5000 prompts from RealToxicityPrompts at
sentiment control that was on par with or better                       random and draw generations from each model.
than PPLM in all settings.                                             Following Gehman et al. (2020), we measure the
                                                                       expected toxicity score and toxicity probability sep-
5.2   Detoxifying GPT-2 and GPT-3
                                                                       arately for generations from toxic and non-toxic
We test GeDi’s ability to detoxify language genera-                    prompts using the Perspective API 3 , which is a
tion. We train a CC-LM on the Jigsaw Toxic Com-                        toxicity classier that returns a probability between
ment Classification Dataset (Borkan et al., 2019),                     0 and 1 that the submitted text is toxic. The ex-
which contains text samples labeled as “toxic” or                      pected toxicity is given by the average classification
“non-toxic”. The “toxic” label indicates the pres-                     probability under Perspective’s toxicity classifier
ence of profanity, obscenity, threats, insults, or                     of continuations from a given model, whereas the
identity hate. We train the model on an even split                     toxicity probability is the fraction of generations
of toxic and non-toxic examples, with “clean” and                      that the Perspective API classifies as having a toxi-
“dirty” control codes to specify toxic and non-toxic                   city probability greater than 0.5. For models that
text. For evaluation, we use generations condi-                        use sampling, we draw 10 generations from each
tioned on RealToxicityPrompts (Gehman et al.,                          prompt, and use the most toxic continuation as eval-
2020). We consider two toxicity evaluations, one                       uated by the Perspective API to measure all statis-
based on automatic toxicity evaluations from a                         tics, following the expected max toxicity scores
large number of prompts following Gehman et al.                        and probabilities used by Gehman et al. (2020).
(2020), and one using human annotations on a                           The results are given in Table 3. GeDi was able to
smaller number of trigger prompts that tend to lead                    reduce the toxicity of GPT-2 and GPT-3 and gave
to especially toxic generations from LMs. We ex-                       a stronger detoxification effect as compared with
periment with the same models as in the previous                       PPLM (The reductions in expected toxicity of GeDi
section (expect for pretrained CTRL, which does                        vs. PPLM, GeDi vs. GPT-2, and GeDi vs. GPT-3
not have a detoxification control code), but also add                  were strongly statistically significant in all com-
results using 1. GPT3 using Open AI API, greedy                        parisons by a paired sample t-test). The advantage
(repetition penalty of 1.2). 2. GPT3 using Open AI
                                                                           3
API, guided by GeDi, greedy (repetition penalty                                https://www.perspectiveapi.com/
Expected toxicity ↓              Toxicity probability ↓
          Model                                                    toxic prompt     non-toxic prompt   toxic prompt     non-toxic prompt
          GPT2-XL (top-p, most toxic of 10 per prompt)               0.790.14            0.350.23            0.98                  0.25
          GeDi-guided GPT-2 (top-p, most toxic of 10 per prompt)     0.710.16            0.210.14            0.89                  0.04
          PPLM (top-p, most toxic of 10 per prompt)                  0.750.14            0.300.19            0.94                  0.15
          GPT2-XL (greedy)                                           0.670.18            0.170.16            0.79                  0.05
          GeDi-guided GPT-2 (greedy)                                 0.610.21            0.120.11            0.67                  0.01
          PPLM (greedy)                                              0.630.19            0.140.12            0.71                  0.02
          CC-LM (greedy)                                             0.690.19            0.170.18            0.83                  0.10
          GPT-3 da-vinci (greedy)                                    0.670.18            0.170.16            0.79                  0.05
          GeDi-guided GPT-3 (greedy)                                 0.610.22            0.110.10            0.69                  0.01

Table 3: RealToxicityPrompts automated toxicity evaluation. We measure the expected toxicity score (with stan-
dard deviation given in subscript) and toxicity probability from continuations from toxic (perspective toxicity score
> 0.5) and non-toxic (perspective toxicity score < 0.5) prompts for 9 models. Generations from 5000 prompts
were used (1054 toxic, 3946 non-toxic, approximately matching the ratios used by Gehman et al. (2020)). For
models that use top-p sampling, we measure the expected toxicity and toxicity probability of the most toxic sam-
ple out of 10 generations per prompt. For generation with greedy models we simply average these metrics across
prompts. GeDi significantly reduced the toxicity of GPT-2 and GPT-3 and resulted in a stronger detoxification
effect as compared with PPLM.

of GeDi over PPLM was especially pronounced                                      Model                               Toxicity ↓         Fluency ↑
                                                                                                                    (human eval)      (human eval)
in the case of top-p sampling, where PPLM gener-
                                                                                 GPT2-XL (top-p)                       49 %               4.10
ated at least one toxic sample (out of 10 samples                                GeDi-guided GPT-2 (top-p)             16 %               4.07
                                                                                 PPLM (top-p)                          30 %               4.19
per prompt) from a non-toxic prompt more than 3
                                                                                 GPT2-XL (greedy)                      60 %               4.32
times as often, suggesting that GeDi is more robust                              GeDi-guided GPT-2 (greedy)            27 %               4.47
to worst case scenarios when applying sampling.                                  PPLM (greedy)                         28 %               4.41
                                                                                 CC-LM (greedy)                        37 %               4.19
                                                                                 GPT-3 da-vinci (greedy)               57 %               4.32
   We also applied human evaluation to measure                                   GeDi-guided GPT-3 (greedy)            21 %               4.23

toxicity using a smaller number of prompts that
                                                                           Table 4: Human evaluation of toxicity on 100 trigger
probe LMs to generate toxic text. To identify strong
                                                                           prompts. We collect 3 annotations of toxicity labels
triggers, we selected a subset of prompts with Per-                        (where we classify each sample based on majority) and
spective API toxicity probabilities between 0.3 and                        linguistic fluency scores (scale of 1-5) for each model.
0.5, that also were classified as non-toxic by a                           We find that GeDi is effective for detoxifying GPT-2
RoBERTa toxicity classifier trained on the Jigsaw                          and GPT-3 while maintaining fluency.
dataset. We used GPT2-XL to draw 32 samples
from each prompt, and selected the 100 prompts
with the highest average toxicity probability over                         5.3      Extending GeDi to the multi-class setting
their 32 completions according to the RoBERTa                              To experiment with multi-class GeDi, we use the
toxicity classifier. Our goal with this procedure                          AG news topic classification data set (Zhang et al.,
was to identify prompts that are non-toxic, but have                       2015) which has 4 topics (World, Sports, Busi-
a high probability of causing language models to                           ness, and Science/Tech). In order to test GeDi’s
generate toxic text.                                                       ability to generate never seen before classes zero-
                                                                           shot, we trained 4 different CC-LMs; each one is
   We ran human evaluation to measure toxicity                             trained on only 3 out of 4 of the AG news classes,
and linguistic fluency [1: very low fluency, 5: very                       with one class held out. We then compare direct
high fluency]. Results are given in Table 4 and                            (CTRL-style) generation from CC-LMs with GeDi-
generations from evaluated models are given in                             guided decoding from GPT-2, on topics included
Table 15. GeDi was able to significantly reduce                            in training and held out (zero-shot) topics. To eval-
the toxicity in GPT-2 and GPT-3 (p < 0.001 by a                            uate topic relevance, we use a RoBERTa classifier
2 proportion z-test in all settings). GeDi resulted                        trained on all 4 AG news topics to estimate the
in a similar toxicity as compared with PPLM for                            topic of generation. We obtain generations condi-
greedy decoding and was significantly less toxic                           tioning on short (minimum 30 characters, ending
than PPLM for sampling (p = 0.02), while also                              on a space) prompts from the multi-news data-set
achieving 30× faster generation speeds.                                    (Fabbri et al., 2019), and report results in Table 5.
Topic      Model         Trained on class     Zero-shot
                             (Label fidelity)   (Label fidelity)
                                                                        Topic            GeDi-guided generation
    World      GPT2-XL              -                22 %                             In a shocking finding NASA have
               GeDi-guided        72 %               30 %                             announced the discovery of
               CC-LM              53 %               28 %                             a mysterious object orbiting
                                                                    Space             our nearest neighbour, Proxima
    Sports     GPT2-XL              -                6%
                                                                                      Centauri.
               GeDi-guided        91 %               62 %
                                                                                      Advertisement
               CC-LM              49 %               12 %
                                                                                      Researchers have spent years
    Business   GPT2-XL              -                4%                               studying the strange object in
               GeDi-guided        55 %               36 %                             space as they are increasingly
               CC-LM              35 %               10 %                             becoming more convinced that it’s
                                                                                      real. In fact, some researchers
    Science    GPT2-XL              -                68 %                             are starting to wonder if this
               GeDi-guided        83 %               84 %                             new discovery may prove to be
               CC-LM              59 %               50 %                             one of the greatest scientific
                                                                                      discoveries of recent years.

Table 5: Automatic label fidelity on topics, measured                                 In a shocking finding police
                                                                                      believe two fire crews, including
by how often a RoBERTa classifier’s label matches                                     a senior paramedic, were
the control code used to generate the sample. We                    Fire              deliberately set alight as part
                                                                                      of a revenge plot.
trained 4 different CC-LMs, each with 1 class held out                                It comes as a huge investigation
and we considered direct CTRL-style generation (CC-                                   into an apparent conspiracy in
                                                                                      which arsonists targeted at least
LM), and GeDi-guided decoding from these models.                                      three other London fire engines
“trained on class” label fidelity averages the label fideli-                          in just one night on Friday and
                                                                                      Saturday night.
ties from 3 models trained with the given class as one of
the training classes. The “zero-shot” label fidelity for                              In a shocking finding historians
                                                                                      believe to be "unprecedented"
each class uses generations from the model trained on                                 British documents have been
the other 3 classes, using a zero-shot control code for             History           unearthed which reveal the true
                                                                                      history of King Richard II and
the desired class. We include results from raw GPT-2-                                 show that he was not only the
XL to show how much GeDi and CC-LM are influenc-                                      son of Godfrey of Gloucester, but
                                                                                      also descended from King Henry
ing generation. We find that GeDi is able to influence                                VIII.
generation more effectively than CC-LM when condi-                                    Richard, whose father was
                                                                                      executed for his crimes in
tioning on both training classes and held out classes.                                1483, became King in 1485 after
                                                                                      defeating John Balliol in a
                                                                                      battle at Bosworth.

   GeDi was able to generate topics included in                    Table 6: Controlling topic of generation (zero-shot)
training with a higher label fidelity than CTRL-                   with GeDi (greedy decoding). This topic GeDi was
style generation from a CC-LM. Unlike CC-LM,                       trained on only three classes: science, sports and busi-
GeDi was able to bias generation towards never                     ness. The topics of Space, Fire, and History were not
seen before zero-shot control codes that are held                  a part of the GeDi training set. Boldfaced string
out from training. GeDi’s ability to generalize to                 indicates the context provided to the language model
                                                                   followed by its generation.
new control codes zero-shot gives the ability to
generate text corresponding to many topics and
subtopics. This ability likely emerges because gen-
erative classifiers can classify unseen topics zero-               ference, making it far more efficient than previous
shot from learned word embeddings (Yogatama                        methods that use discriminators to guide decoding.
et al., 2017), and GeDi uses a generative classifier               GeDi achieves stronger controllability of sentiment
to guide generation. While GPT-3 can also gener-                   than PPLM while also giving a generation speed
ate topics zero shot by conditioning prompts such                  more than 30× faster. GeDis trained on 3 topics
as “Write an article about sports:”, zero-shot gen-                can also controllably generate new topics zero-shot
eration with GeDi does not necessarily need to be                  from just a keyword. We also show that GeDi is
an article or have any other constraints that would                able to significantly reduce the toxicity of GPT-2
come about from the prompt. We provide examples                    and GPT-3 without sacrificing noticeably on lin-
of zero-shot topic generation with GeDi in Table 6.                guistic fluency. GeDi moves towards unifying nat-
                                                                   ural language generation with classification, and
6    Conclusion                                                    suggests that we may be able to efficiently generate
                                                                   text that corresponds to any attribute that we can
We present GeDi as an approach for controllable                    accurately classify. This could have broad impli-
generation that uses generative discriminators to                  cations for improving text generation systems by
classify candidate next tokens on the fly during in-               making them more controllable.
References                                                  deep bidirectional transformers for language under-
                                                            standing. In Proceedings of the 2019 Conference
Jacob Andreas and Dan Klein. 2016. Reasoning about          of the North American Chapter of the Association
   pragmatics with neural listeners and speakers. In        for Computational Linguistics: Human Language
   Proceedings of the 2016 Conference on Empirical          Technologies, Volume 1 (Long and Short Papers),
  Methods in Natural Language Processing, pages             pages 4171–4186, Minneapolis, Minnesota. Associ-
  1173–1182, Austin, Texas. Association for Compu-          ation for Computational Linguistics.
   tational Linguistics.
                                                          Emily Dinan, Angela Fan, Adina Williams, Jack Ur-
Emily M Bender, Timnit Gebru, Angelina McMillan-
                                                            banek, Douwe Kiela, and Jason Weston. 2020.
  Major, and Shmargaret Shmitchell. 2021. On the
                                                            Queens are powerful too: Mitigating gender bias in
  dangers of stochastic parrots: Can language models
                                                            dialogue generation. In Proceedings of the 2020
  be too big. In Proceedings of the 2020 Conference
                                                            Conference on Empirical Methods in Natural Lan-
  on Fairness, Accountability, and Transparency; As-
                                                            guage Processing (EMNLP), pages 8173–8188, On-
  sociation for Computing Machinery: New York, NY,
                                                            line. Association for Computational Linguistics.
  USA.
                                                          Lucas Dixon, John Li, Jeffrey Sorensen, Nithum Thain,
Tolga Bolukbasi, Kai-Wei Chang, James Zou,
                                                            and Lucy Vasserman. 2018. Measuring and mitigat-
  Venkatesh Saligrama, and Adam Kalai. 2016. Man
                                                            ing unintended bias in text classification. In Pro-
  is to computer programmer as woman is to home-
                                                            ceedings of the 2018 AAAI/ACM Conference on AI,
  maker? debiasing word embeddings. In Advances
                                                            Ethics, and Society, pages 67–73.
  in neural information processing systems.
Shikha Bordia and Samuel R. Bowman. 2019. Identify-       Alexander Fabbri, Irene Li, Tianwei She, Suyi Li, and
  ing and reducing gender bias in word-level language       Dragomir Radev. 2019. Multi-news: A large-scale
  models. In Proceedings of the 2019 Conference of          multi-document summarization dataset and abstrac-
  the North American Chapter of the Association for         tive hierarchical model. In Proceedings of the 57th
  Computational Linguistics: Student Research Work-         Annual Meeting of the Association for Computa-
  shop, pages 7–15, Minneapolis, Minnesota. Associ-         tional Linguistics, pages 1074–1084, Florence, Italy.
  ation for Computational Linguistics.                      Association for Computational Linguistics.

Daniel Borkan, Lucas Dixon, Jeffrey Sorensen, Nithum      Angela Fan, David Grangier, and Michael Auli. 2018.
  Thain, and Lucy Vasserman. 2019. Nuanced met-             Controllable abstractive summarization. In Proceed-
  rics for measuring unintended bias with real data for     ings of the 2nd Workshop on Neural Machine Trans-
  text classification. In Companion proceedings of the      lation and Generation, pages 45–54, Melbourne,
  2019 world wide web conference, pages 491–500.            Australia. Association for Computational Linguis-
                                                            tics.
Tom B Brown, Benjamin Mann, Nick Ryder, Melanie
  Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind        Jessica Ficler and Yoav Goldberg. 2017. Controlling
  Neelakantan, Pranav Shyam, Girish Sastry, Amanda           linguistic style aspects in neural language genera-
  Askell, et al. 2020. Language models are few-shot          tion. In Proceedings of the Workshop on Stylis-
  learners. In Advances in neural information process-       tic Variation, pages 94–104, Copenhagen, Denmark.
  ing systems.                                               Association for Computational Linguistics.

Alvin Chan, Yew-Soon Ong, Bill Pung, Aston Zhang,         Michael C Frank and Noah D Goodman. 2012. Pre-
  and Jie Fu. 2021. Cocon: A self-supervised ap-            dicting pragmatic reasoning in language games. Sci-
  proach for controlled text generation. ICLR.              ence, 336(6084):998–998.

Reuben Cohn-Gordon, Noah Goodman, and Christo-            Samuel Gehman, Suchin Gururangan, Maarten Sap,
  pher Potts. 2018. Pragmatically informative image         Yejin Choi, and Noah A. Smith. 2020. RealToxi-
  captioning with character-level inference. In Pro-        cityPrompts: Evaluating neural toxic degeneration
  ceedings of the 2018 Conference of the North Amer-        in language models. In Findings of the Association
  ican Chapter of the Association for Computational         for Computational Linguistics: EMNLP 2020, pages
  Linguistics: Human Language Technologies, Vol-            3356–3369, Online. Association for Computational
  ume 2 (Short Papers), pages 439–443, New Orleans,         Linguistics.
  Louisiana. Association for Computational Linguis-
  tics.                                                   Marjan Ghazvininejad, Xing Shi, Jay Priyadarshi, and
                                                           Kevin Knight. 2017. Hafez: an interactive poetry
Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane        generation system. In Proceedings of ACL 2017,
  Hung, Eric Frank, Piero Molino, Jason Yosinski, and      System Demonstrations, pages 43–48, Vancouver,
  Rosanne Liu. 2020. Plug and play language mod-           Canada. Association for Computational Linguistics.
  els: a simple approach to controlled text generation.
  ICLR.                                                   Noah D Goodman and Andreas Stuhlmüller. 2013.
                                                            Knowledge and implicature: Modeling language un-
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and               derstanding as social cognition. Topics in cognitive
   Kristina Toutanova. 2019. BERT: Pre-training of          science, 5(1):173–184.
Sepp Hochreiter and Jürgen Schmidhuber. 1997.            Björn Ross, Michael Rist, Guillermo Carbonell, Ben-
  Long short-term memory. Neural computation,              jamin Cabrera, Nils Kurowsky, and Michael Wo-
  9(8):1735–1780.                                          jatzki. 2017. Measuring the reliability of hate
                                                           speech annotations: The case of the european
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and         refugee crisis.  Proceedings of NLP4CMC III:
  Yejin Choi. 2020. The curious case of neural text        3rd Workshop on Natural Language Processing
  degeneration. ICLR.                                      for Computer-Mediated Communication (Bochum),
                                                           Bochumer Linguistische Arbeitsberichte, vol. 17, sep
Ari Holtzman, Jan Buys, Maxwell Forbes, Antoine            2016, pp. 6-9.
  Bosselut, David Golub, and Yejin Choi. 2018.
  Learning to write with cooperative discriminators.     Maarten Sap, Dallas Card, Saadia Gabriel, Yejin Choi,
  In Proceedings of the 56th Annual Meeting of the As-    and Noah A. Smith. 2019. The risk of racial bias
  sociation for Computational Linguistics (Volume 1:      in hate speech detection. In Proceedings of the
  Long Papers), pages 1638–1649, Melbourne, Aus-          57th Annual Meeting of the Association for Com-
  tralia. Association for Computational Linguistics.      putational Linguistics, pages 1668–1678, Florence,
                                                          Italy. Association for Computational Linguistics.
Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan
  Salakhutdinov, and Eric P. Xing. 2017. Toward con-     Sheng Shen, Daniel Fried, Jacob Andreas, and Dan
  trolled generation of text. In Proceedings of the        Klein. 2019. Pragmatically informative text gen-
  34th International Conference on Machine Learning,       eration. In Proceedings of the 2019 Conference
  volume 70 of Proceedings of Machine Learning Re-         of the North American Chapter of the Association
  search, pages 1587–1596, International Convention        for Computational Linguistics: Human Language
  Centre, Sydney, Australia. PMLR.                         Technologies, Volume 1 (Long and Short Papers),
                                                           pages 4060–4067, Minneapolis, Minnesota. Associ-
Ben Hutchinson, Vinodkumar Prabhakaran, Emily              ation for Computational Linguistics.
  Denton, Kellie Webster, Yu Zhong, and Stephen De-
  nuyl. 2020. Social biases in NLP models as barriers    Richard Socher, Alex Perelygin, Jean Wu, Jason
  for persons with disabilities. In Proceedings of the     Chuang, Christopher D. Manning, Andrew Ng, and
  58th Annual Meeting of the Association for Compu-        Christopher Potts. 2013. Recursive deep models
  tational Linguistics, pages 5491–5501, Online. As-       for semantic compositionality over a sentiment tree-
  sociation for Computational Linguistics.                 bank. In Proceedings of the 2013 Conference on
                                                           Empirical Methods in Natural Language Processing,
Nitish Shirish Keskar, Bryan McCann, Lav R Varshney,       pages 1631–1642, Seattle, Washington, USA. Asso-
  Caiming Xiong, and Richard Socher. 2019. Ctrl: A         ciation for Computational Linguistics.
  conditional transformer language model for control-    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
  lable generation. arXiv preprint arXiv:1909.05858.       Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
                                                           Kaiser, and Illia Polosukhin. 2017. Attention is all
Andrea Madotto, Etsuko Ishii, Zhaojiang Lin, Sumanth       you need. In Advances in neural information pro-
  Dathathri, and Pascale Fung. 2020. Plug-and-play         cessing systems, pages 5998–6008.
  conversational models. In Findings of the Associa-
  tion for Computational Linguistics: EMNLP 2020,        Zeerak Waseem. 2016. Are you a racist or am I seeing
  pages 2422–2433, Online. Association for Compu-          things? annotator influence on hate speech detection
  tational Linguistics.                                    on Twitter. In Proceedings of the First Workshop on
                                                           NLP and Computational Social Science, pages 138–
Alec Radford, Rafal Jozefowicz, and Ilya Sutskever.        142, Austin, Texas. Association for Computational
  2017. Learning to generate reviews and discovering       Linguistics.
  sentiment. arXiv preprint arXiv:1704.01444.
                                                         Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Alec Radford, Karthik Narasimhan, Tim Salimans, and        Chaumond, Clement Delangue, Anthony Moi, Pier-
  Ilya Sutskever. 2018. Improving language under-          ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
  standing by generative pre-training.                     icz, Joe Davison, Sam Shleifer, Patrick von Platen,
                                                           Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
Alec Radford, Jeffrey Wu, Rewon Child, David Luan,         Teven Le Scao, Sylvain Gugger, Mariama Drame,
  Dario Amodei, and Ilya Sutskever. 2019. Language         Quentin Lhoest, and Alexander Rush. 2020. Trans-
  models are unsupervised multitask learners. OpenAI       formers: State-of-the-art natural language process-
  Blog, 1(8):9.                                            ing. In Proceedings of the 2020 Conference on Em-
                                                           pirical Methods in Natural Language Processing:
Nazneen Fatema Rajani, Bryan McCann, Caiming               System Demonstrations, pages 38–45, Online. Asso-
  Xiong, and Richard Socher. 2019. Explain yourself!       ciation for Computational Linguistics.
  leveraging language models for commonsense rea-
  soning. In Proceedings of the 57th Annual Meet-        Jing Xu, Da Ju, Margaret Li, Y-Lan Boureau, Ja-
  ing of the Association for Computational Linguis-         son Weston, and Emily Dinan. 2020. Recipes for
  tics, pages 4932–4942, Florence, Italy. Association       safety in open-domain chatbots. arXiv preprint
  for Computational Linguistics.                            arXiv:2010.07079.
Dani Yogatama, Chris Dyer, Wang Ling, and Phil Blun-
  som. 2017. Generative and discriminative text clas-
  sification with recurrent neural networks. arXiv
  preprint arXiv:1703.01898.
Lantao Yu, Weinan Zhang, Jun Wang, and Yong Yu.
  2017. Seqgan: Sequence generative adversarial nets
  with policy gradient. In Thirty-First AAAI Confer-
  ence on Artificial Intelligence.
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015.
  Character-level convolutional networks for text clas-
  sification. In Advances in neural information pro-
  cessing systems, pages 649–657.
Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhut-
  dinov, Raquel Urtasun, Antonio Torralba, and Sanja
  Fidler. 2015. Aligning books and movies: Towards
  story-like visual explanations by watching movies
  and reading books. In The IEEE International Con-
  ference on Computer Vision (ICCV).
Daniel M Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B
  Brown, Alec Radford, Dario Amodei, Paul Chris-
  tiano, and Geoffrey Irving. 2019. Fine-tuning lan-
  guage models from human preferences. arXiv
  preprint arXiv:1909.08593.
A       GeDi filtering heuristic                            C     Additional model and
                                                                  hyper-parameter details
In addition to Equation 4, we also used an addi-
tional filtering heuristic that was beneficial for steer-   C.1    Hyper-parameters for GeDi guided
ing generation more aggressively. This heuristic,                  generation
inspired by top-p sampling (Holtzman et al., 2020),         GeDi used ρ = 0.7 and ω = 30 for sentiment,
samples from the set Vm ⊆ V which contains the              ρ = 0.8 and ω = 30 for GPT-2 detoxification,
minimum number of tokens possible from the head             ρ = 0.8 and ω = 90 for GPT-3 detoxification
of the distribution for Pθ (c|xt , x
enough to efficiently classify every candidate next
token. For the weighted decoding setting, we set
ρ = 0 which turns off filtering, and tune ω to give
a similar perplexity score to the combined heuristic
(higher ω results in more aggressive steering and
generally gives better attribute control and a worse
perplexity score). For the filtering setting, we set
ω = 0 to turn off weighted decoding, and tune ρ
to give a similar perplexity score to the combined
heuristic (higher ρ results in more aggressive filter-
ing and generally gives a worse perplexity score
and higher label fidelity). For evaluation, we mea-
sure the label fidelity according to an external clas-
sifier, and perplexity scores under GPT-2-XL, using
the prompts corresponding to the experiments in
Tables 2 and 4 for sentiment and detoxification re-
spectively. For tuning parameters, we use prompts
from IMDb to condition on for sentiment genera-
tions, and an additional trigger 100 prompts (that
do not overlap with the evaluation prompts) for
detoxification. We tune hyperparameters ρ and ω
to give a good trade-off between label fidelity (as
measured by RoBERTa) and perplexity scores. For
the LSTM discriminator, we train a unidirectional
LSTM with 600 hidden units, use mean pooling,
and tune the training learning rate to give the best
held out accuracy. The LSTM discriminator is then
used to guide generation by applying a forward
pass for each candidate token across the full vocab-
ulary, and applying Equations 4 and 7 to guide gen-
eration. This results in generation that is slightly
slower as compared to GeDi (assuming we batch
the LSTM forward passes across the vocabulary),
and results in higher memory usage.
   Results are given in Table 7 for sentiment and
Table 8 for detoxification. Both the filtering and
weighted decoding methods are able to control gen-
eration on their own, but the combined heuristic
appears to perform slightly better for detoxifica-
tion, and may be more robust to settings where one
method or the other do not work as well in isolation.
Using a lightweight LSTM discriminator to guide
generation gave weaker control over sentiment and
detoxification as compared with using GeDi.
Model                                                                            Label fidelity ↑     perplexity scores ↓
               GeDi-guided (combined heuristic, ρ = 0.7, ω = 20)                                      96 %                  13.5
               GeDi-guided (weighted decoding heuristic, ρ = 0, ω = 600)                              86 %                  13.6
               GeDi-guided (filtering heuristic, ρ = 0.7, ω = 0)                                      95 %                  13.3
               Lightweight LSTM discriminator greedy (combined heuristic, ρ = 0.8, ω = 30)            73 %                  16.6

Table 7: Sentiment label fidelity and perplexity scores for the weighted decoding heuristic (ρ = 0), filtering heuristic
(ω = 0), combined weighted decoding filtering heuristic, and comparing with a generative discriminator with a
lightweight LSTM discriminator.

            Model                                                                            Toxicity (RoBERTa) ↓       perplexity scores ↓
            GeDi-guided greedy (combined heuristic, ρ = 0.8, ω = 30)                                  8%                        10.9
            GeDi-guided greedy (weighted decoding heuristic, ρ = 0, ω = 150)                          13 %                      10.8
            GeDi-guided greedy (filtering heuristic, ρ = 0.85, ω = 0)                                 24 %                      10.7
            Lightweight LSTM discriminator greedy (combined heuristic, ρ = 0.8, ω = 30)               18 %                      10.9

Table 8: Toxicity and perplexity scores for the weighted decoding heuristic (ρ = 0), filtering heuristic (ω = 0),
combined weighted decoding filtering heuristic, and comparing with a generative discriminator with a lightweight
LSTM discriminator.

E    Statistical significance tables for human evaluation experiments

                    Model 1             Model 2               p-value positivity    p-value book resemblance        p-value fluency
                    GeDi-pos greedy     GPT2-XL greedy                 4E-05                   0.16                      0.44
                    GeDi-pos top-p      GPT2-XL top-p                  2E-07                   0.04                      0.09
                    GeDi-pos greedy     PPLM-pos greedy                 0.99                   0.49                      0.47
                    GeDi-pos top-p      PPLM-pos top-p                  0.01                   0.72                      0.01
                    GeDi-pos greedy     CCLM-pos greedy                 3E-4                  2E-05                     3E-05
                    GeDi-pos greedy     CTRL-pos greedy                2E-06                   0.06                     8E-4
                    GPT-2-greedy        GPT-2 top p                     0.07                   0.65                      0.05
                    GeDi-neg greedy     GPT2-XL greedy                 2E-07                   0.04                      0.01
                    GeDi-neg top-p      GPT2-XL top-p                  4E-07                  0.001                      0.06
                    GeDi-neg greedy     PPLM-neg greedy                0.005                   0.33                      0.35
                    GeDi-neg top-p      PPLM-neg top-p                  5E-4                   0.19                      0.21
                    GeDi-neg greedy     CCLM-neg greedy                 0.46                  2E-07                      0.01
                    GeDi-neg greedy     CTRL-neg greedy                 1E-4                  0.001                      0.67

Table 9: Statistical significance p-values for sentiment results in Table 2. We use a Wilcoxon signed rank test for
paired measures, since all models generate from the same set of prompts (and because a non-parametric test is
appropriate for an ordinal scale). All p-values are 2-tailed and compare the aligned models in first two columns for
positivity, book resemblance, and fluency.

                    Model 1                       Model 2                      p-value toxicity         p-value fluency
                    GeDi gpt2 greedy              GPT2 greedy                      3E-06                           0.005
                    GeDi gpt3 greedy              GPT3 greedy                      2E-07                            0.16
                    GeDi gpt2 top-p               GPT2 top-p                       6E-07                            0.64
                    GeDi gpt2 greedy              PPLM greedy                       0.87                            0.14
                    GeDi gpt2 top-p               PPLM top-p                        0.02                            0.18
                    GeDi gpt2 greedy              CCLM greedy                       0.13                           3E-04
                    GPT-2 greedy                  GPT-2 top-p                       0.12                           0.006

Table 10: Statistical significance p-values for toxicity results in Table 4. For toxicity, we use a 2 proportions z-test.
For fluency we use a Wilcoxon signed rank test for paired measures as in Table 9. All p-values are 2-tailed and
compare the aligned models in first two columns for toxicity and fluency.
F     Generation Samples
F.1   Sentiment/tone samples

      Model                                      Generation
                  21
                  "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in."
                  She dragged another chair out to the front of her desk and motioned for us
 GeDi-            all to sit down on it while she made herself a sandwich. ”What are you
                  doing here? I thought we were supposed to be having a private meeting!”
 guided           She snapped angrily through the mouthpiece of her headset before taking
 (negative)       a bite out of it with an angry huff that sounded like someone had taken
                  their toothbrush and tried to use it as an axe blade instead. „Youŕe
 (greedy)         just wasting your time here." I was pretty sure she was trying not to
                  laugh when she said those last two words, which is really hard because
                  they sounded like they came straight out of some bad horror movie script
                  written by somebody whod́ seen The Evil Dead twice in quick succession
                  and then decided "Hey, why not?" It wasnt́ until after my third or fourth
                  attempt that I finally managed something resembling humor: †And this
                  is why you should never ever ever ever ever call me again...* * * "Oh
                  hey guys." A voice boomed from somewhere off-screen (probably behind me).
                  "Sorry if there arent́ any other important things going on around here..

                  21
                  "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in." She
                  dragged another chair out to the front of her desk and motioned for us all
 GeDi-            to sit as she began speaking with me about my work with them.
 guided           I have been working with these girls since I started this position at
                  the end of last year, which is a little over two years ago, when I got
 (positive)       my start in this field here at the school and also at The Ranch Academy
                  where I am now located in Arizona where most of these girls are being
 (greedy)         educated on an intensive basis while they are here before going back home
                  to their families and communities once they get older.” It is a really
                  unique experience because there are very few places like it anywhere in
                  the country that can provide young girls so much support and care from day
                  one until they reach adulthood--from birth right up to graduation day!” I
                  think it is something that every family deserves! It’s really exciting
                  how things have worked out so far! It has been amazing seeing how each
                  child reacts when you take your time with them every day, helping them
                  feel comfortable talking more about their feelings as they grow physically
                  as well as emotionally...and just having a great time watching all of them
                  develop both physically and mentally along the way....

                 21
                 "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in."
                 She dragged another chair out to the front of her desk and motioned
 CC-LM           for us all to sit down.i’m sorry i couldn’t get a better
                 seat.this is one of those movies that you can watch over and
                 over again without ever getting bored or annoyed by it. it’s not really
 (negative)      about the characters but rather how they interact with each other in this
                 movie - what happens when two people meet at a party? what happens when
 (greedy)        they go on vacation together? how do we make friends while traveling
                 alone? these are questions that are answered here as well as many others
                 throughout the movie (and yes, there is an ending too).the
                 acting was good enough but nothing special - just average at best.there were some funny moments though...1) jennifer:
                  i don’t think anyone would have thought me capable of such
                 stupidity! 2) tracy:
Model                                        Generation
                  21
                  "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in."    She
                  dragged another chair out to the front of her desk and motioned for us   all
 GPT-2            to sit down on it as she sat down herself, a little awkwardly at first   but
                  then settled into the chair with a smile on her face and an arm around   my
 (greedy)         shoulder.

                  21
                  "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in."
                  She dragged another chair out to the front of her desk and motioned for
 PPLM             us all to sit down on it as she walked over to a chair across from me and
 (negative)       sat down next to me with her feet crossed on the floor behind my desk.
                  I’m sorry I didn’t have time today but I wanted you all here so we could
 (greedy)         get this off our plates before we move forward with anything else that’s
                  important at work or school or wherever else you are right now! She
                  looked up at me and then asked: You know what? You’re not going anywhere
                  until you’ve cleared this up! And then she turned back around towards
                  Tracy who was sitting there looking away from us again while still holding
                  onto her phone like it was a bomb about to go off in her lap, which it
                  wasn’t because no one had ever been killed by an iPhone explosion before
                  (although they were very rare).

                  21
                  "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in." She
                  dragged another chair out to the front of her desk and motioned for us all
 PPLM             to sit down on it as she sat herself down across from us with a smile on
                  her face. "Iḿ very excited about this yearś crop," I told her as we sat
 (positive)       across from each other at the front of my desk, "and I think youĺl see
 (greedy)         that there are some really great things happening in our industry." She
                  smiled back at me and told me that she was looking forward to seeing what
                  else we could do together over the next few years--a sentiment I shared
                  wholeheartedly with everyone who had come into contact with my company
                  during this past year or so--but then quickly added: "But first letś get
                  started! Weŕe going to have a lot more fun working together!" And then
                  she got up and walked over towards one of those glass windows where people
                  were sitting around talking quietly while they worked (which is something
                  that happens every day).

Table 12: Generation with raw GPT-2 and PPLM when controlling the sentiment/tone of the text (greedy decoding).
Boldfaced indicates the context provided to the language model followed by its generation. Like GeDi,
PPLM was able to achieve domain transfer of sentiment. However, GeDi led to 30x faster generation.
Model                                        Generation
                 21
                 "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in." She
                 dragged another chair out to the front of her desk and motioned for us all
 CTRL            to sit. I sat down in one of the chairs and she sat in another. We were
 (negative)      all staring at her. Her hair was a mess from her morning shower. She
                 looked like she had just gotten out of bed. Her eyes were red as if they
 (greedy)        had been crying. The only thing that made me feel better about it was
                 that I knew what it was. It wasn’t my fault. It wasn’t my fault. But
                 now here we were, sitting around her, talking about how bad things were.
                 How could this be happening again? What did you do to make your life so
                 miserable? Why didn’t you tell me? Why didn’t you try harder or talk
                 more with me? You never told me. You never tried harder. And now here
                 we are, sitting here, talking about it. Tracy, why don’t you tell me what
                 happened? What happened between you? Did anything happen? Did anything
                 ever happen between us? Because I’m not going to. I’m not going to.
                 If you’re telling me, then it’s because something happened. Something
                 bad. Something really bad. And if you’re not, then it’s because nothing
                 happened. Nothing ever happened between us. So why are we here? What’s
                 wrong with us? Is there something wrong with our relationship? Are we...
                 is there something wrong? Is there...” Tracy stopped herself before she
                 said any more. “I don’t know. It’s just... I can’t explain it. It’s
                 too much. I’ve been thinking a lot lately. Too much.” She looked up at
                 me.

                 21
                 "Hello, Jennifer, Bella." Tracy said. "Thanks so much for coming in." She
                 dragged another chair out to the front of her desk and motioned for us all
 CTRL            to sit. I sat down in one of the chairs and she sat in another. We were
                 all staring at her. Her hair was a mess from her morning shower. She
 (positive)      looked like she had just gotten out of bed. Her eyes were red as if they
 (greedy)        had been crying. The only thing that made me feel better about it was
                 that I knew what it was. It wasn’t my fault. It wasn’t my fault. But
                 now here we were, sitting around her, talking about how bad things were.
                 How could this be happening again? What did you do to make your life so
                 miserable? Why didn’t you tell me? Why didn’t you try harder or talk
                 more with me? You never told me. You never tried harder. And now here
                 we are, sitting here, talking about it. Tracy, why don’t you tell me what
                 happened? What happened between you? Did anything happen? Did anything
                 ever happen between us? Because I’m not going to. I’m not going to.
                 If you’re telling me, then it’s because something happened. Something
                 bad. Something really bad. And if you’re not, then it’s because nothing
                 happened. Nothing ever happened between us. So why are we here? What’s
                 wrong with us? Is there something wrong with our relationship? Are we...
                 is there something wrong? Is there...” Tracy stopped herself before she
                 said any more. “I don’t know. It’s just... I can’t explain it. It’s
                 too much. I’ve been thinking a lot lately. Too much.” She looked up at
                 me.

Table 13: Controlling sentiment/tone of the text with CTRL (greedy decoding). Boldfaced indicates the context
provided to the language model followed by its generation. CTRL is unable to control the sentiment of book
text because it was trained on Amazon reviews.
You can also read