Energy-Based Models for Code Generation under Compilability Constraints

Page created by Michelle Erickson
 
CONTINUE READING
Energy-Based Models for Code Generation
                                                                     under Compilability Constraints

                                               Tomasz Korbak,1,∗ Hady Elsahar,2 Marc Dymetman,2 Germán Kruszewski2
                                                                     t.korbak@sussex.ac.uk
                                            {hady.elsahar,marc.dymetman,german.kruszewski}@naverlabs.com
                                                                 1
                                                                   University of Sussex, United Kingdom
                                                                       2
                                                                         Naver Labs Europe, France

                                                               Abstract                              gressive language models have been favoured due
arXiv:2106.04985v1 [cs.LG] 9 Jun 2021

                                                                                                     to their scalability and generic training procedure
                                            Neural language models can be successfully
                                            trained on source code, leading to applications
                                                                                                     that can exploit large codebases (e.g. open source
                                            such as code completion. However, their ver-             code repositories available on GitHub) through self-
                                            satile autoregressive self-supervision objective         supervised training.
                                            overlooks important global sequence-level fea-              Despite these desirable traits, neural language
                                            tures that are present in the data such as syn-          models, trained in the standard way, are known
                                            tactic correctness or compilability. In this             to suffer from myopia and to overlook global
                                            work, we pose the problem of learning to
                                                                                                     sequence-level features that are present in the data
                                            generate compilable code as constraint satis-
                                            faction. We define an Energy-Based Model                 and which might be crucial for the quality of gen-
                                            (EBM) representing a pre-trained generative              erated sequences (Parshakova et al., 2019b). This
                                            model with an imposed constraint of generat-             leads to repetitions, hallucinations and failing to
                                            ing only compilable sequences. We then use               capture long-distance consistency requirements. In
                                            the KL-Adaptive Distributional Policy Gradi-             a code generation context, this is demonstrated in
                                            ent algorithm (Khalifa et al., 2021) to train            compilation errors that are a common failure mode
                                            a generative model approximating the EBM.
                                                                                                     in such tasks as translation between programming
                                            We conduct experiments showing that our pro-
                                            posed approach is able to improve compilabil-
                                                                                                     languages (Roziere et al., 2020). This problem has
                                            ity rates without sacrificing diversity and com-         inspired a large body of work on different fronts
                                            plexity of the generated samples.                        on injecting sequence-level priors by either directly
                                                                                                     optimizing sequence-level features (Ranzato et al.,
                                        1   Introduction                                             2016) or through fusion with grammars and au-
                                        Code completion is an essential feature of any mod-          tomata (Xiao et al., 2016). These techniques aim
                                        ern Integrated Development Environment (IDEs).               to balance between the desirable traits and fast
                                        It supports developers with recommendations about            inference of neural autoregressive models trained
                                        the next token to write given a context, speed-              in the standard way and the satisfaction of global
                                        ing up software development and reducing the                 sequence-level features.
                                        number of mistakes. A large body of work has                    In this work, we formulate compilable code gen-
                                        relied on statistical language modeling, treating            eration as a constraint satisfaction problem. We
                                        programming languages as natural languages us-               show that this formulation leads to a unique dis-
                                        ing probabilistic grammars (Raychev et al., 2014;            tribution represented by an Energy-Based Model
                                        Bielik et al., 2016), and more recently relying on           (EBM). This unique distribution by definition fully
                                        neural language models (Liu et al., 2016a; Svy-              satisfies the compilability constraints while having
                                        atkovskiy et al., 2020a,b; Arkesteijn et al., 2020;          a minimal KL divergence from the original autore-
                                        Ciniselli et al., 2021).1 In particular, neural autore-      gressive generative model trained through cross en-
                                            ∗                                                        tropy. We then train an auto-regressive generative
                                              Work done during a research internship at Naver Labs
                                        Europe.                                                      model to approximate the underlying distribution
                                           1
                                             See Allamanis et al. (2018) for a survey.               of this EBM using the KL-Adaptive Distributional
Policy Gradient algorithm (Khalifa et al., 2021).               tion (Ranzato et al., 2016) or actor critic (Konda
   In our experiments, we show that our approach                and Tsitsiklis, 2000) for abstractive summariza-
significantly improves compilability rates without              tion (Paulus et al., 2018), caption generation (Liu
sacrificing diversity or complexity of the generated            et al., 2016b), dialogue (Li et al., 2016b), and video
examples. This alleviates the drawbacks of                      captioning (Pasunuru and Bansal, 2017). Some ap-
reinforcement learning fine-tuning techniques that              proaches (for instance, in machine translation and
maximize compilability but deviate significantly                summarization (Ranzato et al., 2016; Bahdanau
from the original generative model, which leads                 et al., 2017)) directly optimize performance met-
to severe loss in diversity and complexity of the               rics such as BLEU and ROUGE at training time.
generated samples. Finally, we complement our                   Others use heuristic rewards (for instance Li et al.
experiments with a qualitative analysis of the                  (2016b) for dialogue generation and Tambwekar
effect of several fine-tuning approaches on the                 et al. (2019) for story generation) in order to ob-
distribution of compilation errors.                             tain certain a priori desirable features of generated
                                                                sequences that then incentivize good performance
                                                                on target metrics. A weakness of using RL in fine-
2    Related Work                                               tuning generative models is the problem of catas-
                                                                trophic forgetting: maximizing global, sequence-
Imposing compilability constraints on genera-
                                                                level rewards leads to very large deviations from
tive models There is a body of work focusing on
                                                                the original autoregressive model trained through
unconditional code generation or code completion:
                                                                cross-entropy. This often results in significant re-
generating a piece of source code given a preceding
                                                                ductions in fluency and diversity of generated sam-
piece of source code (Nguyen et al., 2013; Raychev
                                                                ples. The catastrophic forgetting problem is some-
et al., 2014; Karpathy et al., 2015; Bielik et al.,
                                                                times addressed by imposing a penalty term to the
2016). That work, however, focuses on perplexity
                                                                rewards, such as the KL divergence between the
and similarity with respect to ground truth comple-
                                                                trained policy and the auto-regressive model. This
tions (in terms of exact-match accuracy, Levens-
                                                                approach, termed “conservative fine-tuning”, was
thein distance and ROUGE scores) (Svyatkovskiy
                                                                applied to generating melodies with music theory
et al., 2020a; Lu et al., 2021), usually failing to
                                                                rewards and organic molecules with synthesizabil-
measure and control for compilability of generated
                                                                ity rewards by Jaques et al. (2017) as well fine-
sequences or semantic and syntactic constraints in
                                                                tuning language models for controllable language
general.2 On the other hand, semantic and syntactic
                                                                generation by Ziegler et al. (2019). This solution
constraints are frequently considered in language-
                                                                doesn’t have an explicit notion of the optimal pol-
to-code translation or program synthesis. For in-
                                                                icy and often has hard time balancing between the
stance, Zhong et al. (2017), who used policy gra-
                                                                reward term and the KL penalty term, leading to
dients to train a model for translating natural lan-
                                                                instability in training (Khalifa et al., 2021). Unlike
guage questions to corresponding SQL queries and
                                                                this approach, our formulation defines the optimal
– in addition for rewarding for query execution re-
                                                                distribution that satisfies both requirements.
sults – added a penalty for syntactically invalid
queries. Taking that one step further, Kulal et al.             Energy-based models for text Energy-based
(2019) use compilation errors (with their precise               models (EBMs) (Hinton, 2002; LeCun et al., 2006;
location) to guide search over the space of possible            Ranzato et al., 2007) are a family of probabilistic
programs.                                                       graphical models in which learning and inference
Optimizing sequence-level rewards for text gen-                 are done by associating an unnormalized probabil-
eration Most previous attempts at steering au-                  ity with each configuration of observed and latent
toregressive model to conform to global constraints             variables. Early examples of EBMs applied to natu-
defined over entire sequence have employed re-                  ral language processing include sequence labeling
inforcement learning (RL). This includes using                  problems (e.g. tagging) exploiting global proper-
Reinforce (Williams, 1992a) for machine transla-                ties of a sequence (Andor et al., 2016; Belanger
                                                                and McCallum, 2016). A recent surge of interest in
   2
     One exception is the work of Maddison and Tarlow           EBMs (Du and Mordatch, 2019) has not left text
(2014), who augment neural probabilistic context free gram-
mars with semantic constraints and use them for unconditional   generation unaffected (see (Bakhtin et al., 2020)
generation.                                                     for a survey). Tu et al. (2020) proposed an energy-
based inference networks for non-autoregressive         sampling sequences from a and filtering on b(x).
machine translation. Parshakova et al. (2019b) and      While this method sounds simple, there’s no di-
Deng et al. (2020) augment a autoregressive lan-        rect way of using it for interactive code completion
guage models with an additional global factor to ob-    as sampling full sequences till the end is neces-
tain a lower perplexity on the training data. Khalifa   sary to filter through the sequence-level filter b(x).
et al. (2021) develop a novel approach to distribu-     Therefore our objective here is to obtain another
tional controllable text generation by constructing     autoregressive policy πθ to directly approximate p.
an EBM satisfying desired statistical constraints          To attain this, Khalifa et al. (2021) (following
imposed on the set of generated sequences (such         Parshakova et al. (2019a)) developed a training pro-
as topic or gender statistics over the sequences)       cedure called KL-Adaptive Distributional Policy
and then train an autoregressive policy to approx-      Gradients (KL-DPG) to train πθ to minimize the
imate it, which can be sampled from efficiently.        KL divergence between p and πθ . The gradient of
We build on Khalifa et al.’s approach by applying       this KL turns out to be tractable:
it to a novel domain outside natural language and
defining a new kind of constraint: compilability.                                        p(x)
                                                          ∇θ DKL (p, πθ ) = ∇θ Ex∼p log                   (4)
                                                                                        πθ (x)
3   Method
                                                                           = −∇θ Ex∼p log πθ (x)          (5)
Following Khalifa et al. (2021), we formulate com-                         = −Ex∼p ∇θ log πθ (x)     (6)
pilable code generation as a constraint satisfaction
                                                                              1 X
problem over a space of generative models. There                           =−       P (x)∇θ log πθ (x)
                                                                              Z x
are two constraints that a target generative model
p must satisfy. First, p must have minimal diver-                                                         (7)
gence -in the distribution space- from an original
generative model a pre-trained using a standard         Let us now absorb the constant −1/Z into a learn-
autoregressive language modeling objective. Sec-        ing rate α(θ) and estimate the expectation over p(x)
ond, it must generate only sequences that satisfy       using importance sampling (Owen, 2013) from yet
a certain sequence level constraint b. In our case,     another generative model q:
b(x) = 1 iff x is a syntactically correct Python
program and b(x) = 0 otherwise. There two con-                                     P (x)
straints can be represented as a product-of-experts       ∇θ DKL (p, πθ ) ∝ Ex∼q         ∇θ log πθ (x). (8)
                                                                                   q(x)
(Hinton, 2002) energy-based model

                P (x) = a(x)b(x).                (1)    During training, both πθ and q are initialized as a.
                                                        Then, q is periodically updated to πθ if πθ surpasses
p(x) can be obtained from P (x) by dividing it by       q in being closer to p (in terms of KL). For a pseudo-
a normalization constant Z:                             code of the whole KL-DPG training procedure, see
                                                        Algorithm 1.
                             1
                 p(x) =        P (x),            (2)       The gradient in (8) is similar to an estimate ob-
                             Z
                                                        tained using policy gradients methods in standard
where                                                   reinforcement learning (Sutton et al., 1999) with
                   . X
                  Z=   P (x).                    (3)    P (x)/q(x) playing the role of a pseudoreward.
                         x                              This similarity, however, is superficial. Our ob-
This EBM P is unique, it represents a distribution      jective is approximating a target generative model
p that optimally reconciles the two constraints. It     p by minimizing DKL (p, πθ ) rather than maximiz-
is a special case of the generalized maximum en-        ing expected reward b(x) or P (x) or P (x)/q(x).
tropy formulation presented in (Csiszár and Shields,   As we show in Section 5, these objectives produce
2004) for applying constraints over distributions.      vastly different policies which diverge from p and
   However, one problem still remains: it is not        catastrophically forget what the pretrained model a
straightforward how to draw samples x ∼ p(x)            knew about its training domain. Furthermore, since
or even evaluating probability p(x) from this op-       q will always be close to πθ , our pseudoreward
timal unique distribution. A simple method for          P (x)/q(x) effectively depends on policy parame-
drawing samples from the p distribution could be        ters θ.
Algorithm 1 KL-DPG                                      syntax and ValueError or OverflowError
Require: EBM P , initial generative model a             if there is an invalid literal.
 1: πθ ← a                                                 This notion of compilability is concerned only
 2: q ← a                                               with syntactic correctness and does not execute
 3: for each iteration do                               the body of a function. However, we found the
 4:     for each episode do                             initial compilability rate Ex∼a b(x) of functions x
 5:         sample x from q(x)                          sampled from a(x) to be only 0.56, which leaves a
 6:         θ ← θ + α(θ) Pq(x)
                           (x)
                               ∇θ log πθ (x)            large margin for improvement.4

    7:if DKL (p||πθ ) < DKL (p||q) then                 KL-DPG training πθ and q share their architec-
 8:       q ← πθ                                        ture with a but have separate weights which are
Ensure: πθ                                              only initially identical to a’s. Throughout the train-
                                                        ing, πθ will be updated to approximate p. See Table
                                                        2 in the Appendix for a complete list of hyperpa-
4        Experiments                                    rameters used for training πθ and q using KL-DPG.
4.1       Setup                                         4.2   Baselines
Dataset: To prepare the training dataset, we            We compare our method to a common approach of
started from the Python150 dataset, which consists      using standard reinforcement learning to fine-tune a
of 150k Python source code files obtained from          generative model to conform to desired constraints.
GitHub (Raychev et al., 2016). Then, using the          We use the Reinforce algorithm (Williams, 1992b)
code from Roziere et al. (2020), we extracted 713k      which instead of minimizing divergence from the
Python functions (both methods and standalone           target distribution p tries to maximize expected
functions) from it (250 MB of raw text data). The       reward Eπθ R(x). We consider two kinds of reward
additional filtering criteria were compilability (ac-   R(x):
cording to b(x)) and being less than 128 BPE to-
kens long. The dataset was then split into a training      • R(x) = b(x), where the generative model is
subset Dtrain and test subset Dtest .                        simply rewarded for generating sequences that
                                                             compile;
Initial generative model a: We implemented a
using the GPT-2 (Radford et al., 2019) architecture        • R(x) = P (x), where the generative model is
with 117m parameters (gpt2-small) and kept                   simply rewarded proportionally to the score
all the original hyperparameters (see Table 1 in the         our EBM assigns to x. Intuitively, this objec-
Appendix). We trained a byte-level BPE tokenizer             tive gives reward for both compilability and
(Sennrich et al., 2016) with special BOS and EOS             respecting the original generative model a.
tokens to obtain a vocabulary of 50k tokens. The
                                                        4.3   Evaluation Metrics
model was trained for one epoch.
                                                        We evaluate KL-DPG and two baselines in terms
Compilability Scorer b: To check for compi-             of the following metrics:
lability, we call the compile command func-
tion from codeop module of Python Standard Li-            1. Ex∼πθ b(x), compilability rate of sequences
brary3 with a sequence x as argument and check               sampled from πθ (x),
if it returns a code object. We apply no postpro-         2. DKL (p, πθ ), the forward KL divergence from
cessing other than removing BOS and EOS tokens.              the optimal distribution p,
codeop.compile command is the implemen-
tation that Python interactive interpreters use in        3. DKL (πθ , a), the reverse KL divergence from
read-eval-print loop (REPL) to determine whether             the original pretrained generative model,
a string is a valid Python code. The method tries to      4. Distinct-1 score, a measure of text diversity in
compile a string of Python code and raise and ex-            terms of the frequency of token repetitions in
ception if there is a problem with the Python code,          a sample x, proposed in the context of NLP
in particular a SyntaxError for invalid Python               by (Li et al., 2016a),
  3                                                        4
    https://docs.python.org/3/library/                       Note that initial compilability
                                                                                P                 P be equal to our Z
                                                                                             rate will
codeop.html                                             because Ex∼a b(x) = x a(x)b(x) = x P (x) = Z.
5. Self-BLEU-5, a measure of text diversity
       across samples, proposed in the context of
                                                                   1.0       KL-DPG
       NLP by (Zhu et al., 2018),                                            R(x) = b(x)
    6. Perplexity measured on Dtest , a held-out sub-              0.9       R(x) = P(x)
       set of the data used for training a, calculated
       as
                                                                   0.8

                                                          E b(x)
                    h    1 X                 i
                exp −              log πθ (x) ,
                         N
                           x∈Dtest

       where N is the overall number of tokens in                  0.7
       Dtest .
    7. Sequence length, the average number of char-                0.6
       acters in generated sequence x after detok-
       enization,
                                                                         0        100         200
    8. AST node count, the average number of nodes
       in an abstract syntax tree (AST) of sequences
                                                                               gradient updates
       that compile. Samples are parsed to their cor-     Figure 1: Compilability rate Ex∼πθ b(x) (↑ better) of sam-
       responding ASTs using the ast module from          ples from policies obtained from KL-DPG, and two base-
       Python Standard Library.5 Intuitively, this        lines: Reinforce with reward R(x) = b(x) and with reward
                                                          R(x) = P (x).
       metric should indicate the logical (as opposed
       to surface) complexity of generated programs,
    9. PEP8 error frequency, the average number           sequence sampled from a. This heavily decreased
       of violations of PEP8, the style guide for         sequence length (most of the generated functions
       Python,6 measured using pycodestyle,7 an off-      are one-liners) seems to artificially increase diver-
       the-shelf linter (static code analysis tool). We   sity metrics (Self-BLEU-5 and Distinct-1).
       report the average number of errors per charac-       Reinforce with R(x) = P (x) doesn’t improve
       ter to avoid confounding by sequence length.       compilability rate until an inflection point after
                                                          which it quickly reaches perfect compilability at a
   While high compilability rate is the target, the
                                                          price of heavily diverging from both a and (perhaps
remaining metrics control for various aspects of
                                                          counterintuitively) p. The reason behind that, how-
fluency, quality and diversity of generated sam-
                                                          ever, is that the policy heavily peaks around a single
ples. Most but not all of these aspects reduce to the
                                                          sequence that is compilable. To understand what
constraint of staying close to a; for instance, it is
                                                          causes this behavior, first note that the objective
possible for πθ to actually outperform a in match-
                                                          for Reinforce with R(x) = P (x) is to maximize
ing the statistics of a’s own training distribution
                                                          Ex∼πθ [a(x)b(x)]. Because R(x) = 0 for uncom-
p∗ (x).
                                                          pilable sequences, compilation rate will improve.
5     Results                                             But for compilable sequences, the effective reward
                                                          is R(x) = a(x) meaning that πθ is rewarded most
We present the evolution of nine evaluation metrics       for generating the most probable sequences (ac-
as a function of gradient updates on Figures 1 and 2.     cording to a(x)), making them even more probable.
   Reinforce with R(x) = b(x) quickly improves            Eventually, Ex∼πθ a(x) is maximized by a policy
compilability by a large margin but this improve-         peaking on a single sample x that was the most
ment is mirrored by an equally large divergence           probable one according to a(x). This failure mode
from p and a. This divergence translates into gener-      is reflected in diversity metrics and perplexity. The
ating sequences much shorter (in terms of the num-        sequence the policy peaks on is also shorter and
ber of characters) and logically simpler (in terms        less complex than an average sequence sampled
of the number of nodes in its AST) than an average        from a.
  5
    https://docs.python.org/3/library/ast.                   KL-DPG is the only method that consistently
html
  6                                                       improves compilability rate while decreasing di-
    https://www.python.org/dev/peps/
pep-0008/                                                 vergence from p, maintaining the diversity of a
  7
    https://github.com/PyCQA/pycodestyle                  and only slightly decreasing sequence length and
2.0                                                2.0                     0.5                                      1.00        KL-DPG
                                                                                                                                              R(x) = b(x)
              1.5                                                1.5                     0.4                                      0.95        R(x) = P(x)

                                                                                                                    Self-BLEU-5
                                                                         Distinct-1
                                                      KL( , a)
KL(p, )

              1.0                                                1.0                                                              0.90
                                                                                         0.3
              0.5                                                0.5                                                              0.85
                                                                                         0.2
                                                                                                                                  0.80
              0.0 0          200         0.0 0          200                                    0          200                            0          200
                  gradient updates           gradient updates                                  gradient updates                          gradient updates
                                      0.0275                                             160                               1.0175
                  30                  0.0250                                                                               1.0174
                                   PEP8 error frequency

                                                                                         140

                                                                       Sequence length
 AST node count

                  25                  0.0225
                                                                                                                           1.0173

                                                                                                              Perplexity
                                                                                         120
                                      0.0200
                  20                                                                     100                               1.0172
                                      0.0175
                                      0.0150                                             80                                1.0171
                  15
                                      0.0125                                             60                     1.0170 0
                     0          200          0          200                                    0          200                     200
                     gradient updates        gradient updates                                  gradient updates        gradient updates
Figure 2: Evaluation metrics KL(p|πθ ) (↓ better), KL(πθ |a) (↓ better), Self-BLEU-5 (↓ better), Distinct-1 (↑ better), AST node
count (↑ better), PEP8 error count (↓ better), sequence length (↑ better), and perplexity (↓ better) for policies obtained from
KL-DPG, and two baselines: Reinforce with reward R(x) = b(x) and with reward R(x) = P (x).

the number of nodes in ASTs. Moreover, as a by-                              sults on Figure 4. This qualitative evaluation paints
product of improving compilability, KL-DPG is                                a similar picture: fine-tuning using Reinforce in-
also able to slightly decrease the perplexity and the                        curs a large (with R(x) = b(x)) or extreme (with
frequency of PEP8 violations per character. We                               R(x) = P (x)) decrease in token diversity. In con-
conjecture the decrease in perplexity is because                             trast, KL-DPG is able to maintain a relatively long
compilability provides a training signal enabling                            tail of token frequencies, not departing too far from
πθ to fit the a’s training distribution p∗ (x) better                        a.
than a was able to.8 The decrease in the frequency                              Moreover, in order to gain better understanding
of PEP8 violations might be due to the fact that                             of how different fine-tuning methods affect genera-
compilability is correlated with PEP8 compliance.                            tive models we measured the frequency of different
                                                                             categories of compilation errors for samples from
5.1                Qualitative evaluation                                    a and from fine-tuned policies. This analysis is pre-
To further analyze effects of different fine-tuning                          sented on Figure 3. We categorized errors using er-
approaches on sample diversity, we measured the                              ror messages produced by Python interpreter trying
frequency of BPE tokens in generated samples. For                            to compile an uncompilable sequence. invalid
each of four analyzed generative models, we sam-                             syntax is the most common failure mode (30%
pled 1000 sequences using pure ancestral sampling.                           of all sequences sampled from a), with a long tail
We then computed the frequency for each BPE to-                              of other error categories. We can see that both
ken (the number of times it occurs) and its rank (its                        KL-DPG and Reinforce with R(x) = b(x) consis-
index in a sorted list of tokens). We plotted these re-                      tently decrease error frequency across almost all
                                                                             the categories.
          8
     This mirrors the results obtained by Parshakova et al.                     Finally, in the Appendix we present randomly
(2019b), who also defined an EBM augmenting an autoregres-                   generated samples from each discussed policy. Ta-
sive model with prior knowledge about features of the training
set and observed a decrease in perplexity compared to pure                   bles 3-6 contain samples obtained through uncon-
autoregressive training.                                                     ditional generation. In addition to that, to illustrate
EOL while scanning          unexpected EOF while                                       unindent does not
                  invalid syntax             string literal                 parsing                    duplicate argument match any outer indentation level    unexpected indent      keyword argument repeated
  40%                                8%                          4%                               2%                            2%                        2%                        1.0%
  30%                                6%                          3%                             1.5%                          1.5%                      1.5%                        0.8%
  20%                                4%                          2%                             1.0%                          1.0%                      1.0%                        0.5%
  10%                                2%                        1.0%                             0.5%                          0.5%                      0.5%                        0.2%
 0.0%                              0.0%                        0.0%                             0.0%                          0.0%                      0.0%                        0.0%
     unexpected character after positional argument follows          EOF while scanning                invalid character in                               positional argument follows non-default argument follows
     line continuation character    keyword argument             triple-quoted string literal                identifier              invalid token       keyword argument unpacking        default argument
 1.0%                              0.8%                        0.6%                             0.3%                          0.3%                      0.3%                        0.3%
                                                                                                                                                                                                     a
 0.8%                              0.6%                                                                                                                                                              KL-DPG
                                                               0.4%                             0.2%                          0.2%                      0.2%                        0.2%             R(x) = b(x)
 0.5%                              0.4%
 0.2%                              0.2%                        0.2%                             0.1%                          0.1%                      0.1%                        0.1%
 0.0%                              0.0%                        0.0%                             0.0%                          0.0%                      0.0%                        0.0%

Figure 3: The frequency (measured as the percentage of samples from πθ (x) causing a given error) of each kind compilation
error for the original generative model a and policies fine-tuned using KL-DPG and Reinforce with R(x) = b(x). The policy
fine-tuned using Reinforce with R(x) = P (x) was excluded because the single sequence it produces causes no compilation
errors. Percentages were computed using 500 samples while confidence intervals were based on 3 repeats of the sampling
procedure.

                                                                                                                   complexity of generated samples.
                                                                            a                                         One obvious application of the presented ap-
                                                                            KL-DPG
                  103                                                       R(x) = b(x)
                                                                                                                   proach is improving the accuracy of code com-
                                                                                                                   pletion, i.e. tools assisting in programming by
                                                                            R(x) = P(x)                            predicting the next tokens based on context (Svy-
token frequency

                                                                                                                   atkovskiy et al., 2020a). The fact that fine-tuning
                  102                                                                                              using KL-DPG has a beneficial effect on perplex-
                                                                                                                   ity and PEP8 error frequency suggests that it can
                                                                                                                   provide a training signal complementary to that in
                  101                                                                                              a language modeling objective. The benefits of this
                                                                                                                   auxilary training signal would arguably diminish
                                                                                                                   with increased training time and datatset size, but
                                                                                                                   that still leaves room for significant improvement
                  100                                                                                              in low-resource domains.
                            0             2000    4000                              6000                              A limitation of the current KL-DPG approach
                                             token rank                                                            is that it is restricted to unconditional generation.
                                                                                                                   This is because for a conditional EBM P (x, c) the
Figure 4: Token frequency against token rank computed
for tokens found in samples from from KL-DPG, and two                                                              proportionality constant −1/Z from (4) would de-
baselines. Longer tails imply more diverse samples.                                                                pend on a context c. Nevertheless, one can imag-
                                                                                                                   ine using a policy πθ fine-tuned using KL-DPG as
                                                                                                                   initialization of a decoder for conditional genera-
the applicability of obtained policies for code com-
                                                                                                                   tion, e.g. transpilation (translation between pro-
pletion, in Tables 7-9 we present samples obtained
                                                                                                                   gramming languages) or program synthesis (trans-
through conditional generation, i.e. x ∼ πθ (x|c),
                                                                                                                   lation from a natural language to a programming
where the context c is a function name. In either
                                                                                                                   language).
case, samples were obtained using pure ancestral
sampling.
                                                                                                                   References
6                 Discussion
                                                                                                                   Miltiadis Allamanis, Earl T. Barr, Premkumar T. De-
In the paper, we presented a new energy-based                                                                        vanbu, and Charles Sutton. 2018. A survey of ma-
model formulation for the problem of imposing                                                                        chine learning for big code and naturalness. ACM
the constraint of compilability on an autoregressive                                                                 Comput. Surv., 51(4):81:1–81:37.
generative model for source code. In contrast with
                                                                                                                   Daniel Andor, Chris Alberti, David Weiss, Aliaksei
standard reinforcement learning approaches, the                                                                      Severyn, Alessandro Presta, Kuzman Ganchev, Slav
solution we propose – KL-DPG – is able to improve                                                                    Petrov, and Michael Collins. 2016. Globally Nor-
compilability rate without sacrificing diversity and                                                                 malized Transition-Based Neural Networks.
Youri Arkesteijn, Nikhil Saldanha, and Bastijn           Muhammad Khalifa, Hady Elsahar, and Marc Dymet-
  Kostense. 2020.    Code completion using neu-           man. 2021. A distributional approach to controlled
  ral attention and byte pair encoding.   CoRR,           text generation. In International Conference on
  abs/2004.06343.                                         Learning Representations.

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu,            Diederik P Kingma and Jimmy Ba. 2014. Adam: A
  Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron C.        method for stochastic optimization. arXiv preprint
  Courville, and Yoshua Bengio. 2017. An actor-critic      arXiv:1412.6980.
  algorithm for sequence prediction. In 5th Inter-
                                                         Vijay Konda and John Tsitsiklis. 2000. Actor-critic al-
  national Conference on Learning Representations,
                                                            gorithms. In Advances in Neural Information Pro-
  ICLR 2017, Toulon, France, April 24-26, 2017, Con-
                                                            cessing Systems, volume 12. MIT Press.
  ference Track Proceedings. OpenReview.net.
                                                         Sumith Kulal, Panupong Pasupat, Kartik Chandra,
A. Bakhtin, Y. Deng, S. Gross, Myle Ott, Marc’Aurelio      Mina Lee, Oded Padon, Alex Aiken, and Percy S
   Ranzato, and Arthur Szlam. 2020. Energy-based           Liang. 2019. Spoc: Search-based pseudocode to
   models for text. ArXiv, abs/2004.10188.                 code. In Advances in Neural Information Process-
                                                           ing Systems, volume 32. Curran Associates, Inc.
David Belanger and Andrew McCallum. 2016. Struc-
  tured prediction energy networks. In Proceedings       Yann LeCun, Sumit Chopra, Raia Hadsell,
  of the 33rd International Conference on Interna-         Marc’Aurelio Ranzato, and Fu Jie Huang. 2006. A
  tional Conference on Machine Learning - Volume 48,       Tutorial on Energy-Based Learning. In Predicting
  ICML’16, pages 983–992. JMLR.org.                        Structured Data. MIT Press.

Pavol Bielik, Veselin Raychev, and Martin Vechev.        Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao,
  2016. Phog: Probabilistic model for code. In Pro-         and Bill Dolan. 2016a. A diversity-promoting ob-
  ceedings of the 33rd International Conference on In-      jective function for neural conversation models. In
  ternational Conference on Machine Learning - Vol-         Proceedings of the 2016 Conference of the North
  ume 48, ICML’16, page 2933–2942. JMLR.org.                American Chapter of the Association for Computa-
                                                            tional Linguistics: Human Language Technologies,
                                                            pages 110–119, San Diego, California. Association
Matteo Ciniselli, Nathan Cooper, Luca Pascarella,
                                                            for Computational Linguistics.
 Denys Poshyvanyk, Massimiliano Di Penta, and
 Gabriele Bavota. 2021. An empirical study on the        Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky,
 usage of BERT models for code completion. CoRR,            Michel Galley, and Jianfeng Gao. 2016b. Deep rein-
 abs/2103.07115.                                            forcement learning for dialogue generation. In Pro-
                                                            ceedings of the 2016 Conference on Empirical Meth-
Imre Csiszár and Paul C. Shields. 2004. Information        ods in Natural Language Processing, EMNLP 2016,
  theory and statistics: A tutorial. Commun. Inf. The-      Austin, Texas, USA, November 1-4, 2016, pages
  ory, 1(4):417–528.                                       1192–1202. The Association for Computational Lin-
                                                            guistics.
Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam,
  and Marc’Aurelio Ranzato. 2020. Residual energy-       Chang Liu, Xin Wang, Richard Shin, Joseph E Gonza-
  based models for text generation. In 8th Inter-          lez, and Dawn Song. 2016a. Neural code comple-
  national Conference on Learning Representations,         tion.
  ICLR 2020, Addis Ababa, Ethiopia, April 26-30,
  2020. OpenReview.net.                                  Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama,
                                                            and Kevin Murphy. 2016b. Optimization of image
Yilun Du and Igor Mordatch. 2019. Implicit genera-          description metrics using policy gradient methods.
   tion and modeling with energy based models. In          CoRR, abs/1612.00370.
  Advances in Neural Information Processing Systems,     Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey
  volume 32. Curran Associates, Inc.                       Svyatkovskiy, Ambrosio Blanco, Colin B. Clement,
                                                           Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Li-
Geoffrey E. Hinton. 2002. Training products of experts     dong Zhou, Linjun Shou, Long Zhou, Michele Tu-
  by minimizing contrastive divergence. Neural Com-        fano, Ming Gong, Ming Zhou, Nan Duan, Neel Sun-
  put., 14(8):1771–1800.                                   daresan, Shao Kun Deng, Shengyu Fu, and Shujie
                                                           Liu. 2021. Codexglue: A machine learning bench-
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, Jose        mark dataset for code understanding and generation.
  Miguel Hernandez Lobato, Richard E. Turner, and          CoRR, abs/2102.04664.
  Doug Eck. 2017. Tuning recurrent neural networks
  with reinforcement learning.                           Chris J. Maddison and Daniel Tarlow. 2014. Structured
                                                           generative models of natural source code. In Pro-
A. Karpathy, J. Johnson, and Li Fei-Fei. 2015. Visual-     ceedings of the 31st International Conference on In-
  izing and understanding recurrent networks. ArXiv,       ternational Conference on Machine Learning - Vol-
  abs/1506.02078.                                          ume 32, ICML’14, page II–649–II–657. JMLR.org.
Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh             Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli,
  Nguyen, and Tien N. Nguyen. 2013. A statistical         and Wojciech Zaremba. 2016. Sequence level train-
  semantic language model for source code. In Pro-        ing with recurrent neural networks. In 4th Inter-
  ceedings of the 2013 9th Joint Meeting on Foun-         national Conference on Learning Representations,
  dations of Software Engineering, ESEC/FSE 2013,         ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016,
  page 532–542, New York, NY, USA. Association for        Conference Track Proceedings.
  Computing Machinery.
                                                         Veselin Raychev, Pavol Bielik, and Martin Vechev.
Art B. Owen. 2013. Importance Sampling. In Monte           2016. Probabilistic model for code with decision
  Carlo theory, methods and examples, chapter 9.           trees. SIGPLAN Not., 51(10):731–747.
Tetiana Parshakova, Jean-Marc Andreoli, and Marc         Veselin Raychev, Martin Vechev, and Eran Yahav. 2014.
  Dymetman. 2019a. Distributional Reinforcement            Code completion with statistical language models.
  Learning For Energy-Based Sequential Models.             SIGPLAN Not., 49(6):419–428.
  CoRR.
                                                         Baptiste Roziere, Marie-Anne Lachaux, Lowik
Tetiana Parshakova, Jean-Marc Andreoli, and Marc           Chanussot, and Guillaume Lample. 2020. Un-
  Dymetman. 2019b. Global Autoregressive Models            supervised translation of programming languages.
  for Data-Efficient Sequence Learning. In Proceed-        Advances in Neural Information Processing Systems,
  ings of the 23rd Conference on Computational Nat-        33.
  ural Language Learning (CoNLL), pages 900–909,
  Hong Kong, China. Association for Computational        Rico Sennrich, Barry Haddow, and Alexandra Birch.
  Linguistics.                                             2016. Neural machine translation of rare words
                                                           with subword units. In Proceedings of the 54th An-
Ramakanth Pasunuru and Mohit Bansal. 2017. Re-             nual Meeting of the Association for Computational
  inforced video captioning with entailment rewards.       Linguistics (Volume 1: Long Papers), pages 1715–
  In Proceedings of the 2017 Conference on Em-             1725, Berlin, Germany. Association for Computa-
  pirical Methods in Natural Language Processing,          tional Linguistics.
  EMNLP 2017, Copenhagen, Denmark, September 9-
                                                         Richard S. Sutton, David McAllester, Satinder Singh,
  11, 2017, pages 979–985. Association for Computa-
                                                           and Yishay Mansour. 1999. Policy gradient methods
  tional Linguistics.
                                                           for reinforcement learning with function approxima-
Adam Paszke, Sam Gross, Francisco Massa, Adam              tion. In Proceedings of the 12th International Con-
  Lerer, James Bradbury, Gregory Chanan, Trevor            ference on Neural Information Processing Systems,
  Killeen, Zeming Lin, Natalia Gimelshein, Luca            NIPS’99, page 1057–1063, Cambridge, MA, USA.
  Antiga, Alban Desmaison, Andreas Kopf, Edward            MIT Press.
  Yang, Zachary DeVito, Martin Raison, Alykhan Te-       Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu,
  jani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang,      and Neel Sundaresan. 2020a. Intellicode compose:
  Junjie Bai, and Soumith Chintala. 2019.          Py-     Code generation using transformer. In Proceed-
  torch: An imperative style, high-performance deep        ings of the 28th ACM Joint Meeting on European
  learning library. In H. Wallach, H. Larochelle,          Software Engineering Conference and Symposium
  A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Gar-     on the Foundations of Software Engineering, ES-
  nett, editors, Advances in Neural Information Pro-       EC/FSE 2020, page 1433–1443, New York, NY,
  cessing Systems 32, pages 8024–8035. Curran Asso-        USA. Association for Computing Machinery.
  ciates, Inc.
                                                         Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi,
Romain Paulus, Caiming Xiong, and Richard Socher.          Maik Riechert, Juliana Franco, and Miltiadis Alla-
  2018. A deep reinforced model for abstractive sum-       manis. 2020b. Fast and memory-efficient neural
  marization. In 6th International Conference on           code completion. CoRR, abs/2004.13651.
  Learning Representations, ICLR 2018, Vancouver,
  BC, Canada, April 30 - May 3, 2018, Conference         Pradyumna Tambwekar, Murtaza Dhuliawala, Lara J.
  Track Proceedings. OpenReview.net.                       Martin, Animesh Mehta, Brent Harrison, and
                                                           Mark O. Riedl. 2019. Controllable neural story plot
Alec Radford, Jeffrey Wu, Rewon Child, David Luan,         generation via reward shaping. In Proceedings of
  Dario Amodei, and Ilya Sutskever. 2019. Language         the Twenty-Eighth International Joint Conference on
  models are unsupervised multitask learners. OpenAI       Artificial Intelligence, IJCAI 2019, Macao, China,
  Blog, 1(8):9.                                            August 10-16, 2019, pages 5982–5988. ijcai.org.
Marc’Aurelio Ranzato, Y-Lan Boureau, Sumit Chopra,       Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, and
 and Yann LeCun. 2007. A unified energy-based               Kevin Gimpel. 2020. Engine: Energy-based infer-
 framework for unsupervised learning.        In Pro-        ence networks for non-autoregressive machine trans-
 ceedings of the Eleventh International Conference          lation. ArXiv, abs/2005.00850.
 on Artificial Intelligence and Statistics, AISTATS
 2007, San Juan, Puerto Rico, March 21-24, 2007,         Ronald J. Williams. 1992a. Simple statistical gradient-
 volume 2 of JMLR Proceedings, pages 371–379.              following algorithms for connectionist reinforce-
 JMLR.org.                                                 ment learning. Mach. Learn., 8:229–256.
Ronald J. Williams. 1992b. Simple statistical gradient-
  following algorithms for connectionist reinforce-
  ment learning. In Machine Learning, pages 229–
  256.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
  Chaumond, Clement Delangue, Anthony Moi, Pier-
  ric Cistac, Tim Rault, Rémi Louf, Morgan Funtow-
  icz, and Jamie Brew. 2019. Huggingface’s trans-
  formers: State-of-the-art natural language process-
  ing. CoRR, abs/1910.03771.
Chunyang Xiao, Marc Dymetman, and Claire Gardent.
  2016. Sequence-based structured prediction for se-
  mantic parsing. In Proceedings of the 54th An-
  nual Meeting of the Association for Computational
  Linguistics (Volume 1: Long Papers), pages 1341–
  1350, Berlin, Germany. Association for Computa-
  tional Linguistics.

Victor Zhong, Caiming Xiong, and Richard Socher.
  2017.    Seq2sql: Generating structured queries
  from natural language using reinforcement learning.
  arXiv preprint arXiv:1709.00103.

Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo,
  Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texy-
  gen: A benchmarking platform for text generation
  models. In The 41st International ACM SIGIR Con-
  ference on Research & Development in Information
  Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08-
  12, 2018, pages 1097–1100. ACM.
Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B.
  Brown, Alec Radford, Dario Amodei, Paul Chris-
  tiano, and Geoffrey Irving. 2019. Fine-tuning lan-
  guage models from human preferences. CoRR,
  abs/1909.08593.
A       Hyperparameters and implementation
        details
We implemented all models using PyTorch (Paszke
et al., 2019) and HuggingFace (Wolf et al., 2019).
Training the initial generative model took 10 days
on 3 Nvidia Tesla T4 GPUs. For a detailed list of
hyperparameter values, see Table 1.

 Hyperparameter            Value
 base LM                   gpt2-small
 number of params          117m
 number of layers          12
 number of heads           12
 vocabulary size           50257
 sequence length           128
 hidden state size         768
 activation function       gelu
 optimizer                 Adam (Kingma and Ba, 2014)
 initial learning rate     5 × 10−5
 learning rate scheduler   linear
 batch size                24
 total gradient updates    20069
 dropout rate              0.1

Table 1: Hyperparameters used for training the initial
generative model a

  The implementation of KL-DPG was based on
code published by Khalifa et al. (2021).9 Each fine-
tuning run took approximately 5 days on 2 Nvidia
V100 GPUs. For a detailed list of hyperparameter
values, see Table 2.

 Hyperparameter             Value
 optimizer                  Adam (Kingma and Ba, 2014)
 learning rate α(θ)         1.41 × 10−6
 learning rate scheduler    linear
 batch size                 2048
 warmup gradient updates    100
 total gradient updates     250
 sequence length            128
 dropout rate               0.1

Table 2: Hyperparameters used for training πθ using
KL-DPG and Reinforce

    9
        https://github.com/naver/gdc
b(x)   Program

       def test_3_invalid(self):
           serializer = serializer.validated_manager['quarterly_ cred']
0
           serializer.user = 'token'
           self.verify_token(epsg = serializer.DBModes,[serializer.user])

       def delete(self,username,password = None):
           if username:
0              if username.startswith("oil",None)or username.startswith('"",True):
                   raise HttpRequest()
           db.model.delete.assert_called_with(username,'password')

       def mode(self):
1          self._mode = 'modeM_GB'
           return self

       def _update_update_tbl(self,new_worksheet):
           self._merge_tbl(new_worksheet,old_worksheet)
           self._create_where('x1')
           self._update_tbl('x1',{ }).extend([str(new_fh.getvalue()))
0
           self._clear_sql()
           self.clear_lstrip()
           self.pop.set('x1')[int(col)for param in['x1','y1']]
           self.flush.update()

       def _callResourceCost(self,server):
           response = urllib.Request('GET','//api//log//%s//detected//' % server.id)
1          body = urllib. urllib2.urlencode(body)
           response.headers['X-Basic-Control-Authorization']= self.oauth_client.Client.CertResponse(response.body)
           return response

       def _pre_save(self,data):
0           self.calculate_updates([item.resolve(data['output')]= yield
           ,→ data['output'].find('top',['mybounce','geodeIB'])))

       def read(self):
           self.offset -= 1
1          start = O8(self)
           while time.time()- start:
               return self.get_index(start)

       def Pub(self):
           r = PCHAP()
           r['where']= struct.unpack('!T',self.digest))
0
           response = MKchronosOPS('R')
           self.sendMessage(response)
           return self.Response(response)

       def __init__(self,current_node):
           self.current_node = current_loadbalancer
           self.assign_current_node = None
1          self.parenting = None
           if self.menu:
               self.getNodeSelector(Index(RemovelineToRow,self.parent.position),0,2.0,5.0)
           self.show_parent()

       def get_response_data(self):
            return {
1
           ,→ 'from_blob_client':self.to_blob_key,'as_blob_secret':self.to_project_secret.to_secret(),'json':self.to_storage
           ,→ }

       def put(self,key,expire = True):
            if not invert:
                dict = { }
0
                dict.update(key,self.__TestStepities[key])
            self.cs.put(self._uZED_ATTRIBUTES_ =[("sequential_command","duration",key,expire)]= "//?modified:%r" %
           ,→ key,queue_text = self.__kneeators["expires"])

       def testPath(self):
           t = Gaffer.Reader(self.callback)
           dupe = ""
1          f.mkdir(t)
           f = sys.stdout.tell()
           f.write('_')
           self.assertEqual(f,dataponCollision)

       def get_count(self):
1
           return self.get_implicit_count()

       def is_alive(self):
1
           return(self.pid,)and(self.pid == 400)

                            Table 3: Sequences sampled from the original generative model a
b(x)   Program

       def fetch_size(self,page):
           response = self.fetch(page,max((2))
0          constant(response.json(),response.pop('utf-8'))
           payload = "%s//%s//%s//%s//%s" %(self.resource.id,page.format_from_bytes())
           return payload

       def setUp(self):
           self.project_loader = testutil.FileSentenceDependencyGraph(extensions =['file','path'])
0
           self.schema =RelatedPackage preserveLoader(root_loader)
           self.extension_context = XMLLoader()

       def __getattr__(self,perm):
1
           return self._memo.get(perm)

       def expand(self,text):
1          value.strip()
           return extract_cseq(text)

       def test_Obze(self):
1          w = Command()
           self.assertEqual(w.callHeader.callHeader,self.result)

       def start_stream(self,addressFamily,opcode):
           logger.info("OpenlibwriteStructBegin chunkon.csv',OperationalError())
           error_message = self.get_stream([None,None])
0
           message,message = self.block_messages[0]
           message = message[0]
           self._process_message(message,message,message,message)

       def set_dense(self,srs,fit_to):
           if dup in self.scalar:
               return
0
           if not isinstance(modality,(pyobj):
               self.sq =SUBNET
           self.basic = asim.bin.sample(srs,rng = self.ctypes,trials = self.rng,dtype = self.dtype)

       def _act(self,value):
1
           self._result.set_argument('value',value)

       def _verify_ssling_access_admin(self,ip_name):
1
           self._check_proxy(ip_name)

       def __str__(self):
           r =[]
           for s in self.__dict__.items():
0              if s[0]in BoundCacheContents():
                   break
           if s[:- 1]:Elements([("Unsupported Ct%s]" % ','.join(self.__class__.__name__))
           return "Data attribute '%s' % ','.join("%sCHOICES from %s" %(WARNING,str(r)))

       def test_FaceIP_3D_14(self):
0
           self.assertTrue(self.doTestFace(self.doTestFace([self.doTestFace([False,False)])

       def __init__(self,** options):
           super(_ChoiceTest,self).__init__(** options)
0          self.action_classes = options["cells_store"]
           self.choices =(1.2,** options["mysql"]= FakeMissingTuple())
           self.parser = Message(list.__init__(option_forms))

       def main(self,client):
1          remove_home_config(client,"client_snapshot_url")
           self.client.client_snapshot.update(client)

       def _stop_signal(self,emitter,datafile,for_attachment):
1
           vim.gui.target_cancel()

                          Table 4: Sequences sampled from a policy fine-tuned using KL-DPG
b(x)   Program

       def invalidateKey(self):
1
           self.action.rooms = { }

       def get(self):
1
           return self.handler.identifier

       def flush(self):
1
           self.write("ready")

       def get_flavor(self,resource,path,** metadata):
1
           return self.context.get(resource,path,** metadata)

       def test_api_set_to_result(self):
1          X = T.ListHead()
           self.assertEquals(quantiles(X),self._cache.annotations)

       def is_cmp(self,other):
1
           return not self._safe_eq(other,self.link)

       def __iter__(self):
1
           return iter(self._reverse())

       def cancel(self):
1
           return self.enhanced_window.set_timeout()

       def __str__(self):
1
           return str(self.repository)

       def summary(self):
1
           return self._series

       def Lazypeer(self):
1
           return self._peer

       def ByteSize(self):
           n = 0
1
           n += self.lengthString(len(self.parameters_))
           return n + self.lengthString(number(self.value_))

       def setUp(self):
           super(TestMaUserRoleTestCase,self).setUp()
1
           self.core =BER()
           self.topsetup_existing = False

       def __init__(self,** kwargs):
1
           self.sourcemersListComp = kwargs.get('stretch {}'.format(self.__class__.twsourceCentOS_text))

                 Table 5: Sequences sampled from a policy fine-tuned using Reinforce with R(x) = b(x)
b(x)   Program

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

       def set_OwnerId(self,OwnerId):
1
           self.add_query_param('OwnerId',OwnerId)

                 Table 6: Sequences sampled from a policy finetuned using Reinforce with R(x) = P (x)
b(x)   Program

                                               Sequences sampled from the original generative model a
       def closeEvent(self):
1          self._isalive = False
           self._original_resume = True

       def close_file(self):
1
           pass

       def closeWorking(self):
1
           pass

                                              Sequences sampled from a policy fine-tuned using KL-DPG
       def close(self):
           if not self.closed:
1
               self.closed = True
           self.translation.close()

       def close(self):
           self.queue.Importer.close(self.info)
1
           self.open_input.close()
           self.graph.close(self.gamma)

       def close(self):
           try:
1              self.srv.get_browser.mac(self.bus_process.name,vm_output = True)
           except suspended as ex:
               self.socket.stop(ex)

                                    Sequences sampled from a policy fine-tuned using Reinforce with R(x) = b(x)
       def close(self):
1
           self._stdout.close()

       def close(self):
1
           self.idb.close()

       def close(self):
           self.reuse = subprocess.Popen('CONNECTION','').unregisterProducer()
1          p = subprocess.Popen()
           p.communicate().close()
           return u.close()

                               Sequences sampled from a policy fine-tuned using Reinforce with R(x) = P (x)
       def close(self,object):
1
           self.api.close(self.uid.length)

       def close(self):
1
           self.job_closed.remove(self)

       def close(self):
1
           self.buffer.flush()

                         Table 7: Samples obtained from policies conditioned on prompt def close
b(x)   Program

                                        Sequences sampled from the original generative model a
       def fit_pdf(self,hop,theta,theta):
           asserttriangular is self._fit_rewrite(hop, kernel,theta,theta)- gtheta,70)
           assertworkspace isTType.ACCEPTED_ignore
0
           assert subset in(coeff,Y)
           assert self._Xfd != xOpenStackBackendError
           assert isinstance(750,Win,T,Vector)

       def fit(self,X,y):
           self._ y = y
           self._children -= 1
           assert isinstance(self._labels,_MOD_'")
           x[:]= 0
0
           y[:]=Bio_OFFSET
           y *= self._labels
           y * y * y
           y //= y
           return y

       def fit(self,X = None,y = None,result = None):
1          sts = self.get_appId(self.mesh_filename,X,y = y,d = result)
           self.mirror_logpdf([0x9]* indented)

                                       Sequences sampled from a policy fine-tuned using KL-DPG
       def fit(self,X,y,* args,** kwargs):
           X = self.transform(X,y,* args,** kwargs)
           data = np.DataFrame(data)
1
           for i in self.fallback_array.iteration_two(* data):
               data[i].labels[i].tolist()
           return data

       def fit(self, initial_output = None):
           if initial_output:
               self.force_input = False
           else:
0              self.cells_done = tuple(initial_output)
           if initial_input == self.WK_MASK:
               self.output_output += self.osfstorage_NORMAL
               self.outputs = list([self.inputState.NORMAL_READ valid])
           return 1

       def fit(self,reshape,a,b):
1
           return frappe. filediff(islice(a,b),b)

                                   Sequences sampled from a policy fine-tuned using Reinforce with R(x) = b(x)
       def fit(self,X,y):
1
           self.x = y

       def fit(self,fit,d):
1          self.fit =followers
           return super(PositionUntilLockedSequence,self).fit(marks)

       def fit(self,X_acc):
           X_exog = self.xc1.exog
           y = self.instance.exog
           y,= self.model.w2 preserve_uniform(os.environ.XMANllf,y_y))
0          y += self.model.t2le continX
           y = self.transition.fit(y)
           y.y = self.model.y * y
           y.red = self.model.gw.urmpopow(y)
           return y

                                Sequences sampled from a policy fine-tuned using Reinforce with R(x) = P (x)
       def fit(self,fit,X,y,z):
0          self.learning = indices[np.zeros(axis = 1Dot,y = y,motion = self. np.loss,y = res.scale)]
           self.index = y

       def fit(self,params):
1
           self.params_param = params

       def fit(self,X,y = None):
1          self.x = x
           self.y = x

                          Table 8: Samples obtained from policies conditioned on prompt def fit
b(x)    Program

                                         Sequences sampled from the original generative model a
        def generate_samples_with_prompt(self,input_value,decimal = False):
            use_full = False
0           full_input_string = escape_input[decimal]
            newprefix = local_input_format.split("
You can also read