Energy-Based Models for Code Generation under Compilability Constraints
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Energy-Based Models for Code Generation under Compilability Constraints Tomasz Korbak,1,∗ Hady Elsahar,2 Marc Dymetman,2 Germán Kruszewski2 t.korbak@sussex.ac.uk {hady.elsahar,marc.dymetman,german.kruszewski}@naverlabs.com 1 University of Sussex, United Kingdom 2 Naver Labs Europe, France Abstract gressive language models have been favoured due arXiv:2106.04985v1 [cs.LG] 9 Jun 2021 to their scalability and generic training procedure Neural language models can be successfully trained on source code, leading to applications that can exploit large codebases (e.g. open source such as code completion. However, their ver- code repositories available on GitHub) through self- satile autoregressive self-supervision objective supervised training. overlooks important global sequence-level fea- Despite these desirable traits, neural language tures that are present in the data such as syn- models, trained in the standard way, are known tactic correctness or compilability. In this to suffer from myopia and to overlook global work, we pose the problem of learning to sequence-level features that are present in the data generate compilable code as constraint satis- faction. We define an Energy-Based Model and which might be crucial for the quality of gen- (EBM) representing a pre-trained generative erated sequences (Parshakova et al., 2019b). This model with an imposed constraint of generat- leads to repetitions, hallucinations and failing to ing only compilable sequences. We then use capture long-distance consistency requirements. In the KL-Adaptive Distributional Policy Gradi- a code generation context, this is demonstrated in ent algorithm (Khalifa et al., 2021) to train compilation errors that are a common failure mode a generative model approximating the EBM. in such tasks as translation between programming We conduct experiments showing that our pro- posed approach is able to improve compilabil- languages (Roziere et al., 2020). This problem has ity rates without sacrificing diversity and com- inspired a large body of work on different fronts plexity of the generated samples. on injecting sequence-level priors by either directly optimizing sequence-level features (Ranzato et al., 1 Introduction 2016) or through fusion with grammars and au- Code completion is an essential feature of any mod- tomata (Xiao et al., 2016). These techniques aim ern Integrated Development Environment (IDEs). to balance between the desirable traits and fast It supports developers with recommendations about inference of neural autoregressive models trained the next token to write given a context, speed- in the standard way and the satisfaction of global ing up software development and reducing the sequence-level features. number of mistakes. A large body of work has In this work, we formulate compilable code gen- relied on statistical language modeling, treating eration as a constraint satisfaction problem. We programming languages as natural languages us- show that this formulation leads to a unique dis- ing probabilistic grammars (Raychev et al., 2014; tribution represented by an Energy-Based Model Bielik et al., 2016), and more recently relying on (EBM). This unique distribution by definition fully neural language models (Liu et al., 2016a; Svy- satisfies the compilability constraints while having atkovskiy et al., 2020a,b; Arkesteijn et al., 2020; a minimal KL divergence from the original autore- Ciniselli et al., 2021).1 In particular, neural autore- gressive generative model trained through cross en- ∗ tropy. We then train an auto-regressive generative Work done during a research internship at Naver Labs Europe. model to approximate the underlying distribution 1 See Allamanis et al. (2018) for a survey. of this EBM using the KL-Adaptive Distributional
Policy Gradient algorithm (Khalifa et al., 2021). tion (Ranzato et al., 2016) or actor critic (Konda In our experiments, we show that our approach and Tsitsiklis, 2000) for abstractive summariza- significantly improves compilability rates without tion (Paulus et al., 2018), caption generation (Liu sacrificing diversity or complexity of the generated et al., 2016b), dialogue (Li et al., 2016b), and video examples. This alleviates the drawbacks of captioning (Pasunuru and Bansal, 2017). Some ap- reinforcement learning fine-tuning techniques that proaches (for instance, in machine translation and maximize compilability but deviate significantly summarization (Ranzato et al., 2016; Bahdanau from the original generative model, which leads et al., 2017)) directly optimize performance met- to severe loss in diversity and complexity of the rics such as BLEU and ROUGE at training time. generated samples. Finally, we complement our Others use heuristic rewards (for instance Li et al. experiments with a qualitative analysis of the (2016b) for dialogue generation and Tambwekar effect of several fine-tuning approaches on the et al. (2019) for story generation) in order to ob- distribution of compilation errors. tain certain a priori desirable features of generated sequences that then incentivize good performance on target metrics. A weakness of using RL in fine- 2 Related Work tuning generative models is the problem of catas- trophic forgetting: maximizing global, sequence- Imposing compilability constraints on genera- level rewards leads to very large deviations from tive models There is a body of work focusing on the original autoregressive model trained through unconditional code generation or code completion: cross-entropy. This often results in significant re- generating a piece of source code given a preceding ductions in fluency and diversity of generated sam- piece of source code (Nguyen et al., 2013; Raychev ples. The catastrophic forgetting problem is some- et al., 2014; Karpathy et al., 2015; Bielik et al., times addressed by imposing a penalty term to the 2016). That work, however, focuses on perplexity rewards, such as the KL divergence between the and similarity with respect to ground truth comple- trained policy and the auto-regressive model. This tions (in terms of exact-match accuracy, Levens- approach, termed “conservative fine-tuning”, was thein distance and ROUGE scores) (Svyatkovskiy applied to generating melodies with music theory et al., 2020a; Lu et al., 2021), usually failing to rewards and organic molecules with synthesizabil- measure and control for compilability of generated ity rewards by Jaques et al. (2017) as well fine- sequences or semantic and syntactic constraints in tuning language models for controllable language general.2 On the other hand, semantic and syntactic generation by Ziegler et al. (2019). This solution constraints are frequently considered in language- doesn’t have an explicit notion of the optimal pol- to-code translation or program synthesis. For in- icy and often has hard time balancing between the stance, Zhong et al. (2017), who used policy gra- reward term and the KL penalty term, leading to dients to train a model for translating natural lan- instability in training (Khalifa et al., 2021). Unlike guage questions to corresponding SQL queries and this approach, our formulation defines the optimal – in addition for rewarding for query execution re- distribution that satisfies both requirements. sults – added a penalty for syntactically invalid queries. Taking that one step further, Kulal et al. Energy-based models for text Energy-based (2019) use compilation errors (with their precise models (EBMs) (Hinton, 2002; LeCun et al., 2006; location) to guide search over the space of possible Ranzato et al., 2007) are a family of probabilistic programs. graphical models in which learning and inference Optimizing sequence-level rewards for text gen- are done by associating an unnormalized probabil- eration Most previous attempts at steering au- ity with each configuration of observed and latent toregressive model to conform to global constraints variables. Early examples of EBMs applied to natu- defined over entire sequence have employed re- ral language processing include sequence labeling inforcement learning (RL). This includes using problems (e.g. tagging) exploiting global proper- Reinforce (Williams, 1992a) for machine transla- ties of a sequence (Andor et al., 2016; Belanger and McCallum, 2016). A recent surge of interest in 2 One exception is the work of Maddison and Tarlow EBMs (Du and Mordatch, 2019) has not left text (2014), who augment neural probabilistic context free gram- mars with semantic constraints and use them for unconditional generation unaffected (see (Bakhtin et al., 2020) generation. for a survey). Tu et al. (2020) proposed an energy-
based inference networks for non-autoregressive sampling sequences from a and filtering on b(x). machine translation. Parshakova et al. (2019b) and While this method sounds simple, there’s no di- Deng et al. (2020) augment a autoregressive lan- rect way of using it for interactive code completion guage models with an additional global factor to ob- as sampling full sequences till the end is neces- tain a lower perplexity on the training data. Khalifa sary to filter through the sequence-level filter b(x). et al. (2021) develop a novel approach to distribu- Therefore our objective here is to obtain another tional controllable text generation by constructing autoregressive policy πθ to directly approximate p. an EBM satisfying desired statistical constraints To attain this, Khalifa et al. (2021) (following imposed on the set of generated sequences (such Parshakova et al. (2019a)) developed a training pro- as topic or gender statistics over the sequences) cedure called KL-Adaptive Distributional Policy and then train an autoregressive policy to approx- Gradients (KL-DPG) to train πθ to minimize the imate it, which can be sampled from efficiently. KL divergence between p and πθ . The gradient of We build on Khalifa et al.’s approach by applying this KL turns out to be tractable: it to a novel domain outside natural language and defining a new kind of constraint: compilability. p(x) ∇θ DKL (p, πθ ) = ∇θ Ex∼p log (4) πθ (x) 3 Method = −∇θ Ex∼p log πθ (x) (5) Following Khalifa et al. (2021), we formulate com- = −Ex∼p ∇θ log πθ (x) (6) pilable code generation as a constraint satisfaction 1 X problem over a space of generative models. There =− P (x)∇θ log πθ (x) Z x are two constraints that a target generative model p must satisfy. First, p must have minimal diver- (7) gence -in the distribution space- from an original generative model a pre-trained using a standard Let us now absorb the constant −1/Z into a learn- autoregressive language modeling objective. Sec- ing rate α(θ) and estimate the expectation over p(x) ond, it must generate only sequences that satisfy using importance sampling (Owen, 2013) from yet a certain sequence level constraint b. In our case, another generative model q: b(x) = 1 iff x is a syntactically correct Python program and b(x) = 0 otherwise. There two con- P (x) straints can be represented as a product-of-experts ∇θ DKL (p, πθ ) ∝ Ex∼q ∇θ log πθ (x). (8) q(x) (Hinton, 2002) energy-based model P (x) = a(x)b(x). (1) During training, both πθ and q are initialized as a. Then, q is periodically updated to πθ if πθ surpasses p(x) can be obtained from P (x) by dividing it by q in being closer to p (in terms of KL). For a pseudo- a normalization constant Z: code of the whole KL-DPG training procedure, see Algorithm 1. 1 p(x) = P (x), (2) The gradient in (8) is similar to an estimate ob- Z tained using policy gradients methods in standard where reinforcement learning (Sutton et al., 1999) with . X Z= P (x). (3) P (x)/q(x) playing the role of a pseudoreward. x This similarity, however, is superficial. Our ob- This EBM P is unique, it represents a distribution jective is approximating a target generative model p that optimally reconciles the two constraints. It p by minimizing DKL (p, πθ ) rather than maximiz- is a special case of the generalized maximum en- ing expected reward b(x) or P (x) or P (x)/q(x). tropy formulation presented in (Csiszár and Shields, As we show in Section 5, these objectives produce 2004) for applying constraints over distributions. vastly different policies which diverge from p and However, one problem still remains: it is not catastrophically forget what the pretrained model a straightforward how to draw samples x ∼ p(x) knew about its training domain. Furthermore, since or even evaluating probability p(x) from this op- q will always be close to πθ , our pseudoreward timal unique distribution. A simple method for P (x)/q(x) effectively depends on policy parame- drawing samples from the p distribution could be ters θ.
Algorithm 1 KL-DPG syntax and ValueError or OverflowError Require: EBM P , initial generative model a if there is an invalid literal. 1: πθ ← a This notion of compilability is concerned only 2: q ← a with syntactic correctness and does not execute 3: for each iteration do the body of a function. However, we found the 4: for each episode do initial compilability rate Ex∼a b(x) of functions x 5: sample x from q(x) sampled from a(x) to be only 0.56, which leaves a 6: θ ← θ + α(θ) Pq(x) (x) ∇θ log πθ (x) large margin for improvement.4 7:if DKL (p||πθ ) < DKL (p||q) then KL-DPG training πθ and q share their architec- 8: q ← πθ ture with a but have separate weights which are Ensure: πθ only initially identical to a’s. Throughout the train- ing, πθ will be updated to approximate p. See Table 2 in the Appendix for a complete list of hyperpa- 4 Experiments rameters used for training πθ and q using KL-DPG. 4.1 Setup 4.2 Baselines Dataset: To prepare the training dataset, we We compare our method to a common approach of started from the Python150 dataset, which consists using standard reinforcement learning to fine-tune a of 150k Python source code files obtained from generative model to conform to desired constraints. GitHub (Raychev et al., 2016). Then, using the We use the Reinforce algorithm (Williams, 1992b) code from Roziere et al. (2020), we extracted 713k which instead of minimizing divergence from the Python functions (both methods and standalone target distribution p tries to maximize expected functions) from it (250 MB of raw text data). The reward Eπθ R(x). We consider two kinds of reward additional filtering criteria were compilability (ac- R(x): cording to b(x)) and being less than 128 BPE to- kens long. The dataset was then split into a training • R(x) = b(x), where the generative model is subset Dtrain and test subset Dtest . simply rewarded for generating sequences that compile; Initial generative model a: We implemented a using the GPT-2 (Radford et al., 2019) architecture • R(x) = P (x), where the generative model is with 117m parameters (gpt2-small) and kept simply rewarded proportionally to the score all the original hyperparameters (see Table 1 in the our EBM assigns to x. Intuitively, this objec- Appendix). We trained a byte-level BPE tokenizer tive gives reward for both compilability and (Sennrich et al., 2016) with special BOS and EOS respecting the original generative model a. tokens to obtain a vocabulary of 50k tokens. The 4.3 Evaluation Metrics model was trained for one epoch. We evaluate KL-DPG and two baselines in terms Compilability Scorer b: To check for compi- of the following metrics: lability, we call the compile command func- tion from codeop module of Python Standard Li- 1. Ex∼πθ b(x), compilability rate of sequences brary3 with a sequence x as argument and check sampled from πθ (x), if it returns a code object. We apply no postpro- 2. DKL (p, πθ ), the forward KL divergence from cessing other than removing BOS and EOS tokens. the optimal distribution p, codeop.compile command is the implemen- tation that Python interactive interpreters use in 3. DKL (πθ , a), the reverse KL divergence from read-eval-print loop (REPL) to determine whether the original pretrained generative model, a string is a valid Python code. The method tries to 4. Distinct-1 score, a measure of text diversity in compile a string of Python code and raise and ex- terms of the frequency of token repetitions in ception if there is a problem with the Python code, a sample x, proposed in the context of NLP in particular a SyntaxError for invalid Python by (Li et al., 2016a), 3 4 https://docs.python.org/3/library/ Note that initial compilability P P be equal to our Z rate will codeop.html because Ex∼a b(x) = x a(x)b(x) = x P (x) = Z.
5. Self-BLEU-5, a measure of text diversity across samples, proposed in the context of 1.0 KL-DPG NLP by (Zhu et al., 2018), R(x) = b(x) 6. Perplexity measured on Dtest , a held-out sub- 0.9 R(x) = P(x) set of the data used for training a, calculated as 0.8 E b(x) h 1 X i exp − log πθ (x) , N x∈Dtest where N is the overall number of tokens in 0.7 Dtest . 7. Sequence length, the average number of char- 0.6 acters in generated sequence x after detok- enization, 0 100 200 8. AST node count, the average number of nodes in an abstract syntax tree (AST) of sequences gradient updates that compile. Samples are parsed to their cor- Figure 1: Compilability rate Ex∼πθ b(x) (↑ better) of sam- responding ASTs using the ast module from ples from policies obtained from KL-DPG, and two base- Python Standard Library.5 Intuitively, this lines: Reinforce with reward R(x) = b(x) and with reward R(x) = P (x). metric should indicate the logical (as opposed to surface) complexity of generated programs, 9. PEP8 error frequency, the average number sequence sampled from a. This heavily decreased of violations of PEP8, the style guide for sequence length (most of the generated functions Python,6 measured using pycodestyle,7 an off- are one-liners) seems to artificially increase diver- the-shelf linter (static code analysis tool). We sity metrics (Self-BLEU-5 and Distinct-1). report the average number of errors per charac- Reinforce with R(x) = P (x) doesn’t improve ter to avoid confounding by sequence length. compilability rate until an inflection point after which it quickly reaches perfect compilability at a While high compilability rate is the target, the price of heavily diverging from both a and (perhaps remaining metrics control for various aspects of counterintuitively) p. The reason behind that, how- fluency, quality and diversity of generated sam- ever, is that the policy heavily peaks around a single ples. Most but not all of these aspects reduce to the sequence that is compilable. To understand what constraint of staying close to a; for instance, it is causes this behavior, first note that the objective possible for πθ to actually outperform a in match- for Reinforce with R(x) = P (x) is to maximize ing the statistics of a’s own training distribution Ex∼πθ [a(x)b(x)]. Because R(x) = 0 for uncom- p∗ (x). pilable sequences, compilation rate will improve. 5 Results But for compilable sequences, the effective reward is R(x) = a(x) meaning that πθ is rewarded most We present the evolution of nine evaluation metrics for generating the most probable sequences (ac- as a function of gradient updates on Figures 1 and 2. cording to a(x)), making them even more probable. Reinforce with R(x) = b(x) quickly improves Eventually, Ex∼πθ a(x) is maximized by a policy compilability by a large margin but this improve- peaking on a single sample x that was the most ment is mirrored by an equally large divergence probable one according to a(x). This failure mode from p and a. This divergence translates into gener- is reflected in diversity metrics and perplexity. The ating sequences much shorter (in terms of the num- sequence the policy peaks on is also shorter and ber of characters) and logically simpler (in terms less complex than an average sequence sampled of the number of nodes in its AST) than an average from a. 5 https://docs.python.org/3/library/ast. KL-DPG is the only method that consistently html 6 improves compilability rate while decreasing di- https://www.python.org/dev/peps/ pep-0008/ vergence from p, maintaining the diversity of a 7 https://github.com/PyCQA/pycodestyle and only slightly decreasing sequence length and
2.0 2.0 0.5 1.00 KL-DPG R(x) = b(x) 1.5 1.5 0.4 0.95 R(x) = P(x) Self-BLEU-5 Distinct-1 KL( , a) KL(p, ) 1.0 1.0 0.90 0.3 0.5 0.5 0.85 0.2 0.80 0.0 0 200 0.0 0 200 0 200 0 200 gradient updates gradient updates gradient updates gradient updates 0.0275 160 1.0175 30 0.0250 1.0174 PEP8 error frequency 140 Sequence length AST node count 25 0.0225 1.0173 Perplexity 120 0.0200 20 100 1.0172 0.0175 0.0150 80 1.0171 15 0.0125 60 1.0170 0 0 200 0 200 0 200 200 gradient updates gradient updates gradient updates gradient updates Figure 2: Evaluation metrics KL(p|πθ ) (↓ better), KL(πθ |a) (↓ better), Self-BLEU-5 (↓ better), Distinct-1 (↑ better), AST node count (↑ better), PEP8 error count (↓ better), sequence length (↑ better), and perplexity (↓ better) for policies obtained from KL-DPG, and two baselines: Reinforce with reward R(x) = b(x) and with reward R(x) = P (x). the number of nodes in ASTs. Moreover, as a by- sults on Figure 4. This qualitative evaluation paints product of improving compilability, KL-DPG is a similar picture: fine-tuning using Reinforce in- also able to slightly decrease the perplexity and the curs a large (with R(x) = b(x)) or extreme (with frequency of PEP8 violations per character. We R(x) = P (x)) decrease in token diversity. In con- conjecture the decrease in perplexity is because trast, KL-DPG is able to maintain a relatively long compilability provides a training signal enabling tail of token frequencies, not departing too far from πθ to fit the a’s training distribution p∗ (x) better a. than a was able to.8 The decrease in the frequency Moreover, in order to gain better understanding of PEP8 violations might be due to the fact that of how different fine-tuning methods affect genera- compilability is correlated with PEP8 compliance. tive models we measured the frequency of different categories of compilation errors for samples from 5.1 Qualitative evaluation a and from fine-tuned policies. This analysis is pre- To further analyze effects of different fine-tuning sented on Figure 3. We categorized errors using er- approaches on sample diversity, we measured the ror messages produced by Python interpreter trying frequency of BPE tokens in generated samples. For to compile an uncompilable sequence. invalid each of four analyzed generative models, we sam- syntax is the most common failure mode (30% pled 1000 sequences using pure ancestral sampling. of all sequences sampled from a), with a long tail We then computed the frequency for each BPE to- of other error categories. We can see that both ken (the number of times it occurs) and its rank (its KL-DPG and Reinforce with R(x) = b(x) consis- index in a sorted list of tokens). We plotted these re- tently decrease error frequency across almost all the categories. 8 This mirrors the results obtained by Parshakova et al. Finally, in the Appendix we present randomly (2019b), who also defined an EBM augmenting an autoregres- generated samples from each discussed policy. Ta- sive model with prior knowledge about features of the training set and observed a decrease in perplexity compared to pure bles 3-6 contain samples obtained through uncon- autoregressive training. ditional generation. In addition to that, to illustrate
EOL while scanning unexpected EOF while unindent does not invalid syntax string literal parsing duplicate argument match any outer indentation level unexpected indent keyword argument repeated 40% 8% 4% 2% 2% 2% 1.0% 30% 6% 3% 1.5% 1.5% 1.5% 0.8% 20% 4% 2% 1.0% 1.0% 1.0% 0.5% 10% 2% 1.0% 0.5% 0.5% 0.5% 0.2% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% unexpected character after positional argument follows EOF while scanning invalid character in positional argument follows non-default argument follows line continuation character keyword argument triple-quoted string literal identifier invalid token keyword argument unpacking default argument 1.0% 0.8% 0.6% 0.3% 0.3% 0.3% 0.3% a 0.8% 0.6% KL-DPG 0.4% 0.2% 0.2% 0.2% 0.2% R(x) = b(x) 0.5% 0.4% 0.2% 0.2% 0.2% 0.1% 0.1% 0.1% 0.1% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% Figure 3: The frequency (measured as the percentage of samples from πθ (x) causing a given error) of each kind compilation error for the original generative model a and policies fine-tuned using KL-DPG and Reinforce with R(x) = b(x). The policy fine-tuned using Reinforce with R(x) = P (x) was excluded because the single sequence it produces causes no compilation errors. Percentages were computed using 500 samples while confidence intervals were based on 3 repeats of the sampling procedure. complexity of generated samples. a One obvious application of the presented ap- KL-DPG 103 R(x) = b(x) proach is improving the accuracy of code com- pletion, i.e. tools assisting in programming by R(x) = P(x) predicting the next tokens based on context (Svy- token frequency atkovskiy et al., 2020a). The fact that fine-tuning 102 using KL-DPG has a beneficial effect on perplex- ity and PEP8 error frequency suggests that it can provide a training signal complementary to that in 101 a language modeling objective. The benefits of this auxilary training signal would arguably diminish with increased training time and datatset size, but that still leaves room for significant improvement 100 in low-resource domains. 0 2000 4000 6000 A limitation of the current KL-DPG approach token rank is that it is restricted to unconditional generation. This is because for a conditional EBM P (x, c) the Figure 4: Token frequency against token rank computed for tokens found in samples from from KL-DPG, and two proportionality constant −1/Z from (4) would de- baselines. Longer tails imply more diverse samples. pend on a context c. Nevertheless, one can imag- ine using a policy πθ fine-tuned using KL-DPG as initialization of a decoder for conditional genera- the applicability of obtained policies for code com- tion, e.g. transpilation (translation between pro- pletion, in Tables 7-9 we present samples obtained gramming languages) or program synthesis (trans- through conditional generation, i.e. x ∼ πθ (x|c), lation from a natural language to a programming where the context c is a function name. In either language). case, samples were obtained using pure ancestral sampling. References 6 Discussion Miltiadis Allamanis, Earl T. Barr, Premkumar T. De- In the paper, we presented a new energy-based vanbu, and Charles Sutton. 2018. A survey of ma- model formulation for the problem of imposing chine learning for big code and naturalness. ACM the constraint of compilability on an autoregressive Comput. Surv., 51(4):81:1–81:37. generative model for source code. In contrast with Daniel Andor, Chris Alberti, David Weiss, Aliaksei standard reinforcement learning approaches, the Severyn, Alessandro Presta, Kuzman Ganchev, Slav solution we propose – KL-DPG – is able to improve Petrov, and Michael Collins. 2016. Globally Nor- compilability rate without sacrificing diversity and malized Transition-Based Neural Networks.
Youri Arkesteijn, Nikhil Saldanha, and Bastijn Muhammad Khalifa, Hady Elsahar, and Marc Dymet- Kostense. 2020. Code completion using neu- man. 2021. A distributional approach to controlled ral attention and byte pair encoding. CoRR, text generation. In International Conference on abs/2004.06343. Learning Representations. Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Diederik P Kingma and Jimmy Ba. 2014. Adam: A Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron C. method for stochastic optimization. arXiv preprint Courville, and Yoshua Bengio. 2017. An actor-critic arXiv:1412.6980. algorithm for sequence prediction. In 5th Inter- Vijay Konda and John Tsitsiklis. 2000. Actor-critic al- national Conference on Learning Representations, gorithms. In Advances in Neural Information Pro- ICLR 2017, Toulon, France, April 24-26, 2017, Con- cessing Systems, volume 12. MIT Press. ference Track Proceedings. OpenReview.net. Sumith Kulal, Panupong Pasupat, Kartik Chandra, A. Bakhtin, Y. Deng, S. Gross, Myle Ott, Marc’Aurelio Mina Lee, Oded Padon, Alex Aiken, and Percy S Ranzato, and Arthur Szlam. 2020. Energy-based Liang. 2019. Spoc: Search-based pseudocode to models for text. ArXiv, abs/2004.10188. code. In Advances in Neural Information Process- ing Systems, volume 32. Curran Associates, Inc. David Belanger and Andrew McCallum. 2016. Struc- tured prediction energy networks. In Proceedings Yann LeCun, Sumit Chopra, Raia Hadsell, of the 33rd International Conference on Interna- Marc’Aurelio Ranzato, and Fu Jie Huang. 2006. A tional Conference on Machine Learning - Volume 48, Tutorial on Energy-Based Learning. In Predicting ICML’16, pages 983–992. JMLR.org. Structured Data. MIT Press. Pavol Bielik, Veselin Raychev, and Martin Vechev. Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, 2016. Phog: Probabilistic model for code. In Pro- and Bill Dolan. 2016a. A diversity-promoting ob- ceedings of the 33rd International Conference on In- jective function for neural conversation models. In ternational Conference on Machine Learning - Vol- Proceedings of the 2016 Conference of the North ume 48, ICML’16, page 2933–2942. JMLR.org. American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 110–119, San Diego, California. Association Matteo Ciniselli, Nathan Cooper, Luca Pascarella, for Computational Linguistics. Denys Poshyvanyk, Massimiliano Di Penta, and Gabriele Bavota. 2021. An empirical study on the Jiwei Li, Will Monroe, Alan Ritter, Dan Jurafsky, usage of BERT models for code completion. CoRR, Michel Galley, and Jianfeng Gao. 2016b. Deep rein- abs/2103.07115. forcement learning for dialogue generation. In Pro- ceedings of the 2016 Conference on Empirical Meth- Imre Csiszár and Paul C. Shields. 2004. Information ods in Natural Language Processing, EMNLP 2016, theory and statistics: A tutorial. Commun. Inf. The- Austin, Texas, USA, November 1-4, 2016, pages ory, 1(4):417–528. 1192–1202. The Association for Computational Lin- guistics. Yuntian Deng, Anton Bakhtin, Myle Ott, Arthur Szlam, and Marc’Aurelio Ranzato. 2020. Residual energy- Chang Liu, Xin Wang, Richard Shin, Joseph E Gonza- based models for text generation. In 8th Inter- lez, and Dawn Song. 2016a. Neural code comple- national Conference on Learning Representations, tion. ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, and Kevin Murphy. 2016b. Optimization of image Yilun Du and Igor Mordatch. 2019. Implicit genera- description metrics using policy gradient methods. tion and modeling with energy based models. In CoRR, abs/1612.00370. Advances in Neural Information Processing Systems, Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey volume 32. Curran Associates, Inc. Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, Daxin Jiang, Duyu Tang, Ge Li, Li- Geoffrey E. Hinton. 2002. Training products of experts dong Zhou, Linjun Shou, Long Zhou, Michele Tu- by minimizing contrastive divergence. Neural Com- fano, Ming Gong, Ming Zhou, Nan Duan, Neel Sun- put., 14(8):1771–1800. daresan, Shao Kun Deng, Shengyu Fu, and Shujie Liu. 2021. Codexglue: A machine learning bench- Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, Jose mark dataset for code understanding and generation. Miguel Hernandez Lobato, Richard E. Turner, and CoRR, abs/2102.04664. Doug Eck. 2017. Tuning recurrent neural networks with reinforcement learning. Chris J. Maddison and Daniel Tarlow. 2014. Structured generative models of natural source code. In Pro- A. Karpathy, J. Johnson, and Li Fei-Fei. 2015. Visual- ceedings of the 31st International Conference on In- izing and understanding recurrent networks. ArXiv, ternational Conference on Machine Learning - Vol- abs/1506.02078. ume 32, ICML’14, page II–649–II–657. JMLR.org.
Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, Nguyen, and Tien N. Nguyen. 2013. A statistical and Wojciech Zaremba. 2016. Sequence level train- semantic language model for source code. In Pro- ing with recurrent neural networks. In 4th Inter- ceedings of the 2013 9th Joint Meeting on Foun- national Conference on Learning Representations, dations of Software Engineering, ESEC/FSE 2013, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, page 532–542, New York, NY, USA. Association for Conference Track Proceedings. Computing Machinery. Veselin Raychev, Pavol Bielik, and Martin Vechev. Art B. Owen. 2013. Importance Sampling. In Monte 2016. Probabilistic model for code with decision Carlo theory, methods and examples, chapter 9. trees. SIGPLAN Not., 51(10):731–747. Tetiana Parshakova, Jean-Marc Andreoli, and Marc Veselin Raychev, Martin Vechev, and Eran Yahav. 2014. Dymetman. 2019a. Distributional Reinforcement Code completion with statistical language models. Learning For Energy-Based Sequential Models. SIGPLAN Not., 49(6):419–428. CoRR. Baptiste Roziere, Marie-Anne Lachaux, Lowik Tetiana Parshakova, Jean-Marc Andreoli, and Marc Chanussot, and Guillaume Lample. 2020. Un- Dymetman. 2019b. Global Autoregressive Models supervised translation of programming languages. for Data-Efficient Sequence Learning. In Proceed- Advances in Neural Information Processing Systems, ings of the 23rd Conference on Computational Nat- 33. ural Language Learning (CoNLL), pages 900–909, Hong Kong, China. Association for Computational Rico Sennrich, Barry Haddow, and Alexandra Birch. Linguistics. 2016. Neural machine translation of rare words with subword units. In Proceedings of the 54th An- Ramakanth Pasunuru and Mohit Bansal. 2017. Re- nual Meeting of the Association for Computational inforced video captioning with entailment rewards. Linguistics (Volume 1: Long Papers), pages 1715– In Proceedings of the 2017 Conference on Em- 1725, Berlin, Germany. Association for Computa- pirical Methods in Natural Language Processing, tional Linguistics. EMNLP 2017, Copenhagen, Denmark, September 9- Richard S. Sutton, David McAllester, Satinder Singh, 11, 2017, pages 979–985. Association for Computa- and Yishay Mansour. 1999. Policy gradient methods tional Linguistics. for reinforcement learning with function approxima- Adam Paszke, Sam Gross, Francisco Massa, Adam tion. In Proceedings of the 12th International Con- Lerer, James Bradbury, Gregory Chanan, Trevor ference on Neural Information Processing Systems, Killeen, Zeming Lin, Natalia Gimelshein, Luca NIPS’99, page 1057–1063, Cambridge, MA, USA. Antiga, Alban Desmaison, Andreas Kopf, Edward MIT Press. Yang, Zachary DeVito, Martin Raison, Alykhan Te- Alexey Svyatkovskiy, Shao Kun Deng, Shengyu Fu, jani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, and Neel Sundaresan. 2020a. Intellicode compose: Junjie Bai, and Soumith Chintala. 2019. Py- Code generation using transformer. In Proceed- torch: An imperative style, high-performance deep ings of the 28th ACM Joint Meeting on European learning library. In H. Wallach, H. Larochelle, Software Engineering Conference and Symposium A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Gar- on the Foundations of Software Engineering, ES- nett, editors, Advances in Neural Information Pro- EC/FSE 2020, page 1433–1443, New York, NY, cessing Systems 32, pages 8024–8035. Curran Asso- USA. Association for Computing Machinery. ciates, Inc. Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Romain Paulus, Caiming Xiong, and Richard Socher. Maik Riechert, Juliana Franco, and Miltiadis Alla- 2018. A deep reinforced model for abstractive sum- manis. 2020b. Fast and memory-efficient neural marization. In 6th International Conference on code completion. CoRR, abs/2004.13651. Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Pradyumna Tambwekar, Murtaza Dhuliawala, Lara J. Track Proceedings. OpenReview.net. Martin, Animesh Mehta, Brent Harrison, and Mark O. Riedl. 2019. Controllable neural story plot Alec Radford, Jeffrey Wu, Rewon Child, David Luan, generation via reward shaping. In Proceedings of Dario Amodei, and Ilya Sutskever. 2019. Language the Twenty-Eighth International Joint Conference on models are unsupervised multitask learners. OpenAI Artificial Intelligence, IJCAI 2019, Macao, China, Blog, 1(8):9. August 10-16, 2019, pages 5982–5988. ijcai.org. Marc’Aurelio Ranzato, Y-Lan Boureau, Sumit Chopra, Lifu Tu, Richard Yuanzhe Pang, Sam Wiseman, and and Yann LeCun. 2007. A unified energy-based Kevin Gimpel. 2020. Engine: Energy-based infer- framework for unsupervised learning. In Pro- ence networks for non-autoregressive machine trans- ceedings of the Eleventh International Conference lation. ArXiv, abs/2005.00850. on Artificial Intelligence and Statistics, AISTATS 2007, San Juan, Puerto Rico, March 21-24, 2007, Ronald J. Williams. 1992a. Simple statistical gradient- volume 2 of JMLR Proceedings, pages 371–379. following algorithms for connectionist reinforce- JMLR.org. ment learning. Mach. Learn., 8:229–256.
Ronald J. Williams. 1992b. Simple statistical gradient- following algorithms for connectionist reinforce- ment learning. In Machine Learning, pages 229– 256. Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pier- ric Cistac, Tim Rault, Rémi Louf, Morgan Funtow- icz, and Jamie Brew. 2019. Huggingface’s trans- formers: State-of-the-art natural language process- ing. CoRR, abs/1910.03771. Chunyang Xiao, Marc Dymetman, and Claire Gardent. 2016. Sequence-based structured prediction for se- mantic parsing. In Proceedings of the 54th An- nual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1341– 1350, Berlin, Germany. Association for Computa- tional Linguistics. Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103. Yaoming Zhu, Sidi Lu, Lei Zheng, Jiaxian Guo, Weinan Zhang, Jun Wang, and Yong Yu. 2018. Texy- gen: A benchmarking platform for text generation models. In The 41st International ACM SIGIR Con- ference on Research & Development in Information Retrieval, SIGIR 2018, Ann Arbor, MI, USA, July 08- 12, 2018, pages 1097–1100. ACM. Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu, Tom B. Brown, Alec Radford, Dario Amodei, Paul Chris- tiano, and Geoffrey Irving. 2019. Fine-tuning lan- guage models from human preferences. CoRR, abs/1909.08593.
A Hyperparameters and implementation details We implemented all models using PyTorch (Paszke et al., 2019) and HuggingFace (Wolf et al., 2019). Training the initial generative model took 10 days on 3 Nvidia Tesla T4 GPUs. For a detailed list of hyperparameter values, see Table 1. Hyperparameter Value base LM gpt2-small number of params 117m number of layers 12 number of heads 12 vocabulary size 50257 sequence length 128 hidden state size 768 activation function gelu optimizer Adam (Kingma and Ba, 2014) initial learning rate 5 × 10−5 learning rate scheduler linear batch size 24 total gradient updates 20069 dropout rate 0.1 Table 1: Hyperparameters used for training the initial generative model a The implementation of KL-DPG was based on code published by Khalifa et al. (2021).9 Each fine- tuning run took approximately 5 days on 2 Nvidia V100 GPUs. For a detailed list of hyperparameter values, see Table 2. Hyperparameter Value optimizer Adam (Kingma and Ba, 2014) learning rate α(θ) 1.41 × 10−6 learning rate scheduler linear batch size 2048 warmup gradient updates 100 total gradient updates 250 sequence length 128 dropout rate 0.1 Table 2: Hyperparameters used for training πθ using KL-DPG and Reinforce 9 https://github.com/naver/gdc
b(x) Program def test_3_invalid(self): serializer = serializer.validated_manager['quarterly_ cred'] 0 serializer.user = 'token' self.verify_token(epsg = serializer.DBModes,[serializer.user]) def delete(self,username,password = None): if username: 0 if username.startswith("oil",None)or username.startswith('"",True): raise HttpRequest() db.model.delete.assert_called_with(username,'password') def mode(self): 1 self._mode = 'modeM_GB' return self def _update_update_tbl(self,new_worksheet): self._merge_tbl(new_worksheet,old_worksheet) self._create_where('x1') self._update_tbl('x1',{ }).extend([str(new_fh.getvalue())) 0 self._clear_sql() self.clear_lstrip() self.pop.set('x1')[int(col)for param in['x1','y1']] self.flush.update() def _callResourceCost(self,server): response = urllib.Request('GET','//api//log//%s//detected//' % server.id) 1 body = urllib. urllib2.urlencode(body) response.headers['X-Basic-Control-Authorization']= self.oauth_client.Client.CertResponse(response.body) return response def _pre_save(self,data): 0 self.calculate_updates([item.resolve(data['output')]= yield ,→ data['output'].find('top',['mybounce','geodeIB']))) def read(self): self.offset -= 1 1 start = O8(self) while time.time()- start: return self.get_index(start) def Pub(self): r = PCHAP() r['where']= struct.unpack('!T',self.digest)) 0 response = MKchronosOPS('R') self.sendMessage(response) return self.Response(response) def __init__(self,current_node): self.current_node = current_loadbalancer self.assign_current_node = None 1 self.parenting = None if self.menu: self.getNodeSelector(Index(RemovelineToRow,self.parent.position),0,2.0,5.0) self.show_parent() def get_response_data(self): return { 1 ,→ 'from_blob_client':self.to_blob_key,'as_blob_secret':self.to_project_secret.to_secret(),'json':self.to_storage ,→ } def put(self,key,expire = True): if not invert: dict = { } 0 dict.update(key,self.__TestStepities[key]) self.cs.put(self._uZED_ATTRIBUTES_ =[("sequential_command","duration",key,expire)]= "//?modified:%r" % ,→ key,queue_text = self.__kneeators["expires"]) def testPath(self): t = Gaffer.Reader(self.callback) dupe = "" 1 f.mkdir(t) f = sys.stdout.tell() f.write('_') self.assertEqual(f,dataponCollision) def get_count(self): 1 return self.get_implicit_count() def is_alive(self): 1 return(self.pid,)and(self.pid == 400) Table 3: Sequences sampled from the original generative model a
b(x) Program def fetch_size(self,page): response = self.fetch(page,max((2)) 0 constant(response.json(),response.pop('utf-8')) payload = "%s//%s//%s//%s//%s" %(self.resource.id,page.format_from_bytes()) return payload def setUp(self): self.project_loader = testutil.FileSentenceDependencyGraph(extensions =['file','path']) 0 self.schema =RelatedPackage preserveLoader(root_loader) self.extension_context = XMLLoader() def __getattr__(self,perm): 1 return self._memo.get(perm) def expand(self,text): 1 value.strip() return extract_cseq(text) def test_Obze(self): 1 w = Command() self.assertEqual(w.callHeader.callHeader,self.result) def start_stream(self,addressFamily,opcode): logger.info("OpenlibwriteStructBegin chunkon.csv',OperationalError()) error_message = self.get_stream([None,None]) 0 message,message = self.block_messages[0] message = message[0] self._process_message(message,message,message,message) def set_dense(self,srs,fit_to): if dup in self.scalar: return 0 if not isinstance(modality,(pyobj): self.sq =SUBNET self.basic = asim.bin.sample(srs,rng = self.ctypes,trials = self.rng,dtype = self.dtype) def _act(self,value): 1 self._result.set_argument('value',value) def _verify_ssling_access_admin(self,ip_name): 1 self._check_proxy(ip_name) def __str__(self): r =[] for s in self.__dict__.items(): 0 if s[0]in BoundCacheContents(): break if s[:- 1]:Elements([("Unsupported Ct%s]" % ','.join(self.__class__.__name__)) return "Data attribute '%s' % ','.join("%sCHOICES from %s" %(WARNING,str(r))) def test_FaceIP_3D_14(self): 0 self.assertTrue(self.doTestFace(self.doTestFace([self.doTestFace([False,False)]) def __init__(self,** options): super(_ChoiceTest,self).__init__(** options) 0 self.action_classes = options["cells_store"] self.choices =(1.2,** options["mysql"]= FakeMissingTuple()) self.parser = Message(list.__init__(option_forms)) def main(self,client): 1 remove_home_config(client,"client_snapshot_url") self.client.client_snapshot.update(client) def _stop_signal(self,emitter,datafile,for_attachment): 1 vim.gui.target_cancel() Table 4: Sequences sampled from a policy fine-tuned using KL-DPG
b(x) Program def invalidateKey(self): 1 self.action.rooms = { } def get(self): 1 return self.handler.identifier def flush(self): 1 self.write("ready") def get_flavor(self,resource,path,** metadata): 1 return self.context.get(resource,path,** metadata) def test_api_set_to_result(self): 1 X = T.ListHead() self.assertEquals(quantiles(X),self._cache.annotations) def is_cmp(self,other): 1 return not self._safe_eq(other,self.link) def __iter__(self): 1 return iter(self._reverse()) def cancel(self): 1 return self.enhanced_window.set_timeout() def __str__(self): 1 return str(self.repository) def summary(self): 1 return self._series def Lazypeer(self): 1 return self._peer def ByteSize(self): n = 0 1 n += self.lengthString(len(self.parameters_)) return n + self.lengthString(number(self.value_)) def setUp(self): super(TestMaUserRoleTestCase,self).setUp() 1 self.core =BER() self.topsetup_existing = False def __init__(self,** kwargs): 1 self.sourcemersListComp = kwargs.get('stretch {}'.format(self.__class__.twsourceCentOS_text)) Table 5: Sequences sampled from a policy fine-tuned using Reinforce with R(x) = b(x)
b(x) Program def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) def set_OwnerId(self,OwnerId): 1 self.add_query_param('OwnerId',OwnerId) Table 6: Sequences sampled from a policy finetuned using Reinforce with R(x) = P (x)
b(x) Program Sequences sampled from the original generative model a def closeEvent(self): 1 self._isalive = False self._original_resume = True def close_file(self): 1 pass def closeWorking(self): 1 pass Sequences sampled from a policy fine-tuned using KL-DPG def close(self): if not self.closed: 1 self.closed = True self.translation.close() def close(self): self.queue.Importer.close(self.info) 1 self.open_input.close() self.graph.close(self.gamma) def close(self): try: 1 self.srv.get_browser.mac(self.bus_process.name,vm_output = True) except suspended as ex: self.socket.stop(ex) Sequences sampled from a policy fine-tuned using Reinforce with R(x) = b(x) def close(self): 1 self._stdout.close() def close(self): 1 self.idb.close() def close(self): self.reuse = subprocess.Popen('CONNECTION','').unregisterProducer() 1 p = subprocess.Popen() p.communicate().close() return u.close() Sequences sampled from a policy fine-tuned using Reinforce with R(x) = P (x) def close(self,object): 1 self.api.close(self.uid.length) def close(self): 1 self.job_closed.remove(self) def close(self): 1 self.buffer.flush() Table 7: Samples obtained from policies conditioned on prompt def close
b(x) Program Sequences sampled from the original generative model a def fit_pdf(self,hop,theta,theta): asserttriangular is self._fit_rewrite(hop, kernel,theta,theta)- gtheta,70) assertworkspace isTType.ACCEPTED_ignore 0 assert subset in(coeff,Y) assert self._Xfd != xOpenStackBackendError assert isinstance(750,Win,T,Vector) def fit(self,X,y): self._ y = y self._children -= 1 assert isinstance(self._labels,_MOD_'") x[:]= 0 0 y[:]=Bio_OFFSET y *= self._labels y * y * y y //= y return y def fit(self,X = None,y = None,result = None): 1 sts = self.get_appId(self.mesh_filename,X,y = y,d = result) self.mirror_logpdf([0x9]* indented) Sequences sampled from a policy fine-tuned using KL-DPG def fit(self,X,y,* args,** kwargs): X = self.transform(X,y,* args,** kwargs) data = np.DataFrame(data) 1 for i in self.fallback_array.iteration_two(* data): data[i].labels[i].tolist() return data def fit(self, initial_output = None): if initial_output: self.force_input = False else: 0 self.cells_done = tuple(initial_output) if initial_input == self.WK_MASK: self.output_output += self.osfstorage_NORMAL self.outputs = list([self.inputState.NORMAL_READ valid]) return 1 def fit(self,reshape,a,b): 1 return frappe. filediff(islice(a,b),b) Sequences sampled from a policy fine-tuned using Reinforce with R(x) = b(x) def fit(self,X,y): 1 self.x = y def fit(self,fit,d): 1 self.fit =followers return super(PositionUntilLockedSequence,self).fit(marks) def fit(self,X_acc): X_exog = self.xc1.exog y = self.instance.exog y,= self.model.w2 preserve_uniform(os.environ.XMANllf,y_y)) 0 y += self.model.t2le continX y = self.transition.fit(y) y.y = self.model.y * y y.red = self.model.gw.urmpopow(y) return y Sequences sampled from a policy fine-tuned using Reinforce with R(x) = P (x) def fit(self,fit,X,y,z): 0 self.learning = indices[np.zeros(axis = 1Dot,y = y,motion = self. np.loss,y = res.scale)] self.index = y def fit(self,params): 1 self.params_param = params def fit(self,X,y = None): 1 self.x = x self.y = x Table 8: Samples obtained from policies conditioned on prompt def fit
b(x) Program Sequences sampled from the original generative model a def generate_samples_with_prompt(self,input_value,decimal = False): use_full = False 0 full_input_string = escape_input[decimal] newprefix = local_input_format.split("
You can also read