Predicting Electron Paths
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Predicting Electron Paths John Bradshaw Matt J. Kusner Brooks Paige arXiv:1805.10970v1 [physics.chem-ph] 23 May 2018 University of Cambridge Alan Turing Institute Alan Turing Institute Max Planck Institute, Tübingen University of Warwick University of Cambridge jab255@cam.ac.uk mkusner@turing.ac.uk bpaige@turing.ac.uk Marwin H. S. Segler José Miguel Hernández-Lobato BenevolentAI University of Cambridge marwin.segler@benevolent.ai Alan Turing Institute jmh233@cam.ac.uk Abstract Chemical reactions can be described as the stepwise redistribution of electrons in molecules. As such, reactions are often depicted using “arrow-pushing” diagrams which show this movement as a sequence of arrows. We propose an electron path prediction model (E LECTRO) to learn these sequences directly from raw reaction data. Instead of predicting product molecules directly from reactant molecules in one shot, learning a model of electron movement has the benefits of (a) being easy for chemists to interpret, (b) incorporating constraints of chemistry, such as balanced atom counts before and after the reaction, and (c) naturally encoding the sparsity of chemical reactions, which usually involve changes in only a small number of atoms in the reactants. We design a method to extract approximate reaction paths from any dataset of atom-mapped reaction SMILES strings. Our model achieves state-of-the-art results on a subset of the UPSTO reaction dataset. Furthermore, we show that our model recovers a basic knowledge of chemistry without being explicitly trained to do so. 1 Introduction The ability to reliably predict the products of chemical reactions is of tremendous importance for areas as diverse as health care, renewable energy, and construction, providing molecules which serve as medicines, energy capturing devices, and nanomaterials. Theoretically, all chemical reactions can be described by the stepwise rearrangement of electrons in molecules [6]. This sequence of bond-making and breaking is known as the reaction mechanism. Understanding the reaction mechanism is crucial because it not only determines the products (formed at the last step of the mechanism), but it also provides insight into why the products are formed on an atomistic level. Mechanisms can be treated at different levels of abstraction. On the lowest level, quantum-mechanical simulations of the electronic structure can be performed, which is prohibitively computationally expensive for most systems of interest. On the other end, chemical reactions can be treated as rules that “rewrite” reactant molecules to products, which abstracts away the individual electron redistribution steps into a single, global transformation step. To combine the advantages of both approaches, chemists use a tremendously powerful intermediate conceptual model, which simplifies the stepwise electron shifts using sequences of arrows which indicate the path of electrons throughout molecular graphs [6]. Preprint. Work in progress.
product prediction mechanism prediction target 2 target target 3 1 + + reactant 1 reactant 2 reagent product 1 product 2 reactant 1 reactant 2 reagent product 1 product 2 Figure 1: (Left) The reaction product prediction problem: Given the reactants and reagents, predict the structure of the product. (Right) The reaction mechanism prediction problem: Given the reactants and reagents, predict how the reaction occurred to form the products. Recently, there have been a number of machine learning models proposed for directly predicting the products of chemical reactions [2, 7, 20, 21, 22, 24], largely using graph-based or machine translation models. The task of reaction product prediction is shown on the left-hand side of Figure 1. In this paper we propose a machine learning model to predict the reaction mechanism, as shown on the right-hand side of Figure 1, for a particularly important subset of organic reactions. We argue that not only is our model more interpretable than product prediction models but it is easier to encode in it the constraints imposed by chemistry. Proposed approaches to predicting reaction mechanisms have been based on combining hand-coded heuristics and quantum mechanics [1, 11, 17, 18, 23, 26], rather than machine learning. We call our model E LECTRO, as it directly predicts the path of electrons through molecules (i.e., the reaction mechanism). To train the model we devise a general technique to obtain approximate reaction mechanisms purely from data about the reactants and products. This allows one to train our a model on large, unannotated reaction datasets such as USPTO [16]. We demonstrate that not only does our model achieve state-of-the-art results, surprisingly it also learns chemical properties it was not explicitly trained on. 2 Background In Figure 1 we show the two challenges we tackle in this paper. On the left we show product prediction, where the goal is to predict the reaction products, given a set of reactants and reagents. However, for this task we do not care how the reactants react. This job of finding the how is the target of reaction mechanism prediction. Before describing how our model predicts mechanisms, we use this section to detail the types of reactions we consider in this paper, and their properties. Molecules and Chemical reactions. Organic molecules (those involving predominantly carbon) can be modeled as a graph structure, where each node is an atom and each edge is a covalent bond. These covalent bonds represent the fact that one or more pairs of electrons are shared between the atoms that the bond connects. Just as electrons describe the current structure of molecules, they also describe how molecules react with other molecules to produce new ones. All chemical reactions involve the stepwise movement of electrons along the atoms in a set of reactant molecules. This movement causes the formation and breaking of chemical bonds that changes the reactants into a new set of product molecules [5]. Reaction mechanisms can be classified by the topology of their "electron-pushing arrows". Here, the class of reactions with linear electron flow topology is by far the most important, followed by those with cyclic topology [5]. In this work, we will only consider reactions with linear topology that involve the movement of pairs of electrons. Reactions as single electron paths. If reactions fall into this class, then a chemical reaction can be modeled as pairs of electrons moving in a single path through the reactant atoms. Further, this electron path will alternately remove existing bonds in molecules, and form new ones. We show this alternating structure in the right hand part of Figure 1. The reaction formally starts by taking the pair of electrons between the Li and C atoms and moving them to the C atom (step 1); this is a remove bond step. Next comes an add step where electrons are moved from the C atom to form a bond between the two reactant molecules (step 2). Then a pair of electrons are removed between the 2
C and O atoms and moved to the O atom, giving rise to the products (step 3). Predicting the final product is thus a byproduct of predicting this series of electron steps. However, there are a number of benefits of predicting electron paths over predicting the outcomes of reactions directly (as is done in previous work [7, 20]): • Easy to interpret: If the model makes a mistake, it is easy to see where, and possibly why, it goes wrong by comparing the steps of the path with the correct steps. • Sparse: Reactions often only affect between 3 and 7 atoms out of anywhere from 10-50 reactant atoms. Modeling the reaction as a path allows us to exploit this sparsity. • Chemically consistent: Learning a path allows us to easily incorporate chemical constraints, such as the alternating removal and addition of bonds, among others. • Generalizable: As reaction paths follow similar trends in different molecules, our model naturally generalizes to unseen molecules. The only other work we are aware of to use machine learning to predict reaction mechanisms are [3, 8, 9, 10]. All of these model a chemical reaction as an interaction between atoms which function as electron donors and those which function as electron acceptors. They predict the reaction mechanism via two independent models: one that identifies these likely electron sources and sinks, and another that ranks all combinations of them. However, this combining and then ranking of electron sources and sinks can be a slow process, as many plausible reactions need to be considered (the number of subgraphs of n reacting atoms, where single bonds are either added or removed is 2n(n−1)/2 ). These models also have only so far been successfully trained on small hand-curated datasets. 3 Model In this section we define a probabilistic model that describes the movement of electrons that define linear topology reactions. We represent a set of molecules as a set of graphs M, with atoms A as vertices and bonds B as edges; each connected component of the graph defines an individual molecule. We can associate an ordering over all the atoms in all the molecules in the set using an atom map number: an integer label assigned to each non-hydrogen atom in both the reactants and the products which both permits easy matching between atoms before and after the reaction, and gives us a consistent way to index particular atoms. Each atom v ∈ A includes a set of features, such as its atom type (e.g. carbon, oxygen, . . . ); the full list of input atom features can be found in Table 3 of the appendix. Molecules input into the model are first put in a Kekulé form, a process which makes explicit the location of single and double bonds in aromatic structures; each bond b ∈ B is either a single, double, or triple bond. Given an initial set of reactant molecules M0 and a set of reagent molecules Mr , our model defines a conditional distribution over a sequence of atoms (which we also refer to as actions) P0:T = (a0 , a1 , . . . , aT ), which fully characterizes the electron path. This electron path in turn deterministically defines both a final product MT +1 , denoting the outcome of the reaction, as well as a sequence of intermediate products Mt , for t = 1, . . . , T , which correspond to the state of the graph after the first t steps in the subsequence P0:t = (a0 , . . . , at ) are applied to the initial M0 . We also define a stopping sequence St = (s0 , . . . , sT +1 ) which indicates if the reaction should stop (i.e, st = 1 if the reaction should stop and is 0 otherwise). We propose to learn a distribution pθ (P0:T | M0 , Mr ) over electron movements. We first detail the generative process that specifies pθ , before describing how to train the model’s parameters, θ. 3.1 Generative process First note that since our reactions are a single path of electrons through the reactants then, at any point, the next step in the path depends only on (i) the intermediate molecule formed by the action path up to that point, (ii) the previous action taken (indicating where the free pair of electrons are) and (iii) the point of time through the path, indicating whether we are on an add or remove bond step. We make the simplifying assumption that the stop probability and the actions after the initial action 3
pstart premove pstop Yes previously selected selected No 1 reactant 1 reactant 2 2 probability of selection 3 masking bond to remove 4 electron pair padd pstop bond to remove premove pstop Yes Yes No No 5 bond to add 6 7 8 9 product 1 product 2 0 1 probability Figure 2: This figure shows the sequence of actions in transforming the reactants in box 1 to the products in box 9. The sequence of actions will result in a sequence of pairs of atoms, between which bonds will alternately be removed and created, creating a series of intermediate products. At each step the model sees the current intermediate product graph (shown in the boxes) as well as the previous action, if applicable, shown by the grey circle. It uses this to decide on the next action. We represent the characteristic probabilities the model may have over these next actions as colored circles over each atom. Some actions are disallowed on certain steps, for instance you cannot remove a bond that does not exist; these blocked actions are shown as red crosses. a0 do not depend on the reagents. This leads to a parameterized model with dependency structure: pθ (P0:T | M0 , Mr ) = pθ (s00 | M0 )pθ (a0 | M0 , Mr ) (1) "T # Y 0 × pθ (st | Mt )pθ (at | Mt , at−1 , t) pθ (sT +1 | MT +1 ), t=1 where we have defined pθ (s0t | Mt ) ≡ 1 − pθ (st | Mt ) to be the probability of continuing a reaction given the current molecule set Mt . The other terms include pθ (a0 | M0 , Mr ), the probability of the initial state a0 given the reactants and reagents; the conditional probability pθ (at | Mt , at−1 , t) of next state at given the intermediate products Mt for t > 0; and the probability pθ (st | Mt ) that the reaction terminates with final product Mt . It is possible to stop prior to selecting a first atom a0 , indicating that no reaction would take place. However, we restrict our model to not stop at step t = 1, as it is necessary to pick up a complete electron pair. Given any particular selected atom at which extends the reaction path, we can deterministically update the previous molecular graph Mt to produce the next set of (intermediate) products Mt+1 . Given our reaction assumptions, then, as stated earlier, there are two types of electron movements that alternate: (i) movement that removes an existing bond, and (ii) movement that adds a new bond. We define atoms with free electrons as having a self-bond. Thus, all reactions start by first selecting an atom, removing a bond (between two different atoms, or a self-bond), and then alternately adding and removing bonds; we can determine whether a particular step is an add step or a remove step by inspecting t. Note that M1 = M0 , as the initial action of selecting a0 does not form or remove any bonds. Figure 2 presents a simple example reaction which demonstrates all the critical features of the model; the subfigures show the sequence of intermediate products and the distributions over actions. Each of the conditional probabilities in Eq. (1) are parameterized by neural networks. At each step of the electron path, the network takes the current intermediate graphs, the previous action, and the reagents if relevant, and computes a probability distribution over next possible actions (i.e., selecting a particular atom, or stopping). The structure of these networks is described in the following section. 3.2 Computing atom and molecule features We are left now with defining the functional form of our conditional distributions for continu- ing pθ (s0t | Mt ), picking the initial action pθ (a0 | M0 , Mr ), and picking subsequent actions pθ (at | Mt , at−1 , t). However, before describing these modules we need to describe how we com- 4
pute node embeddings and graph embeddings, as these are essential to each. Full architectural details (e.g. number of layers and hidden units) and training settings are deferred to the appendix. Node embeddings are representations of all the atoms in all the molecules present in Mt . We denote them by the matrix HMt ⊆ R|A|×d . Each row contains a d-dimensional embedding of an atom. A natural ordering for the rows are the atom-map numbers assigned to each atom. We define the function gA to map a set of graphs representing each molecule to these node embeddings. In general gA could be any graph model that uses the graph structure of Mt to get graph-isomorphic node features, usually via message-passing techniques [4]; we choose to use gated graph neural network (GGNN) message functions [13]. It is also useful to be able to calculate graph embeddings hg , which are vectors that represent groups of nodes belonging to one or more graphs; i.e. an entire molecule or set of molecules. We define the function that maps node features belonging to each atom to their graph embedding by r. These are similar to the readout functions used for regressing on graphs detailed in [4, Eq. 3] and the graph embeddings described in Li et al. [14, §B.1]. Specifically, r consists of three functions, fgate , fup and fdown , which could be any multi-layer perceptron (MLP) but in practice we find that linear functions suffice. They are used to form the graph embedding, hg , as ! X hg = r(HMt ) = fdown σ (fgate (HMt ,v )) fup (HMt ,v ) . (2) v∈A0 Where σ is a sigmoid function. We can break this equation down into two stages. In stage (i), similar to Li et al. [14, §B.1], we form an embedding of one or more molecules (with vertices A0 and with A0 ⊂ A) by performing a gated sum over the node features. In this manner the function fgate is used to decide how much that node should contribute towards the embedding, and fup projects the node embedding up to a higher dimensional space; following Li et al. [14, §B.1], we choose this to be double the dimension of the node features. Having formed this embedding of the graphs, we project this down to a lower dimensional space in stage (ii), which is done by the function fdown . 3.3 Computing probabilities over actions Having described how we compute node and graph embeddings, we are now ready to define each of our parameterized distributions over actions. The simplest of these is pθ (s0t | Mt ), which is the probability of continuing given the set of intermediate products at time t. This probability is computed from a graph embedding via the function rstop , which projects down to a single dimension. We then map this to the [0, 1] interval via the sigmoid function σ which yields the overall expression pθ (s0t | Mt ) = σ(rstop (HMt )). (3) Each of the three parameterized conditional probability distributions for the start, add and remove steps have similar forms, each defining a probability vector over actions. The transition distribution pθ (at | Mt , at−1 , t) which selects the next atom in the sequence P can be split into two distributions depending on the parity of t: the remove bond step distribution premove θ (at | Mt , at−1 ) is used when t is odd, and the add bond step distribution paddθ (at | M t , at−1 ) is used when t is even. These three modules each have the same overall functional form s = f (HMt , c), (4) pθ (at | · · · ) ∝ β softmax(s) (5) where f is one of the networks fstart , fadd , or fremove ; c is a context vector, and β is a binary mask. Each of the three actions has a different context and mask. The add step padd θ (at | Mt , at−1 ) and remove step premove θ (at | Mt , at−1 ), have as context the node embedding of the atom selected at the previous step, cat−1 = HMt ,at−1 . For the initial step, this context vector creagent is an embedding of all the reagents present, computed by a graph embedding function rreagent . When computing the output probabilities, we use the binary vector β to mask out specific actions known to be impossible. The value of this differs for the start, add and remove steps; for the start step any action can be picked, so β is the all-ones vector. For the remove step, βremove masks out (i.e. is set to zero for) any bonds that do not currently exist and thus cannot be removed (noting though that self-bonds are permitted in the first remove step). For the add step, βadd only masks out the previous action, preventing the model from stalling in the same state for multiple time-steps. 5
Training We can learn the parameters θ of all the parameterized functions, by maximizing the log likelihood of a full path log pθ (P0:T | M0 , Mr ). This is evaluated by using a known electron path a?t and intermediate products M?t extracted from training data, rather than on simulated values. This allows us to train on all stages of the reaction at once, given electron path data. We train our models using Adam [12] and an initial learning rate of 10−4 , with minibatches of size one, where minibatches often consist of multiple intermediate graphs. Prediction Once trained, we can use our model to sample chemically-valid paths given an input set of reactants M0 and reagents Mr , simply by simulating from the conditional distributions until sampling a stop value st . We instead would like to find a ranked list of the top-K predicted paths, and do so using a modified beam search, in which we roll out a beam of width K until a maximum path length T max , while recording all paths which have terminated. This search procedure is described in detail in Algorithm 1 in the Appendix. 4 Reaction Mechanism Identification Our dataset for evaluating our model is a collection of chemical reactions extracted from the US patent database [15]. We take as our starting point the 479,035 reactions, along with the training, validation, and testing splits which were used by Jin et al. [7], referred to as the USPTO dataset. This data consists of, per reaction, a group of bond changes and reaction SMILES strings [25]. The bond changes indicate pairs of atoms which are connected differently in the reactants and products. The SMILES strings encode the molecules present in a text based format. Before we can apply our method, we perform two data preprocessing tasks described in the subsections below (using the open-source chemo-informatics software RDKit [19]). These steps automatically extract a subset of data appropriate for training our model of electron movement during a reaction. 4.1 Reactant and reagent separation Typically, reaction SMILES strings are split into three parts — reactants, reagents, and products. The reactant molecules are those which are consumed during the course of the chemical reaction to form the product, while the reagents are any additional molecules which provide context under which the reaction occurs (for example, catalysts), but do not explicitly take part in the reaction itself; we see this in the example in Figure 1. Unfortunately, the USPTO dataset as extracted does not differentiate between reagents and reactants. We elect to preprocess the entire USPTO dataset by separating out the reagents from the reactants using the process outlined in Schwaller et al. [20], where we classify as a reagent any molecule for which either (i) none of its constituent atoms appear in the product, or (ii) the molecule appears in the product SMILES completely unchanged from the pre-reaction SMILES. This allows us to properly model molecules which are included in the dataset but do not materially contribute to the reaction. 4.2 Identifying reactions with linear electron topology To train our model, it is necessary to extract a ground-truth representation of the electron paths from the SMILES strings and bond changes. Furthermore, not every reaction in the USPTO dataset has a linear electron topology; such reactions (for example, multi-step reactions and cycloadditions) will not have a single unique path through the atoms which describes the movement of the electrons. The first step is to look at the bond changes present in a reaction. Each atom on the ends of the path will be involved in exactly one bond change; the atoms in the middle will be involved in two. We can then line up bond change pairs so that neighboring pairs have one atom in common, with this ordering forming a path. For instance, given the pairs "11-13, 14-10, 10-13" we form the unordered path "14-10, 10-13, 13-11". If we are unable to form such a path, for instance due to two paths being present as a result of multiple reaction stages, then we discard the reaction. For training our model we want to find the ordering of our path, so that we know in which direction the electrons flow. To do this we examine the changes of the properties of the atoms at the two ends of our path. In particular, we look at changes in charge and attached implicit hydrogen counts. The gain of negative charge (or analogously the gain of hydrogen as H+ ions without changing charge) 6
Table 1: Results when using E LECTRO Accuracies (%) for reaction mechanism prediction. Here we count a prediction as correct if the Model Name Top-1 Top-2 Top-3 Top-5 atom mapped action sequences predicted E LECTRO -L ITE 70.3 82.8 87.7 92.2 by our model match exactly those ex- E LECTRO 77.8 89.2 92.4 94.7 tracted from the USPTO dataset. indicates that electrons have arrived at this atom, implying that this is the end of the path; vice-versa for the start of the path. However, sometimes the difference is not available in the USPTO data, as unfortunately only major products are recorded, and so details of what happens to some of the reactant molecules’ atoms may be missing. In these cases we fall back to using an element’s electronegativity to estimate the direction of our path, with more electronegative atoms attracting electrons towards them and so being at the end of the path. The next step of filtering checks that the path alternates between add steps (+1) and remove steps (-1). This is done by analyzing and comparing the bond changes on the path in the reactant and product molecules. Reactions that involve greater than one change (for instance going from no bond between two atoms in the reactants to a double bond between the two in the products) can indicate multi-step reactions with identical paths, and so are discarded. Finally, as a last sanity check, we use RDKit to produce all the intermediate and final products induced by our path acting on the reactants, to confirm that the final product that is produced by our extracted electron path is consistent with the major product SMILES in the USPTO dataset. The end result of this is extracted reaction paths for those entries in the USPTO dataset which correspond to reactions of linear topology. This comprises 73% of the dataset, containing 349,898 total reactions, of which 29,360 form the held-out test set. 5 Experiments and Evaluation In our experiments we consider two variants of our model: the first model E LECTRO is exactly as defined in Section 3, including all reagent information; a second version we call E LECTRO -L ITE ignores the reagents when selecting the initial action (i.e. no context vector creagent is input into fstart ), allowing us to gauge the importance of reagents in determining what reaction occurs. We now evaluate our model on the reaction mechanism prediction and reaction product prediction problems. 5.1 Reaction Mechanism Prediction For mechanism prediction we are interested in ensuring we obtain the exact sequence of electron actions correctly. For instance, when forming a bond between two pairs of atoms we want to know which one of the atoms donated the electron pair needed to form the bond, even if the end result is the same. The representation of the reaction mechanism produced by our model is a sequence of atoms, detailing the path taken by the electrons in a series of alternating steps in which bonds are broken and formed; using the atom mapping from the USPTO dataset this takes the form of a series of integers. The most straightforward approach then to evaluate our accuracy at predicting reaction mechanisms is to check whether the sequence of integers extracted from the raw data as described earlier is an exact match with the sequence of integers output by E LECTRO; the top-1, top-2, top-3, and top-5 accuracies evaluated in this manner are reported in Table 1. 5.2 Reaction Product Prediction Reaction mechanism prediction is useful for ensuring that we formed the correct product in the correct way. However, this can underestimate the model’s actual predictive accuracy: although a single atom mapping is provided as part of the USPTO dataset, in general atom mappings are not unique (e.g., if a molecule contains symmetries). Here this would manifest as multiple different sequences of integers which correspond to chemically- identical electron paths. The first figure in the appendix shows an example of a reaction with symmetries, where different electron paths produce the same product. 7
Table 2: Results when using E LECTRO for Accuracies (%) reaction product prediction, following the product matching procedure in Section 5.2. Model Name Top-1 Top-2 Top-3 Top-5 The WLDN accuracy is computed using E LECTRO -L ITE 78.2 87.7 91.5 94.4 their evaluation code and pretrained model E LECTRO 87.0 92.6 94.5 95.9 outputs on our test set. We were unable to evaluate the Seq2Seq model on our test WLDN [7] 84.0 89.2 91.1 92.3 set, instead quoting their reported numbers Seq2Seq [20] 80.3? 84.7? 86.2? 87.5? with an asterisk. 1st Choice Br 2nd Choice Cl Br 3rd Choice OH Cl B OH HO CH3 I Figure 3: (Left) Nucleophilic substitutions SN 2-reactions, (right) Suzuki-coupling (please note that in the “real” mechanism of the Suzuki coupling, the reaction would proceed via oxidative insertion, transmetallation and reductive elimination at a Palladium catalyst. As these mechanistic details are not contained in training data, we treat Palladium implicitly as a reagent). In both cases, our model has correctly picked up the trend that halides lower in the period table usually react preferably (I > Br > Cl). Recent machine learning approaches to reaction product prediction [7, 20] have evaluated whether the major product reported in the test dataset matches predicted candidate products generated by their system, independent of mechanism. In our case, the top-5 accuracy for a particular reaction may include multiple different electron paths that ultimately yield the same product molecule. Identifying whether two product molecules are chemically the same is equivalent to solving a graph isomorphism over the atoms and bond types, comparing the output of our system to the product molecule. To perform this comparison, we take the sequence of edits on the reactants graph defined by an electron path, apply these edits to define a product graph, and then define a deterministic mapping from the edited graph to a canonical string representation. This is done by mapping the molecule to Kekulé form, then applying the sequence of edits to the reactants graph, setting explicit charges or hydrogen counts on the first and last atom in the electron path in order to satisfy valence constraints. We then strip all atom map numbers from the graph. If this graph corresponds to a valid product molecule, we can use RDKit to express the molecule in a canonical SMILES string format (predicted electron paths which yield chemically infeasible products are considered failures). We can then evaluate whether a predicted electron path matches the ground truth by a string comparison. To use our model to produce a ranked list of predicted products, we compute the canonicalized product SMILES for each of the predictions found by beam search over electron paths, removing duplicates along the way. These product-level accuracies are reported in Table 2. For product prediction we compare with the state-of-the-art graph-based method [7]; we use their evaluation code and pre-trained model1 , re-evaluated on our extracted test set. We also compare against the Seq2Seq model proposed by [20]; unfortunately, however, we were unable to evaluate it on our extracted test set, and instead quote their reported performance on a different USPTO test set for reference. Overall, E LECTRO outperforms all other approaches on this metric, with 87% top-1 accuracy and 95.9% top-5 accuracy. Omitting the reagents in E LECTRO degrades top-1 accuracy slightly, but maintains a high top-3 and top-5 accuracy, suggesting that reagent information is necessary to provide context in disambiguating plausible reaction paths. If the ultimate desired goal is to predict the product molecule rather than the reaction mechanism, a benefit of our approach is the predicted electron paths can then serve as an explanation. In this manner, when showing predicted products, we can list, alongside the maximum likelihood path, any other candidate paths that result in the same product. 1 https://github.com/wengong-jin/nips17-rexgen 8
5.3 Qualitative Analysis Complex molecules often feature several potentially reactive functional groups, which compete for reaction partners. To predict the selectivity, that is which functional group will predominantly react in the presence of other groups, students of chemistry learn heuristics and trends, which have been established over the course of three centuries of experimental observation. To qualitatively study whether the model has learned such trends from data we queried the model with several typical text book examples from the chemical curriculum (see Figure 3 and the appendix). We found that the model predicts most examples correctly. In the few incorrect cases, interpreting the model’s output reveals that the model made chemically plausible predictions. 6 Discussion In this paper we proposed E LECTRO, a model for predicting electron paths. These electron paths, or reaction mechanisms provide a detailed description of how two reactants react together. Our model (i) produces output that is easy for chemists to interpret, and (ii) is able to exploit the sparsity involved in chemical reactions, in which often only small numbers of atoms in the reactants interact. As a byproduct of predicting reaction mechanisms we are also able to perform reaction product prediction, and achieve state-of-the-art performance on this task. Acknowledgements. We would like to thank Jennifer Wei, Dennis Sheberla, and David Duvenaud for their very helpful discussions. BP and MK are supported by The Alan Turing Institute under the EPSRC grant EP/N510129/1. JB acknowledges support from an EPSRC studentship. References [1] Maike Bergeler, Gregor N Simm, Jonny Proppe, and Markus Reiher. Heuristics-guided explo- ration of reaction mechanisms. Journal of chemical theory and computation, 11(12):5712–5722, 2015. [2] Connor W Coley, Regina Barzilay, Tommi S Jaakkola, William H Green, and Klavs F Jensen. Prediction of organic reaction outcomes using machine learning. ACS central science, 3(5): 434–443, 2017. [3] David Fooshee, Aaron Mood, Eugene Gutman, Mohammadamin Tavakoli, Gregor Urban, Frances Liu, Nancy Huynh, David Van Vranken, and Pierre Baldi. Deep learning for chemical reaction prediction. Molecular Systems Design & Engineering, 2018. [4] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In ICML, 2017. [5] Rainer Herges. Coarctate transition states: The discovery of a reaction principle. Journal of chemical information and computer sciences, 34(1):91–102, 1994. [6] Rainer Herges. Organizing principle of complex reactions and theory of coarctate transition states. Angewandte Chemie International Edition in English, 33(3):255–276, 1994. [7] Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction outcomes with Weisfeiler-Lehman network. In NIPS, pages 2604–2613, 2017. [8] Matthew A Kayala and Pierre Baldi. Reactionpredictor: Prediction of complex chemical reactions at the mechanistic level using machine learning. J. Chem. Inf. Mod., 52(10):2526– 2540, 2012. [9] Matthew A. Kayala and Pierre F. Baldi. A machine learning approach to predict chemical reactions. In NIPS, 2011. [10] Matthew A Kayala, Chloé-Agathe Azencott, Jonathan H Chen, and Pierre Baldi. Learning to predict chemical reactions. J. Chem. Inf. Mod., 51(9):2209–2222, 2011. [11] Yeonjoon Kim, Jin Woo Kim, Zeehyo Kim, and Woo Youn Kim. Efficient prediction of reaction paths through molecular graph and reaction network analysis. Chemical Science, 2018. [12] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015. [13] Yujia Li, Richard Zemel, Marc Brockschmidt, and Daniel Tarlow. Gated graph sequence neural networks. In ICLR, 2016. 9
[14] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep generative models of graphs. arXiv preprint arXiv:1803.03324, 2018. [15] Daniel Lowe. Chemical reactions from US patents (1976-Sep2016). 6 2017. doi: 10.6084/m9.figshare.5104873.v1. URL https://figshare.com/articles/Chemical_ reactions_from_US_patents_1976-Sep2016_/5104873. [16] Daniel Mark Lowe. Extraction of chemical structures and reactions from the literature. PhD thesis, University of Cambridge, 2012. [17] Surajit Nandi, Suzanne R McAnanama-Brereton, Mark P Waller, and Anakuthil Anoop. A tabu- search based strategy for modeling molecular aggregates and binary reactions. Computational and Theoretical Chemistry, 1111:69–81, 2017. [18] Dmitrij Rappoport, Cooper J Galvin, Dmitry Yu Zubarev, and Alán Aspuru-Guzik. Complex chemical reaction networks from heuristics-aided quantum chemistry. Journal of chemical theory and computation, 10(3):897–907, 2014. [19] RDKit, online. RDKit: Open-source cheminformatics. http://www.rdkit.org. [Online; accessed 01-February-2018]. [20] Philippe Schwaller, Theophile Gaudin, David Lanyi, Costas Bekas, and Teodoro Laino. " found in translation": Predicting outcome of complex organic chemistry reactions using neural sequence-to-sequence models. arXiv preprint arXiv:1711.04810, 2017. [21] Marwin HS Segler and Mark P Waller. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chem. Eur. J., 23(25):5966–5971, 2017. [22] Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep neural networks and symbolic ai. Nature, 555(7698):604, 2018. [23] Gregor N Simm and Markus Reiher. Context-driven exploration of complex chemical reaction networks. Journal of chemical theory and computation, 13(12):6108–6119, 2017. [24] Jennifer N Wei, David Duvenaud, and Alán Aspuru-Guzik. Neural networks for the prediction of organic chemistry reactions. ACS central science, 2(10):725–732, 2016. [25] David Weininger. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28 (1):31–36, 1988. [26] Paul M Zimmerman. Automated discovery of chemically reasonable elementary reaction steps. Journal of computational chemistry, 34(16):1385–1392, 2013. 10
A Example of symmetry affecting evaluation of electron paths In the main text we described the challenges of how to evaluate our model, as different electron paths can form the same products, for instance due to symmetry. Figure 4 is an example of this. 5 8 H3 C9 CH 3 4 1 6 Cl 10 3N 7 2 15 2 11 7 HN 20 22 24 26 N3 H 3 C 18 O CH 3 + O O + Cl 17 + 6 4 + Na 16 + H2 O 27 13 N12 11 NH 21 23 9 14 12 19 25 H3 C CH 3 13 8 5 14 10 N H15 (a) Reaction as defined by USPTO SMILES 1 1 Cl Cl 10 2 10 2 15 15 7 11 7 N3 11 N3 HN HN 6 6 4 4 NH 9 NH 9 14 14 12 H3 C CH 3 12 H3 C CH 3 8 5 8 5 13 13 (b) Possible action sequences that all result in same major product. Figure 4: This example shows how symmetry can affect the evaluation of electron paths. In this example, although one electron path is given in the USPTO dataset, the initial N that reacts could be either 15 or 12, with no difference in the final product. This is why judging purely based on electron path accuracy can sometimes be misleading. B More training details In this section we go through more specific model architecture details omitted from the main text. B.1 Model architectures In this section we provide further details of our model architectures. Section 3 of the main paper discusses our model. In particular we are interested in computing three conditional probability terms: (1) pθ (a0 | M0 , Mr ), the probability of the initial state a0 given the reactants and reagents; (2) the conditional probability pθ (at | Mt , at−1 , t) of next state at given the intermediate products Mt for t > 0; and (3) the probability pθ (st | Mt ) that the reaction terminates with final product Mt . Each of these is parametrized by NNs. We can split up the components of these NNs into a series of modules, all introduced in the main text: gA , rstop , rreagent , fadd , fremove and fstart . In this section we shall go through each of these in turn. The function gA computes node embeddings, HMt , which are used as input to all the other modules. For this we use Gated Graph Neural Networks (GGNN) [13, 4]. We use 4 propagation steps. The atom features we feed in are detailed in Table 3. These are calculated using RDKit. In total there are 101 features and we maintain this dimensionality in the hidden layers during the propagation steps of the GGNN. Three edge labels are defined: single bonds, double bonds and triple bonds. RDKit is used to Kekulize the reactant molecules. As mentioned in Section 3 of the main paper both rstop , rreagent consist of three linear functions. For both, the function fgate is used to decide how much each node should contribute towards the embedding and so projects down to a scalar value. Again for both, fup projects the node embedding up to a higher dimensional space, which we choose to be 202 dimensions. This is double the dimension of the node features, and similar to the approach taken by Li et al. [14, §B.1]. Finally, fdown differs between the two modules, as for rstop it projects down to one dimension (to later go through a sigmoid function and compute a stop probability), whereas for rreagent , fdown projects to a dimensionality of 100 to form the reagent embedding. 11
Table 3: Atom features Feature Description Atom type 72 possible elements in total, one hot Degree One hot (0, 1, 2, 3, 4, 5, 6, 7, 10) Explicit Valence One hot (0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14) Hybridization One hot (SP, SP2, SP3, Other) H count integer Electronegativity float Atomic number integer Part of an aromatic ring boolean The modules for fadd and fremove , that operate on each node to produce a action logit, are both NNs consisting of one hidden layer of 100 units. Concatenated onto the node features going into these networks are the node features belonging to the previous atom on the path. The final function, fstart , is represented by an NN with hidden layers of 100 units. When conditioning on reagents (ie for E LECTRO) the reagent embeddings calculated by freagent are concatenated onto the node embeddings and we use two hidden layers for our NN. When ignoring reagents (ie for E LECTRO -L ITE) we use one hidden layer for this network. In total E LECTRO has approximately 250,000 parameters and E LECTRO -L ITE has approximately 190,000. B.2 Training We train everything using Adam [12] and an initial learning rate of 0.0001, which we decay after 5 and 9 epochs by a factor of 0.1. We train for a total of 10 epochs. For training we use reaction minibatch sizes of one, although these can consist of multiple intermediate graphs. C Prediction using our model At predict time, as discussed in the main text, we use beam search to find high probable chemically- valid paths from our model. Further details are given in Algorithm 1. 12
Algorithm 1 Predicting electron paths at test time. Input: Molecule M0 (consisting of atoms A), reagents Mr , beam width K, time steps T max 1: P̂ = {(∅, log(1 − calc_prob_continue(M0 )))} .This set will store all completed paths. 2: Fremove = 1 .Remove flag 3: 4: B̂ = ∅. .This set will store all possible open paths. Cleared at start of each timestep. 5: for all v ∈ A do 6: ρ = (v) 7: ppath = log calc_prob_continue(M0 ) + log calc_prob_initial(v, M0 , Mr ) 8: B̂ = B̂ ∪ {(ρ, ppath )} 9: end for 10: B0 = pick_topK_actions(B̂) .We filter down to the top K most promising actions. 11: 12: for t in (1, . . . , T max ) do 13: B̂ = ∅ 14: for all (ρ, ppath ) ∈ Bt−1 do 15: Mρ = calc_intermediate_mol(M0 , ρ) 16: pc = calc_prob_continue(Mρ ) 17: P̂ = P̂ ∪ {(ρ, ppath + log(1 − pc ))} 18: for all v ∈ A do 19: ρ0 = ρ_ (v) .New proposed path is concatenation of old path with new node. 20: vt−1 = last element of ρ 21: B̂ = B̂ ∪ {(ρ0 , ppath + log pc + log calc_prob_action(v, Mρ , vt−1 , Fremove ))} 22: end for 23: end for 24: Bt = pick_topK_actions(B̂) 25: Fremove = Fremove + 1 mod 2. .If on add step change to remove and vice versa. 26: end for 27: 28:P̂ = sort_on_prob(P̂) Output: Valid completed paths and their respective probabilities, sorted by the latter, P̂ 13
D Further example of actions proposed by our model Figure 5 shows the model’s predictions for the mechanism of how two molecules will react. O O N CH3 O CH3 Br H3 C Mg Figure 5: Predicted mechanism of our model on reactant molecules. Green arrow shows preferred mechanism, whereas pink shows the model’s second preferred choice. Here, the first-choice prediction is incorrect, but chemically reasonable, as the Weinreb amide is typically used together in reactions with Magnesium species. The second-choice prediction is correct. a) O SH S O + O O b) NO2 NO2 Cl + O O c) O Cl O O + + O O O Figure 6: Additional typical selectivity examples: Here, the expected product is shown on the right. The blue arrows indicate the top ranked paths from our model, the red arrows indicate other possibly competing but incorrect steps, which the model does not predict to be of high probability. In all cases, our model predicted the correct products. In b) and c), our model correctly recovers the regioselectivity expected in electrophilic aromatic substitutions. 14
You can also read