Predicting Electron Paths

Page created by Megan Singh

Science

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Predicting Electron Paths

                                                             John Bradshaw                     Matt J. Kusner                  Brooks Paige
arXiv:1805.10970v1 [physics.chem-ph] 23 May 2018

                                                         University of Cambridge             Alan Turing Institute          Alan Turing Institute
                                                       Max Planck Institute, Tübingen       University of Warwick          University of Cambridge
                                                           jab255@cam.ac.uk                mkusner@turing.ac.uk            bpaige@turing.ac.uk

                                                                  Marwin H. S. Segler                       José Miguel Hernández-Lobato
                                                                     BenevolentAI                               University of Cambridge
                                                             marwin.segler@benevolent.ai                         Alan Turing Institute
                                                                                                                 jmh233@cam.ac.uk

                                                                                                Abstract

                                                            Chemical reactions can be described as the stepwise redistribution of electrons in
                                                            molecules. As such, reactions are often depicted using “arrow-pushing” diagrams
                                                            which show this movement as a sequence of arrows. We propose an electron path
                                                            prediction model (E LECTRO) to learn these sequences directly from raw reaction
                                                            data. Instead of predicting product molecules directly from reactant molecules
                                                            in one shot, learning a model of electron movement has the benefits of (a) being
                                                            easy for chemists to interpret, (b) incorporating constraints of chemistry, such as
                                                            balanced atom counts before and after the reaction, and (c) naturally encoding
                                                            the sparsity of chemical reactions, which usually involve changes in only a small
                                                            number of atoms in the reactants. We design a method to extract approximate
                                                            reaction paths from any dataset of atom-mapped reaction SMILES strings. Our
                                                            model achieves state-of-the-art results on a subset of the UPSTO reaction dataset.
                                                            Furthermore, we show that our model recovers a basic knowledge of chemistry
                                                            without being explicitly trained to do so.

                                                   1    Introduction

                                                   The ability to reliably predict the products of chemical reactions is of tremendous importance for
                                                   areas as diverse as health care, renewable energy, and construction, providing molecules which serve
                                                   as medicines, energy capturing devices, and nanomaterials.
                                                   Theoretically, all chemical reactions can be described by the stepwise rearrangement of electrons
                                                   in molecules [6]. This sequence of bond-making and breaking is known as the reaction mechanism.
                                                   Understanding the reaction mechanism is crucial because it not only determines the products (formed
                                                   at the last step of the mechanism), but it also provides insight into why the products are formed on
                                                   an atomistic level. Mechanisms can be treated at different levels of abstraction. On the lowest level,
                                                   quantum-mechanical simulations of the electronic structure can be performed, which is prohibitively
                                                   computationally expensive for most systems of interest. On the other end, chemical reactions can be
                                                   treated as rules that “rewrite” reactant molecules to products, which abstracts away the individual
                                                   electron redistribution steps into a single, global transformation step. To combine the advantages
                                                   of both approaches, chemists use a tremendously powerful intermediate conceptual model, which
                                                   simplifies the stepwise electron shifts using sequences of arrows which indicate the path of electrons
                                                   throughout molecular graphs [6].

                                                   Preprint. Work in progress.

product prediction mechanism prediction
target 2
target target
3
1
+ +
reactant 1 reactant 2 reagent product 1 product 2 reactant 1 reactant 2 reagent product 1 product 2

Figure 1: (Left) The reaction product prediction problem: Given the reactants and reagents, predict
the structure of the product. (Right) The reaction mechanism prediction problem: Given the reactants
and reagents, predict how the reaction occurred to form the products.

Recently, there have been a number of machine learning models proposed for directly predicting the
products of chemical reactions [2, 7, 20, 21, 22, 24], largely using graph-based or machine translation
models. The task of reaction product prediction is shown on the left-hand side of Figure 1.
In this paper we propose a machine learning model to predict the reaction mechanism, as shown on
the right-hand side of Figure 1, for a particularly important subset of organic reactions. We argue that
not only is our model more interpretable than product prediction models but it is easier to encode
in it the constraints imposed by chemistry. Proposed approaches to predicting reaction mechanisms
have been based on combining hand-coded heuristics and quantum mechanics [1, 11, 17, 18, 23, 26],
rather than machine learning. We call our model E LECTRO, as it directly predicts the path of electrons
through molecules (i.e., the reaction mechanism). To train the model we devise a general technique
to obtain approximate reaction mechanisms purely from data about the reactants and products. This
allows one to train our a model on large, unannotated reaction datasets such as USPTO [16]. We
demonstrate that not only does our model achieve state-of-the-art results, surprisingly it also learns
chemical properties it was not explicitly trained on.

2 Background

In Figure 1 we show the two challenges we tackle in this paper. On the left we show product
prediction, where the goal is to predict the reaction products, given a set of reactants and reagents.
However, for this task we do not care how the reactants react. This job of finding the how is the target
of reaction mechanism prediction. Before describing how our model predicts mechanisms, we use
this section to detail the types of reactions we consider in this paper, and their properties.

Molecules and Chemical reactions. Organic molecules (those involving predominantly carbon)
can be modeled as a graph structure, where each node is an atom and each edge is a covalent bond.
These covalent bonds represent the fact that one or more pairs of electrons are shared between the
atoms that the bond connects.
Just as electrons describe the current structure of molecules, they also describe how molecules react
with other molecules to produce new ones. All chemical reactions involve the stepwise movement
of electrons along the atoms in a set of reactant molecules. This movement causes the formation
and breaking of chemical bonds that changes the reactants into a new set of product molecules [5].
Reaction mechanisms can be classified by the topology of their "electron-pushing arrows". Here, the
class of reactions with linear electron flow topology is by far the most important, followed by those
with cyclic topology [5].
In this work, we will only consider reactions with linear topology that involve the movement of pairs
of electrons.

Reactions as single electron paths. If reactions fall into this class, then a chemical reaction can
be modeled as pairs of electrons moving in a single path through the reactant atoms. Further, this
electron path will alternately remove existing bonds in molecules, and form new ones. We show
this alternating structure in the right hand part of Figure 1. The reaction formally starts by taking
the pair of electrons between the Li and C atoms and moving them to the C atom (step 1); this is a
remove bond step. Next comes an add step where electrons are moved from the C atom to form a
bond between the two reactant molecules (step 2). Then a pair of electrons are removed between the

C and O atoms and moved to the O atom, giving rise to the products (step 3). Predicting the final
product is thus a byproduct of predicting this series of electron steps.
However, there are a number of benefits of predicting electron paths over predicting the outcomes of
reactions directly (as is done in previous work [7, 20]):

• Easy to interpret: If the model makes a mistake, it is easy to see where, and possibly why,
it goes wrong by comparing the steps of the path with the correct steps.
• Sparse: Reactions often only affect between 3 and 7 atoms out of anywhere from 10-50
reactant atoms. Modeling the reaction as a path allows us to exploit this sparsity.
• Chemically consistent: Learning a path allows us to easily incorporate chemical constraints,
such as the alternating removal and addition of bonds, among others.
• Generalizable: As reaction paths follow similar trends in different molecules, our model
naturally generalizes to unseen molecules.

The only other work we are aware of to use machine learning to predict reaction mechanisms are
[3, 8, 9, 10]. All of these model a chemical reaction as an interaction between atoms which function as
electron donors and those which function as electron acceptors. They predict the reaction mechanism
via two independent models: one that identifies these likely electron sources and sinks, and another
that ranks all combinations of them. However, this combining and then ranking of electron sources
and sinks can be a slow process, as many plausible reactions need to be considered (the number of
subgraphs of n reacting atoms, where single bonds are either added or removed is 2n(n−1)/2 ). These
models also have only so far been successfully trained on small hand-curated datasets.

3 Model

In this section we define a probabilistic model that describes the movement of electrons that define
linear topology reactions. We represent a set of molecules as a set of graphs M, with atoms A
as vertices and bonds B as edges; each connected component of the graph defines an individual
molecule. We can associate an ordering over all the atoms in all the molecules in the set using an
atom map number: an integer label assigned to each non-hydrogen atom in both the reactants and the
products which both permits easy matching between atoms before and after the reaction, and gives us
a consistent way to index particular atoms. Each atom v ∈ A includes a set of features, such as its
atom type (e.g. carbon, oxygen, . . . ); the full list of input atom features can be found in Table 3 of
the appendix. Molecules input into the model are first put in a Kekulé form, a process which makes
explicit the location of single and double bonds in aromatic structures; each bond b ∈ B is either a
single, double, or triple bond.
Given an initial set of reactant molecules M0 and a set of reagent molecules Mr , our model
defines a conditional distribution over a sequence of atoms (which we also refer to as actions)
P0:T = (a0 , a1 , . . . , aT ), which fully characterizes the electron path. This electron path in turn
deterministically defines both a final product MT +1 , denoting the outcome of the reaction, as well
as a sequence of intermediate products Mt , for t = 1, . . . , T , which correspond to the state of the
graph after the first t steps in the subsequence P0:t = (a0 , . . . , at ) are applied to the initial M0 . We
also define a stopping sequence St = (s0 , . . . , sT +1 ) which indicates if the reaction should stop (i.e,
st = 1 if the reaction should stop and is 0 otherwise).
We propose to learn a distribution pθ (P0:T | M0 , Mr ) over electron movements. We first detail the
generative process that specifies pθ , before describing how to train the model’s parameters, θ.

3.1 Generative process

First note that since our reactions are a single path of electrons through the reactants then, at any
point, the next step in the path depends only on (i) the intermediate molecule formed by the action
path up to that point, (ii) the previous action taken (indicating where the free pair of electrons are)
and (iii) the point of time through the path, indicating whether we are on an add or remove bond step.
We make the simplifying assumption that the stop probability and the actions after the initial action

pstart                                premove pstop
                                                                                                                                   Yes
                                                                                                     previously
                                                          selected
                                                                                                      selected                     No
 1    reactant 1           reactant 2   2     probability of selection               3   masking bond to remove                4
        electron pair
                                   padd pstop              bond to
                                                           remove
                                                                               premove pstop
                                            Yes                                                Yes

                                            No                                                 No
 5           bond to add                6             7                                    8         9            product 1   product 2

                                                  0                                            1
                                                                     probability

Figure 2: This figure shows the sequence of actions in transforming the reactants in box 1 to the
products in box 9. The sequence of actions will result in a sequence of pairs of atoms, between
which bonds will alternately be removed and created, creating a series of intermediate products. At
each step the model sees the current intermediate product graph (shown in the boxes) as well as the
previous action, if applicable, shown by the grey circle. It uses this to decide on the next action.
We represent the characteristic probabilities the model may have over these next actions as colored
circles over each atom. Some actions are disallowed on certain steps, for instance you cannot remove
a bond that does not exist; these blocked actions are shown as red crosses.

a0 do not depend on the reagents. This leads to a parameterized model with dependency structure:
         pθ (P0:T | M0 , Mr ) = pθ (s00 | M0 )pθ (a0 | M0 , Mr )                                                                     (1)
                                  "T                                         #
                                    Y
                                              0
                                ×        pθ (st | Mt )pθ (at | Mt , at−1 , t) pθ (sT +1 | MT +1 ),
                                              t=1

where we have defined pθ (s0t | Mt ) ≡ 1 − pθ (st | Mt ) to be the probability of continuing a reaction
given the current molecule set Mt . The other terms include pθ (a0 | M0 , Mr ), the probability of the
initial state a0 given the reactants and reagents; the conditional probability pθ (at | Mt , at−1 , t) of
next state at given the intermediate products Mt for t > 0; and the probability pθ (st | Mt ) that the
reaction terminates with final product Mt .
It is possible to stop prior to selecting a first atom a0 , indicating that no reaction would take place.
However, we restrict our model to not stop at step t = 1, as it is necessary to pick up a complete electron
pair. Given any particular selected atom at which extends the reaction path, we can deterministically
update the previous molecular graph Mt to produce the next set of (intermediate) products Mt+1 .
Given our reaction assumptions, then, as stated earlier, there are two types of electron movements
that alternate: (i) movement that removes an existing bond, and (ii) movement that adds a new bond.
We define atoms with free electrons as having a self-bond. Thus, all reactions start by first selecting
an atom, removing a bond (between two different atoms, or a self-bond), and then alternately adding
and removing bonds; we can determine whether a particular step is an add step or a remove step by
inspecting t. Note that M1 = M0 , as the initial action of selecting a0 does not form or remove any
bonds. Figure 2 presents a simple example reaction which demonstrates all the critical features of the
model; the subfigures show the sequence of intermediate products and the distributions over actions.
Each of the conditional probabilities in Eq. (1) are parameterized by neural networks. At each step
of the electron path, the network takes the current intermediate graphs, the previous action, and the
reagents if relevant, and computes a probability distribution over next possible actions (i.e., selecting
a particular atom, or stopping). The structure of these networks is described in the following section.

3.2    Computing atom and molecule features

We are left now with defining the functional form of our conditional distributions for continu-
ing pθ (s0t | Mt ), picking the initial action pθ (a0 | M0 , Mr ), and picking subsequent actions
pθ (at | Mt , at−1 , t). However, before describing these modules we need to describe how we com-

                                                                      4

pute node embeddings and graph embeddings, as these are essential to each. Full architectural details
(e.g. number of layers and hidden units) and training settings are deferred to the appendix.
Node embeddings are representations of all the atoms in all the molecules present in Mt . We denote
them by the matrix HMt ⊆ R|A|×d . Each row contains a d-dimensional embedding of an atom.
A natural ordering for the rows are the atom-map numbers assigned to each atom. We define the
function gA to map a set of graphs representing each molecule to these node embeddings. In general
gA could be any graph model that uses the graph structure of Mt to get graph-isomorphic node
features, usually via message-passing techniques [4]; we choose to use gated graph neural network
(GGNN) message functions [13].
It is also useful to be able to calculate graph embeddings hg , which are vectors that represent groups
of nodes belonging to one or more graphs; i.e. an entire molecule or set of molecules. We define the
function that maps node features belonging to each atom to their graph embedding by r. These are
similar to the readout functions used for regressing on graphs detailed in [4, Eq. 3] and the graph
embeddings described in Li et al. [14, §B.1]. Specifically, r consists of three functions, fgate , fup and
fdown , which could be any multi-layer perceptron (MLP) but in practice we find that linear functions
suffice. They are used to form the graph embedding, hg , as
                                                                                    !
                                                X
                    hg = r(HMt ) = fdown           σ (fgate (HMt ,v )) fup (HMt ,v ) .                  (2)
                                               v∈A0

Where σ is a sigmoid function. We can break this equation down into two stages. In stage (i), similar
to Li et al. [14, §B.1], we form an embedding of one or more molecules (with vertices A0 and with
A0 ⊂ A) by performing a gated sum over the node features. In this manner the function fgate is used
to decide how much that node should contribute towards the embedding, and fup projects the node
embedding up to a higher dimensional space; following Li et al. [14, §B.1], we choose this to be
double the dimension of the node features. Having formed this embedding of the graphs, we project
this down to a lower dimensional space in stage (ii), which is done by the function fdown .

3.3   Computing probabilities over actions

Having described how we compute node and graph embeddings, we are now ready to define each
of our parameterized distributions over actions. The simplest of these is pθ (s0t | Mt ), which is
the probability of continuing given the set of intermediate products at time t. This probability is
computed from a graph embedding via the function rstop , which projects down to a single dimension.
We then map this to the [0, 1] interval via the sigmoid function σ which yields the overall expression
                                    pθ (s0t | Mt ) = σ(rstop (HMt )).                                   (3)
Each of the three parameterized conditional probability distributions for the start, add and remove
steps have similar forms, each defining a probability vector over actions. The transition distribution
pθ (at | Mt , at−1 , t) which selects the next atom in the sequence P can be split into two distributions
depending on the parity of t: the remove bond step distribution premove θ    (at | Mt , at−1 ) is used when
t is odd, and the add bond step distribution paddθ (at | M  t , at−1 ) is used when t is even.
These three modules each have the same overall functional form
                                                    s = f (HMt , c),                                    (4)
                                     pθ (at | · · · ) ∝ β softmax(s)                                    (5)
where f is one of the networks fstart , fadd , or fremove ; c is a context vector, and β is a binary mask.
Each of the three actions has a different context and mask. The add step padd      θ (at | Mt , at−1 ) and
remove step premove
               θ     (at | Mt , at−1 ), have as context the node embedding of the atom selected at the
previous step, cat−1 = HMt ,at−1 . For the initial step, this context vector creagent is an embedding
of all the reagents present, computed by a graph embedding function rreagent . When computing the
output probabilities, we use the binary vector β to mask out specific actions known to be impossible.
The value of this differs for the start, add and remove steps; for the start step any action can be picked,
so β is the all-ones vector. For the remove step, βremove masks out (i.e. is set to zero for) any bonds
that do not currently exist and thus cannot be removed (noting though that self-bonds are permitted
in the first remove step). For the add step, βadd only masks out the previous action, preventing the
model from stalling in the same state for multiple time-steps.

                                                      5

Training We can learn the parameters θ of all the parameterized functions, by maximizing the
log likelihood of a full path log pθ (P0:T | M0 , Mr ). This is evaluated by using a known electron
path a?t and intermediate products M?t extracted from training data, rather than on simulated values.
This allows us to train on all stages of the reaction at once, given electron path data. We train our
models using Adam [12] and an initial learning rate of 10−4 , with minibatches of size one, where
minibatches often consist of multiple intermediate graphs.

Prediction Once trained, we can use our model to sample chemically-valid paths given an input
set of reactants M0 and reagents Mr , simply by simulating from the conditional distributions until
sampling a stop value st . We instead would like to find a ranked list of the top-K predicted paths, and
do so using a modified beam search, in which we roll out a beam of width K until a maximum path
length T max , while recording all paths which have terminated. This search procedure is described in
detail in Algorithm 1 in the Appendix.

4 Reaction Mechanism Identification
Our dataset for evaluating our model is a collection of chemical reactions extracted from the US
patent database [15]. We take as our starting point the 479,035 reactions, along with the training,
validation, and testing splits which were used by Jin et al. [7], referred to as the USPTO dataset.
This data consists of, per reaction, a group of bond changes and reaction SMILES strings [25]. The
bond changes indicate pairs of atoms which are connected differently in the reactants and products.
The SMILES strings encode the molecules present in a text based format. Before we can apply our
method, we perform two data preprocessing tasks described in the subsections below (using the
open-source chemo-informatics software RDKit [19]). These steps automatically extract a subset of
data appropriate for training our model of electron movement during a reaction.

4.1 Reactant and reagent separation

Typically, reaction SMILES strings are split into three parts — reactants, reagents, and products.
The reactant molecules are those which are consumed during the course of the chemical reaction to
form the product, while the reagents are any additional molecules which provide context under which
the reaction occurs (for example, catalysts), but do not explicitly take part in the reaction itself; we
see this in the example in Figure 1.
Unfortunately, the USPTO dataset as extracted does not differentiate between reagents and reactants.
We elect to preprocess the entire USPTO dataset by separating out the reagents from the reactants
using the process outlined in Schwaller et al. [20], where we classify as a reagent any molecule for
which either (i) none of its constituent atoms appear in the product, or (ii) the molecule appears in the
product SMILES completely unchanged from the pre-reaction SMILES. This allows us to properly
model molecules which are included in the dataset but do not materially contribute to the reaction.

4.2 Identifying reactions with linear electron topology

To train our model, it is necessary to extract a ground-truth representation of the electron paths from
the SMILES strings and bond changes. Furthermore, not every reaction in the USPTO dataset has a
linear electron topology; such reactions (for example, multi-step reactions and cycloadditions) will
not have a single unique path through the atoms which describes the movement of the electrons.
The first step is to look at the bond changes present in a reaction. Each atom on the ends of the path
will be involved in exactly one bond change; the atoms in the middle will be involved in two. We can
then line up bond change pairs so that neighboring pairs have one atom in common, with this ordering
forming a path. For instance, given the pairs "11-13, 14-10, 10-13" we form the unordered path
"14-10, 10-13, 13-11". If we are unable to form such a path, for instance due to two paths being
present as a result of multiple reaction stages, then we discard the reaction.
For training our model we want to find the ordering of our path, so that we know in which direction
the electrons flow. To do this we examine the changes of the properties of the atoms at the two ends
of our path. In particular, we look at changes in charge and attached implicit hydrogen counts. The
gain of negative charge (or analogously the gain of hydrogen as H+ ions without changing charge)

Table 1: Results when using E LECTRO Accuracies (%)
for reaction mechanism prediction. Here
we count a prediction as correct if the Model Name Top-1 Top-2 Top-3 Top-5
atom mapped action sequences predicted E LECTRO -L ITE 70.3 82.8 87.7 92.2
by our model match exactly those ex- E LECTRO 77.8 89.2 92.4 94.7
tracted from the USPTO dataset.

indicates that electrons have arrived at this atom, implying that this is the end of the path; vice-versa
for the start of the path. However, sometimes the difference is not available in the USPTO data, as
unfortunately only major products are recorded, and so details of what happens to some of the reactant
molecules’ atoms may be missing. In these cases we fall back to using an element’s electronegativity
to estimate the direction of our path, with more electronegative atoms attracting electrons towards
them and so being at the end of the path.
The next step of filtering checks that the path alternates between add steps (+1) and remove steps (-1).
This is done by analyzing and comparing the bond changes on the path in the reactant and product
molecules. Reactions that involve greater than one change (for instance going from no bond between
two atoms in the reactants to a double bond between the two in the products) can indicate multi-step
reactions with identical paths, and so are discarded. Finally, as a last sanity check, we use RDKit
to produce all the intermediate and final products induced by our path acting on the reactants, to
confirm that the final product that is produced by our extracted electron path is consistent with the
major product SMILES in the USPTO dataset.
The end result of this is extracted reaction paths for those entries in the USPTO dataset which
correspond to reactions of linear topology. This comprises 73% of the dataset, containing 349,898
total reactions, of which 29,360 form the held-out test set.

5 Experiments and Evaluation

In our experiments we consider two variants of our model: the first model E LECTRO is exactly as
defined in Section 3, including all reagent information; a second version we call E LECTRO -L ITE
ignores the reagents when selecting the initial action (i.e. no context vector creagent is input into
fstart ), allowing us to gauge the importance of reagents in determining what reaction occurs. We now
evaluate our model on the reaction mechanism prediction and reaction product prediction problems.

5.1 Reaction Mechanism Prediction

For mechanism prediction we are interested in ensuring we obtain the exact sequence of electron
actions correctly. For instance, when forming a bond between two pairs of atoms we want to know
which one of the atoms donated the electron pair needed to form the bond, even if the end result is the
same. The representation of the reaction mechanism produced by our model is a sequence of atoms,
detailing the path taken by the electrons in a series of alternating steps in which bonds are broken and
formed; using the atom mapping from the USPTO dataset this takes the form of a series of integers.
The most straightforward approach then to evaluate our accuracy at predicting reaction mechanisms
is to check whether the sequence of integers extracted from the raw data as described earlier is an
exact match with the sequence of integers output by E LECTRO; the top-1, top-2, top-3, and top-5
accuracies evaluated in this manner are reported in Table 1.

5.2 Reaction Product Prediction

Reaction mechanism prediction is useful for ensuring that we formed the correct product in the
correct way. However, this can underestimate the model’s actual predictive accuracy: although a
single atom mapping is provided as part of the USPTO dataset, in general atom mappings are not
unique (e.g., if a molecule contains symmetries).
Here this would manifest as multiple different sequences of integers which correspond to chemically-
identical electron paths. The first figure in the appendix shows an example of a reaction with
symmetries, where different electron paths produce the same product.

Table 2: Results when using E LECTRO for Accuracies (%)
reaction product prediction, following the
product matching procedure in Section 5.2. Model Name Top-1 Top-2 Top-3 Top-5
The WLDN accuracy is computed using E LECTRO -L ITE 78.2 87.7 91.5 94.4
their evaluation code and pretrained model E LECTRO 87.0 92.6 94.5 95.9
outputs on our test set. We were unable
to evaluate the Seq2Seq model on our test WLDN [7] 84.0 89.2 91.1 92.3
set, instead quoting their reported numbers Seq2Seq [20] 80.3? 84.7? 86.2? 87.5?
with an asterisk.
1st Choice
Br 2nd Choice
Cl Br
3rd Choice
OH

Cl B

OH
HO CH3
I

Figure 3: (Left) Nucleophilic substitutions SN 2-reactions, (right) Suzuki-coupling (please note that
in the “real” mechanism of the Suzuki coupling, the reaction would proceed via oxidative insertion,
transmetallation and reductive elimination at a Palladium catalyst. As these mechanistic details
are not contained in training data, we treat Palladium implicitly as a reagent). In both cases, our
model has correctly picked up the trend that halides lower in the period table usually react preferably
(I > Br > Cl).

Recent machine learning approaches to reaction product prediction [7, 20] have evaluated whether
the major product reported in the test dataset matches predicted candidate products generated by their
system, independent of mechanism. In our case, the top-5 accuracy for a particular reaction may
include multiple different electron paths that ultimately yield the same product molecule.
Identifying whether two product molecules are chemically the same is equivalent to solving a graph
isomorphism over the atoms and bond types, comparing the output of our system to the product
molecule. To perform this comparison, we take the sequence of edits on the reactants graph defined
by an electron path, apply these edits to define a product graph, and then define a deterministic
mapping from the edited graph to a canonical string representation. This is done by mapping the
molecule to Kekulé form, then applying the sequence of edits to the reactants graph, setting explicit
charges or hydrogen counts on the first and last atom in the electron path in order to satisfy valence
constraints. We then strip all atom map numbers from the graph. If this graph corresponds to a valid
product molecule, we can use RDKit to express the molecule in a canonical SMILES string format
(predicted electron paths which yield chemically infeasible products are considered failures). We can
then evaluate whether a predicted electron path matches the ground truth by a string comparison.
To use our model to produce a ranked list of predicted products, we compute the canonicalized
product SMILES for each of the predictions found by beam search over electron paths, removing
duplicates along the way. These product-level accuracies are reported in Table 2. For product
prediction we compare with the state-of-the-art graph-based method [7]; we use their evaluation code
and pre-trained model1 , re-evaluated on our extracted test set. We also compare against the Seq2Seq
model proposed by [20]; unfortunately, however, we were unable to evaluate it on our extracted test
set, and instead quote their reported performance on a different USPTO test set for reference. Overall,
E LECTRO outperforms all other approaches on this metric, with 87% top-1 accuracy and 95.9%
top-5 accuracy. Omitting the reagents in E LECTRO degrades top-1 accuracy slightly, but maintains a
high top-3 and top-5 accuracy, suggesting that reagent information is necessary to provide context in
disambiguating plausible reaction paths.
If the ultimate desired goal is to predict the product molecule rather than the reaction mechanism,
a benefit of our approach is the predicted electron paths can then serve as an explanation. In this
manner, when showing predicted products, we can list, alongside the maximum likelihood path, any
other candidate paths that result in the same product.

1
https://github.com/wengong-jin/nips17-rexgen

5.3 Qualitative Analysis

Complex molecules often feature several potentially reactive functional groups, which compete for
reaction partners. To predict the selectivity, that is which functional group will predominantly react
in the presence of other groups, students of chemistry learn heuristics and trends, which have been
established over the course of three centuries of experimental observation. To qualitatively study
whether the model has learned such trends from data we queried the model with several typical text
book examples from the chemical curriculum (see Figure 3 and the appendix). We found that the
model predicts most examples correctly. In the few incorrect cases, interpreting the model’s output
reveals that the model made chemically plausible predictions.

6 Discussion
In this paper we proposed E LECTRO, a model for predicting electron paths. These electron paths, or
reaction mechanisms provide a detailed description of how two reactants react together. Our model (i)
produces output that is easy for chemists to interpret, and (ii) is able to exploit the sparsity involved
in chemical reactions, in which often only small numbers of atoms in the reactants interact. As a
byproduct of predicting reaction mechanisms we are also able to perform reaction product prediction,
and achieve state-of-the-art performance on this task.

Acknowledgements. We would like to thank Jennifer Wei, Dennis Sheberla, and David Duvenaud
for their very helpful discussions. BP and MK are supported by The Alan Turing Institute under the
EPSRC grant EP/N510129/1. JB acknowledges support from an EPSRC studentship.

References
[1] Maike Bergeler, Gregor N Simm, Jonny Proppe, and Markus Reiher. Heuristics-guided explo-
ration of reaction mechanisms. Journal of chemical theory and computation, 11(12):5712–5722,
2015.
[2] Connor W Coley, Regina Barzilay, Tommi S Jaakkola, William H Green, and Klavs F Jensen.
Prediction of organic reaction outcomes using machine learning. ACS central science, 3(5):
434–443, 2017.
[3] David Fooshee, Aaron Mood, Eugene Gutman, Mohammadamin Tavakoli, Gregor Urban,
Frances Liu, Nancy Huynh, David Van Vranken, and Pierre Baldi. Deep learning for chemical
reaction prediction. Molecular Systems Design & Engineering, 2018.
[4] Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural
message passing for quantum chemistry. In ICML, 2017.
[5] Rainer Herges. Coarctate transition states: The discovery of a reaction principle. Journal of
chemical information and computer sciences, 34(1):91–102, 1994.
[6] Rainer Herges. Organizing principle of complex reactions and theory of coarctate transition
states. Angewandte Chemie International Edition in English, 33(3):255–276, 1994.
[7] Wengong Jin, Connor Coley, Regina Barzilay, and Tommi Jaakkola. Predicting organic reaction
outcomes with Weisfeiler-Lehman network. In NIPS, pages 2604–2613, 2017.
[8] Matthew A Kayala and Pierre Baldi. Reactionpredictor: Prediction of complex chemical
reactions at the mechanistic level using machine learning. J. Chem. Inf. Mod., 52(10):2526–
2540, 2012.
[9] Matthew A. Kayala and Pierre F. Baldi. A machine learning approach to predict chemical
reactions. In NIPS, 2011.
[10] Matthew A Kayala, Chloé-Agathe Azencott, Jonathan H Chen, and Pierre Baldi. Learning to
predict chemical reactions. J. Chem. Inf. Mod., 51(9):2209–2222, 2011.
[11] Yeonjoon Kim, Jin Woo Kim, Zeehyo Kim, and Woo Youn Kim. Efficient prediction of reaction
paths through molecular graph and reaction network analysis. Chemical Science, 2018.
[12] Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
[13] Yujia Li, Richard Zemel, Marc Brockschmidt, and Daniel Tarlow. Gated graph sequence neural
networks. In ICLR, 2016.

[14] Yujia Li, Oriol Vinyals, Chris Dyer, Razvan Pascanu, and Peter Battaglia. Learning deep
     generative models of graphs. arXiv preprint arXiv:1803.03324, 2018.
[15] Daniel Lowe. Chemical reactions from US patents (1976-Sep2016). 6 2017. doi:
     10.6084/m9.figshare.5104873.v1. URL https://figshare.com/articles/Chemical_
     reactions_from_US_patents_1976-Sep2016_/5104873.
[16] Daniel Mark Lowe. Extraction of chemical structures and reactions from the literature. PhD
     thesis, University of Cambridge, 2012.
[17] Surajit Nandi, Suzanne R McAnanama-Brereton, Mark P Waller, and Anakuthil Anoop. A tabu-
     search based strategy for modeling molecular aggregates and binary reactions. Computational
     and Theoretical Chemistry, 1111:69–81, 2017.
[18] Dmitrij Rappoport, Cooper J Galvin, Dmitry Yu Zubarev, and Alán Aspuru-Guzik. Complex
     chemical reaction networks from heuristics-aided quantum chemistry. Journal of chemical
     theory and computation, 10(3):897–907, 2014.
[19] RDKit, online. RDKit: Open-source cheminformatics. http://www.rdkit.org. [Online;
     accessed 01-February-2018].
[20] Philippe Schwaller, Theophile Gaudin, David Lanyi, Costas Bekas, and Teodoro Laino. "
     found in translation": Predicting outcome of complex organic chemistry reactions using neural
     sequence-to-sequence models. arXiv preprint arXiv:1711.04810, 2017.
[21] Marwin HS Segler and Mark P Waller. Neural-symbolic machine learning for retrosynthesis
     and reaction prediction. Chem. Eur. J., 23(25):5966–5971, 2017.
[22] Marwin HS Segler, Mike Preuss, and Mark P Waller. Planning chemical syntheses with deep
     neural networks and symbolic ai. Nature, 555(7698):604, 2018.
[23] Gregor N Simm and Markus Reiher. Context-driven exploration of complex chemical reaction
     networks. Journal of chemical theory and computation, 13(12):6108–6119, 2017.
[24] Jennifer N Wei, David Duvenaud, and Alán Aspuru-Guzik. Neural networks for the prediction
     of organic chemistry reactions. ACS central science, 2(10):725–732, 2016.
[25] David Weininger. SMILES, a chemical language and information system. 1. Introduction to
     methodology and encoding rules. Journal of chemical information and computer sciences, 28
     (1):31–36, 1988.
[26] Paul M Zimmerman. Automated discovery of chemically reasonable elementary reaction steps.
     Journal of computational chemistry, 34(16):1385–1392, 2013.

                                               10

A         Example of symmetry affecting evaluation of electron paths
In the main text we described the challenges of how to evaluate our model, as different electron paths
can form the same products, for instance due to symmetry. Figure 4 is an example of this.
                                                                                                                                                                                                                            5           8
                                                                                                                                                                                                          H3 C9                        CH 3
                                                                                                                                                                                                                       4
                                                                                                                                   1                                                                                              6
                                                                                                                                   Cl

          10                                                                                                                                                                                                      3N              7
                                                                                                                                       2
     15                                                                                                                                                                                                                      2
                    11                                                                                                         7
    HN                                        20        22             24        26                                                        N3
                              H 3 C 18                  O                        CH 3
                         +               O                                  O               +        Cl
                                                                                                      17
                                                                                                                +              6            4
                                                                                                                                                        +    Na
                                                                                                                                                              16
                                                                                                                                                                       +   H2 O
                                                                                                                                                                              27                                  13
                                                                                                                                                                                                                            N12
                                                                                                                                                                                                                                  11
                NH                                 21         23                                                                                  9
    14         12                        19                                 25
                                                                                                                    H3 C                         CH 3
          13                                                                                                               8       5
                                                                                                                                                                                                                  14              10
                                                                                                                                                                                                                            N
                                                                                                                                                                                                                            H15

                                                                            (a) Reaction as defined by USPTO SMILES

                                                                                    1                                                                                                             1
                                                                                   Cl                                                                                                             Cl

                         10                                                             2                                                               10                                            2
                15                                                                                                                               15                                           7
                                    11
                                                                        7                       N3                                                                11                                        N3
               HN                                                                                                                               HN
                                                                        6                                                                                                                     6
                                                                                                 4                                                                                                            4
                                  NH                                                                        9                                                 NH                                                            9
               14                                                                                                                          14
                                12                          H3 C                                           CH 3                                              12                    H3 C                                    CH 3
                                                                   8                5                                                                                                     8       5
                         13                                                                                                                             13

                                              (b) Possible action sequences that all result in same major product.
Figure 4: This example shows how symmetry can affect the evaluation of electron paths. In this
example, although one electron path is given in the USPTO dataset, the initial N that reacts could be
either 15 or 12, with no difference in the final product. This is why judging purely based on electron
path accuracy can sometimes be misleading.

B         More training details
In this section we go through more specific model architecture details omitted from the main text.

B.1        Model architectures

In this section we provide further details of our model architectures.
Section 3 of the main paper discusses our model. In particular we are interested in computing three
conditional probability terms: (1) pθ (a0 | M0 , Mr ), the probability of the initial state a0 given the
reactants and reagents; (2) the conditional probability pθ (at | Mt , at−1 , t) of next state at given the
intermediate products Mt for t > 0; and (3) the probability pθ (st | Mt ) that the reaction terminates
with final product Mt .
Each of these is parametrized by NNs. We can split up the components of these NNs into a series of
modules, all introduced in the main text: gA , rstop , rreagent , fadd , fremove and fstart . In this section we
shall go through each of these in turn.
The function gA computes node embeddings, HMt , which are used as input to all the other modules.
For this we use Gated Graph Neural Networks (GGNN) [13, 4]. We use 4 propagation steps. The
atom features we feed in are detailed in Table 3. These are calculated using RDKit. In total there are
101 features and we maintain this dimensionality in the hidden layers during the propagation steps of
the GGNN. Three edge labels are defined: single bonds, double bonds and triple bonds. RDKit is
used to Kekulize the reactant molecules.
As mentioned in Section 3 of the main paper both rstop , rreagent consist of three linear functions.
For both, the function fgate is used to decide how much each node should contribute towards the
embedding and so projects down to a scalar value. Again for both, fup projects the node embedding up
to a higher dimensional space, which we choose to be 202 dimensions. This is double the dimension
of the node features, and similar to the approach taken by Li et al. [14, §B.1]. Finally, fdown differs
between the two modules, as for rstop it projects down to one dimension (to later go through a sigmoid
function and compute a stop probability), whereas for rreagent , fdown projects to a dimensionality of
100 to form the reagent embedding.

                                                                                                                           11

Table 3: Atom features
Feature Description
Atom type 72 possible elements in total, one hot
Degree One hot (0, 1, 2, 3, 4, 5, 6, 7, 10)
Explicit Valence One hot (0, 1, 2, 3, 4, 5, 6, 7, 8, 10, 12, 14)
Hybridization One hot (SP, SP2, SP3, Other)
H count integer
Electronegativity float
Atomic number integer
Part of an aromatic ring boolean

The modules for fadd and fremove , that operate on each node to produce a action logit, are both NNs
consisting of one hidden layer of 100 units. Concatenated onto the node features going into these
networks are the node features belonging to the previous atom on the path.
The final function, fstart , is represented by an NN with hidden layers of 100 units. When conditioning
on reagents (ie for E LECTRO) the reagent embeddings calculated by freagent are concatenated onto
the node embeddings and we use two hidden layers for our NN. When ignoring reagents (ie for
E LECTRO -L ITE) we use one hidden layer for this network. In total E LECTRO has approximately
250,000 parameters and E LECTRO -L ITE has approximately 190,000.

B.2 Training

We train everything using Adam [12] and an initial learning rate of 0.0001, which we decay after
5 and 9 epochs by a factor of 0.1. We train for a total of 10 epochs. For training we use reaction
minibatch sizes of one, although these can consist of multiple intermediate graphs.

C Prediction using our model
At predict time, as discussed in the main text, we use beam search to find high probable chemically-
valid paths from our model. Further details are given in Algorithm 1.

Algorithm 1 Predicting electron paths at test time.
Input: Molecule M0 (consisting of atoms A), reagents Mr , beam width K, time steps T max
 1: P̂ = {(∅, log(1 − calc_prob_continue(M0 )))} .This set will store all completed paths.
 2: Fremove = 1 .Remove flag
 3:
 4:   B̂ = ∅. .This set will store all possible open paths. Cleared at start of each timestep.
 5:   for all v ∈ A do
 6:      ρ = (v)
 7:      ppath = log calc_prob_continue(M0 ) + log calc_prob_initial(v, M0 , Mr )
 8:      B̂ = B̂ ∪ {(ρ, ppath )}
 9:   end for
10:   B0 = pick_topK_actions(B̂) .We filter down to the top K most promising actions.
11:
12:   for t in (1, . . . , T max ) do
13:     B̂ = ∅
14:     for all (ρ, ppath ) ∈ Bt−1 do
15:         Mρ = calc_intermediate_mol(M0 , ρ)
16:         pc = calc_prob_continue(Mρ )
17:         P̂ = P̂ ∪ {(ρ, ppath + log(1 − pc ))}
18:         for all v ∈ A do
19:            ρ0 = ρ_ (v) .New proposed path is concatenation of old path with new node.
20:            vt−1 = last element of ρ
21:            B̂ = B̂ ∪ {(ρ0 , ppath + log pc + log calc_prob_action(v, Mρ , vt−1 , Fremove ))}
22:         end for
23:     end for
24:     Bt = pick_topK_actions(B̂)
25:     Fremove = Fremove + 1 mod 2. .If on add step change to remove and vice versa.
26:   end for
27:
28:P̂ = sort_on_prob(P̂)
Output: Valid completed paths and their respective probabilities, sorted by the latter, P̂

                                                   13

D    Further example of actions proposed by our model
Figure 5 shows the model’s predictions for the mechanism of how two molecules will react.
                                                         O

                                                                       O
                                                              N                    CH3
                                 O

                                                                 CH3

                                                                   Br
                                          H3 C               Mg

Figure 5: Predicted mechanism of our model on reactant molecules. Green arrow shows preferred
mechanism, whereas pink shows the model’s second preferred choice. Here, the first-choice prediction
is incorrect, but chemically reasonable, as the Weinreb amide is typically used together in reactions
with Magnesium species. The second-choice prediction is correct.

               a)

                                          O
                    SH                                        S        O
                         +                    O
                                                                           O

               b)                                                  NO2
                                              NO2
                                 Cl
                                      +
                             O
                                                             O

               c)
                                              O
                                 Cl                          O                               O
                                      +                                        +
                             O
                                                             O                           O

Figure 6: Additional typical selectivity examples: Here, the expected product is shown on the right.
The blue arrows indicate the top ranked paths from our model, the red arrows indicate other possibly
competing but incorrect steps, which the model does not predict to be of high probability. In all
cases, our model predicted the correct products. In b) and c), our model correctly recovers the
regioselectivity expected in electrophilic aromatic substitutions.

                                                    14

You can also read