Noise-Induced Barren Plateaus in Variational Quantum Algorithms
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Noise-Induced Barren Plateaus in Variational Quantum Algorithms Samson Wang,1, 2 Enrico Fontana,1, 3, 4 M. Cerezo,1, 5 Kunal Sharma,1, 6 Akira Sone,1, 5 Lukasz Cincio,1 and Patrick J. Coles1 1 Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA 2 Imperial College London, London, UK 3 University of Strathclyde, Glasgow, UK 4 National Physical Laboratory, Teddington, UK 5 Center for Nonlinear Studies, Los Alamos National Laboratory, Los Alamos, NM, USA 6 Hearne Institute for Theoretical Physics and Department of Physics and Astronomy, Louisiana State University, Baton Rouge, LA USA Variational Quantum Algorithms (VQAs) may be a path to quantum advantage on Noisy Intermediate-Scale Quantum (NISQ) computers. A natural question is whether noise on NISQ devices places fundamental limitations on VQA performance. We rigorously prove a serious limita- arXiv:2007.14384v3 [quant-ph] 8 Feb 2021 tion for noisy VQAs, in that the noise causes the training landscape to have a barren plateau (i.e., vanishing gradient). Specifically, for the local Pauli noise considered, we prove that the gradient vanishes exponentially in the number of qubits n if the depth of the ansatz grows linearly with n. These noise-induced barren plateaus (NIBPs) are conceptually different from noise-free barren plateaus, which are linked to random parameter initialization. Our result is formulated for a generic ansatz that includes as special cases the Quantum Alternating Operator Ansatz and the Unitary Coupled Cluster Ansatz, among others. For the former, our numerical heuristics demonstrate the NIBP phenomenon for a realistic hardware noise model. I. Introduction VQAs may avoid the exponential scaling that otherwise would result from the exponential precision requirements One of the great unanswered technological questions of navigating through a barren plateau. is whether Noisy Intermediate Scale Quantum (NISQ) However, these works do not consider quantum hard- computers will yield a quantum advantage for tasks of ware noise, and very little is known about the scalability practical interest [1]. At the heart of this discussion are of VQAs in the presence of noise. One of the main sell- Variational Quantum Algorithms (VQAs), which are be- ing points of VQAs is noise mitigation, and indeed VQAs lieved to be the best hope for near-term quantum ad- have shown evidence of noise resilience in the sense that vantage [2–4]. Such algorithms leverage classical opti- the global minimum of the landscape may be unaffected mizers to train the parameters in a quantum circuit, by noise [6, 23]. While some analysis has been done [44– while employing a quantum device to efficiently estimate 46], an important open question, which has not yet been an application-specific cost function or its gradient. By addressed, is how noise affects the asymptotic scaling of keeping the quantum circuit depth relatively short, VQAs VQAs. More specifically, one can ask how noise affects mitigate hardware noise and may enable near-term appli- the training process. Intuitively, incoherent noise is ex- cations including electronic structure [5–8], dynamics [9– pected to reduce the magnitude of the gradient and hence 12], optimization [13–16], linear systems [17, 18], metrol- hinder trainability, and preliminary numerical evidence ogy [19, 20], factoring [21], compiling [22–24], and oth- of this has been seen [47, 48], although the scaling of this ers [25–30]. effect has not been studied. The main open question for VQAs is their scalability to In this work, we analytically study the scaling of gra- large problem sizes. While performing numerical heuris- dient for VQAs as a function of n, the circuit depth L, tics for small or intermediate problem sizes is the norm and a noise parameter q. We consider a general class of for VQAs, deriving analytical scaling results is rare for local noise models that includes depolarizing noise and this field. Noteworthy exceptions are some recent studies certain kinds of Pauli noise. Furthermore, we investi- of the scaling of the gradient in VQAs with the number of gate a general, abstract ansatz that allows us to encom- qubits n [31–39]. For example, it was proven that the gra- pass many of the important ansatzes in the literature, dient vanishes exponentially in n for randomly initialized, hence allowing us to make a general statement about deep Hardware Efficient ansatzes [31, 32] and dissipative VQAs. This includes the Quantum Alternating Operator quantum neural networks [33], and also for shallow depth Ansatz (QAOA) which is used for solving combinatorial with global cost functions [34]. This vanishing gradient optimization problems [13–16] and the Unitary Coupled phenomenon is now referred to as barren plateaus in the Cluster (UCC) Ansatz which is used in the Variational training landscape. Fortunately, investigations into bar- Quantum Eigensolver (VQE) to solve chemistry problems ren plateaus have spawned several promising strategies to [49–51]. This is also applicable for the Hardware Efficient avoid them, including local cost functions [34, 40], param- Ansatz and the Hamiltonian Variational Ansatz (HVA) eter correlation [37], pre-training [41], and layer-by-layer which are employed for various applications [52–56]. Our training [42, 43]. Such strategies give hope that perhaps results also generalize to settings that allow for multiple
2 NIBP issue. In addition, we provide numerical heuristics that illustrate our main result for MaxCut optimization with the QAOA, showing that NIBPs significantly im- pact this application. II. Results A. General Framework In this work we analyze a general class of parameter- ized ansatzes U (θ) that can be expressed as a product of L unitaries sequentially applied by layers U (θ) = UL (θ L ) · · · U2 (θ 2 ) · U1 (θ 1 ) . (1) FIG. 1. Schematic diagram of the Noise-Induced Bar- ren Plateau (NIBP) phenomenon. For various appli- Here θ = {θ l }L l=1 is a set of vectors of continuous pa- cations such as chemistry and optimization, increasing the rameters that are optimized to minimize a cost function problem size often requires one to increase the depth L of the C that can be expressed as the expectation value of an variational ansatz. We show that, in the presence of local operator O: noise, the gradient vanishes exponentially in L and hence ex- ponentially in the number of qubits n when L scales linearly C = Tr[OU (θ)ρU † (θ)] . (2) in n. This can be seen in the plots on the right, which show As shown in Fig. 2, ρ is an n-qubit input state. Without the cost function landscapes for a simple variational problem loss of generality we assume that each Ul (θ l ) is given by with local noise. Y Ul (θ l ) = e−iθlm Hlm Wlm , (3) m input states or training data, as in machine learning ap- plications, often called quantum neural networks [57–61]. where Hlm are Hermitian operators, θ l = {θlm } are Our main result (Theorem 1) is an upper bound on the continuous parameters, and Wlm denote unparametrized magnitude of the gradient that decays exponentially with gates. We expand Hlm and O in the Pauli basis as L, namely as 2−κ with κ = L log2 (q). Hence we find that X i X the gradient vanishes exponentially in the circuit depth. Hlm = η lm · σ n = ηlm σni , O = ω · σ n = ω i σni , i i Moreover, it is typical to consider L scaling as poly(n) (4) (e.g., in the UCC Ansatz [51]), for which our main result where σni ∈ {11, X, Y, Z}⊗n are Pauli strings, and η lm and implies an exponential decay of the gradient in n. We ω are real-valued vectors that specify the terms present refer to this as a Noise-Induced Barren Plateau (NIBP). in the expansion. Defining Nlm = |η lm | and NO = |ω| We remark that NIBPs can be viewed as concomitant to as the number of non-zero elements, i.e., the number of the cost landscape concentrating around the value of the terms in the summations in Eq. (4), we say that Hlm and cost for the maximally mixed state, and we make this O admit an efficient Pauli decomposition if Nlm , NO ∈ precise in Lemma 1. See Fig. 1 for a schematic diagram O(poly(n)), respectively. of the NIBP phenomenon. We now briefly discuss how the QAOA, UCC, and To be clear, any variational algorithm with a NIBP will Hardware Efficient ansatzes fit into this general frame- have exponential scaling. In this sense, NIBPs destroy work. We refer the reader to the Methods section for quantum speedup, as the standard goal of quantum algo- additional details. In QAOA one sequentially alternates rithms is to avoid the typical exponential scaling of clas- the action of two unitaries as sical algorithms. NIBPs are conceptually distinct from the noise-free barren plateaus of Refs. [31–36]. Indeed, U (γ, β) = e−iβp HM e−iγp HP · · · e−iβ1 HM e−iγ1 HP , (5) strategies to avoid noise-free barren plateaus [34, 37, 40– 43] do not appear to solve the NIBPs issue. where HP and HM are the so-called problem and mixer Hamiltonian, respectively. We define NP (NM ) as the The obvious strategy to address NIBPs is to reduce number of terms in the Pauli decomposition of HP (HM ). circuit complexity, or more precisely, to reduce the circuit On the other hand, Hardware Efficient ansatzes natu- depth. Hence, our work provides quantitative guidance rally fit into Eqs. (1)–(3) as they are usually composed for how small L needs to be to potentially avoid NIBPs. of fixed gates (e.g, controlled NOTs), and parametrized In what follows, we present our general framework fol- gates (e.g., single qubit rotations). Finally, as detailed in lowed by our main result. We also present two extensions the Methods, the UCC ansatz can be expressed as of our main result, one involving correlated ansatz pa- P i Y Y i rameters and one allowing for measurement noise. The U (θ) = Ulm (θlm ) = eiθlm i µlm σn , (6) latter indicates that global cost functions exacerbate the lm lm
3 In this case, one is primarily concerned with trainabil- ity, and hence the gradient is a key quantity of inter- est. These applications motivate our main result in The- orem 1, which bounds the magnitude of the gradient. We remark that trainability is of course also important for VQE, and hence Theorem 1 is also of interest for this application. With this motivation in mind, we now present our main results. We first present our bound on the cost function, since one can view this as a phenomenon that naturally accompanies our main theorem. Namely, in the following lemma, we show that the noisy cost function concentrates around the corresponding value for the maximally mixed state. FIG. 2. Setting for our analysis. An n-qubit input state ρ is sent through a variational ansatz U (θ) composed of L Lemma 1 (Concentration of the cost function). Con- unitary layers Ul (θ l ) sequentially acting according to Eq. (1). sider an L-layered ansatz of the form in Eq. (1). Suppose Here, Ul denotes the quantum channel that implements the that local Pauli noise of the form of Eq. (7) with noise unitary Ul (θ l ). The parameters in the ansatz θ = {θ l }Ll=1 are strength q acts before and after each layer as in Fig. 2. trained to minimize a cost function that is expressed as the Then, for a cost function C e of the form in Eq. (8), the expectation value of an operator O as in Eq. (2). We consider following bound holds a noise model where local Pauli noise channels Nj act on each qubit j before and after each unitary. e − 1 Tr[O] 6 G(n) ρ − 1 C , (9) 2n 2n 1 where µilm ∈ {0, ±1}, and where θlm are the coupled where P Nlm = |µlm | as cluster amplitudes. Moreover, we denote G(n) = NO kωk∞ q L+1 . (10) b the number of non-zero elements in i µilm σni . As shown in Fig. 2, we consider a noise model where Here k · k∞ is the infinity norm, k · k1 is the trace norm, local Pauli noise channels Nj act on each qubit j before ω is defined in Eq. (4), and NO = |ω| is the number of and after each unitary Ul (θ l ). The action of Nj on a non-zero elements in the Pauli decomposition of O. local Pauli operator σ ∈ {X, Y, Z} can be expressed as This lemma implies the cost landscape exponentially concentrates on the value Tr[O]/2n for large n, whenever Nj (σ) = qσ σ , (7) the number of layers L scales linearly with the number of where −1 < qX , qY , qZ < 1. Here, we character- qubits. While this lemma has important applications on ize the noise strength with a single parameter q = its own, particularly for VQE, it also provides intuition max{|qX |, |qY |, |qZ |}. Let Ul denote the channel that im- for the NIBP phenomenon, which we now state. plements the unitary Ul (θ l ) and let N = N1 ⊗ · · · ⊗ Nn Let ∂lm C e = ∂ C/∂θ e lm denote the partial derivative of denote the n-qubit noise channel. Then, the noisy cost the noisy cost function with respect to the m-th param- function is given by eter that appears in the l-th layer of the ansatz, as in Eq. (3). For our main technical result, we upper bound |∂lm C| e as a function of L and n. e = Tr O N ◦ UL ◦ · · · ◦ N ◦ U1 ◦ N (ρ) . C (8) Theorem 1 (Upper bound on the partial derivative). Consider an L-layered ansatz as defined in Eq. (1). Let B. General Analytical Results θlm denote the trainable parameter corresponding to the Hamiltonian Hlm in the unitary Ul (θ l ) appearing in the There are some VQAs, such as the VQE [5] for chem- ansatz. Suppose that local Pauli noise of the form in istry and other physical systems, where it is important Eq. (7) with noise parameter q acts before and after each to accurately characterize the value of the cost function layer as in Fig. 2. Then the following bound holds for the itself. We provide an important result below in Lemma 1 partial derivative of the noisy cost function that quantitatively bounds the cost function itself, and we envision that this bound will be especially useful in |∂lm C| e 6 F (n), (11) the context of VQE. On the other hand, there are other VQAs, such as those for optimization [13–16], compil- where √ ing [22–24], and linear systems [17, 18], where the key F (n) = 8 ln 2 NO kHlm k∞ kωk∞ n1/2 q L+1 , (12) goal is to learn the optimal parameters and the precise value of the cost function is either not important or can and ω is defined in Eq. (4), with number of non-zero be computed classically after learning the parameters. elements NO .
4 Let us now consider the asymptotic scaling of the func- derivative of the noisy cost with respect to θst is bounded tion F (n) in Eq. (12). Under standard assumptions such as as that O in Eq. (4) admits an efficient Pauli decomposi- √ e 6 8 ln 2 gNO kHlm k∞ kωk∞ n1/2 q L+1 , |∂st C| (16) tion and that Hlm has bounded eigenvalues, we now state that F (n) decays exponentially in n, if L grows linearly at all points in the cost landscape. in n. Remark 1 is especially important in the context of the Corollary 1 (Noise-induced barren plateaus). Let QAOA and the UCC ansatz, as discussed below. We i Nlm , NO ∈ O(poly(n)) and let ηlm , ω j ∈ O(poly(n)) for note that, in the general case, a unitary of the form of all i, j. Then the upper bound F (n) in Eq. (12) vanishes Eq. (3) cannot be implemented as a single gate on a phys- exponentially in n as ical device. In practice one needs to compile the unitary into a sequence of native gates. Moreover, Hamiltoni- F (n) ∈ O(2−αn ) , (13) ans with non-commuting terms are usually approximated with techniques such as Trotterization. This compiliation for some positive constant α if we have overhead potentially leads to a sequence of gates that L ∈ Ω(n) . (14) grows with n. Remark 1 enables us to account for such scenarios, and we elaborate on its relevance to specific The asymptotic scaling in Eq. (13) is independent of applications in the next subsection. l and m, i.e., the scaling is blind to the layer, or the Finally, we present an extension of our main result to parameter within the layer, for which the derivative is the case of measurement noise. Consider a model of mea- taken. This corollary implies that when Eq. (14) holds, surement noise where each local measurement indepen- i.e. L grows at least linearly in n, the partial derivative dently has some bit-flip probability given by (1 − qM )/2, |∂lm C| e exponentially vanishes in n across the entire cost which we assume to be symmetric with respect to the 0 landscape. In other words, one observes a Noise-Induced and 1 outcomes. This leads to an additional reduction Barren Plateau (NIBP). of our bounds on the cost function and its gradient that We note that Eq. (14) is satisfied for all q < 1. That depends on the locality of the observable O. is, NIBPs occur regardless of the noise strength, it only Proposition 1 (Measurement noise). Consider expand- changes the severity of the exponential scaling. ing the observable O as a sum of Pauli strings, as in In addition, Corollary 1 implies that NIBPs are con- Eq. (4). Let w denote the minimum weight of these ceptually different from noise-free barren plateaus. First, strings, where the weight is defined as the number of non- NIBPs are independent of the parameter initialization identity elements for a given string. In addition to the strategy or the locality of the cost function. Second, noise process considered in Fig. 2, suppose there is also NIBPs exhibit exponential decay of the gradient itself; measurement noise consisting of a tensor product of lo- not just of the variance of the gradient, which is the cal bit-flip channels with bit-flip probability (1 − qM )/2. hallmark of noise-free barren plateaus. Noise-free barren Then we have plateaus allow the global minimum to sit inside deep, nar- row valley in the landscape [34], whereas NIBPs flatten e − 1 Tr[O] 6 qM C w 1 G(n) ρ − n (17) the entire landscape. 2n 2 1 One of the strategies to avoid the noise-free barren and plateaus is to correlate parameters, i.e., to make a subset of the parameters equal to each other [37]. We generalize w |∂lm C| e 6 qM F (n) (18) Theorem 1 in the following remark to accommodate such a setting, consequently showing that such correlated or where G(n) and F (n) are defined in Lemma 1 and The- degenerate parameters do not help in avoiding NIBPs. orem 1, respectively. In this setting, the result we obtain in Eq. (16) below is Proposition 1 goes beyond the noise model considered essentially identical to that in Eq. (12) except with an in Theorem 1. It shows that in the presence of measure- additional factor quantifying the amount of degeneracy. ment noise there is an additional contribution from the Remark 1 (Degenerate parameters). Consider the locality of the measurement operator. It is interesting to ansatz defined in Eqs. (1) and (3). Suppose there is a draw a parallel between Proposition 1 and noise-free bar- subset Gst of the set {θlm } in this ansatz such that Gst ren plateaus, which have been shown to be cost-function consists of g parameters that are degenerate: dependent and in particular depend on the locality of the observable O [34]. The bounds in Proposition 1 similarly depend on the locality of O. For example, when w = n, Gst = θlm | θlm = θst . (15) w i.e., global observables, the factor qM will hasten the ex- Here, θst denotes the parameter in Gst for which ponential decay. On the other hand, when w = 1, i.e., lo- Nlm kη lm k∞ takes the largest value in the set. (θst can cal observables, the scaling is unaltered by measurement also be thought of as a reference parameter to which all noise. In this sense, a global observable exacerbates the other parameters are set equal in value.) Then the partial NIBP issue by making the decay more rapid with n.
5 C. Application-Specific Analytical Results Moreover, weak growth of p with n combined with com- pilation overhead could still result in an NIBP. Here we investigate the implications of our results from Finally, we note that above we have assumed the con- Section II B for two applications: optimization and chem- tribution of kP dominates that of kM . However, it is istry. In particular, we derive explicit conditions for possible that for choice of more exotic mixers [16], kM NIBPs for these applications. These conditions are de- also needs to be carefully considered to avoid NIBPs. rived in the setting where Trotterization is used, but Corollary 3 (Example: UCC). Let H denote a molec- other compilation strategies incur similar asymptotic be- ular Hamiltonian of a system of Me electrons. Consider havior. We begin with the QAOA for optimization and the UCC ansatz as defined in Eq. (6). If local Pauli noise then discuss the UCC ansatz for chemistry. Finally, of the form in Eq. (7) with noise parameter q acts before we make a remark about the Hamiltonian Variational and after every Ulm (θlm ) in Eq. (6), then we have Ansatz (HVA), as well as remark that our results also ap- ply to a generalized cost function that can employ train- √ |∂θlm C| e 6 8 ln 2 Nblm NH kωk∞ n1/2 q L+1 , (21) ing data. Corollary 2 (Example: QAOA). Consider the QAOA for any coupled cluster amplitude θlm , and where O = H with 2p trainable parameters, as defined in Eq. (5). Sup- in Eq. (2). pose that the implementation of unitaries corresponding to the problem Hamiltonian HP and the mixer Hamilto- Corollary 3 allows us to make general statements about nian HM require kP - and kM -depth circuits, respectively. the trainability of UCC ansatz. We present the details for If local Pauli noise of the form in Eq. (7) with noise pa- the standard UCC ansatz with single and double excita- rameter q acts before and after each layer of native gates, tions from occupied to virtual orbitals [49, 68] (see Meth- then we have ods for more details). Let Mo denote the total number of √ spin orbitals. Then at least n = Mo qubits are required |∂βl C| e 6 8 ln 2 gl,P NP kHP k∞ kωk∞ n1/2 q (kP +kM )p+1 , to simulate such a system and the number of variational (19) parameters grows as Ω(n2 Me2 ) [62, 69]. To implement the √ 1/2 (k +k )p+1 |∂γl C| e 6 8 ln 2 gl,M NP kHM k∞ kωk∞ n q P M , UCC ansatz on a quantum computer, the excitation op- (20) erators are first mapped to Pauli operators using Jordan- Wigner or Bravyi-Kitaev mappings [70, 71]. Then, using for any choice of parameters βl , γl , and where O = HP first-order Trotterization and employing SWAP networks in Eq. (2). Here gl,P and gl,M are the respective number [62], the UCC ansatz can be implemented in Ω(n2 Me ) of native gates parameterized by βl and γl according to depth, while assuming 1-D connectivity of qubits [62]. the compilation. Hence for the UCC ansatz, approximated by single- and Corollary 2 follows from Remark 1 and it has interest- double-excitation operators, the upper bound in Eq. (21) ing implications for the trainability of the QAOA. From (asymptotically) vanishes exponentially in n. Eqs. (19) and (20), NIBPs are guaranteed if pkP scales To target strongly correlated states for molecular linearly in n. This can manifest itself in a number of Hamiltonians, one can employ a UCC ansatz that in- ways, which we explain below. cludes additional, generalized excitations [55, 72]. A First, we look at the depth kP required to implement Ω(n3 ) depth circuit is required to implement the first- one application of the problem unitary. Graph prob- order Trotterized form of this ansatz [62]. Hence NIBPs lems containing vertices of extensive degree such as the become more prominent for generalized UCC ansatzes. Sherrington-Kirkpatrick model inherently require Ω(n) Finally, we remark that a sparse version of the UCC depth circuits to implement [54]. On the other hand, ansatz can be implemented in Ω(n) depth [62]. NIBPs generic problems mapped to hardware topologies also still would occur for such ansatzes. have the potential to incur Ω(n) depth or greater in com- Additionally, we can make the following remark about pilation cost. For instance, implementation of MaxCut the Hamiltonian Variational Ansatz (HVA). As argued in and k-SAT using SWAP networks on circuits with 1-D [55, 73, 74], the HVA has the potential to be an effective connectivity requires depth Ω(n) and Ω(nk−1 ) respec- ansatz for quantum many-body problems. tively [15, 62]. Such mappings with the aforementioned Remark 2 (Example: HVA). The HVA be thought of compiling overhead for k > 2 are guaranteed to encounter as a generalization of the QAOA to more than two non- NIBPs even for a fixed number of rounds p. commuting Hamiltonians. It is remarked in Ref. [56] that Second, it appears that p values that grow at least for problems of interest the number of rounds p scales lightly with n may be needed for quantum advantage in linearly in n. Thus, considering this growth of p and also certain optimization problems (for example, [63–66]). In the potential growth of the compiled unitaries with n, the addition, there are problems employing the QAOA that HVA has the potential to encounter NIBPs, by the same explicitly require p scaling as poly(n) [21, 67]. Thus, arguments made above for the QAOA (e.g., Corollary 2). without even considering the compilation overhead for the problem unitary, these QAOA problems may run into Remark 3 (Quantum Machine Learning). Our results NIBPs particularly when aiming for quantum advantage. can be extended to generalized cost functions of the form
6 † P Ctrain = i Tr[Oi U (θ)ρi U (θ)] where {Oi } is a set of operators each of the form (4) and {ρi } is a set of states. This can encapsulate certain quantum machine learning settings [57–61] that employ training data {ρi }. As an example of an instance where NIBPs can occur, in one study [61] an ansatz model has been proposed that requires at least linear circuit depth in n. D. QAOA Heuristics To illustrate the NIBP phenomenon beyond the con- ditions assumed in our analytical results, we numerically implement the QAOA to solve MaxCut combinatorial op- timization problems. We employ a realistic noise model obtained from gate-set tomography on the IBM Ourense FIG. 3. QAOA heuristics in the presence of real- superconducting qubit device. In the Methods section istic hardware noise: increasing number of rounds we provide additional details on the noise model and the for fixed problem size. (a) The approximation ratio av- optimization method employed. eraged over 100 random graphs of 5 nodes is plotted ver- Let us first recall that a MaxCut problem is specified sus number of rounds p. The black, green, and red curves by a graph G = (V, E) of nodes V and edges E. The respectively correspond to noise-free training, noisy train- goal is to partition the nodes of G into two sets which ing with noise-free final cost evaluation, and noisy train- ing with noisy final cost evaluation. The performance of maximize the number of edges connecting nodes between noise-free training increases with p, similar to the results in sets. Here, the QAOA problem Hamiltonian is given by Ref. [15]. The green curve shows that the training process 1 X itself is hindered by noise, with the performance decreasing HP = − Cij (11 − Zi Zj ) , (22) steadily with p for p > 4. The dotted blue lines correspond 2 ij∈E to known lower and upper bounds on classical performance in polynomial time: respectively the performance guarantee where Zi are local Pauli operators on qubit (node) i, of the Goemans-Williamson algorithm [76] and the boundary Cij = 1 if the nodes are connected and Cij = 0 otherwise. of known NP-hardness [77, 78]. (b) The deviation of the cost We analyze performance in two settings. First, we fix from Tr[HP ]/2n (averaged over graphs and parameter values) the problem size at n = 5 nodes (qubits) and vary the is plotted versus p. As p increases, this deviation decays ap- number of rounds p (Fig. 3). Second, we fix the number proximately exponentially with p (linear on the log scale). (c) of rounds of QAOA at p = 4 and vary the problem size The absolute value of the largest partial derivative, averaged by increasing the number of nodes (Fig. 4). over graphs and parameter values, is plotted versus p. The In order to quantify performance for a given n and p, partial derivatives decay approximately exponentially with p, we randomly generate 100 graphs according to the showing evidence of Noise-Induced Barren Plateaus (NIBPs). Erdős–Rényi model [75], such that each graph G is chosen uniformly at random from the set of all graphs of n nodes. For each graph we run 10 instances of the parameter opti- via noisy training. Note that evaluating the cost in a mization, and we select the run that achieves the smallest noise-free setting has practical meaning, since the clas- energy. At each optimization step the cost is estimated sicality of the Hamiltonian allows one to compute the with 1000 shots. Performance is quantified by the aver- cost on a (noise-free) classical computer, after training age approximation ratio when training the QAOA in the the parameters. For p > 4 this approximation ratio de- presence and absence of noise. The approximation ratio creases, meaning that as p becomes larger it becomes is defined as the lowest energy obtained via optimizing increasingly hard to find a minimizing direction to navi- divided by the exact ground state energy of HP . gate through the cost function landscape. Moreover, the In our first setting we observe in Fig. 3(a) that when effect of NIPBs is evident in Fig. 3(c) where we depict training in the absence of noise, the approximation ratio the average absolute value of the largest cost function increases with p. However, when training in the pres- partial derivative (i.e., maxlm |∂lm C|). e This plot shows ence of noise the performance decreases for p > 2. This an exponential decay of the partial derivative with p in result is in accordance with Lemma 1, as the cost func- accordance with Theorem 1. tion value concentrates around Tr[HP ]/2n as p increases. Finally, in Fig. 3(a) we contextualize our results This concentration phenomenon can also be seen clearly with previously known two-sided bounds on classical in Fig. 3(b), where in fact we see evidence of exponential polynomial-time performance. The lower bound cor- decay of cost value with p. responds to the performance guarantee of the classical In addition, we can see the effect of NIBPs as Fig. 3(a) Goemans-Williamson algorithm [76], whilst the upper also depicts the value of the approximation ratio com- bound is at the value 16/17 which is the approximation puted without noise by utilizing the parameters obtained ratio beyond which Max-Cut is known to be NP-hard
7 scalability, and there is even less known about the im- pact of noise on their scaling. Our work represents a breakthrough in understanding the effect of local noise on VQA scalability. We rigorously prove two important and closely related phenomena: the exponential concentra- tion of the cost function in Lemma 1 and the exponential vanishing of the gradient in Theorem 1. We refer to the latter as a Noise-Induced Barren Plateau (NIBP). Like noise-free barren plateaus, NIBPs require the precision and hence the algorithmic complexity to scale exponen- tially with the problem size. Thus, avoiding NIBPs is necessary for a VQA to have any hope of exponential quantum speedup. On the other hand, NIBPs are conceptually different from noise-free barren plateaus [31–36]. The latter are FIG. 4. QAOA heuristics in the presence of realistic due to random parameter initialization and hence can be hardware noise: increasing problem size for a fixed addressed by pre-training, correlating parameters, and number of rounds. The approximation ratio averaged over other strategies [34, 37, 40–43]. In contrast, NIBPs hold 60 random graphs of increasing number of nodes n and fixed for every point on the cost function landscape. Hence, number of rounds p = 4 is plotted. The black, green, and red pre-training and other similar strategies do not avoid curves respectively correspond to noise-free training, noisy NIBPs, and we explicitly demonstrate this for the pa- training with noise-free final cost evaluation, and noisy train- rameter correlation strategy in Remark 1. At the mo- ing with noisy final cost evaluation. (a) For a problem size of 8 ment, the only strategies we are aware of for avoiding nodes or larger, the noisily-trained approximation ratio falls below the performance guarantee of the classical Goemans- NIBPs are: (1) reducing the hardware noise level, or (2) Williamson algorithm. (b) The depth of the circuit (red improving the design of variational ansatzes such that curve) scales linearly with the number of qubits, confirming their circuit depth scales more weakly with n. Our work we are in a regime where we would expect to observe Noise- provides quantitative guidance for how to develop these Induced Barren Plateaus. strategies. An elegant feature of our work is its generality, as our results apply to a wide range of VQAs and ansatzes. [77, 78]. This includes the two most popular ansatzes, QAOA In our second setting we find complementary results. for optimization and UCC for chemistry, which Corol- In Fig. 4(a) we observe that at a problem size of 8 qubits laries 2 and 3 treat respectively. In recent times QAQA, or larger, 4 rounds of QAOA trained on the noisy circuit UCC, and other physically motivated ansatzes have be falls short of the performance guarantees of the classi- touted as the potential solution to trainability issues due cal Goemans-Williamson algorithm. As we increase the to (noise-free) barren plateaus, while Hardware Efficient number of qubits, we also observe this increases the depth ansatzes, which minimize circuit depth, have been re- of the circuit linearly (Fig. 4(b)), thus confirming we are garded as problematic. Our work swings the pendulum in in a regime of NIBPs. the other direction: any additional circuit depth that an Our numerical results show that training the QAOA ansatz incorporates (regardless of whether it is physically in the presence of a realistic noise model significantly motivated) will hurt trainability and potentially lead to affects the performance. The concentration of cost and a NIBP. This suggests that Hardware Efficient ansatzes the NIBP phenomenon are both also clearly visible in our are in fact worth exploring further, provided one has an data. Even though we observe performance for n = 5 appropriate strategy to avoid noise-free barren plateaus. and p = 4 that is NP-hard to achieve classically, any This claim is supported by recent state-of-the-art imple- possible advantage would be lost for large problem sizes mentations for optimization [54] and chemistry [53] using or circuit depth due to bad scaling. Hence, noise appears such ansatzes. to be a crucial factor to account for when attempting to We believe our work has particular relevance to opti- understand the performance of QAOA. mization. For combinatorial optimization problems, such as MaxCut on 3-regular graphs, the compilation of a sin- gle instance of the problem unitary e−iγHP can require an III. Discussion Ω(n)-depth circuit [54]. Therefore, for a constant number of rounds p of the QAOA, the circuit depth grows at least The success of NISQ computing largely depends on the linearly with n. From Theorem 1, it follows that NIBPs scalability of Variational Quantum Algorithms (VQAs), can occur for practical QAOA problems, even for con- which are widely viewed as the best hope for near-term stant number of rounds. Furthermore, even neglecting quantum advantage for various applications. Only a the aforementioned linear compilation overhead, NIBPs small handful of works have analytically studied VQA are guaranteed (asymptotically) if p grows in n. Such
8 growth has been shown to be necessary in certain in- stances of MaxCut [63] as well as for other optimization problems [21, 67], and hence NIBPs are especially rele- vant in these cases. While it is well known that decoherence ultimately lim- its the depth of quantum circuits in the NISQ era, there was an interesting open question (prior to our work) as to whether one could still train the parameters of a vari- ational ansatz in the high decoherence limit. This ques- tion was especially important for VQAs for optimization, compiling, and linear systems, which are applications that do not require accurate estimation of cost func- tions on the quantum computer. Our work essentially provides a negative answer to this question. Naturally, important future work will involve extending our results to more general (e.g., non-unital) noise models, and nu- merically testing the tightness of our bounds. Moreover, FIG. 5. Special cases of our general ansatz. (a) QAOA our work emphasizes the importance of short-depth vari- problem unitary e−iγHP for the ring-of-disagrees MaxCut problem, with Hamiltonian HP = 21 j Zj Zj+1 . (b) Hard- P ational ansatzes. Hence a crucial research direction for the success of VQAs will be the development of methods ware Efficient ansatz composed of CNOTs and single qubit to reduce ansatz depth. rotations around the y-axis Ry (θ). (c) Unitary for the expo- nential e−iθY1 Z2 Z3 X4 . This type of circuit is a representative component of the UCC ansatz. IV. Methods 2. Hardware Efficient Ansatz A. Special Cases of Our Ansatz The goal of the Hardware Efficient ansatz is to reduce Here we discuss how the the QAOA, the Hardware the gate overhead (and hence the circuit depth) which Efficient ansatz, and the UCC ansatz fit into the general arises when implementing a general unitary as in (3). framework of Section II A. Hence, when employing a specific quantum hardware the parametrized gates e−iθlm Hlm and the unparametrized gates Wlm are taken from a gate alphabet composed of native gates to that hardware. Figure 5(b) shows an 1. Quantum Alternating Operator Ansatz example of a Hardware Efficient ansatz where the gate alphabet is composed of rotations around the y axis and of CNOTs. The QAOA can be understood as a discretized adi- abatic transformation where the goal is to prepare the ground state of a given Hamiltonian HP . The order p of the Trotterization determines the solution precision 3. Unitary Coupled Cluster Ansatz and the circuit depth. Given an initial state |si, usually the linear superposition of all elements of the computa- tional basis |si = |+i⊗n , the ansatz corresponds to the This ansatz is employed to estimate the ground state sequential application of two unitaries UP (γl ) = e−iγl HP energy of the molecular Hamiltonian. In the second and UM (βl ) = e−iβl HM . These alternating unitaries are quantization, and within the Born-Oppenheimer approx- usually known as the problem and mixer unitary, respec- imation, the molecular Hamiltonian of aPsystem of Me † tively. Here γ = {γk }L L l=1 and β = {βk }l=1 are vectors electrons can be expressed as: H = pq hpq ap aq + of variational parameters which determine how long each 1 † † † P 2 pqrs hpqrs ap aq ar as , where {ap } ({aq }) are Fermionic unitary is applied and which must be optimized to mini- creation (annihilation) operators. Here, hpq and hpqrs mize the cost function C, defined as the expectation value respectively correspond to the so-called one- and two- electron integrals [49, 68]. The ground state energy of C = hγ, β|HP |γ, βi = Tr[HP |γ, βihγ, β|] , (23) H can be estimated with the VQE algorithm by prepar- ing a reference state, normally taken to be the Hartree- where |γ, βi = U (γ, β)|si is the QAOA variational state, Fock (HF) mean-field state |ψ0 i, and acting on it with a and where U (γ, β) is given by (5). In Fig. 5(a) we depict parametrized UCC ansatz. the circuit description of a QAOA ansatz for a specific The action of a UCC ansatz with single (T1 ) and double Hamiltonian where kP = 6. (T2 ) excitations is given by |ψi = exp(T − T † )|ψ0 i, where
9 T = T1 + T2 , and where Let us now present two lemmas that reflect these two parts of the proof. The action of the noise in ti,j a†a a†b aj ai . X X a,b T1 = tai a†a ai , T2 = (24) (7) on the operator Λ is to map the elements of λ as i∈occ i,j∈occ N x(i) y(i) z(i) a∈vir a,b∈vir λi −−→ λ0i = qX qY qZ λi where x(i), y(i), and z(i) respectively denote the number of X, Y , and Z oper- Here the i and j indices range over “occupied” orbitals ators in the i-th Pauli string. Recall the definition whereas the a and b indices range over “virtual” or- q = max{|qX |, |qY |, |qZ |}. Since x(i) + y(i) + z(i) > 1, bitals [49, 68]. The coefficients tai and ta,b i,j are called the inequality |λ0 | 6 q|λ| always holds. We use this re- coupled cluster amplitudes. For simplicity, we denote lationship, along with Weyl’s inequality and the unitary these amplitudes {tai , ta,b i,j } as {θlm }. Similarly, by denot- invariance of Schatten norms to show that for an operator ing the excitation operators {a†a ai , a†a a†b aj ai } as {τlm }, of the form (25) we have the UCC P ansatz can be written in a compact form as W k (Λ) ∞ 6 λ0 + q k λ 1 (26) † U (θ) = e lm θlm (τlm −τlm ) . In order to implement U (θ) one maps the fermionic operators to spin operators by where W k is a channel composed of k unitaries inter- means of the Jordan-Wigner or the Bravyi-Kitaev trans- leaved with noise channels of the form (7). The second † lemma we present is a bound on the relative entropy by P i i [70, 71], which allows us to write (τlm −τlm ) = formations Müller-Hermes et al. [79] which states that i i µlm σn . Then, from a first-order Trotterization we obtain (6). Here, µilm ∈ {0, ±1}. In Fig. 5(c) we depict D W(ρ) 11⊗n /2n 6 q 2k D ρ 11⊗n /2n (27) the circuit description of a representative component of where we recall that D ρ 11⊗n /2n itself is always upper the UCC ansatz. bounded by n for any n-qubit quantum state ρ. Now that we have the main tools we present a sketch B. Proof of Theorem 1 of the proof. In order to analyze the partial derivative of the cost function ∂lm C e = Tr [O ∂lm ρL ] we first note that Here we outline the proof for our main result on Noise- the output state ρL can be expressed as Induced Barren Plateaus. We refer the reader to the Sup- ρL = (Wa ◦ Wb ) (ρ0 ) = Wa (ρ̄l ) , (28) plementary Information for additional details. We note that Lemma 1 and Remark 1 follow from similar steps where ρ0 is the input state and and their proofs are also detailed in the Supplementary + Wa = N ◦ UL ◦ · · · ◦ Ul+1 ◦ N ◦ Ulm , (29) Information. Moreover, we remark that Corollaries 1, 2, − Wb = Ulm ◦ N ◦ Ul−1 ◦ · · · ◦ N ◦ U1 ◦ N , (30) and 3 follow in a straightforward manner from a direct application of Theorem 1 and Remark 1. ± where Ulm are channels that implement the unitaries Throughout our calculations we find it useful to use − Ulm = s6m e−iθls Hls and Ulm Q + = s>m e−iθls Hls such Q the expansion of operators in the Pauli tensor product + − that Ul = Ulm · Ulm . For simplicity of notation here we basis. Given an n-qubit Hermitian operator Λ, one can have omitted the parameter dependence on the concate- always consider the decomposition nation of channels. Additionally, we have introduced the Λ = λ0 11⊗n + λ · σ n , (25) notation ρ̄l = Wb (ρ0 ) and it is straightforward to show n that where λ0 ∈ R and λ ∈ R4 −1 . Note that here we redefine ∂lm ρ̄l = −i[Hlm , ρ̄l ] . (31) the vector of Pauli strings σ n as a vector of length 4n − 1 which excludes 11⊗n . Using the tracial matrix Hölder’s inequality [80], we Central to our proof is to understand how operators can write are mapped by concatenations of unitary transformations e = Tr Wa† (O) ∂lm ρ̄l ∂lm C (32) and noise channels. We do this through two lenses. First, † given an operator Λ we investigate how various `p -norms 6 Wa (O) ∞ ∂lm ρ̄l 1 , (33) of λ are related at different points in the evolution. Such quantities are well suited to study in our setting as we where Wa† is the adjoint map of Wa . The two terms can use the transfer matrix formalism in the Pauli ba- in the product can then be bounded with the above sis, that is, to represent a channel N with the matrix two techniques. Using (26) we find Wa† (O) ∞ 6 (TN )ij = 21n Tr σni N (σnj ) . Indeed, we see that the noise q L−l+1 NO ω ∞ for the first term. We bound the sec- model in (7) has a diagonal Pauli transfer matrix, which ond term by using (31), a bound on Schatten norms motivates this choice of attack. The second quantity we of commutators [81], quantum Pinsker’s √ inequality [82], use is the relative entropy D ρ 11⊗n /2n between a state and (27) to obtain ∂lm ρ̄l 1 6 8 ln 2 Hlm ∞ n1/2 q l . ρ and the maximally mixed state. This is also useful Putting the two parts together we obtain to study due to the strong data processing inequality in √ e 6 8 ln 2 NO kHlm k∞ kωk∞ n1/2 q L+1 , ∂lm C (34) Ref. [79] which quantifies how noise maps ρ closer to the maximally mixed state. completing the proof.
10 w C. Proof of Proposition 1 with qM for each term in the sum. This gives an ex- tra locality-dependent factor in the bound on the partial Here we sketch the proof of Proposition 1, with addi- derivative: tional details being presented in the Supplementary In- w |∂lm C| e 6 qM F (n). (40) formation. We model measurement noise as a tensor product of An analogous reasoning leads to the following result independent local classical bit-flip channels, which math- for the concentration of the cost function: ematically corresponds to modifying the local POVM el- ements P0 = |0ih0| and P1 = |1ih1| as follows: e − 1 Tr O 6 q w G(n). C (41) M 2n 1 + qM 1 − qM P0 = |0ih0| → Pe0 = |0ih0| + |1ih1| (35) 2 2 1 − qM 1 + qM D. Details of Numerical Implementations P1 = |1ih1| → Pe1 = |0ih0| + |1ih1| . (36) 2 2 The noise model employed in our numerical simula- In turn, it follows that one can also model this measure- tions was obtained by performing one- and two-qubit ment noise as a tensor product of local depolarizing chan- gate-set tomography [83, 84] on the five-qubit IBM Q nels with depolarizing probability 1 > (1 − qM )/2 > 0, Ourense superconducting qubit device. The process ma- which we indicate by NM . The channel is applied di- trices for each gate native to the device’s alphabet, and the state preparation and measurement noise are de- P i to thei measurement operator such that NM (O) = rectly i ω NM (σn ) = ω e · σ n . Here ω e is a vector of coeffi- scribed in Ref. [85, Apendix B]. In addition, the opti- cients ω w(i) e i = qM ω i , where w(i) = x(i) + y(i) + z(i) is mization for the MaxCut problems was performed using the weight of the Pauli string. Here we recall that we an optimizer based on the Nelder-Mead simplex method. have respectively defined x(i), y(i), z(i) as the number of Pauli operators X, Y , and Z in the i-th Pauli string. V. ACKNOWLEDGEMENTS Let us first focus on the partial derivative of the cost. In the presence of measurement noise we then have We thank Daniel Stilck França for helpful discussions and for pointing us to Ref. [79]. Research presented in e = 1 Tr (e h i ∂lm C ω · σ n )(g (L) · σ n ) (37) this article was supported by the Laboratory Directed 2n Research and Development program of Los Alamos Na- =ωe · g (L) . (38) tional Laboratory under project number 20190065DR. SW and EF acknowledge support from the U.S. Depart- Which means that |∂lm C| e = |eω · g (L) |. We then examine ment of Energy (DOE) through a quantum computing the inner product in an element-wise fashion: program sponsored by the LANL Information Science & Technology Institute. MC and AS were also sup- X (L) X w(i) (L) ported by the Center for Nonlinear Studies at LANL. ω · g (L) | 6 |e |e ωi ||gi | 6 qM |ωi ||gi | . (39) PJC also acknowledges support from the LANL ASC Be- i i yond Moore’s Law project. LC and PJC were also sup- ported by the U.S. Department of Energy (DOE), Of- Therefore, defining w = mini w(i) as the minimum fice of Science, Office of Advanced Scientific Computing weight of the Pauli strings in the decomposition of O, Research, under the Quantum Computing Applications w(i) w w(i) we have that qM 6 qM , and hence we can replace qM Team (QCAT) program. [1] J. Preskill, “Quantum computing in the NISQ era and [4] Kishor Bharti, Alba Cervera-Lierta, Thi Ha Kyaw, beyond,” Quantum 2, 79 (2018). Tobias Haug, Sumner Alperin-Lea, Abhinav Anand, [2] M. Cerezo, Andrew Arrasmith, Ryan Babbush, Simon C Matthias Degroote, Hermanni Heimonen, Jakob S. Benjamin, Suguru Endo, Keisuke Fujii, Jarrod R Mc- Kottmann, Tim Menke, Wai-Keong Mok, Sukin Sim, Clean, Kosuke Mitarai, Xiao Yuan, Lukasz Cincio, and Leong-Chuan Kwek, and Alán Aspuru-Guzik, “Noisy Patrick J Coles, “Variational quantum algorithms,” arXiv intermediate-scale quantum (nisq) algorithms,” arXiv preprint arXiv:2012.09265 (2020). preprint arXiv:2101.08448 (2021). [3] Suguru Endo, Zhenyu Cai, Simon C Benjamin, and Xiao [5] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Yuan, “Hybrid quantum-classical algorithms and quan- Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’Brien, tum error mitigation,” arXiv preprint arXiv:2011.01382 “A variational eigenvalue solver on a photonic quantum (2020). processor,” Nature Communications 5, 4213 (2014).
11 [6] Jarrod R McClean, Jonathan Romero, Ryan Babbush, compiling,” New Journal of Physics (2020). and Alán Aspuru-Guzik, “The theory of variational [24] T. Jones and S. C Benjamin, “Quantum compila- hybrid quantum-classical algorithms,” New Journal of tion and circuit optimisation via energy dissipation,” Physics 18, 023023 (2016). arXiv:1811.03147 [quant-ph]. [7] Bela Bauer, Dave Wecker, Andrew J Millis, Matthew B [25] A. Arrasmith, L. Cincio, A. T. Sornborger, W. H. Zurek, Hastings, and Matthias Troyer, “Hybrid quantum- and P. J. Coles, “Variational consistent histories as a hy- classical approach to correlated materials,” Physical Re- brid algorithm for quantum foundations,” Nature com- view X 6, 031045 (2016). munications 10, 3438 (2019). [8] Tyson Jones, Suguru Endo, Sam McArdle, Xiao Yuan, [26] Marco Cerezo, Alexander Poremba, Lukasz Cincio, and and Simon C Benjamin, “Variational quantum algorithms Patrick J Coles, “Variational quantum fidelity estima- for discovering hamiltonian spectra,” Physical Review A tion,” Quantum 4, 248 (2020). 99, 062304 (2019). [27] M Cerezo, Kunal Sharma, Andrew Arrasmith, and [9] Ying Li and Simon C Benjamin, “Efficient variational Patrick J Coles, “Variational quantum state eigensolver,” quantum simulator incorporating active error minimiza- arXiv preprint arXiv:2004.01372 (2020). tion,” Physical Review X 7, 021050 (2017). [28] Ryan LaRose, Arkin Tikku, Étude O’Neel-Judy, Lukasz [10] Cristina Cirstoiu, Zoe Holmes, Joseph Iosue, Lukasz Cin- Cincio, and Patrick J Coles, “Variational quantum cio, Patrick J Coles, and Andrew Sornborger, “Vari- state diagonalization,” npj Quantum Information 5, 1– ational fast forwarding for quantum simulation beyond 10 (2019). the coherence time,” npj Quantum Information 6, 1–10 [29] Guillaume Verdon, Jacob Marks, Sasha Nanda, Stefan (2020). Leichenauer, and Jack Hidary, “Quantum hamiltonian- [11] K. Heya, K. M. Nakanishi, K. Mitarai, and based models and the variational quantum thermalizer K. Fujii, “Subspace variational quantum simulator,” algorithm,” arXiv preprint arXiv:1910.02071 (2019). arXiv:1904.08566 [quant-ph]. [30] Peter D Johnson, Jonathan Romero, Jonathan Olson, [12] Xiao Yuan, Suguru Endo, Qi Zhao, Ying Li, and Si- Yudong Cao, and Alán Aspuru-Guzik, “QVECTOR: an mon C Benjamin, “Theory of variational quantum simu- algorithm for device-tailored quantum error correction,” lation,” Quantum 3, 191 (2019). arXiv:1711.02249 (2017). [13] E. Farhi, J. Goldstone, and S. Gutmann, “A quantum [31] Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, approximate optimization algorithm,” arXiv:1411.4028 Ryan Babbush, and Hartmut Neven, “Barren plateaus [quant-ph]. in quantum neural network training landscapes,” Nature [14] Z. Wang, S. Hadfield, Z. Jiang, and E. G. Rieffel, “Quan- communications 9, 4812 (2018). tum approximate optimization algorithm for MaxCut: A [32] Zoe Holmes, Kunal Sharma, M. Cerezo, and fermionic view,” Phys. Rev. A 97, 022304 (2018). Patrick J Coles, “Connecting ansatz expressiblity to gra- [15] G. E. Crooks, “Performance of the quantum approximate dient magnitudes and barren plateaus,” arXiv preprint optimization algorithm on the maximum cut problem,” arXiv:2101.02138 (2021). arXiv:1811.08419 [quant-ph]. [33] Kunal Sharma, M Cerezo, Lukasz Cincio, and [16] Stuart Hadfield, Zhihui Wang, Bryan O’Gorman, Patrick J Coles, “Trainability of dissipative perceptron- Eleanor G Rieffel, Davide Venturelli, and Rupak Biswas, based quantum neural networks,” arXiv preprint “From the quantum approximate optimization algorithm arXiv:2005.12458 (2020). to a quantum alternating operator ansatz,” Algorithms [34] M Cerezo, Akira Sone, Tyler Volkoff, Lukasz Cincio, 12, 34 (2019). and Patrick J Coles, “Cost-function-dependent barren [17] Carlos Bravo-Prieto, Ryan LaRose, M. Cerezo, Yigit plateaus in shallow quantum neural networks,” arXiv Subasi, Lukasz Cincio, and Patrick J. Coles, “Variational preprint arXiv:2001.00550 (2020). quantum linear solver: A hybrid algorithm for linear sys- [35] Carlos Ortiz Marrero, Mária Kieferová, and Nathan tems,” arXiv:1909.05820 (2019). Wiebe, “Entanglement induced barren plateaus,” arXiv [18] X. Xu, J. Sun, S. Endo, Y. Li, S. C. Benjamin, and preprint arXiv:2010.15968 (2020). X. Yuan, “Variational algorithms for linear algebra,” [36] Taylor L Patti, Khadijeh Najafi, Xun Gao, and Su- arXiv:1909.03898 [quant-ph]. sanne F Yelin, “Entanglement devised barren plateau [19] Bálint Koczor, Suguru Endo, Tyson Jones, Yuichiro Mat- mitigation,” arXiv preprint arXiv:2012.12658 (2020). suzaki, and Simon C Benjamin, “Variational-state quan- [37] Tyler Volkoff and Patrick J Coles, “Large gradients via tum metrology,” New Journal of Physics (2020). correlation in random parameterized quantum circuits,” [20] Johannes Jakob Meyer, Johannes Borregaard, and Quantum Science and Technology (2021). Jens Eisert, “A variational toolbox for quantum multi- [38] M. Cerezo and Patrick J Coles, “Impact of barren parameter estimation,” arXiv preprint arXiv:2006.06303 plateaus on the hessian and higher order derivatives,” (2020). arXiv preprint arXiv:2008.07454 (2020). [21] Eric Anschuetz, Jonathan Olson, Alán Aspuru-Guzik, [39] Andrew Arrasmith, M. Cerezo, Piotr Czarnik, Lukasz and Yudong Cao, “Variational quantum factoring,” Cincio, and Patrick J Coles, “Effect of barren in Quantum Technology and Optimization Problems plateaus on gradient-free optimization,” arXiv preprint (Springer International Publishing, Cham, 2019) pp. 74– arXiv:2011.12245 (2020). 85. [40] Alexey Uvarov and Jacob Biamonte, “On barren plateaus [22] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. Sorn- and cost function locality in variational quantum algo- borger, and P. J. Coles, “Quantum-assisted quantum rithms,” arXiv preprint arXiv:2011.10530 (2020). compiling,” Quantum 3, 140 (2019). [41] Guillaume Verdon, Michael Broughton, Jarrod R Mc- [23] Kunal Sharma, Sumeet Khatri, Marco Cerezo, and Clean, Kevin J Sung, Ryan Babbush, Zhang Jiang, Hart- Patrick Coles, “Noise resilience of variational quantum mut Neven, and Masoud Mohseni, “Learning to learn
12 with quantum neural networks via classical neural net- formation Processing 13, 2567–2586 (2014). works,” arXiv preprint arXiv:1907.05415 (2019). [58] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione, [42] Edward Grant, Leonard Wossnig, Mateusz Ostaszewski, “An introduction to quantum machine learning,” Con- and Marcello Benedetti, “An initialization strategy for temporary Physics 56, 172–185 (2015). addressing barren plateaus in parametrized quantum cir- [59] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick cuits,” Quantum 3, 214 (2019). Rebentrost, Nathan Wiebe, and Seth Lloyd, “Quantum [43] Andrea Skolik, Jarrod R McClean, Masoud Mohseni, machine learning,” Nature 549, 195–202 (2017). Patrick van der Smagt, and Martin Leib, “Layerwise [60] Kerstin Beer, Dmytro Bondarenko, Terry Farrelly, To- learning for quantum neural networks,” arXiv preprint bias J Osborne, Robert Salzmann, Daniel Scheiermann, arXiv:2006.14904 (2020). and Ramona Wolf, “Training deep quantum neural net- [44] Cheng Xue, Zhao-Yun Chen, Yu-Chun Wu, and Guo- works,” Nature Communications 11, 1–6 (2020). Ping Guo, “Effects of quantum noise on quantum [61] Amira Abbas, David Sutter, Christa Zoufal, Aurélien approximate optimization algorithm,” arXiv preprint Lucchi, Alessio Figalli, and Stefan Woerner, “The arXiv:1909.02196 (2019). power of quantum neural networks,” arXiv preprint [45] Jeffrey Marshall, Filip Wudarski, Stuart Hadfield, and arXiv:2011.00027 (2020). Tad Hogg, “Characterizing local noise in QAOA circuits,” [62] Bryan O’Gorman, William J Huggins, Eleanor G Ri- IOP SciNotes 1, 025208 (2020). effel, and K Birgitta Whaley, “Generalized swap net- [46] Laura Gentini, Alessandro Cuccoli, Stefano Piran- works for near-term quantum computing,” arXiv preprint dola, Paola Verrucchi, and Leonardo Banchi, “Noise- arXiv:1905.05118 (2019). resilient variational hybrid quantum-classical optimiza- [63] Sergey Bravyi, Alexander Kliesch, Robert Koenig, and tion,” Physical Review A 102, 052414 (2020). Eugene Tang, “Obstacles to state preparation and vari- [47] Jonas M Kübler, Andrew Arrasmith, Lukasz Cin- ational optimization from symmetry protection,” arXiv cio, and Patrick J Coles, “An adaptive optimizer for preprint arXiv:1910.08980 (2019). measurement-frugal variational algorithms,” Quantum 4, [64] Zhihui Wang, Stuart Hadfield, Zhang Jiang, and 263 (2020). Eleanor G. Rieffel, “Quantum approximate optimization [48] Andrew Arrasmith, Lukasz Cincio, Rolando D Somma, algorithm for maxcut: A fermionic view,” Phys. Rev. A and Patrick J Coles, “Operator sampling for shot-frugal 97, 022304 (2018). optimization in variational algorithms,” arXiv preprint [65] Matthew B Hastings, “Classical and quantum bounded arXiv:2004.06252 (2020). depth approximation algorithms,” arXiv preprint [49] Yudong Cao, Jonathan Romero, Jonathan P Olson, arXiv:1905.07047 (2019). Matthias Degroote, Peter D Johnson, Mária Kieferová, [66] Zhang Jiang, Eleanor G Rieffel, and Zhihui Wang, “Near- Ian D Kivlichan, Tim Menke, Borja Peropadre, Nico- optimal quantum circuit for grover’s unstructured search las PD Sawaya, et al., “Quantum chemistry in the age using a transverse field,” Physical Review A 95, 062317 of quantum computing,” Chemical reviews 119, 10856– (2017). 10915 (2019). [67] V. Akshay, H. Philathong, M. E. S. Morales, and J. D. [50] Rodney J Bartlett and Monika Musiał, “Coupled-cluster Biamonte, “Reachability deficits in quantum approxi- theory in quantum chemistry,” Reviews of Modern mate optimization,” Phys. Rev. Lett. 124, 090504 (2020). Physics 79, 291 (2007). [68] Sam McArdle, Suguru Endo, Alan Aspuru-Guzik, Si- [51] Joonho Lee, William J Huggins, Martin Head-Gordon, mon C Benjamin, and Xiao Yuan, “Quantum computa- and K Birgitta Whaley, “Generalized unitary coupled tional chemistry,” Reviews of Modern Physics 92, 015003 cluster wave functions for quantum computation,” Jour- (2020). nal of chemical theory and computation 15, 311–324 [69] Jonathan Romero, Ryan Babbush, Jarrod R McClean, (2018). Cornelius Hempel, Peter J Love, and Alán Aspuru- [52] A. Kandala, A. Mezzacapo, K. Temme, M. Takita, Guzik, “Strategies for quantum computing molecular en- M. Brink, J. M. Chow, and J. M. Gambetta, ergies using the unitary coupled cluster ansatz,” Quan- “Hardware-efficient variational quantum eigensolver for tum Science and Technology 4, 014008 (2018). small molecules and quantum magnets,” Nature 549, 242 [70] Gerardo Ortiz, James E Gubernatis, Emanuel Knill, and (2017). Raymond Laflamme, “Quantum algorithms for fermionic [53] Frank Arute et al., “Hartree-fock on a superconduct- simulations,” Physical Review A 64, 022319 (2001). ing qubit quantum computer,” Science 369, 1084–1089 [71] Sergey B Bravyi and Alexei Yu Kitaev, “Fermionic (2020). quantum computation,” Annals of Physics 298, 210–226 [54] Frank Arute et al., “Quantum approximate optimization (2002). of non-planar graph problems on a planar superconduct- [72] Marcel Nooijen, “Can the eigenstates of a many-body ing processor,” arXiv preprint arXiv:2004.04197 (2020). hamiltonian be represented exactly using a general two- [55] Dave Wecker, Matthew B Hastings, and Matthias body cluster expansion?” Physical review letters 84, 2108 Troyer, “Progress towards practical quantum variational (2000). algorithms,” Physical Review A 92, 042303 (2015). [73] Wen Wei Ho and Timothy H Hsieh, “Efficient variational [56] Roeland Wiersema, Cunlu Zhou, Yvette de Sereville, simulation of non-trivial quantum states,” SciPost Phys Juan Felipe Carrasquilla, Yong Baek Kim, and Henry 6, 029 (2019). Yuen, “Exploring entanglement and optimization within [74] Chris Cade, Lana Mineh, Ashley Montanaro, and Stasja the hamiltonian variational ansatz,” PRX Quantum 1, Stanisic, “Strategies for solving the fermi-hubbard model 020319 (2020). on near-term quantum computers,” Physical Review B [57] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione, 102, 235122 (2020). “The quest for a quantum neural network,” Quantum In-
You can also read