Photonic Matrix Computing: From Fundamentals to Applications
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
nanomaterials Review Photonic Matrix Computing: From Fundamentals to Applications Junwei Cheng, Hailong Zhou and Jianji Dong * Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan 430074, China; jwcheng@hust.edu.cn (J.C.); hailongzhou@hust.edu.cn (H.Z.) * Correspondence: jjdong@hust.edu.cn Abstract: In emerging artificial intelligence applications, massive matrix operations require high computing speed and energy efficiency. Optical computing can realize high-speed parallel infor- mation processing with ultra-low energy consumption on photonic integrated platforms or in free space, which can well meet these domain-specific demands. In this review, we firstly introduce the principles of photonic matrix computing implemented by three mainstream schemes, and then review the research progress of optical neural networks (ONNs) based on photonic matrix computing. In addition, we discuss the advantages of optical computing architectures over electronic processors as well as current challenges of optical computing and highlight some promising prospects for the future development. Keywords: photonic matrix computing; photonic accelerators; artificial intelligence; optical neural networks; photonic integrated platform; diffractive planes Citation: Cheng, J.; Zhou, H.; Dong, 1. Introduction J. Photonic Matrix Computing: From With the proliferation of artificial intelligence and the next-generation communica- Fundamentals to Applications. tion technology, the growing demand for high-performance computing has driven the Nanomaterials 2021, 11, 1683. https:// development of custom hardware to accelerate this specific category of computing. How- doi.org/10.3390/nano11071683 ever, processors based on electronic hardware have hit the bottleneck of unsustainable performance growth as the exponential scaling of electronic transistors reaches the physical Academic Editors: Carlos Alberto limit revealed by Moore’s law [1]. Photonic processors compute with photons instead of Alonso-Ramos and Linjie Zhou electrons, and therefore optical computing can dramatically accelerate computing speed by overcoming the inherent limitations of electronics. Unlike electrical circuit technologies, Received: 29 May 2021 Accepted: 24 June 2021 photonic circuits have some extraordinary properties such as ultra-wide bandwidth, high Published: 26 June 2021 frequency, and low energy consumption. Furthermore, light has several dimensions such as wavelength, polarization, and spatial mode to enable parallel data processing, result- Publisher’s Note: MDPI stays neutral ing in remarkable acceleration against a conventional von Neumann computer, which with regard to jurisdictional claims in makes the optical computing approach a viable and competitive candidate for artificial published maps and institutional affil- intelligence accelerators. iations. The matrix–vector multiplication (MVM) operation is one of the fundamental math- ematical operations widely used in large-scale neuromorphic optoelectronic computing. The weighted interconnections between adjacent photonic neurons in ONNs can be math- ematically represented by a matrix whose entries are the weight values, and each entry Copyright: © 2021 by the authors. vector is multiplied by the input signal of a particular synapse [2], which matches the math- Licensee MDPI, Basel, Switzerland. ematical nature of MVM. To some extent, neuromorphic engineering is an attempt to move This article is an open access article computational processes of artificial intelligence algorithms to specific hardware, enabling distributed under the terms and functions that are difficult to realize with conventional computing hardware. For example, conditions of the Creative Commons linear optical elements can calculate convolutions, Fourier transforms, random projections, Attribution (CC BY) license (https:// and many other operations through light–matter interaction or light propagation [3–6]. creativecommons.org/licenses/by/ There has been rapid progress in research on photonic matrix computing, and different 4.0/). photonic devices have been successfully used to implement matrix operations. Optical Nanomaterials 2021, 11, 1683. https://doi.org/10.3390/nano11071683 https://www.mdpi.com/journal/nanomaterials
Nanomaterials 2021, 11, 1683 2 of 15 Nanomaterials 2021, 11, x FOR PEER REVIEW 2 of 15 modulator There hasarrays, such progress been rapid as electro-optic modulated in research on photonicdirect-driven LED arraysand matrix computing, anddiffer- acousto- entoptic Bragg photonic devices, devices havecanbeen perform matrix used successfully calculations to implementat a much faster matrix rate than operations. existing Optical electronic devices [7–9]. The optical implementation of convolutional modulator arrays, such as electro-optic modulated direct-driven LED arrays and acousto- neural networks with optic fast operation Bragg devices, can speed and high perform energy matrix efficiencyat calculations is aappealing owing much faster ratetothan its outstanding existing capability of feature extraction [10]. In particular, convolutional electronic devices [7–9]. The optical implementation of convolutional neural networks processing based on MVM, which is a computationally intensive operation in electronics, occupies with fast operation speed and high energy efficiency is appealing owing to its outstanding over 80% of the total processing time in convolutional neural networks [11], therefore computational acceleration capability of feature extraction [10]. In particular, convolutional processing based on for convolutional neural networks can be achieved by matching hardware and MVM MVM, which is a computationally intensive operation in electronics, occupies over 80% operations. The MVM operation can be mathematically described as of the total processing time in convolutional neural networks [11], therefore computa- tional acceleration for convolutional wneural networks . . . can be achieved x1 by matching hard- 11 w12 w1N ware and MVM operations. TheMVM operation can be mathematically described as w21 w22 … . . . w 2N x2 Y = WX = ... (1) . . . … . . . . . . . . . = = …w …w … .…. . w…NN xN (1) N1 N2 … where X isXthe input vector, W is Wtheismatrix, and and = Y, =, …[y, , y , .is the output T vector. where is the input vector, the matrix, 1 2 . . , y N ] is the output Optical MVMs can be implemented by three mainstream vector. Optical MVMs can be implemented by three mainstream optical methods, optical methods, as shown in as Figure 1, including the multiple plane light conversion (MPLC) shown in Figure 1, including the multiple plane light conversion (MPLC) method, the method, the wavelength division multiplexing wavelength (WDM) method, division multiplexing (WDM) andmethod, the Mach–Zehnder interferometer and the Mach–Zehnder (MZI) interferometer method. (MZI) method. Figure Figure 1. Three 1. Three categories categories of optical of optical MVM, MVM, including including (a)(a) MPLC-MVM MPLC-MVM method, method, (b)(b) WDM-MVM WDM-MVM method, and (c) MZI-MVM method, and (c) MZI-MVM method.method. In this In this review, review, we firstly we firstly survey survey recent recent researches researches and progress and progress in optical in optical MVMsMVMs and and photonic photonic artificial artificial intelligence intelligence hardware. hardware. Afterwe After that, that, we describe describe the basic the basic principles principles of of optical optical MPLC-MVM, MPLC-MVM, WDM-MVM, WDM-MVM, and MZI-MVM and MZI-MVM methods. methods. Finally, Finally, we discuss we discuss the ad-the advantages vantages of optical of optical computing computing architectures architectures overover electronic electronic processors processors as well as well as as current current challenges of optical computing, and provide perspectives for further improvement of of challenges of optical computing, and provide perspectives for further improvement optical optical computing computing architectures. architectures. 2. MPLC 2. MPLC Matrix Matrix Core Core Unlike Unlike integrated integrated schemes schemes suchsuch as as microring microring andand MZI MZI matrix matrix cores, cores, thethe MPLC MPLC matrix matrix core builds computing capabilities directly above an optical field propagating core builds computing capabilities directly above an optical field propagating in free in free space. Among these three methods, MPLC was the first to be implemented in optical comput- space. Among these three methods, MPLC was the first to be implemented in optical com- ing [12], and the initially programmable MVM was finished with spatial optical elements [9]. puting [12], and the initially programmable MVM was finished with spatial optical ele- The MPLC matrix core is the only one that can currently support super-large-scale matrix ments [9]. The MPLC matrix core is the only one that can currently support super-large- operation, which makes it valuable in pulse shaping [13], mode processing [14–16], and scale matrix operation, which makes it valuable in pulse shaping [13], mode processing machine learning [5,17–19]. [14–16], and machine learning [5,17–19]. The principle of the MPLC method is shown in Figure 2, and the architecture of the MPLC matrix core consists of a series of planes encoded with amplitude and phase infor-
Nanomaterials 2021, 11, x FOR PEER REVIEW 3 of 15 Nanomaterials 2021, 11, 1683 3 of 15 The principle of the MPLC method is shown in Figure 2, and the architecture of the MPLC matrix core consists of a series of planes encoded with amplitude and phase infor- mation,and mation, andreflective reflectivemirrors mirrorsare are only only used used to to change change thethe direction direction of light of light propagation propagation to to reduce the spatial volume of the system. By configuring the parameters reduce the spatial volume of the system. By configuring the parameters of these planes, of these planes, thelight the lightbeam beamirradiated irradiatedon onthe theplane planesurface surfacecan canbe bemodulated modulatedtotochange changeamplitude amplitudeandand phase.The phase. Theincident incidentlight lightdiffracts diffractsin infree freespace spaceand andthen thenincident incidenttotothe thefirst firstplane, plane,after after that,ititdiffracts that, diffractsininfree freespace spaceand andthen thenincident incidentinto intothe thesecond secondplane, plane,and andsosoon. on.After After passing through all planes, the final optical field will be output and then be passing through all planes, the final optical field will be output and then be detected by thedetected by the photodetector photodetector array. array. Figure2.2.The Figure Theprinciple principleofofthe theMPLC MPLCmatrix matrixcore. core.The TheMPLC MPLCmatrix matrixcore coreisiscomposed composedofofaaseries seriesofof diffractive diffractiveplanes planesencoded encodedwith amplitude with amplitudeand phase and information. phase By By information. configuring the the configuring parameters parametersof of these these diffractive diffractive planes, planes, the beam the light light beam irradiated irradiated on theon thesurface plane plane surface can be modulated can be modulated to changeto change amplitude amplitude and phase. and phase. Thetransmission The transmissionmatrices matricesMM i (i = 1, 2, …, N, N i (i = 1, 2, . . . , N ++ 1) are fixed fixed owing owingtotothe thepropagation propagation distance in free space being fixed. The transmission matrix of each plane is aadiagonal distance in free space being fixed. The transmission matrix of each plane is diagonal matrix, denoted as A (i = 1, matrix, denoted as Ai (i = 1, 2, . . . , i 2, …, N). Therefore, the transmission matrix T of Therefore, the transmission matrix T of this MPLC this MPLC systemcan system canbebeexpressed expressedas as T = MN+1AN...M2A1M1. (2) The arrangement of pixelsTin= each MN+1plane AN . . relates . M2 A1to M1the . matrix dimensions. For exam- (2) ple, if we assume that the pixels of each plane are p × q, then the dimensions of the matrices M i, A The arrangement of pixels in each plane relates to the matrix dimensions. For example, i, and T are all pq × pq. From Equation (2), the multiplane transformation can be math- ifematically we assume that the pixels expressed as the of eachproduct cross plane are of pa × q, then series the dimensions of fixed matrices andof the matrices configurable M i , Ai , and diagonal matrices. pq × pq. From T are allTheoretically, Equation arbitrary (2), the matrix multiplane operation can betransformation implemented can be as long mathematically expressed as the cross product of a series of fixed matrices and configurable as there are enough planes, and the number of planes required is approximately equal to diagonal matrices. Theoretically, arbitrary matrix operation can be implemented as long as 2-fold the number of input modes, but in fact, only a few planes are needed to approxi- there are enough planes, and the number of planes required is approximately equal to 2- mately achieve the function of target transmission matrix [20]. fold the number of input modes, but in fact, only a few planes are needed to approximately In practice, the transmission matrix T in Equation (2) is difficult to analyze or measure achieve the function of target transmission matrix [20]. experimentally. In general, parameters of each phase plane can be obtained by iterating In practice, the transmission matrix T in Equation (2) is difficult to analyze or measure according to the input and target output. One commonly used method is the wavefront experimentally. In general, parameters of each phase plane can be obtained by iterating matching method [21], as shown in Figure 3, where the input optical field is ( , ), and according to the input and target output. One commonly used method is the wavefront after forward propagation, the distribution of the optical field in front of the mth phase matching method [21], as shown in Figure 3, where the input optical field is φ0 ( x, y), and plane is ( , ). On the other hand, ( , ), the distribution of the optical field after after forward propagation, the distribution of the optical field in front of the mth phase the mth phase plane, can be obtained by the backward propagation of the output optical plane is φm ( x, y). On the other hand, Ψm ( x, y), the distribution of the optical field after the fieldphase mth plane, ( , ). can Hence the theoretical be obtained by the phase backward distribution propagation of the of mth phase plane the output opticalshould field be the phase difference between the two fields, i.e., ( , ) and ( , ): Ψ N +1 ( x, y). Hence the theoretical phase distribution of∗ the mth phase plane should be the ( , ) = ( , ) ( , ) (3) phase difference between the two fields, i.e., φm ( x, y) and Ψm ( x, y) : where means the function to get the phase angle. If there are multiple mode pairs of ∗ input and output, then the phase Φm ( x,value y) = Λ of[Ψthe m (plane x, y)φmshould ( x, y)]be its weighted phase value: (3) where Λ means the function ( , to ) get=the phase ( , ) If∗ ,there , angle. ( , )are multiple mode pairs(4) of input and output, then the phase value of the plane should be its weighted phase value: " # Φm ( x, y) = Λ ∑ Ψm,j (x, y)φm,j ∗ ( x, y) (4) j
Nanomaterials 2021,11, Nanomaterials2021, 2021, 11,x1683 11, xFOR FORPEER PEERREVIEW REVIEW 44ofof1515 wherej jjrepresents where where represents representsthethe j th set the setof set ofofinput–output input–outputmode input–output modepairs. mode pairs.By pairs. Bythe By themeans the meansof means ofiterating of iteratingback iterating back back and and forth forth to to update update the the parameters parameters ofof phase phase planes planes repeatedly, repeatedly, the the algorithm algorithm and forth to update the parameters of phase planes repeatedly, the algorithm will eventu- will will even- even- tually tually converge. converge. ally converge. Thekey The The keyadvantage key advantageof advantage ofofthe thewavefront the wavefrontmatching wavefront matchingmethod matching methodisisisthat method thatthe that theiterative the iterative iterative speed speed is very speedisisvery fast, veryfast, and fast,and andthethe whole thewhole plane wholeplane is updated planeisisupdated each updatedeach iterative eachiterative process, iterativeprocess, process,thusthus the thusthe conver- theconver- conver- gence genceisis gence exponential.Generally, isexponential. exponential. Generally, Generally,itit only itonly needs onlyneeds fewerthan needsfewer fewer than20 than 20iterations 20 iterationsback iterations backand back andforth and forthtoto forth to achieve achieve convergence. convergence. achieve convergence. Figure Figure3.3. Figure Schematic 3.Schematic diagram diagramofof Schematicdiagram the ofthe wavefront thewavefront matchingmethod. wavefrontmatching matching method. method. Here,ininorder Here, ordertotovisually visuallydemonstrate demonstratethe theworking workingmechanism mechanismofofthe theMPLC MPLCmatrix matrix coreand core andverify verifyits itscapability capabilitytotoimplement implementlarge-scale large-scaleoptical opticalcomputing, computing,we wetheoretically theoretically demonstrate demonstrate aaaspecific demonstrate specificexample specific example example ofnumeric ofof numeric numeric holographic holographic holographic coding coding coding based basedbased on on theon theMPLC the MPLC MPLC scheme scheme scheme (Figure (Figure(Figure 4). We designed 4). We designed 4). We designed a two-digit a two-digit a two-digit seven-segment seven-segment seven-segment display displaydisplay with 14with with 14 segments 14 segments segments in total,inin re- total, total, respectively spectively powered respectively powered by14 by 14 different powered by 14different different Laguerre-Gaussian Laguerre-Gaussian (LG)The (LG) modes. Laguerre-Gaussian (LG) modes. Thetwo-digit two-digit modes. The two-digit numbers numbers range from numbers range range tofrom 00 from 99 0000toto can be9999 canbe bedisplayed displayed can displayed andswitched and switched and switched by setting bybythe setting thecombination combination combination setting the of input ofof inputmodes, modes, input asmodes, shown asasinshown Figure shown inin4b. Figure4b. Figure 4b. Figure4.4. Figure Figure 4.The The numeric Thenumeric holographic numericholographic coding holographiccoding basedon codingbased based onthe on the MPLC theMPLC matrix MPLCmatrix core. matrixcore. (a) core.(a) This (a)This figure Thisfigure summarizes figuresummarizes summarizesthethe input-to-output theinput-to-output input-to-output mapping mapping relationships between the input LG modes and the output segments of numeric display. mapping relationships between the input LG modes and the output segments of numeric display. (b) of relationships between the input LG modes and the output segments of numeric display. (b) (b) Results Results ofthe theoptical- Results optical- of the powerednumeric powered numericdisplay displayfrom from0000toto99. 99. optical-powered numeric display from 00 to 99. How How Howto toefficiently to efficientlydeal efficiently dealwith deal with with the the the increasing increasing increasing scale scale scale ofneural of of neural neural network network network computing computing computing re- re- remains mains mains aa significant significant problem problem toto be be solved. solved. Benefiting Benefiting from from the the parallelism parallelism a significant problem to be solved. Benefiting from the parallelism and minimal latency of and and minimal minimal latency latency opticalofofoptical optical systems systems insystems ininfree free space, free space,the thespace, MPLC theMPLC schemeMPLC scheme scheme has hasthe has the ability theimplement to abilitytotoimplement ability implement large-scale large-scale large-scale matrix matrix operation, operation, which which makes makes itit the the potential potential candidate candidate for for matrix operation, which makes it the potential candidate for large-scale neural networks. large-scale large-scale neural neural networks. networks. In In 2018, 2018, Lin Lin etet al. al. introduced introduced the the diffraction diffraction ONNs ONNs framework framework In 2018, Lin et al. introduced the diffraction ONNs framework used for all-optical ma- used used for for all-op- all-op- tical tical chinemachine machine learning, learning, learning, i.e., D2 NN,i.e.,D i.e., D2NN, and 2NN, and experimentally demonstrated the image classifica- and experimentally experimentally demonstrated demonstrated the imagethe classification image classifica-with tion tion with with Modified Modified National National Institute Institute ofof Standards Standards and and Technology Technology Modified National Institute of Standards and Technology (MNIST) handwritten (MNIST) (MNIST) handwritten handwritten digits
Nanomaterials2021, Nanomaterials 2021,11, 11,1683 x FOR PEER REVIEW 5 5ofof1515 digits and and Fashion-MNIST Fashion-MNIST datasetsdatasets [5]. The[5]. The optical optical D2 NN D 2NN architecture shows great poten- architecture shows great potential for tial for machine machine learninglearning applications, applications, and it would and itbe would more be more complete complete if it included if it included an optical an optical nonlinear activation function [22]. The image information nonlinear activation function [22]. The image information is encoded in the amplitude is encoded in the ampli- tude or phaseor phase channels channels of theofinput the input optical optical fields, fields, and wave and wave propagation propagation in free in free space space can can be be mathematically mathematically described described by Kirchhoff’s by Kirchhoff’s diffraction diffraction integral, integral, which which amountsamounts to a con- to a convolu- volution tion operation operation of theofoptical the optical field field with with a trained a trained kernel. kernel. In this In work, this work, the training the training pro- process cess of ONNs is still completed by an electronic computer of ONNs is still completed by an electronic computer to update parameters, and each to update parameters, and each diffractivelayer diffractive layerisisfabricated fabricated by by 3D3D printing printing technology. technology. In 2021, Rahman et et al. al. applied applieda apruning pruningalgorithm algorithmtotofurther furtherimprove improvethe theimage image classification classification accuracy accuracy of of DD 2 2 NNs NNs[23]. On [23]. thethe On image image classification classification of the of theCIFAR-10 CIFAR-10 dataset released dataset released by Canadian by Canadian Institute For For Institute Ad- vanced Research, Advanced Research, whose whosetest testimages imagesare aremore more complicated complicated than MNIST MNIST and and Fashion- Fashion- MNIST,the MNIST, theDD 2 NN 2NN architecture combined with the pruning algorithm provides an infer- architecture combined with the pruning algorithm provides an inference ence improvement improvement of more of more than 16%thancompared 16% compared to thetoaverage the average performance. performance. In neuromor- In neuromorphic phic optoelectronic optoelectronic computing,computing, the functionality the functionality of input ofnodes input andnodes and neurons output output neurons is imple-is mented with programmable implemented with programmable diffractive opticaloptical diffractive devices, such assuch devices, the digital micromirror as the digital micro- device mirror(DMD) device and (DMD) spatial andlight spatialmodulator (SLM). The light modulator (SLM).DMD Theprovides a high optical DMD provides a high con-opti- trast for encoding cal contrast information. for encoding It allowsIt encoding information. allows encodingthe binarized data into the binarized theinto data amplitude the am- of coherent plitude optical fields, of coherent opticalwhere fields, awherephasea SLMphasesubsequently SLM subsequently modulates their phase modulates dis- their phase tribution to realize the diffractive modulation. Zhou et al. distribution to realize the diffractive modulation. Zhou et al. proposed an optoelectronic proposed an optoelectronic fused fusedcomputing computingarchitecture architecturewith withaareconfigurable reconfigurablediffractive diffractiveprocessing processingunit, unit,which whichcan can support supportdifferent different neural networks, networks,and andachieved achievedexcellent excellentexperimental experimental accuracies accuracies forfor im- image age and andvideo videorecognition recognitionover overbenchmark benchmarkdata data sets sets [19]. [19]. ItIt can can bebe seen seen that thatthe theMPLC MPLC scheme schemeisisofofgreat greatpotential potentialtotonarrownarrowthe thegapgapandandsurpass surpassininclassification classificationperformance performance between optical computing architecture and state-of-the-art between optical computing architecture and state-of-the-art electronic computers. electronic computers. Similar Similartotothe theMPLC MPLCmethodmethoddemonstrated demonstratedininfree freespace, space,the theMPLC MPLCmatrixmatrixcore corecan can also also be realized on a chip, whose structure is shown in Figure 5. The integratedMPLC be realized on a chip, whose structure is shown in Figure 5. The integrated MPLC matrix matrixcorecoreisiscomposed composedofofmultiple multiplelayers, layers,including includingalternately alternatelyarranged arrangedtunable tunablelayers layers and unitary diffractive layers, to implement multichannel and unitary diffractive layers, to implement multichannel transformation described by transformation described bya aunitary unitarymatrix. matrix.The The tunable layer can be achieved by independent tunable layer can be achieved by independent phase shifters or time phase shifters or time delay units, corresponding a unitary transfer matrix with delay units, corresponding a unitary transfer matrix with diagonal form. The unitary dif- diagonal form. The unitary diffractive layers fractive layers describe describe thethe interaction interaction between between channels, channels, which which cancanbebe implemented implemented by by multimode interference (MMI) couplers or a region of multimode interference (MMI) couplers or a region of coupled waveguides; therefore, coupled waveguides; therefore, the the transfer matrices of unitary diffractive layers are static because the structure of MMI transfer matrices of unitary diffractive layers are static because the structure of MMI cou- couplers and coupled waveguides cannot be changed once fabricated. plers and coupled waveguides cannot be changed once fabricated. Figure 5. The structure of the integrated MPLC matrix core. Figure 5. The structure of the integrated MPLC matrix core.
Nanomaterials 2021, 11, 1683 6 of 15 In 2017, Tang et al. proposed a novel integrated architecture consisting of cascaded MMI couplers and phase shifter arrays [24] to realize an n × n unitary transfer matrix. The n × n unitary transfer matrix can be decomposed as M stages of unitary matrices with only diagonal elements (i.e., the transfer matrices of phase shifter arrays) and another M − 1 stage of unitary matrices (i.e., the transfer matrices of MMI) when the path-dependent coupling loss is not considered. After that, Saygin et al. made an in-depth analysis of this integrated MPLC architecture, in which the flexibility and robustness of this architecture have been thoroughly discussed. The transfer matrices of the static MMI blocks can be randomly chosen from a continuous class of unitary matrices without sacrificing the quality of approximation for the target unitary transformation, making this scheme insensitive to errors [25]. The integrated MPLC matrix core provides an alternative viable solution to decompose large unitary matrices into small ones with high flexibility and robustness, and thus optical diffractive neural networks are possible to build on a chip based on this method. 3. Microring Matrix Core The MPLC scheme of photonic matrix computing was thoroughly discussed in Section 2, from which we find that the predominant advantage of MPLC schemes is the capability to implement a large-scale matrix operation at present. However, one major drawback cannot be ignored—the MPLC scheme is usually limited by bulky optical instru- ments, and hence they are difficult to highly integrate on a chip. The microring has a very compact structure and its radius can be as small as a few microns [26], which means the footprint of photonic devices can be greatly reduced, and thus the integration density can be competitive. The microring has been widely used in on- chip WDM systems [27–29], filtering systems [30–32], etc. In addition, a microring array can also be used in the operations of incoherent matrix computation since each microring can independently configure the transmission coefficient of a wavelength channel. Therefore, the microring matrix core is well suited to implement WDM-MVM operation. The implementations of matrix computation enabled by a microring array are shown in Figure 6. The N × N-sized microring array in the right region of Figure 6 corresponds to an N × N-sized matrix M, alternatively called a microring matrix core, and the input signal X can be represented by a vector with a length of N. The input signal X can be generated off-chip or on-chip. If it is generated off-chip, the microring array in the left region of Figure 6, i.e., the microring front module, only serves as a wavelength division multiplexer for combining multiple input wavelength channels. If it is generated on-chip, the microring front module is used to modulate the input vector X to different wavelengths. After that, the input signal is divided into multiple beams with equal power after passing through the beam splitter, and each beam is sent to a different row of the microring matrix core in the right region of Figure 6. Each row of the microring matrix core can independently configure the transmission coefficient of each wavelength channel, and the total power is finally detected from each output port by photodetectors. The add-drop microring structure [33] is widely applied in on-chip optical computing owing to the capability of difference processing. Since the power value is non-negative, early work only utilized the through port, then the transmission matrix and the output vector are non-negative, thus the matrix operation is limited in the non-negative number domain. However, fundamental mathematical operations such as matrix–vector multipli- cation and matrix–matrix multiplication are usually performed in the real number domain in practice. In order to extend the matrix operation to the full real number domain, the final results need to be obtained via the differential processing between the power values of the drop port and the through port; in this way, the transmission matrix and final output vector are both able to contain negative domain. The drop transmission coefficient of the microring situated at the ith row and the jth column of the microring matrix core is represented by mij , and the through transmission coefficient is therefore represented by 1 − mij without considering the loss of the microring.
n × n unitary transfer matrix can be decomposed as M stages of unitary matrices with only diagonal elements (i.e., the transfer matrices of phase shifter arrays) and another M − 1 stage of unitary matrices (i.e., the transfer matrices of MMI) when the path-dependent coupling loss is not considered. After that, Saygin et al. made an in-depth analysis of this integrated MPLC architecture, in which the flexibility and robustness of this architecture Nanomaterials 2021, 11, 1683 7 of 15 have been thoroughly discussed. The transfer matrices of the static MMI blocks can be randomly chosen from a continuous class of unitary matrices without sacrificing the qual- ity of approximation for the target unitary transformation, making this scheme insensitive to errors Each [25]. The microring integrated situated at theMPLC matrix core same column merely provides an alternative modulates the opticalviable solution signal with ato decompose specific large unitary wavelength, hence the matrices outputinto of eachsmall row ones with high contains flexibility the power of N and robustness, different wave- and thus lengths, and optical diffractive neural N is numerically equal tonetworks the number are of possible microrings to build on a chip positioned based in the same onrow. this method. Based on the microring matrix core model consisting of add-drop microring structure, we assume that the input vector is X = [ x1 , x2 , . . . , x N ] T , thus the output power of drop ports 3. Microring and Matrix through ports Corerow can be calculated by Equations (5) and (6), respectively. of each The MPLC scheme of photonic matrix computing was thoroughly discussed in Sec- N tion 2, from which we find that the predominant yi = ∑ mij x j advantage of MPLC schemes is the(5) ca- pability to implement a large-scale matrix j=1 operation at present. However, one major drawback cannot be ignored—the MPLC scheme is usually limited by bulky optical in- struments, and hence they are difficult to N highly integrate on a chip. The microring has a very compact yi = ∑ (1 − mij )and structure x j its radius can be as small as a (6) few j =1 microns [26], which means the footprint of photonic devices can be greatly reduced, and thus the the integration output vector of dropdensity portscan Y1 be= [competitive. y1 , y2 , . . . , y NThe ] T can microring has been widely be mathematically usedasin described on-chip WDM systems [27–29], filtering systems [30–32], etc. In addition, a microring ar- ray can also be used in the operations m11 m12 of incoherent . . . m1N matrix computation x1 since each mi- croring can independently m21 m22 configure . . . m2N coefficient the transmission x2 ofa wavelength channel. Y1 = (7) Therefore, the microringmatrix . . . core. .is. well .suited .. . .to. implement . . . WDM-MVM operation. The implementations ofmmatrix N1 mcomputation N2 . . . menabledNN by ax N microring array are shown in Figure 6. The N × N-sized microring array in the right region of Figure 6 corresponds Similarly, the output vector of through ports Y2 can be written as to an N × N-sized matrix M, alternatively called a microring matrix core, and the input signal X can be represented 1 − m11by a1 − vector m12 with. .a. length 1 − mof N. The inputsignal X can be x1 1N generated off-chipor on-chip.1 − m If1it−ismgenerated . . . off-chip, 1 − m the microring x2 array in the left 21 22 2N region of FigureY2 =6, i.e., the. . microring (8) . . front module, only . . . serves as a wavelength division . . ... . . . multiplexer for combining 1−m multiple input wavelength . . . 1 − channels. If itxis generated on-chip, N1 1 − m N2 m NN N the microring front module is used to modulate the input vector X to different wave- lengths. then Afteroutput the final that, the inputY signal matrix can beiscalculated divided into by the multiple beamsprocessing differential with equalaspower after passing through the beam splitter, and each beam is sent to a different row of the mi- 1 − 2m − 2m 1 − 2m croring matrix core in the right 11 1 of region Figure 12 . . . row 6. Each of 1N the microring x1 matrix core can independently 1 −the 21 1 − 2m22 coefficient 2mtransmission . . . 1of−each2m2N x2 channel, Y = Y2 − Y1 configure wavelength and = (9) the total power is finally detected . . . from each . . . output. .port . by .photodetectors. .. ... 1 − 2m N1 1 − 2m N2 . . . 1 − 2m NN xN Figure 6. The scheme of WDM-MVM. The microring front module can be positioned off-chip or on-chip, which is designed to serve as a wavelength division multiplexer (off-chip) or to modulate the input vector X to different wavelengths (on-chip). The microring matrix core is an N × N-sized microring array corresponding to an N × N-sized matrix. The microring matrix core was used for on-chip photonic matrix operation early in 2013, when Yang et al. proposed a matrix–vector multiplier based on a microring array and experimentally demonstrated matrix multiplication and weighted interconnection [34]. A “broadcast-and-weight” scheme has been proposed [35,36] and demonstrated [37] to implement large-scale reconfigurable optical interconnections and the integrated optical
Nanomaterials 2021, 11, 1683 8 of 15 neural Nanomaterials 2021, 11, x FOR PEER REVIEW network enabled by microring resonators on a silicon photonic chip utilizing 8 of 15 WDM technique. In this scheme, each microring plays a role as a tunable filter and only oper- ates at a specific wavelength. Optical signals are modulated in parallel by an array of microrings to implement[38]. The WDM-MVM large-scale reconfigurable scheme optical provides a viable interconnections solution and to achieve the integrated opti-orders of cal neural network magnitude enabled byinmicroring improvements resonators onspeed both computational a silicon andphotonic energy chip utilizing against consumption WDM technique. existing In this based architectures scheme,on each microring electronic plays aIn devices. role as a tunable addition, the filter and only microring matrix core operates can also be used in the linear computation part of the neural networksarray at a specific wavelength. Optical signals are modulated in parallel by an of to achieve the microrings [38]. The WDM-MVM scheme provides a viable solution to achieve optical acceleration of the ONNs. In 2019, Feldmann et al. presented an all-optical spiking orders of magnitude improvements in both computational speed and energy consumption against neurosynaptic network successfully realizing pattern recognition directly in the optical existing architectures based on electronic devices. In addition, the microring matrix core domain, in which a microring integrated with phase-change material (PCM) cell is able can also be used in the linear computation part of the neural networks to achieve the op- to control tical whether acceleration of thetoONNs. generate an output In 2019, Feldmann spike et al.pulse by switching presented an all-optical thespiking PCM states to change the optical resonance condition of the microring and its neurosynaptic network successfully realizing pattern recognition directly in the opticalpropagation loss [39]. A scalableinoptical domain, which aneural network microring architecture integrated is implemented with phase-change material using (PCM) thecell WDM is able to technique. Feldmann control et al. whether improved to generate their architecture an output spike pulse byand demonstrated switching the PCM an integrated states to changephotonic hardware the accelerator optical resonance utilizingofoptical condition frequency the microring and combs and the loss its propagation WDM technique [39]. A scalableto achieve optical parallelneural network architecture convolutional processingis[40]. implemented Recently,using Xu ettheal.WDM technique. proposed Feldmann an optical convolutional et al. improved neural network their architecturebased architecture and demonstrated on WDM toanaccelerate integratedcomputing photonic hardware speed by ac- utilizing celerator utilizing broad optical optical frequency bandwidth [41], whichcombscanand bethe usedWDM technique for various to achieve parallel convolutional operations to convolutional processing realize handwritten [40].recognition. digits Recently, Xu et al. proposed an optical convolutional neural network architecture based on WDM to accelerate computing speed by utilizing broad optical 4. MZIbandwidth [41], which can be used for various convolutional operations to realize Matrix Core handwritten digits recognition. As one of the basic photonic devices, MZI has been widely used in optical modula- 4. MZI Matrixoptical tors [42,43], Core communication [44,45], and optical computing [46]. MZI is a natural minimum matrix operation As one of the basic photonicunit, and MZI devices, can has be fabricated been widelyon a silicon used platform in optical to implement modulators the minimum matrix multiplication. The photonic matrix network [42,43], optical communication [44,45], and optical computing [46]. MZI is a natural min- built by MZIs can be imum matrix operation unit, and can be fabricated on a silicon platform to implement theReck et al. extended to arbitrary matrix multiplication without fundamental loss. In 1994, firstly proposed minimum a general algorithm, matrix multiplication. the triangular The photonic decomposition matrix network built by algorithm, MZIs can befor ex-the design of an experimental tended realization to arbitrary matrix multiplication × n unitary of any nwithout matrix [47]. fundamental In 1994, loss. In this case, Reckunitary et al. matrix firstly proposed transforms canabegeneral algorithm, achieved by thethe triangular decomposition architecture consisting of beamalgorithm, for the splitters, de- shifters, phase sign and of an experimental mirrors arranged realization according of toany n × n rules. specific unitary matrix [47]. In this case, unitary matrixThetransforms structurecanofbeMZI achieved by the in is shown architecture Figure 7a. consisting MZI is of beam splitters, composed phase of two multimode shifters, and mirrors arranged according to specific rules. interference couplers and two interference arms [48]. The phase shifts on the internal and The structure of MZI is shown in Figure 7a. MZI is composed of two multimode in- external phase shifters of MZI can be expressed as θ n , αn , and βn , respectively, moreover, terference couplers and two interference arms [48]. The phase shifts on the internal and these phase shifters can be easily configured and controlled by thermally tuning the heaters. external phase shifters of MZI can be expressed as θn, αn, and βn, respectively, moreover, these × 2 unitary The 2phase transformation shifters can matrixand be easily configured of controlled a single MZI can be expressed by thermally as a standard tuning the heat- SU(2) rotation matrix: ers. The 2 × 2 unitary transformation matrix of a single MZI can be expressed as a standard SU(2) rotation matrix: iα iθ iαn eiθn + 1 1 − n e n −1 e 1) ie 1 ==R(n) = ( ( 1) U MZIn= ( ) 2 ieiβ1) n eiθn + 1 iβ n iθn (10) (10) 2 ( (1 − e ) 1 − e Figure 7. The principle of the MZI matrix core. (a) The structure of one MZI used in an MZI matrix core. (b) The whole MZI mesh consists of MZI matrix cores, and MZIs can be equivalent to a reconfigurable black box. (c) A typical example of a 4 × 4 network structure enabled by an MZI matrix core.
Nanomaterials 2021, 11, Nanomaterials 2021, 11, 1683 x FOR PEER REVIEW 99 of of 15 15 Figure 7. The principle Arbitrary of the MZI n × n unitary matrix core. (a) transformation The structure matrix SU(N) can of one MZI used in an be theoretically MZI matrix decomposed core. (b) The whole MZI mesh consists of MZI matrix cores, and MZIs can be into the product of a series of SU(2) rotation submatrices, and therefore the wholeequivalent to a recon- MZI figurable black box. (c) A typical example of a 4 × 4 network structure enabled mesh can be equivalent to a reconfigurable black box as shown in Figure 7b to perform by an MZI matrix core. any unitary transformation we desired. A typical example of a 4 × 4 network structure enabled by an MZI matrix core is given in Figure 7c, which is composed of six MZIs, and Arbitrary n × n unitary transformation matrix SU(N) can be theoretically decomposed the matrix transformation relations of each port can be obtained by cutting the plane along into the product of a series of SU(2) rotation submatrices, and therefore the whole MZI the dashed lines. The detailed process can be mathematically described as mesh can be equivalent to a reconfigurable black box as shown in Figure 7b to perform any unitary transformation 1 we desired. A typical example of a 4 × 4 network structure enabled by an MZI U2 = R1,1 U1 = matrix 1core is given U1 Figure 7c, which is composed of six MZIs, and in the matrix transformation relations R (1) of each port can be obtained by cutting the plane along the dashed lines. The detailed process can be 1 mathematically described as 1 U3 = R2,1 R2,21U2 = 1 R (2) U2 (11) = , = 1 R (3) 1 (1) 1 1 R (4) 1 1 U = R R3,2 R3,3 U3 = 1 R (5) 1 U3 =4 , 3,1, = 1 (2) (11) R (6) 1 1 (3) 1 1 1 (4) Consequently, the 4 × 4 unitary transformation matrix SU(4) can be expressed as = , , , = 1 (5) 1 (6) 1 1 SU (4) = R R R R R R 3,1 3,2 3,3 2,1 2,2 1,1 (12) Consequently, the 4 × 4 unitary transformation matrix SU(4) can be expressed as A similar principle can also be applied to any SU(N) matrix; in this way, an arbi- (4) = , , , , , , (12) trary n × n unitary matrix can always be decomposed into the product of n (n − 1)/2 A similar principle rotation submatrices can also be applied to any SU(N) matrix; in this way, an arbitrary n × n unitary matrix can always be decomposed into the product of n (n − 1)/2 rotation submatrices SU ( N ) = R N −1,1 R N −1,2 . . . R N −1,N −1 . . . R3,1 R3,2 R3,3 R2,1 R2,2 R1,1 (13) ( ) = , , … , … , , , , , , (13) According to Equation (13), we can configure the MZI mesh to mimic mimic the the correspond- correspond- ing unitary matrix. Furthermore, when it comes comes to general nn × to aa general × n matrix, which is not not limited in in unitary unitarymatrix, matrix,we weknowknowthat a general that a generalcomplex-valued complex-valued matrix M can matrix be decom- M can be de- posed as Mas= UΣV † utilizing singular valuevalue decomposition (SVD)(SVD) [49], where U is anUm is×anm composed M = UΣV † utilizing singular decomposition [49], where unitary m × m unitary Σ is an m matrix, matrix, Σ× is nanrectangular diagonal diagonal m × n rectangular matrix, and V † is and matrix, the complex V† is theconjugate complex of the n × of conjugate n unitary the n × nmatrix unitaryV [50]. matrixThus, an arbitrary V [50]. Thus, an complex-valued matrix network arbitrary complex-valued matrix can be decomposed into a unitary MZI mesh, an array of tunable optical network can be decomposed into a unitary MZI mesh, an array of tunable optical attenu- attenuators, and another unitary MZI mesh (Figure 8), which can implement U, Σ, ators, and another unitary MZI mesh (Figure 8), which can implement U, Σ, and V , re-and V † , respectively † by tuning phase spectively shifters by tuning phase and attenuators shifters to changetothe and attenuators transmission change coefficient the transmission of each coefficient signal of eachchannel. signal channel. Figure8.8. MZI Figure MZI mesh mesh can canbe bedesigned designedto tomimic mimicarbitrary arbitrarynn× × n complex-valued matrix utilizing SVD. The above designs are all based on the triangular decomposition algorithm algorithm shown shown inin Figure 9a. Figure 9a.However, However,the the triangular triangular structure structure has ahas a large large footprint footprint andcompact and is not is not compact enough for highly enough forintegrated applications. highly integrated In 2016,In applications. Clements et al. optimized 2016, Clements the design et al. optimized on the the design basis on theofbasis the triangular decomposition of the triangular algorithmalgorithm decomposition and proposedand aproposed rectangular decomposition a rectangular de- scheme (Figure composition 9b) [51]. scheme The principles (Figure of these 9b) [51]. The two schemes principles of these are twosimilar schemes forare they are both similar for based onboth they are rotation basedsubmatrices on rotationdecomposition, but the rectangular submatrices decomposition, scheme but the is morescheme rectangular compact is and moreneater compactthanand triangular scheme. neater than triangular scheme.
Nanomaterials 2021, 11, x FOR PEER REVIEW 10 of 15 Nanomaterials 2021, 11, 1683 10 of 15 The MZI matrix core has already shown its great potential for accelerating the linear The MZI matrix core has already shown its great potential for accelerating the linear computation part of ONNs and offered the promise to overcome the bottlenecks of state- computation part of ONNs and offered the promise to overcome the bottlenecks of state- of-the-art electronics. In 2017, Shen et al. proposed and experimentally demonstrated a of-the-art electronics. In 2017, Shen et al. proposed and experimentally demonstrated a coherent optical computing architecture enabled by a cascaded programmable MZI mesh coherent optical computing architecture enabled by a cascaded programmable MZI mesh utilizing SVD [50]. This design is capable of remarkably accelerating computing speed utilizing SVD [50]. This design is capable of remarkably accelerating computing speed and and improving power efficiency using coherent light, which makes the MZI matrix core improving power efficiency using coherent light, which makes the MZI matrix core one one of the most significant building blocks of ONNs and optical computing acceleration. of the most significant building blocks of ONNs and optical computing acceleration. The The following year, Hughes et al. thoroughly discussed the training of ONNs by back- following year, Hughes et al. thoroughly discussed the training of ONNs by backpropa- propagation and gradient measurement [52], and this work provided a path toward effec- gation and gradient measurement [52], and this work provided a path toward effectively tively implementing implementing on-chip on-chip training training and optimizing and optimizing reconfigurable reconfigurable integrated integrated optical optical platforms. platforms. By applying on-chip training on the integrated optical platform, By applying on-chip training on the integrated optical platform, Zhou et al. proposed Zhou et al.and proposed and experimentally demonstrated an all-in-one silicon photonic experimentally demonstrated an all-in-one silicon photonic polarization processor [53,54], polarization processor [53,54], a universal a universal matrix computing matrix chipcomputing [55], and a chip [55], and a self-configuring self-configuring program- programmable signal proces- mable sor [56], etc. Beyond the applications in machine learning, these works may alsoworks signal processor [56], etc. Beyond the applications in machine learning, these broaden may thealso broaden access the access to intelligent to intelligent optical optical information information processing. processing. In parallel, some Inrecent parallel, some in progress recent progress in network structures has been reported, including hexagonal network structures has been reported, including hexagonal MZI mesh for various filter op- MZI mesh for tical various filter switch optical signal switch signal processing [57–61]processing [57–61] and microwave and a programmable a programmable microwave photonic chip based photonic chip based on on a quadrilateral MZIa quadrilateral mesh [62], etc. MZI mesh These [62], further works etc. These works enrich thefurther matrix enrich the computation matrix computation functionalities andfunctionalities versatility of theandMZI versatility of theNevertheless, matrix core. MZI matrix core. Nevertheless, key issues such as the keylarge issues such asofthe footprint thelarge MZI footprint structure of the MZI (usually overstructure 10,000 µm(usually over 10,000 μm 2 per interferometer 2 per unit) and interferometer unit) and extra energy consumption of thermo-optic extra energy consumption of thermo-optic modulation (approximately 10 mW per heatermodulation (approx- imately of MZI)10 mWlimit per the heater of MZI) application of alimit the application large-scale programmable of a large-scale programmable optical neural network to a optical neural certain network to a certain extent. extent. Figure 9. Two decomposition schemes of the MZI matrix core. (a) Triangular [47] and (b) rectangular Figure 9. Two decomposition schemes of the MZI matrix core. (a) Triangular [47] and (b) rectangular decomposition decomposition schemes schemes [51].[51]. (c)typical (c) A A typical MZIMZI matrix matrix corecore based based on triangular on triangular decomposition. decomposition. Nano-opto-electro-mechanical Nano-opto-electro-mechanical systems systems (NOEMS) (NOEMS) are are structuresdesigned structures designedtotomaxim- maximize both opto-mechanical and electro-mechanical interaction at the nanoscale [63]. ize both opto-mechanical and electro-mechanical interaction at the nanoscale [63]. Com- Compared to devices pared based to devices on thermo-optical based phase on thermo-optical shifters, phase NOEMS-based shifters, NOEMS-based devices can work devices without can work static power dissipation because mechanical displacements require extra energy only for without static power dissipation because mechanical displacements require extra energy switching to a different state. Experimental demonstrations of NOEMS-based devices on only for switching to a different state. Experimental demonstrations of NOEMS-based de- silicon have been reported using in-plane motion of directional couplers [64], microring [65], vices on silicon have been reported using in-plane motion of directional couplers [64], and MZI [66], demonstrating the great potential of NOEMS for static and microsecond-scale microring [65], and MZI [66], demonstrating the great potential of NOEMS for static and reconfiguration of integrated photonic circuits and quantum photonic networks. However, microsecond-scale reconfiguration of integrated photonic circuits and quantum photonic NOEMS require higher-precision lithography than conventional devices to ensure the networks. However, NOEMS require higher-precision lithography than conventional de- resolution and alignment accuracy of nanophotonic structures, and mechanical systems vices to ensure the resolution and alignment accuracy of nanophotonic structures, and generally need to be packaged to avoid the impact of the environment. mechanical systems generally need to be packaged to avoid the impact of the environ- ment. 5. Discussion and Outlook The MVM operations enabled by photonics have a remarkably higher speed and 5. Discussion and Outlook lower energy consumption compared to those of their electronic counter parts, which provides a feasible acceleration solution for the applications of artificial intelligence. On the one hand, on-chip integrated photonic circuits are an ideal platform for artificial
Nanomaterials 2021, 11, 1683 11 of 15 neural networks owing to their high compactness and great potential for competitive integration density. In addition, fast electro-optical modulators and efficient nonlinear optical components built on the LiNbO3 -on-insulator (LNOI) platform are compatible with silicon photonic circuits [67–69] and provide a promising alternative approach to realize all-optical neural networks on one chip. On the other hand, the MPLC method based on holography can achieve an ultra-large size of MVM operations due to the capability of high parallel processing in free space, and a high model complexity with millions of neurons has already been achieved with the architecture enabled by the MPLC matrix core. The optical AI accelerator provides a hardware platform, which is completely different from the conventional electronic architecture, used to support several universal neural network algorithms to match specific artificial intelligence applications including image recognition, human action recognition, and Google PageRank, etc. Table 1 summarizes the comparison of different recently demonstrated photonic AI accelerators with well-known analog and digital electronic hardware. The performance parameters of photonic architectures were obtained by theoretically extrapolating from the experimental performance index of a handful of photonic processing units. In complemen- tary metal–oxide–semiconductor (CMOS), MVM operations are typically implemented by systolic arrays [70] or single instruction multiple data (SIMD) units [71]. Due to the properties of electronic components, performing simple operations requires a large number of transistors to work together and an extra scheduler program to coordinate the data movement involved in weights, while MVM operations can be easily implemented by fun- damental photonic components such as microring, MZI, and diffractive plane. Therefore, the rate of photonic computing is several orders of magnitude faster than electrons and consumes much less power. Table 1. Comparison of different photonic AI accelerators with well-known analog and digital electronic hardware. Computing Density Precision Technology Energy/MAC Latency (TMACs/s/mm2 ) (bits) MPLC with a reconfigurable diffractive processing unit [19] - 0.82 fJ/MAC - 8 Broadcast-and-weight based on WDM [72] 50 2.1 fJ/MAC
You can also read