Integer Boltzmann Machines for Future Generation Digital Annealing Units - AQC 2021
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Integer Boltzmann Machines for Future Generation Digital Annealing Units Mohammad Bagherbeik1, Kouichi Kanda2, Hirotaka Tamura3, Ali Sheikholeslami1 ¹Dept. of Electrical & Computer Engineering, University of Toronto, Canada ²Fujitsu Limited, Kawasaki, Japan ³DXR Laboratories, Kawasaki, Japan Adiabatic Quantum Computing Conference 2021 Integer Boltzmann Machines for Future DAUs 1
Outline Motivation and Background ◼ Quadratic Assignment Problems ◼ Local Search Algorithms Integer Boltzmann Machine Parallel Tempering Solver Design ◼ Algorithm and SIMD ◼ Load Balancing Benchmark Results Conclusion Integer Boltzmann Machines for Future DAUs 2
Integer Assignment Problems Example 1: Travelling Salesman Class of NP-hard combinatorial optimization problems with integer variables ◼ Quadratic Assignment Problem (QAP) [1] Simple formulation but difficult to solve even with modern CPU and GPU hardware Start & Visting Travel Itinerary Locations (Min. Total Distance) Example 2: FPGA Block Placement a b c d a b Goal Create an integer Boltzmann Machine variant to e d accelerate search for QAP place f c e f Presentation covers formulation for symmetric, CLB Netlist FPGA Placement bias-less QAPs but concepts are directly (Min. Route Cost) transferrable to asymmetric + biased problems Integer Boltzmann Machines for Future DAUs 3
QAP Formulation ϕ X 1 x1,1 x1,2 x1,3 x1,4 L1 ϕ1 = 2 0 1 0 0 x2,1 x2,2 x2,3 x2,4 4 2 L2 ϕ2 = 3 0 0 1 0 x3,1 x3,2 x3,3 x3,4 L3 solve ϕ3 = 4 0 0 0 1 3 x4,1 x4,2 x4,3 x4,4 L4 ϕ4 = 1 1 0 0 0 Facilities & Flows Locations & Distances Assignment Permutation (Integer and Binary) Given facilities and locations Find a permutation = ∈ , where is set of all possible permutations ◼ assigns each facility , to a unique location such that the total cost of the assignment (1) is minimized: 1, = 1 = , , (1) , = ቊ , = xi,j ∈ 0,1 × (2) 0, ℎ =1 =1 Integer Boltzmann Machines for Future DAUs 4
Local Search 1. Propose new states as perturbations of the current state ◼ Exchange locations of 2 facilities 2. Calculate difference in cost function (Δ ) between previous and proposed state 3. Decide whether to accept proposed state based on some criterion ◼ Stochastic Approach (e.g. Simulated Annealing): Use a varying temperature variable ( ) that is used to inject randomness into acceptance of moves via Metropolis Hastings acceptance probability generated using (3) Proposal Acceptance Rate ( ) → ∞, → 100%, all proposals are accepted regardless of Δ value → 0, → 0%, search turns greedy Δ = min{1, exp − } (3) Integer Boltzmann Machines for Future DAUs 5
Local Search: QAP Exchange Exchange locations of two facilities , x1,1 x1,2 x1,3 x1,4 ϕ1 = 2 0 1 0 0 ϕ2 = 3 x2,1 x2,2 x2,3 x2,4 Change in cost calculated via (4) 0 0 1 0 x3,1 x3,2 x3,3 x3,4 ϕ3 = 4 0 x4,1 0 x4,2 0 x4,3 1 x4,4 Implemented in software via simple ϕ4 = 1 1 0 0 0 loop(s) in ( ) time x1,1 x1,2 x1,3 x1,4 ϕ1 = 1 1 0 0 0 x2,1 x2,2 x2,3 x2,4 Δ = 2( , − , )( , − , ) (4) ϕ2 = 3 0 0 1 0 =1, ≠ , x3,1 x3,2 x3,3 x3,4 ϕ3 = 4 0 0 0 1 x4,1 x4,2 x4,3 x4,4 ϕ4 = 2 0 1 0 0 Integer Boltzmann Machines for Future DAUs 6
Outline Motivation and Background ◼ Quadratic Assignment Problems ◼ Local Search Algorithms Integer Boltzmann Machine Parallel Tempering Solver Design ◼ Algorithm and SIMD ◼ Load Balancing Benchmark Results Conclusion Integer Boltzmann Machines for Future DAUs 7
Boltzmann Machines 1D – neurons 2D – × neurons x1,1 x1,2 x1,3 x1,n x5 T rand() x4 x6 x2,1 x2,2 x2,3 x2,n hi update x3 wi,j xi x3,1 x3,2 x3,3 x3,n S xi x2 xN bi x1 xn,1 xn,2 xn,3 xn,n State 1 1 Cost/ = − , − (5) = − , ; , , , − , , (7) 2 2 Energy =1 =1 =1 =1 =1 =1 =1 =1 =1 Neuron Local ℎ ( ) = , + (6) ℎ , ( ) = , ; , , + , (8) Fields =1 =1 =1 Integer Boltzmann Machines for Future DAUs 8
BM vs Integer BM for QAP [2,3] Data Storage Searching the State-Space: Exchange 2 Facilities (a b) n locations S1 S2 S3 S4 S5 S3 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 S2 S4 n facilities 0 1 0 0 State 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 2 Cost(X) 0 0 1 0 n binary bits 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 BM 0 0 0 1 64kbits at n=256 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 S1 W matrix: Fully Connected Bits Infeasible States S5 1.6G floats at n=256 4 Sequential Spin-Flips Search State-Space ϕ State Cost(ϕ) Int BM 1 F, D Matrices 1 3 n integers 2 2 x 64k floats 2 2 S1 3 2,048bits 3 1 at n=256 S5 n at n=256 n S1 1 Exchange Operation S5 n Search State-Space 2D BM requires penalty terms to enforce one-hot constraints on rows/columns of Using an Integer BM with intrinsic exchange moves (4-bit-flip equivalent) simplifies problem formulation and speeds up search, significantly reduces memory reqs. Integer Boltzmann Machines for Future DAUs 9
Int BM QAP Formulation Notes Formulation # • Remove conditions on the summation Δ = 2( , − , )( , − , ) (4) loop iterator =1, ≠ , • Introduces extra terms within the sum • Compensate via additional terms Δ = 2( , − , )( , − , ) (9) =1 +4 , • Create vector of flow row differences Δ = ,∗ − ,∗ (10) • Create vector of permutation ordered Δ = ,∗ − ,∗ ⊺ (11) distance row differences • Reformulate using a dot product Δ = 2Δ ∙ Δ + 4 , (12) Integer Boltzmann Machines for Future DAUs 10
Integer BM: Local Fields Notes Formulation # • Generate cache values at start of = ⊺ = ℎ , ∈ ℝ × (13) search in ( 3 ) time X L1 L2 L3 L4 H L1 L2 L3 L4 x1,1 x1,2 x1,3 x1,4 • Elements in correspond to bits 0 1 0 0 h1,1 h1,2 h1,3 h1,4 in x2,1 x2,2 x2,3 x2,4 h2,1 h2,2 h2,1 h2,4 • Element ℎ , is cost of assigning a 0 0 1 0 x3,1 x3,2 x3,3 x3,4 facility to location 0 0 0 1 h3,1 h3,2 h3,3 h3,4 x4,1 x4,2 x4,3 x4,4 h4,1 h4,2 h4,3 h4,4 1 0 0 0 • Reformulate dot product as sum Δ ∙ Δ = ℎ , + ℎ , − ℎ , − ℎ , (14) of 4 elements taking (1) time Δ = 2Δ ∙ Δ + 4 , Δ = 2(ℎ , + ℎ , − ℎ , − ℎ , ) + 4 , (15) Integer Boltzmann Machines for Future DAUs 11
Int BM: Local Field Updates H (ΔF) ΔD Update local fields 1 row at a time using + floating point fused-multiply-adds h1,1 h1,2 h1,3 h1,n = δf1 δd1 δd2 δd3 δdn × ◼ ( 2 ) h2,1 h2,2 h2,1 h2,n + = δf2 δd1 δd2 δd3 δdn × Can skip rows corresponding to 0 h3,1 h3,2 h3,3 h3,n 0 δd1 δd2 δd3 δdn elements in Δ + Significant speed-ups if is sparse or hn,1 hn,2 hn,3 hn,n = δfn δd1 δd2 δd3 δdn × has a particular pattern that leads to sparse Δ vectors ⊺ += Δ ⊗ Δ (16) Integer Boltzmann Machines for Future DAUs 12
Outline Motivation and Background ◼ Quadratic Assignment Problems ◼ Local Search Algorithms Integer Boltzmann Machine Parallel Tempering Solver Design ◼ Algorithm and SIMD ◼ Load Balancing Benchmark Results Conclusion Integer Boltzmann Machines for Future DAUs 13
Parallel Tempering Uses M unique search-states/replicas with unique temperatures, = ( ) ∈ Tmax global min found Replicas perform Local Search at their Cost assigned temperature ◼ Can swap temperatures with other replicas with probability calculated via (17) Iterations (time) ◼ Allows for an escape mechanism from local minima T Scaled Cost 1 1 = 1, exp − − +1 (17) Tmin +1 State Integer Boltzmann Machines for Future DAUs 14
Parallel Tempering Solver System Experimental setup: ◼ 64-Core AMD 3990X CPU with AVX2 QAP Replica C Cmin H support ϕ ϕmin Data ◼ Coded in C++ and compiled with GCC-9 n F D T0 I Float32 used to store , , Replica Replica . .. Replica ◼ Allows for Fused-Multiply-Add instructions 1 2 32 C T Each engine uses 32 replicas in combination SLS SLS . .. SLS with a PT controller PT . .. ◼ Each replica is run on a dedicated CPU core core 1 core 2 core 32 ◼ 2 engines are run in parallel to utilize all 32 Replica PT Engine 64 cores ◼ Runs until a target cost is found or a 5- minute time-out limit is reached Integer Boltzmann Machines for Future DAUs 15
Load Balancing Threads no load balancing load balanced ReplicaT32 ReplicaT32 ... ... ReplicaT24 ReplicaT24 ... ... ReplicaT17 ReplicaT17 ... ... ReplicaT1 ReplicaT1 time time idle ΔC calc. (I) ΔC calc. (Ibal) update Varying proposal acceptance rates of replicas at different temperatures causes thread imbalance when using Parallel Tempering Track average time-per-iteration at each temperature and scale iterations at each temperature to try to equalize thread run-times Integer Boltzmann Machines for Future DAUs 16
Outline Motivation and Background ◼ Quadratic Assignment Problems ◼ Local Search Algorithms Integer Boltzmann Machine Parallel Tempering Solver Design ◼ Algorithm and SIMD ◼ Load Balancing Benchmark Results Conclusion Integer Boltzmann Machines for Future DAUs 17
QAP Benchmark Results Avg. Time-to-Solution (TtS) across 100 independent, randomly seeded, runs with 99% confidence interval for 19 difficult QAP instances ◼ Instances from [4,5,6] ParEOTS [7]: Parallel Extremal Optimization + Tabu Search ◼ Best performing published QAP solver ◼ 128 cores on CPU cluster 8 x 16–core AMD 6376 CPU ◼ TtS reported as average across 10 runs with 5-minute time-out 57x faster than ParEOTS on average ◼ Max speed-up of 353x Integer Boltzmann Machines for Future DAUs 18
Conclusion Extended BMs to support for integer variables with one-hot constraints ◼ Remove need for penalty tuning on user side Added support for weight-generation using submatrices for problem topologies such as QAP ◼ Larger problem support due to lower memory requirements Integer BM based CPU solution displays performance that is orders of magnitude faster than competing solvers across difficult QAP instances Future Work Accelerate search relative to high-performance multi-core CPU system through dedicated hardware Integer Boltzmann Machines for Future DAUs 19
Acknowledgements The authors would like to thank Fujitsu Ltd. and Fujitsu Consulting (Canada) Inc. for providing financial support and technical expertise on this research and to CMC Microsystems for access to CAD tools used in this research
Thank You! Looking forward to your questions at the live session
References [1] Lawler, E.L.: The quadratic assignment problem. Manage. Sci. 9(4), 586–599 (1963). [2] M. Bagherbeik, P. Ashtari, S. F. Mousavi, K. Kanda, H. Tamura, and A. Sheikholeslami, “A permutational boltzmann machine with parallel tempering for solving combinatorial optimization problems,” in International Conference on Parallel Problem Solving from Nature. Springer, 2020, pp. 317–331 [3] M. Bagherbeik and A. Sheikholeslami, “Caching and vectorization schemes to accelerate local search algorithms for assignment problems,” in IEEE Congress on Evolutionary Computation (to appear), 2021. [4] Burkard, R.E., Karisch, S.E., Rendl, F.: QAPLIP-a quadratic assignment problem library. J. Global Optim. 10(4), 391–403 (1997). [5] Palubeckis, G.: An algorithm for construction of test cases for the quadratic assignment problem. Informatica Lith. Acad. Sci. 11, 281–296 (2000) [6] Drezner, Z., Hahn, P.M., Taillard, E.D.: Recent advances for the quadratic assign- ´ ment problem with special emphasis on instances that are difficult for metaheuristic methods. Ann. Oper. Res. 139(1), 65–94 (2005). [7] Munera, Danny, Daniel Diaz, and Salvador Abreu. "Hybridization as cooperative parallelism for the quadratic assignment problem." International Workshop on Hybrid Metaheuristics. Springer, Cham, 2016. Integer Boltzmann Machines for Future DAUs 22
You can also read