The MUMPS Solver: academic needs and industrial expectations - MUMPS group
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The MUMPS Solver: academic needs and industrial expectations Chiara Puglisi (Inria-Grenoble (LIP-ENS Lyon)) MUMPS group, CERFACS, CNRS, ENS-Lyon, INRIA, INPT, Université Bordeaux 1 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 2/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 3/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Academic needs: a research platform Solution of sparse Code Aster, Carter → systems (e.g., finite elements) Ax = b Often the most expensive part in numerical simulation codes Sparse direct methods to solve Ax = b: • Decompose A under the form LU ,LDLt or LLt • Solve the triangular systems Ly = b, then U x = y 3D example in earth science: acoustic wave propagation, 27-point finite difference grid Current goal [Seiscope project]: LU on complete earth n = N 3 = 10003 Extrapolation on a 1000 × 1000 × 1000 grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory!
Sparse direct solution: main research issues Dip (km) 0 5 10 15 20 20 15 Frequency domain Code Aster, ) m (k ss 10 seismic modeling, ro C 5 EDF Pump, 0 Helmholtz equa- Depth (km) 1 2 nuclear backup 3 4 3000 4000 5000 6000 tions, SEISCOPE circuit m/s project Extrapolation on a 1000 × 1000 × 1000 grid: 55 exaflops, 200 Tbytes for factors, 40 TBytes for active memory! Main algorithmic issues • Parallel algorithmic issues: synchronization avoidance, mapping irregular data structures, scheduling. • Performance scalability: time but also memory/proc when increasing number of processors (and problem size). • Numerical issues: numerical accurary, hybrid iterative-direct solvers, application (elliptic PDEs) specific solvers 5/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Robust memory-aware mappings Context Factors Active Memory Factors Active Memory Disk NODE ... Disk NODE Memory per node Active memory not or core naturally scalable, is decreasing Factors Active Memory Factors Active Memory difficult to estimate Disk ... Disk NODE NODE Algorithmic work • Design mapping algorithms that enforce some memory constraints and provide better memory estimates. • Active memory size dominates total memory in parallel, Example: share of active storage on the AUDI matrix 1 processor: 11% 256 processors: 59%
Robust memory-aware mappings (problem) Metric: active memory efficiency Sseq e(p) = p × Smax (p) with Sseq sequential memory; Smax (p) maximum memory used on p procs We would like e(p) ' 1, i.e. Sseq /p on each processor. Common mappings/schedulings → poor memory efficiency: • Standard proportional mapping: lim e(p) = 0 on regular problems. p→∞ • With more sophisticated relaxed proportional mapping, typical efficiency e(p) is still between 0.10 and 0.40. (Memory estimates are unreliable). 7/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Robust Memory-Aware mappings (results) • Reduce memory ↔ serialize some branches in the elimination tree ⇒ Reliable estimation and better memory use with Memory-Aware with respect to default version (MUMPS 4.10.0). Illustration with matrix PANCAKE 2 (3D electromagnetism, Cedrat (Flux) and Padova Univ.), 64 MPI processes MUMPS Memory-aware 4.10.0 mappings Objective max MB/core n/a 400 200 Time (seconds) 418 591 684 Active workspace (avg MB/core) 539.4 234.7 180.0 Active workspace (max MB/core) 900.3 356.2 181.5
Application specific solvers : BLR solver Block Low-Rank approximations to improve sparse multifrontal solvers Low-rank approximations (Elliptic PDE’s) • memory compression and flop reduction • accuracy controlled by a numerical parameter (→ can also be used as a preconditioner) Main features of Block Low Rank (BLR) format • Algebraic solver; flat and simple format • Compatibility with numerical pivoting ⇒ Many representations: Recursive H, H2 [Bebendof, Börm, Hackbush, Grasedyck,. . . ], HSS/SSS [Chandrasekaran, Dewilde, Gu, Li, Xia,. . . ], Flat block low-rank (BLR) . . .
Block Low Rank multifrontal solver ⇒ Elimination tree ⇒ Singular value decomposition (SVD) of each B block B ⇒ B = X1 S1 Y1 + X2 S2 Y2 10/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Block Low Rank multifrontal solver ⇒ Elimination tree ⇒ rank k(ε): B = X1 S1 Y1 +X2 S2 Y2 B kEk2 = kX2 S2 Y2 k2 = σk+1 ≤ ε → Block Low-Rank Solver (BLR), PhD INP-EDF, 2013, C. Weisbecker 10/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Application to frequency-domain seismic modeling Dip (km) Dip (km) Dip (km) Dip (km) 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 20 20 20 20 ) 15 15 15 15 ) ) ) m m m m (k (k (k (k ss ss ss ss 10 10 ro ro 10 10 ro ro C C C C 5 5 5 5 0 0 0 0 Depth (km) Depth (km) Depth (km) Depth (km) 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4 ops memory ε fqcy |L| |CB| (10−5 ) 2 Hz 41.8 % 61.8 % 32.3% 4 Hz 27.4 % 50.0 % 24.4% 8 Hz 21.8 % 41.6 % 23.9% (10−4 ) 2 Hz 32.9 % 53.4 % 23.9% 4 Hz 20.0 % 42.2 % 21.7% 8 Hz 15.2 % 28.9 % 19.4% % : percentage of standard (full-rank) sparse solver 11/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 12/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Industrial expectations: a software platform Technological transfer • From research prototyping during PhD thesis to robust and portable software. Examples: ◦ Memory Aware : PhDs E. Agullo (LIP-ENS, 2008) and F.-H. Rouet (INPT-IRIT, 2012); ◦ Block Low Rank: PhD C. Weibecker (INPT-IRIT with EDF support, 2013). Software issues and interaction with users • Code development: develop and combine complex features • Software engineering: analysis/experimentation/validation tools, maintenance (also essential for research developments !) • Users: expect support, training and adaptation/developments but also: research collaborations, software validation and financial support.
MUMPS solver software platform General context • Initially funded by European project (1996-1999), 12 partners from 5 countries • Publically available since 1999 at http://graal.ens-lyon.fr/MUMPS and http://mumps.enseeiht.fr • Co-developed in Toulouse, Lyon-Grenoble, Bordeaux by CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux • Latest release MUMPS 4.10.0, May 2011, ≈ 250 000 lines of C and Fortran code Competitive and original software package used worldwide • Integrated within commercial and open-source packages (e.g., Samcef from Samtech, Actran from Free Field Technologies, Code Aster from EDF, PAM-Crash from ESI, IPOPT, Petsc, Trilinos, Debian packages, . . . ).
Software requests World Map since Dec. 2002 (8839 requests) 15/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Software requests The number of requests per day has increased steadily throughout the evolution of the software Requests per day 4.5 4.02 4 3.52 3.5 3 2.84 2.5 2.04 2 1.58 1.51.3 1.31 1 0.5 0 4.3 4.5 4.6 4.7 4.8 4.9 4.10 MUMPS releases The latest version (4.10.0) is downloaded more than 1000 times per year 16/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
MUMPS Team (May 2014) Permanent members: Patrick Amestoy (INPT-IRIT, Toulouse) Jean-Yves L’Excellent (INRIA-LIP, Lyon) Abdou Guermouche (LABRI, Bordeaux) Bora Uçar (CNRS-LIP, Lyon) Alfredo Buttari (CNRS-IRIT, Toulouse) Engineers: Guillaume Joslin (Université Paul Sabatier, Toulouse) Chiara Puglisi (INRIA, Grenoble) Part time on MUMPS: Maurice Brémond (INRIA, Grenoble) PhD Students: Mohamed Sid-Lakhdar (ENS-Lyon) Florent Lopez (UPS, Toulouse) 17/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
2000-2013: Research through PhD’s Ph.D. students connected to the project: F. Lopez, UPS W. Sid-Lakhdar, ENS Lyon C. Weisbecker, INPT-EDF F.-H. Rouet, INPT M. Slavova, CERFACS E. Agullo, ENS Lyon S.Pralet, CERFACS A. Guermouche,ENS Lyon C. Voemel, CERFACS 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 Some research themes: Preprocessing and orderings, Numerical pivoting and accuracy, Numerical features, Memory usage and task scheduling, Shared-memory parallelism 18/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Relations with our users Exchanges with users • Direct contacts by email • MUMPS Users Mailing list MUMPS Users Days 1 October 24th, 2006, Lyon, France 2 April 15th - 16th, 2010, Toulouse, France 3 May 29th - 30th, 2013, EDF, Clamart, France Objectives of these workshops: • Present some facets of the algorithmic, numerical and software work in the context of the MUMPS project/solver • Share experience • Identify users expectations (software evolution, new features) • Discuss future research tracks and future of MUMPS
Outline Academic needs: a research platform for sparse direct solvers Industrial expectations: MUMPS solver a software platform Concluding remarks: research and software perspectives 20/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Research perspectives Scientific hurdles and related research areas • Computation driven by memory: Memory-aware algorithms • Controlled accuracy to improve complexity: BLR Solver • Multicore and asynchronous communications: key issue for time and memory scalability, algorithms and communication schemes need be revisited. Performance projection and target (3D Helmholtz; n = 109 ; 1.4 PFlops computer, 2000 nodes, 32 core/node) (Still much research and software work needed to reach this target !!) MUMPS 4.10.0 Research target Time 107 seconds 104 seconds Factors 8 GB/core 3 GB/core Workspace 50 GB/core 2 GB/core 21/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Software agreement Software agreement signed by owners of the software: CERFACS, CNRS, ENS Lyon, INPT, Inria, Univ. Bordeaux 1. Key features • All institutions have recognized and confirmed their will to freely distribute MUMPS releases • A technical committee supervises technical/scientific decisions • Conditions of use for development version defined • Conditions of transfer toward next public version defined • License for public versions: Cecill-C (LGPL-compatible) 22/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
Sustainability of MUMPS software and research platform Objectives • Stabilize engineering work and expertise with long-term positions • Ensure software quality and faster transfer research work MUMPS Consortium • Type: group of users • Objective: support engineer work • Services: beta-release of future/new functionalities, annual meeting to share experience, wish list to influence priority in development, training cycles . . . On going work . . . takes more time than one could have expected
References I 24/24 Séminaire Aristote - HPC-Desk — ONERA, France, May 20th, 2014
You can also read