How to run GENESIS on Fugaku - C. Kobayashi Riken Center for Computational Science
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Contents • Purpose of this meeting • Get source code of GENESIS 2.0beta • Install & Run • Get good performance • Trouble shooting 2
Purpose of this meeting • I talk about usage of GENESIS on Fugaku • Basic usage of GENESIS is out of focus • For the basic usage, please check Usage/Tutorial & Samples in the GENESIS website and manual. https://www.r-ccs.riken.jp/labs/cbrt/ 3
What is GENESIS 2.0 beta? Jung et al. J. Comp. Chem., https://doi.org/10.1002/jcc.26450 GENESIS 2.0beta has following features; 1. Selecting the suitable nonbond kernel code for architecture & # of MPIs 2. MD integrator with large time step (multiple time step, MTS) (Jung et al. in submitted) Enable functions = SPDYN & all atom FF (CHARMM/AMBER FF) • MD, Minimization • Simulations with replicas: REMD, REUS, gREST, GaMD, String method Unavailable function in SPDYN • Leapfrog integrator • Langevin thermostat • FEP (Please use 1.5.1 code) 5
How to get GENESIS 2.0 beta code (1) 1. Push ‘Download’ tab in GENESIS site 2. Go to GENSIS 2.0beta page in GENESIS site 3. Go to GitHub site; https://github.com/genesis-release-r-ccs/genesis-2.0 GitHub site 6
How to get GENESIS 2.0 beta code (2) A. Download code on Fugaku directly. (recommended) Push here If you push it, the URL is copied On fugaku, please execute it % git clone https://github.com/genesis-release-r-ccs/genesis-2.0.git Attention! It is strictly forbidden to put your private key on Fugaku. Do not use git via ssh on fugaku. % git clone git@github.com:genesis-release-r-ccs/genesis-2.0.git 7
GENESIS 2.0beta source-tree In top directory: Manual and brief user guide (PDF files) Source codes of GENESIS 2.0beta (only lib and spdyn) Compile test … Quick guide for installation in ASCII text file … File for generation of configure 10
Compile GENESIS on Fugaku If you get source code using ‘git’ command % git clone https://github.com/genesis-release-r-ccs/genesis-2.0.git % cd genesis-2.0 If you get the zip file from github % unzip genesis-2.0-master.zip % cd genesis-2.0-master Compile % autoreconf % ./configure --enable-single --host=Fugaku % make You can choose the following options; % make install --enable-mixed --enable-double (default) Please check doc/GENESIS-2.0.pdf(section 2.2.3). 11
Compile test of GENESIS on Fugaku Compile % cd ../tests/regression_test (make job script) % pjsub regression.sh regression.sh #!/bin/sh #PJM -L "rscgrp=eap-small" #PJM -L "rscunit=rscunit_ft01" #PJM -L "node=2" Explanations of options in script #PJM --mpi "proc=8" will be shown in next pages #PJM -L "elapse=00:30:00" #PJM -j #PJM -S module switch lang/tcsds-1.2.28a export OMP_NUM_THREADS=12 export PLE_MPI_STD_EMPTYFILE=off bindir=PLEASE_INSERT_GENEIS_PATH python2 ./test.py "mpiexec ${bindir}/spdyn " fugaku > regression.log 12
Job script of GENESIS A example of job script in #!/bin/bash #PJM -L "rscgrp=eap-small" Resource group (can be changed) (required) #PJM -L "rscunit=rscunit_ft01" Resource unit (can be changed) (required) #PJM -L "node=16" # of nodes (required) #PJM --mpi "proc=128" # of MPI processes (required) #PJM -L "elapse=20:00" Elapse time(required) #PJM -j #PJM -S Output stat information (optional, however, I strongly recommend) pdir=PLEASE_INSERT_GENEIS_PATH module switch lang/tcsds-1.2.28a Setting of development environment (can be changed) export OMP_NUM_THREADS=6 export PLE_MPI_STD_EMPTYFILE=off Disable empty stdout files for each process mpiexec -stdout run_fep1.out $pdir/spdyn input/run_fep1.inp # stdout file name (Default is (jobscript).(job ID).out.(0 (stdout) or 1 (stderr)).(process ID)) 13
Useful commands on Fugaku How to confirm your current loaded environment % module list How to confirm available environment Details in “Use and job execution” Section 4.6 % module avail How to check cpu time of your group % accountj –h –E –g group_name Please check [SUBTHEME_PERIDO] section Details in “support tools user guide” How to check disk usage (you & group) Sections 3.1.7 & 3.1.8 % accountd 14
Section 3 Get good performance 15
How to get good performance on Fugaku We need to meet the following conditions 1. Please use suitable parameter sets. (You can check it from GENESIS benchmark pages.) 2. Proper calculation (# of MPI/OMP, version of compilers…) At this moment, development environments are also under construction. To check if calculation is proper, benchmark is important! 16
How to get benchmark • Run simulations with different # of nodes, # of MPI processes, and # of OMP threads and check the scalability • 5000 ~ 10000 steps are enough in most cases. • In GENESIS, please check in “dynamics” instead of “total time” [STEP6] Deallocate Arrays Output_Time> Averaged timer profile (Min, Max) This is time for main loops for MDs. total time setup = = 104.674 24.235 Please use it to estimate your total dynamics energy = = 80.439 62.515 simulation time. integrator = 8.233 pairlist = 3.957 ( 3.454, 4.242) energy bond = 0.120 ( 0.003, 0.315) angle = 0.361 ( 0.017, 0.844) dihedral = 1.177 ( 0.047, 2.678) nonbond = 50.372 ( 47.778, 51.623) pme real = 41.143 ( 36.091, 43.540) pme recip = 9.215 ( 8.017, 11.668) (skip) 17
Why my simulation is so slow? Please do following points before consulting with someone. 1. Check your parameters in control file and simulation condition. • Please check parameters and performance in the benchmark site. • Do you compile GENESIS in ‘recommended’ way? (Please do not set FCFLAGS or CCFLAGS by yourself.) 2. Run benchmarks with different sets of MPI/OMP cores • Fugaku has 4 CMGs (core memory group) with 12 cores. → OMPs > 12 is not effective. • Suitable ratios of MPI/OMP are difference in simulation size and # of nodes (In general, smaller OMP is preferred in smaller nodes) 3. Find which part is bottle-neck. 18
How to find bottle-neck [STEP6] Deallocate Arrays Output_Time> Averaged timer profile (Min, Max) total time setup = = 104.674 24.235 Check point : “Energy” is bottle-neck? dynamics = 80.439 energy = 62.515 min max Check point : Real or Recip? integrator = pairlist = 8.233 3.957 ( 3.454, 4.242) Y energy bond = 0.120 ( 0.003, 0.315) Check point : Differences angle dihedral = = 0.361 ( 1.177 ( 0.017, 0.047, 0.844) 2.678) in process are large or small? nonbond = 50.372 ( 47.778, 51.623) pme real pme recip = = 41.143 ( 9.215 ( 36.091, 8.017, 43.540) 11.668) Check point : Constraint or solvation polar = = 0.000 ( 0.000 ( 0.000, 0.000, 0.000) 0.000) N communication? non-polar = 0.000 ( 0.000, 0.000) restraint = 0.000 ( 0.000, 0.000) Check point : Differences qmmm = 0.000 ( 0.000, 0.000) integrator in process are large or small? constraint = 1.884 ( 1.613, 2.082) update = 2.317 ( 2.180, 2.454) comm_coord comm_force = = 0.994 ( 2.955 ( 0.697, 1.450, 1.534) 4.636) Check point : Bottle-necks are difference comm_migrate = 0.115 ( 0.070, 0.168) between sets of MPI/OMP cores? communication 19
Other check points • Please try benchmark of a few sets two or three times. • If differences of execute time in processes ( , in page 28) are too large (> 3 times), you may doubt that the machine has hardware/network troubles. • If simulation is too slow while times in log file is not slow, you may doubt that HDD or network has troubles. • Performance drops when you set small rstout_period (< 1000). • If (Real >> Recip), “respa (elec_long_period >1)” may not be efficient. 20
Section 4 Trouble shooting 21
When you meet a trouble with GENESIS Please do following points before consulting with the developers. 1. Please read your log files carefully and find out which part of calculation/compilation failed. A) Compile: configure log(‘config.log’) and compiler messages B) Calculation: outputs, script.$(jobid).out, and script.$(jobid).stats 2. Check your parameters in control file and simulation condition related to your error logs. 3. But, you don’t need to read source codes. 22
Other check points (1) ü Did you check GENESIS web site & manual carefully? Your parameter and/or usage may not be allowed. ü Do you use recent source code? Your problem might be fixed in the recent version. ü Do you use Fugaku properly? You can check usage in the portal site. (English documents are prepared.) ü Do you select proper development environment and binary? In many cases, old binary does not work in newer environments. Administrator of Fugaku suggests re-compile your code when the environment is updated. 23
Other check points (2) ü Is memory usage less than 28GiB? Fugaku has 32GiB memory, however, only ~28GiB can be used in calculation. Please check your memory usage (MAX MEMORY SIZE (USE) ) in ‘stats’ file. ü Did the job exit within a calculation time written as “elapse=“ in script? ü Do you set correct shape of nodes ("node=lxnxm”, l, n, m=numbers) in current development environment. (in particular, use of multiple replicas) Node shape is frequently changed. Please check current node shape from the portal site. ü Please try the job again. Fugaku is also under development. Job sometimes fails due to unknown reason. 24
How to contact us About GENESIS: GENESIS forum; (we have two BBS rooms; English & Japanese) Forum About Fugaku (for users): HPCI: helpdesk_at_hpci-office.jp Others : r-ccs-ungi-support_at_riken.jp Questions in user briefing (held every month) are welcome. 25
You can also read