MUCOSIM WS 2020/2021 (SEE ALSO UNIVIS) - RRZE MOODLE
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
MuCoSim Seminar WS 2020/1 Time & place § Monday 4pm – 5:30pm § Zoom meetings until in person seminar is allowed § Updates: moodle Requirements § 1+1 talks § 1 written report What you get § 5 ECTS credits § Invaluable insights J 2
Mission Optimization & Parallelization on all modern compute architectures àBenchmarking & Performance Measurement àUnderstand interaction between code & hardware This is ours! àPerformance modelling: Roofline model & ECM model àPerformance tools: likwid – lightweight performance tools (https://github.com/RRZE-HPC/likwid) kerncraft - Loop Kernel Analysis & Performance Modeling Toolkit (https://github.com/RRZE-HPC/kerncraft) MuCoSim SS 2020 4
What to do in the seminar – two groups of projects Performance Measurement, Analysis and Optimization • Get familiar with some code (C/C++/Fortran) • Carefully measure and report (performance) numbers for (various) modern compute device(s) • Implement (small) code modifications and measure their impact • Do (simple) performance model if necessary/possible Performance Tools (likwid, kerncraft, OSACA) • Analyse and/or extend feature set of tools • Compare with other tools MuCoSim SS 2020 5
What we expect • Basic knowledge of C, C++, or Fortran • Basic knowledge of Linux shell usage incl. editing • Basic knowledge of OpenMP and/or MPI parallelization (some projects) • Basic knowledge of Python or some other capable scripting language • Nice, but not strictly required: PTfS lecture (summer term) • You need to actively participate in two hands-on sessions where you learn • how to access and use our machines, • how to compile and run a code, • how to use our benchmarking and analysis tool likwid MuCoSim SS 2020 6
Coupled oscillators as a model for parallel execution (Georg Hager) • Synchronization phenomena with coupled oscillators are an intensely studied subject $ ! = ! + ) sin( % − ! ) https://en.wikipedia.org/wiki/Kuramoto_model !"# • Parallel, communicating processes can be modeled as coupled oscillators • Compute-communicate phases are like oscillation • Communication acts as coupling MuCoSim WS 2020/2021 16.11.20 8
Coupled oscillators as a model for parallel execution cont’d (Georg Hager) • Synchronization and desynchronization play important roles in parallel computing • Task: Simulate a (modified) Kuramoto model and adjust parameters to mimic parallel execution of coupled processes MuCoSim WS 2020/2021 16.11.20 9
Analyze dense matrix-vector multiplication (Thomas Gruber) § Dense MVM is a common operation in HPC § Often part of HPC courses Task: § Establish simple performance model(s) for dMVM § Perform hardware measurements using LIKWID on different CPUs § Compare results with model and make refinements § Propose optimizations for naïve algorithm MuCoSim WS 2020/2021 16.11.20 10
Analyze branch prediction systems of modern architectures (Thomas Gruber) § Common codes contain a lot of conditions → branches § CPUs try to predict outcome to speculatively execute code sections § CPUs provide measurement facilities for branching Task: § Analyze how detailed branching can be analyzed § How does mispredictions limit code execution (stalls, pipeline drains, …) MuCoSim WS 2020/2021 16.11.20 11
HPCG (Christie L. Alappat) HPCG is a supercomputing benchmark (https://www.hpcg-benchmark.org) used to rank world’s most powerful supercomputers. The benchmark solves a linear system of equations using multigrid preconditioned conjugate-gradient (CG) algorithm. • Experiment with SpMV kernel and do a layer condition analysis of HPCG matrix in CRS format. (Code given) • Understand SymGS kernel, ist dependency problem and implement level scheduling to parallelise the code. (Code for level scheduling will be given). • Now try to vectorize the SymGS code with the level scheduling scheme. • If successful run the entire analysis on world‘s most powerful CPU (A64FX). MuCoSim WS 2020/2021 16.11.20 12
Study the caching behaviour of Intel YASK code (Christie L. Alappat) YASK is a stencil DSL framework developed by Intel. YaskSite is an in-house library build on top of YASK to support performance modeling of YASK generated Pic source : https://software.intel.com/en- stencils. The topic concerns on analyzing YaskSite’s us/articles/eight-optimizations-for-3- dimensional-finite-difference-3dfd-code- performance model using pycachesim cache with-an-isotropic-iso simulator. • Understand pycachesim and YASK interface and couple them. • Test the deviations with the analytical predictions for different star shaped stencils, especially long range ones. • If deviating tell which component is missing in analytical model. • Study impact of spatial and temporal blocking. MuCoSim WS 2020/2021 16.11.20 13
Porting a MD force kernel to CUDA (Jan Eitzinger) § Target code MD-Bench (in-house Mini-App) § Sequential C re-implementation of Mantevo Mini-MD § Less than 1000 loc Task § Port the force calculation kernel to GPU using CUDA § Optional: Profile and Analyse Performance § Optional: Optimize the Performance https://github.com/RRZE-HPC/MD-Bench MuCoSim WS 2020/2021 16.11.20 14
Performance Tools
Analyze MinApps and Kernels with Intel Advisor (Georg Hager) Intel Advisor provides insights into hardware utilization of applications and advice for code optimization – including a roofline analysis (wow!) 1. Get familiar with the tool 2. Analyze several (existing) kernels and applications 3. Compare Intel results with existing performance models and knowledge about bottlenecks. MuCoSim WS 2020/2021 16.11.20 16
Adding and testing PAPI to likwid-bench (Thomas Gruber) § PAPI provides an abstraction layer for various measurement facilities (e.g. hardware performance counter) § likwid-bench is a micro-benchmarking suite with assembly kernels Task: § Add PAPI calls to likwid-bench for common measurement groups (L2, L3, FLOPS_DP, FLOPS_SP, …) § Compare measurements of PAPI with LIKWID measurements MuCoSim WS 2020/2021 16.11.20 17
OSACA for Rasberry Pi 4 (Julian Hammer) § Create OSACA in-core execution model and validate for ARM Cortex-A72 architecture Software and techniques involved: § assembly, OSACA, asmbench, ibench, Python MuCoSim WS 2020/2021 16.11.20 18
Validation Fuzzer for OSACA \w asmbench (Julian Hammer) § Create random benchmarks with fuzzing techniques using asmbench tool § Compare results with IACA, OSACA and LLVM-MCA Software and techniques involved: § Python, fuzzing, llvm, llvm-ir, assembly, git MuCoSim WS 2020/2021 16.11.20 19
Kernel Explorer (Julian Hammer) § Build a website where Kerncraft can be used in the browser § Think compiler explorer (godbolt.org), but with Kerncraft Software and techniques involved: § Python, $webframework (e.g., django, flask), JS, HTML, CSS, Docker MuCoSim WS 2020/2021 16.11.20 20
Performance Analysis with Paraver (Ayesha Afzal) Paraver – offline trace analysis tool (timelines, 2/3D tables -statistics) Dimemas – message passing simulator Extrae – instrumentation Tasks • First talk: Getting familiar with Paraver and tool exploration with simpler test cases • Downloads: sources / binaries, Linux / windows / MAC • Documentation: training guides, tutorial slides • Second talk: Analysis of composite distributed applications with tool provided features • Analyzing variability: time, IPC, Instructions, cache misses ratio, … • Trace manipulation: filtering, cutting, … • Play around with latency and bandwidth parameters: network sensitivity, ideal machine, … • Through clustering: identify structure, track scability, … • …. Required skills • Basic knowledge of C/C++ and code parallelization with MPI Provided material • MPI parallelized benchmarks and algorithms (e.g., spMvM (irregular matrices), Jacobi (regular), ray tracer (load imbalances), etc.,) Parallel efficiency = LB eff * Comm eff Parallel efficiency refinement: LB * μLB * Tr MuCoSim WS 2020/2021 16.11.20 21
Open Talks from last semester Student (Tutor) Topic (state) Michael Holzmann (Hager) Stencils on Tsubasa (2nd talk pending) Maniranam (Dominik Ernst) Modern Languages (1. Talk) Maniranam (Dominik Ernst) Modern Languages (2. Talk) Matthias König (TG) Dense matrix transpose (2. Talk) Ravi Chandra (TG) Threading models (1. Talk, 30.11.) Ravi Chandra (TG) Threading models (2. Talk) MuCoSim WS 2020/2021 16.11.20 22
Time schedule Date Topic 23.11.2020 Kerncraft on RasPi4 (T. Auerochs, 2. talk) 27.11.2020 Mandatory Intro Hands-On (Part 1) 30.11.2020 Threading models in modern programming languages (R. Chandra, 1. talk) 07.12.2020 Mandatory Intro Hands-On (Part 2) MuCoSim WS 2020/2021 16.11.20 23
You can also read