ADVANCED COMPUTER ARCHITECTURES - Polimi
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
088949 – ADVANCED COMPUTER ARCHITECTURES AA 2018/2019 – Second Semester http://home.deib.polimi.it/silvano/aca-milano.htm Prof. Cristina Silvano email: cristina.silvano@polimi.it Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB) Politecnico di Milano
Goals of the ACA course Provide an overview of the most recent and advanced computer architectures Introduce the basic microarchitectural mechanisms found in modern microprocessor architectures Provide the reasoning behind the adoption of advanced computer architectures Cristina Silvano – Politecnico di Milano -2-
Advanced Computer Architectures: Supercomputers First supercomputer reaching the Petascale peak performance (1015 Flops) was IBM Roadrunner installed in 2008 at Los Alamos National Lab (New Mexico) Research on supercomputing is pushing towards the Exascale (1018 Flops) billions of billions to be reached in 2023. Cristina Silvano – Politecnico di Milano -4-
How to measure performance: FLOPS, Floating Point Operations per Second Name FLOPS zettaFLOPS 1021 exaFLOPS 1018 petaFLOPS 1015 teraFLOPS 1012 gigaFLOPS 109 megaFLOPS 106 kiloFLOPS 103 FLOPS 1 Cristina Silvano – Politecnico di Milano -5-
TOP500 List • The TOP500 list is ranking the world's most powerful supercomputers. • The LINPACK Benchmark (introduced by Jack Dongarra) is used to measure the system's floating point computing power • LINPACK measures how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering www.top500.org
US Dept. of Energy announced Summit and Sierra Supercomputers Cristina Silvano – Politecnico di Milano -7-
Top500 ranking of the world’s most powerful supercomputers (Nov. 2018) 1. Summit, IBM-built supercomputer at Dept. of Energy’s Oak Ridge National Lab. reached 143.5 PetaFLOPS Linpack performance with 2,397,824 cores: Processor IBM Power9 22C at 3.1 GHz, NVIDIA Volta GV100 GPUs, Mellanox dual-rail EDR InfiniBand network. 2. Sierra, IBM-built supercomputer at DOE’s Lawrence Livermore National Lab. reached 94.6 PetaFLOPS with 1,572,480 cores. Sierra is quite similar to Summit with IBM Power9 22C at 3.1 GHz, NVIDIA Volta GV100 GPUs, Mellanox dual-rail EDR InfiniBand network. 3. Sunway TaihuLight, developed by China’s National Research Center of Parallel Computer at the National Supercomputing Center in Wuxi. It dropped to No. 3 after leading the list for the past 2 years with 93 PetaFLOPS and 10,649,600 cores. www.top500.org Cristina Silvano – Politecnico di Milano -8-
Top500 ranking: the Italian most powerful public supercomputer (Nov. 2018) No. 19 in Top500 and No.4 in Europe: Marconi Intel Xeon Phi: 10.38 PetaFlops (Linpack performance) 18.8 PetaFlops (peak performance) with 348,000 cores. Site: Casalecchio di Reno, Bologna (Italy) Marconi is the Cineca's Tier-0 system, co-designed by Cineca and Lenovo based on the Lenovo NeXtScale platform and Intel® Xeon Phi™ product family alongside with Intel® Xeon® processor and Intel Omni-Path Cristina Silvano – Politecnico di Milano -9-
Energy efficiency underlies all markets Energy efficiency is of paramount importance for all application markets (automotive, consumer, mobile, healthcare and beyond) and target systems spanning from sensors, cyber- physical systems, embedded systems up to servers and HPC systems.
The Gren500 List The Green500 list ranks the top 500 supercomputers in the world by energy efficiency (performance-per-watt) for sustainable supercomputing. The inaugural Green500 list was announced on Nov. 2007 Currently, the top positions of Green500 are occupied by heterogeneous computing systems (host processor + coprocessor): The top three positions in June 2018 are taken by supercomputers installed in Japan and based on the ZettaScaler-2.2 architecture using PEZY-SC2 accelerators, while other top 10 systems use NVIDIA GPUs. The dominance of heterogeneous computing is becoming a trend to reach the target of 20 MW Exascale supercomputer www.top500.org/green500/ Cristina Silvano – Politecnico di Milano - 11 - March 2013
Exascale Supercomputers Most green supercomputer in Green500 is Shoubu SystemB with 18.4 GigaFlops/W during its 858 TeraFLOPS Linpack run (ranked N.359 in Top500) IBM Summit equipped with NVIDIA VOLTA GV100 is ranked No.1 in Top500 with 122.3 PetaFLOPS Linpack and ranked No. 5 in the Green500 with 13.8 GigaFLOPS/W To reach 20 MW Exascale supercomputer projected to 2021, current supercomputers need to achieve 4x energy efficiency from around 13 towards 50 GFLOPS/W Cristina Silvano – Politecnico di Milano - 12 -
Applications driving the demand for more computing performance Climate Astrophysics Biology Business Analytics Cristina Silvano – Politecnico di Milano - 13 -
Performance Trend Source: Jack Dongarra, U. of Tennessee, Oak Ridge National Lab, U. of Manchester
Performance of HPC over the last years from the Top500 Source: Jack Dongarra, U. of Tennessee, Oak Ridge National Lab, U. of Manchester
Advanced Computer Architectures: Intel® Core™ i7-3770T Processor # of Cores 4 # of Threads 8 Clock Speed 2.5 GHz Max Turbo Frequency 3.7 GHz Intel® Smart Cache 8 MB Instruction Set 64-bit Instruction Set Extensions SSE4.1/4.2, AVX Embedded Options Available No 160mm² die @ 22nm Lithography 22 nm 1.40 billion transistors Max TDP 45 W Next generations: Broadwell, Recomm. Customer Price TRAY: $294.00 Skylake, Kaby Lake at 14nm Max Memory Size 32 GB (2014); Cannonlake at 10nm (2H Memory Types DDR3-1333/1600 2017); Ice Lake 10nm (2018) # of Memory Channels 2 Cristina Silvano – Politecnico di Milano Max Memory Bandwidth 25.6 GB/s
NVIDIA Fermi GPU Cristina Silvano – Politecnico di Milano - 17 -
NVIDIA Kepler GPU Kepler GK110 Architecture • 7.1B Transistors • 15 SMX units (2880 cores) • >1TFLOP FP64 • 1.5MB L2 Cache • 384-bit GDDR5 • PCI Express Gen3 Cristina Silvano – Politecnico di Milano - 18 -
NVIDIA Tesla P100 with Pascal GP100 GPU NVIDIA Tesla GP100 (Pascal) with 3584 FP32 CUDA Cores/GPU and 1792 FP64 CUDA Cores/GPUs Total 15.3 billion transistors in 16 nmFinFET process technology Cristina Silvano – Politecnico di Milano - 19 -
Advanced Computer Architectures: Smart Phones iPhone 7 iPhone 7 Plus 4.7-inch 4.7-inch 5.5-inch display 12MP camera 12MP camera New 12MP camera New 12MP camera ++ 5MP videocamera 5MP videocamera 7MP videocamera 7MP videocamera Retina HD display Retina HD display Retina HD display Retina HD display with 3D touch with 3D touch with 3D touch with 3D touch A9 chip 64-bit A9 chip 64-bit Waterproof Waterproof M9 coprocessor M9 coprocessor Audio stereo Audio stereo iOS 10 iOS 10 A10 Fusion chip 64-bit A10 Fusion chip 64-bit 32GB 128GB 32GB 128GB M10 co-proecessor M10 coprocessor iOS 10 iOS 10 - 20 - 32GB 128GB 256GB 32GB 128GB 256GB
Apple A9 System-on-Chip Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2015 for the iPhone 6S and iPhone 6S Plus Apple states that it has 70% more CPU performance and 90% more graphics performance compared to its predecessor A8. This is one of the most powerful mobile chip on the market toady along with the Samsung Exynos 8890 and Qualcomm Snapdragon 820. The A9 features the Apple-designed 64-bit 1.85 GHz ARMv8-A dual-core CPU, called Twister, and an integrated PowerVR Series 7XT GT7600 six-core GPU. The A9 is manufactured by two companies: 14nm FinFET process by Samsung and 16 nm FinFET process by TSMC. A9 has 2 GB of LPDDR4 RAM included in the package. Apple introduced a variant of the A9, the A9X, in the iPad Pro with the M9 motion coprocessor embedded in it Cristina Silvano – Politecnico di Milano
Apple A10 Fusion Apple A10 Fusion is a 64-bit ARM-based SoC designed by Apple and introduced on Sept. 2016 for the iPhone 7 and iPhone 7 Plus Apple states that it has 40% more CPU performance and 50% more graphics performance compared to its predecessor A9. The A10 with a die area of 125 mm2 and 3.3 billion transistors (including GPU and cache) features two Apple-designed 64-bit 2.34 GHz ARMv8-A cores called Hurricane and two energy-efficient 64-bit cores codenamed Zephyr (like the ARM big.LITTLE technology). A10 integrates new designed PowerVR Series 7XT GT7600 six-core GPU. The A10 is manufactured 16 nm FinFET process by TSMC. Cristina Silvano – Politecnico di Milano
Squeezing of computing cores 2005 65 nm 1.4 mm2 Source: ARM9 STMicroelectronics 2007 45 nm 2009 32 nm 2011 22 nm 2013 14 nm
… entering the multi/many‐core era 2005 65 nm 1.4 mm2 Source: ARM9 STmicroelectronics 2007 45 nm 2009 32 nm 2011 22 nm 2013 14 nm
What are the barriers of further scaling? Transistor density increases ~2x every 2 years Frequency wall Power wall Utilisation wall … the end of the Dennard scaling … increasing power densities … entering the dark silicon era
The dark silicon problem The power wall and the utilisation wall represent the main barriers for the efficient scaling in the multi/many- core era Dark silicon: Fraction of the die not usable due to the power budget
ACA COURSE INFORMATION Cristina Silvano – Politecnico di Milano - 27 -
Contact Information Office hours for students: Monday 14.00 - 15.00 at DEIB, Via Ponzio 34/5 First floor – Internal phone number: 3692 (please send an email to get an appointment). Main Contact: The students can contact prof. Cristina Silvano by e-mail (cristina.silvano@polimi.it) by indicating: Subject: ACA COURSE Milano, Your_Surname, Your_Name, Your_POLIMI_ID_NUMBER Cristina Silvano – Politecnico di Milano
ACA Teaching Assistants Prof. Giovanni Agosta e-mail (giovanni.agosta@polimi.it) Prof. Gerardo Pelosi e-mail (gerardo.pelosi@polimi.it) Cristina Silvano – Politecnico di Milano
ACA Teaching Material Additional information in slides and papers available through Beep and the course webpage: http://home.deib.polimi.it/silvano/aca-milano.htm Reference Book: "Computer Architecture, A Quantitative Approach", John Hennessy, David Patterson, Morgan Kaufmann, Fourth Edition / Fifth Edition Cristina Silvano – Politecnico di Milano
You can also read