ADVANCED COMPUTER ARCHITECTURES - Polimi

Page created by Ralph Schroeder
 
CONTINUE READING
ADVANCED COMPUTER ARCHITECTURES - Polimi
088949 – ADVANCED COMPUTER ARCHITECTURES
             AA 2017/2018 – Second Semester
    http://home.deib.polimi.it/silvano/aca-milano.htm

                    Prof. Cristina Silvano
               email: cristina.silvano@polimi.it
  Dipartimento di Elettronica, Informazione e Bioingegneria (DEIB)
                        Politecnico di Milano
ADVANCED COMPUTER ARCHITECTURES - Polimi
Goals of the ACA course

   Provide an overview of the most recent and advanced
    computer architectures

   Introduce the basic microarchitectural mechanisms
    found in modern microprocessor architectures

   Provide the reasoning behind the adoption of advanced
    computer architectures

Cristina Silvano – Politecnico di Milano   -2-
ADVANCED COMPUTER ARCHITECTURES - Polimi
ADVANCED COMPUTER
ARCHITECTURES: AN OVERVIEW
Cristina Silvano – Politecnico di Milano   -3-
ADVANCED COMPUTER ARCHITECTURES - Polimi
Advanced Computer Architectures:
                Supercomputers
    First supercomputer reaching the Petascale peak
     performance (1015 Flops) was IBM Roadrunner installed in
     2008 at Los Alamos National Lab (New Mexico)
    Research on supercomputing is pushing towards the
     Exascale (1018 Flops) billions of billions to be reached in
     2023.

Cristina Silvano – Politecnico di Milano   -4-
ADVANCED COMPUTER ARCHITECTURES - Polimi
How to measure performance:
                FLOPS, Floating Point Operations per Second

                                           Name         FLOPS
                                           zettaFLOPS   1021
                                           exaFLOPS     1018
                                           petaFLOPS    1015
                                           teraFLOPS    1012
                                           gigaFLOPS    109
                                           megaFLOPS    106
                                           kiloFLOPS    103
                                           FLOPS        1

Cristina Silvano – Politecnico di Milano      -5-
ADVANCED COMPUTER ARCHITECTURES - Polimi
TOP500 List

• The TOP500 list is ranking the world's most powerful
  supercomputers.
• The LINPACK Benchmark (introduced by Jack Dongarra) is
  used to measure the system's floating point computing
  power
• LINPACK measures how fast a computer solves a dense n by
  n system of linear equations Ax = b, which is a common task
  in engineering

                                         www.top500.org
ADVANCED COMPUTER ARCHITECTURES - Polimi
Top500 ranking of the world’s most
                powerful supercomputers (Nov. 2017)

                                              No. 1 Sunway TaihuLight reaches 93.01
                                               PetaFlops (Linpack performance) 125.43
                                               PetaFlops peak performance with 15.37
                                               MW power dissipation. Site: National
                                               Supercomputing Center in Wuxi (China)

                                              No. 2 Tianhe-2 (Milky-Way-2) reaches
                                               33.86 PetaFlops (Linpack performance)
                                               54.9 PetaFlops peak performance with
                                               17.8 MW power dissipation. Site:
                                               National Super Computer Center in
                                               Guangzhou (China)

Cristina Silvano – Politecnico di Milano   -7-
ADVANCED COMPUTER ARCHITECTURES - Polimi
Top500 ranking: the Italian most powerful
                    supercomputer (Nov. 2017)
   No. 14 in Top500 and No.2 in Europe: Marconi Intel Xeon Phi: 7.47
    PetaFlops (Linpack performance) 15.37 PetaFlops (peak performance)
    with 314,384 cores. Site: Casalecchio di Reno, Bologna (Italy)

                                                Marconi is the Cineca's Tier-0 system, co-
                                                 designed by Cineca and Lenovo based on
                                                 the Lenovo NeXtScale platform and Intel®
                                                 Xeon Phi™ product family alongside with
                                                 Intel® Xeon® processor and Intel Omni-
                                                 Path

    Cristina Silvano – Politecnico di Milano   -8-
ADVANCED COMPUTER ARCHITECTURES - Polimi
No. 5 TITAN – Cray XK7, Opteron 2.2GHz, NVIDIA K20X

Cristina Silvano – Politecnico di Milano   -9-
ADVANCED COMPUTER ARCHITECTURES - Polimi
Exascale Supercomputers

    To reach 20 MW Exascale supercomputers projected to 2023,
     current supercomputers must achieve energy efficiency pushing
     towards a goal of 50 GigaFlops/W
    No.1 Sunway delivers 6 GigaFlops/W resulting only 20th in the
     Green500 list ranking supercomputers by their energy efficiency.
    Today most green supercomputer in Green500 installed in Japan
     achieves 17 GigaFlops/W
    The top positions in Green500 are all occupied by heterogeneous
     systems (based on accelerator/co-processor technology) equipped
     with Intel Xeon processors and NVIDIA’s Tesla P100 and NVIDIA Volta
     GV100 GPU to further accelerating the computation.
    This dominance will become a trend for the next coming years to
     reach the target of 20 MW Exascale supercomputer

Cristina Silvano – Politecnico di Milano   - 10 -
US Dept. of Energy Announced Summit and
                Sierra Supercomputers

Cristina Silvano – Politecnico di Milano   - 11 -
Applications driving the demand for more
                computing performance
 Climate                                                      Astrophysics

                                                    Biology

     Business Analytics

Cristina Silvano – Politecnico di Milano   - 12 -
Performance Trend

Source: Jack Dongarra, U. of Tennessee, Oak Ridge National Lab, U. of Manchester
Performance of HPC over the last years from
            the Top500

Source: Jack Dongarra, U. of Tennessee, Oak Ridge National Lab, U. of Manchester
Advanced Computer Architectures:
                 Intel® Core™ i7-3770T Processor
                                            # of Cores                   4

                                            # of Threads                 8

                                            Clock Speed                  2.5 GHz

                                            Max Turbo Frequency          3.7 GHz

                                            Intel® Smart Cache           8 MB

                                            Instruction Set              64-bit

                                            Instruction Set Extensions   SSE4.1/4.2, AVX

                                            Embedded Options Available   No

160mm² die @ 22nm                           Lithography                  22 nm
1.40 billion transistors                    Max TDP                      45 W
Next generations: Broadwell,                Recomm. Customer Price       TRAY: $294.00
Skylake, Kaby Lake at 14nm                  Max Memory Size              32 GB
(2014); Cannonlake at 10nm (2H
                                            Memory Types                 DDR3-1333/1600
2017); Ice Lake 10nm (2018)
                                            # of Memory Channels         2
 Cristina Silvano – Politecnico di Milano
                                            Max Memory Bandwidth         25.6 GB/s
NVIDIA Fermi GPU

Cristina Silvano – Politecnico di Milano   - 16 -
NVIDIA Kepler GPU

  Kepler GK110 Architecture
  • 7.1B Transistors
  • 15 SMX units (2880 cores)
  • >1TFLOP FP64
  • 1.5MB L2 Cache
  • 384-bit GDDR5
  • PCI Express Gen3

Cristina Silvano – Politecnico di Milano   - 17 -
NVIDIA Tesla P100 with Pascal GP100 GPU

The Green500 list ranks the top 500 supercomputers in the
world by energy efficiency for sustainable supercomputing
Cristina Silvano – Politecnico di Milano   - 18 -
Advanced Computer Architectures:
            Smart Phones

                                                      iPhone 7              iPhone 7 Plus
     4.7-inch                                           4.7-inch             5.5-inch display
  12MP camera         12MP camera                 New 12MP camera         New 12MP camera ++
5MP videocamera     5MP videocamera                7MP videocamera          7MP videocamera
Retina HD display   Retina HD display              Retina HD display        Retina HD display
  with 3D touch       with 3D touch                  with 3D touch            with 3D touch
  A9 chip 64-bit      A9 chip 64-bit                  Waterproof               Waterproof
 M9 coprocessor      M9 coprocessor                  Audio stereo             Audio stereo
     iOS 10              iOS 10                  A10 Fusion chip 64-bit   A10 Fusion chip 64-bit
  32GB 128GB          32GB 128GB                  M10 co-proecessor         M10 coprocessor
                                                        iOS 10                    iOS 10
                                        - 20 -
                                                 32GB 128GB 256GB         32GB 128GB 256GB
Apple A8 System-on-Chip

   Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2014 for the
    iPhone 6 and iPhone 6 Plus
   Apple states that it has 25% more CPU performance and 50% more graphics
    performance with 50% of the power compared to its predecessor A7.
   The A8 features the second generation of the Apple-designed 64-bit 1.4 GHz
    ARMv8-A dual-core CPU, called Cyclone Gen 2, and an integrated PowerVR
    Series 6XT GX6450 quad-core GPU.
   The A8 is manufactured on a 20 nm process by TSMC which replaced Samsung
    as manufacturer of Apple's mobile device processors. It contains 2 billion
    transistors. It has 1 GB of LPDDR3 RAM included in the package.
   On October 16, 2014, Apple introduced a variant of the A8, the A8X, in the iPad
    Air 2 with improved graphics and CPU performance due to one extra core and
    higher frequency

    Cristina Silvano – Politecnico di Milano
Apple A9 System-on-Chip

   Apple A8 is a 64-bit ARM-based SoC was introduced on Sept. 2015 for the
    iPhone 6S and iPhone 6S Plus
   Apple states that it has 70% more CPU performance and 90% more graphics
    performance compared to its predecessor A8.
   This is one of the most powerful mobile chip on the market toady along with
    the Samsung Exynos 8890 and Qualcomm Snapdragon 820.
   The A9 features the Apple-designed 64-bit 1.85 GHz ARMv8-A dual-core CPU,
    called Twister, and an integrated PowerVR Series 7XT GT7600 six-core GPU.
   The A9 is manufactured by two companies: 14nm FinFET process by Samsung
    and 16 nm FinFET process by TSMC.
   A9 has 2 GB of LPDDR4 RAM included in the package.
   Apple introduced a variant of the A9, the A9X, in the iPad Pro with the M9
    motion coprocessor embedded in it

    Cristina Silvano – Politecnico di Milano
Apple A10 Fusion

   Apple A10 Fusion is a 64-bit ARM-based SoC designed by Apple and introduced
    on Sept. 2016 for the iPhone 7 and iPhone 7 Plus
   Apple states that it has 40% more CPU performance and 50% more graphics
    performance compared to its predecessor A9.
   The A10 with a die area of 125 mm2 and 3.3 billion transistors (including GPU
    and cache) features two Apple-designed 64-bit 2.34 GHz ARMv8-A cores called
    Hurricane and two energy-efficient 64-bit cores codenamed Zephyr (like the
    ARM big.LITTLE technology).
   A10 integrates new designed PowerVR Series 7XT GT7600 six-core GPU.
   The A10 is manufactured 16 nm FinFET process by TSMC.

    Cristina Silvano – Politecnico di Milano
Energy efficiency underlies all markets

   Energy efficiency is of paramount importance
    for all application markets (automotive,
    consumer, mobile, healthcare and beyond) and
    target systems spanning from sensors, cyber-
    physical systems, embedded systems up to
    servers and HPC systems.
Squeezing of computing cores
                               2005
                               65 nm
                               1.4 mm2
                               Source:
                               ARM9 STMicroelectronics

                               2007
                               45 nm

                               2009
                               32 nm

                               2011
                               22 nm
                               2013
                               14 nm
… entering the multi/many‐core era
                         2005
                         65 nm
                         1.4 mm2
                         Source:
                         ARM9 STmicroelectronics

                         2007
                         45 nm

                         2009
                         32 nm

                         2011
                         22 nm
                         2013
                         14 nm
What are the barriers of further scaling?

   Transistor density
    increases ~2x
    every 2 years

   Frequency wall

   Power wall

   Utilisation wall

                         … the end of the Dennard scaling
                             … increasing power densities
                           … entering the dark silicon era
The dark silicon problem

   The power wall
    and the
    utilisation wall
    represent the main
    barriers for the
    efficient scaling in
    the multi/many-
    core era

                           Dark silicon: Fraction of the
                           die not usable due to the
                           power budget
ACA COURSE INFORMATION

Cristina Silvano – Politecnico di Milano   - 40 -
Contact Information

    Office hours for students:
     Monday 14.00 - 15.00 at DEIB, Via Ponzio 34/5 First floor –
     Internal phone number: 3692 (please send an email to get an
     appointment).
    Main Contact:
     The students can contact prof. Cristina Silvano by
     e-mail (cristina.silvano@polimi.it)
     by indicating:
    Subject: ACA COURSE Milano, Your_Surname,
     Your_Name, Your_POLIMI_ID_NUMBER

    Cristina Silvano – Politecnico di Milano
ACA Teaching Assistants

    Prof. Giovanni Agosta
     e-mail (giovanni.agosta@polimi.it)

    Prof. Gerardo Pelosi
     e-mail (gerardo.pelosi@polimi.it)

    Cristina Silvano – Politecnico di Milano
ACA Course Info

    Teaching Activity: The course consists of 5 CFU and it is
     organized in 30 hours of lectures and 20 hours of
     written/tool-based exercises to prove the concepts
     presented during the lectures.
    Pre-requirements: Basic concepts on logic design and
     computer architectures.

Cristina Silvano – Politecnico di Milano
ACA Final Exam
    FINAL EXAM:
     The final exam consists of a written exam.
     For each written exam, a max. score of 32 points will be assigned
     to 6 questions: max. 16 points will be assigned for the solution of
     the exercise part (composed of 3 questions) and max. 16 points will
     be assigned for answering to the theory part (composed of 3
     questions)
     It is possible to ask an OPTIONAL project to the instructor. The
     project must be concluded before each written exam session (firm
     deadline). The project assign an additional score up to max 12
     points. The additional points given by the project will be added to
     the score of the written exam only if the final score of the written
     exam will be sufficient (>=18 points).

Cristina Silvano – Politecnico di Milano
ACA Teaching Material

   Additional information in slides and papers available
    through Beep and the course webpage:
    http://home.deib.polimi.it/silvano/aca-milano.htm
    If you're using MOZILLA FIREFOX AS WEB BROWSER, for a correct visualisation
    and printing of the PDF SLIDES, please use the SAVE AS option and save the
    PDF FILE on your laptop for correct visualisation and printing.
   Reference Book: "Computer Architecture, A Quantitative
    Approach", John Hennessy, David Patterson, Morgan
    Kaufmann, Fourth Edition / Fifth Edition

Cristina Silvano – Politecnico di Milano
ACA Course

    ACA course is offered in English
    Teaching materials (slides/papers/textbook) are
     available in English
    Final exam can be done in English
    Teaching support available in English and Italian
    Students with M-Z must follow the parallel ACA course
     session held by prof. Donatella Sciuto. ACA course
     objectives and program are aligned. Text of final written
     exam is the same.

Cristina Silvano – Politecnico di Milano
You can also read