Computer Architecture Lecture 01 - Introduction - Pengju Ren Institute of Artificial Intelligence and Robotics

Page created by Douglas Gonzalez

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Computer Architecture Lecture 01 - Introduction - Pengju Ren Institute of Artificial Intelligence and Robotics

Computer Architecture
                                            2 1
                                     2    0
                    J T
   Lecture 01 - Introduction       U
                  X@
                 n
          j u  Re Pengju Ren
     n  g of Artificial Intelligence and Robotics
Pe Xi’an Jiaotong University
 Institute

http://gr.xjtu.edu.cn/web/pengjuren

Course Administration
Instructor: Pengju Ren
TA: Gelin Fu （Ph.D Candidate)
Lectures: Two 100-minute lectures a week
                                            2 1
                                      2   0
Textbook: Computer Architecture: A Quantitative Approach
           6th Edition(2019)J T     U
                          @X
Prerequisite: Digital System Structure and Design
                        n
                j u   Re
              ng
          Pe

                                                           2

Preface

“The most beautiful thing we can experience is the
                                                 1
mysterious. It is the source of all true art and Science.”
                                               2
                                             0 I believe, 1930
                         ---Albert Einstein,2What
                                    J T U
                             @   X
                           n
                  j u   Re
                ng
            Pe

                                                                 3

What is Computer Architecture

          Application

                                                2 1
                                            2 0 computer
                        In its broadest definition,
                        architecture Tis U
                                         the design of the
Gap too large to                  X J
                        abstraction/Implementation      layers
                               @ us to implement information
                        thatnallow
bridge in one step
                        R e
                        processing applications efficiently
                   gj u using available manufacturing
                e n technologies.
             P
            Physics

                                                                 4

What is Computer Architecture

            Application
            Algorithm                           2 1
      Programming Language                   20
 Operating System/Virtual Machines      J TU
Gap  too largeSet
   Instruction  to Architecture (ISA)
                                      @X
bridge in one step
                                 e  n
           Microarchitecture
                         j u  R
                    n  g
     Register-Transfer Level (RTL)
                  e
                 Gates
               PCircuits
              Devices
              Physics

                                                      5

What is Computer Architecture

            Application
            Algorithm                              2 1
      Programming Language                   2   0
 Operating System/Virtual Machines      J TU
Gap  too largeSet
   Instruction  to Architecture (ISA)
                                      @X
bridge in one step
                                 e  n
           Microarchitecture
                         j u  R          This course

                    n  g
     Register-Transfer Level (RTL)
                  e
                 Gates
               PCircuits
              Devices
              Physics

                                                         6

What is Computer Architecture

            Application                 Application Requirement:

            Algorithm                                 2 1
                                         Suggest how to improve

      Programming Language                    2     0
                                          architecture
                                         Provide revenue to fund

 Operating System/Virtual Machines      J T U
                                          development

Gap  too largeSet
   Instruction  to Architecture (ISA)
                                      @X
bridge in one step
                                 e  nArchitecture provides feedback to
           Microarchitecture
                         j u  R      guide application and technology
                                     research directions

                    n  g
     Register-Transfer Level (RTL)
                  e
                 Gates
               PCircuits               Technology Constraints:
                                        Restrict what can be done efficiently
              Devices                   New technologies make new arch
              Physics                     possible

                                                                                 7

Computing Devices Then…

                                          2 1
                                     2  0
                          J T      U
                        @X
                      n
              j u   Re
            ng
        Pe

     EDSAC, University of Cambridge, UK, 1949
                                                8

Computing Devices Now

                                     2 1
                              2    0
                       J T  U
                     @X
                   n
           j u   Re
         ng
      Pe
                        Modern computing is as much
                        about enhancing capabilities as
                        data processing!
                                                          9

Architecture continually changing

Applications suggest                           Improved
                                               technologies make
how to improve
                          Applications         2 1
                                               new applications
technology, provide
                                           2 0 possible
revenue to fund
                                    J T  U
development
                              @  X
                            n
                         Technology
                           e
                     j u R
                   ng
               Pe                 Cost of software development
                                  makes compatibility a major
                                  force in market

                                                                 10

Single-Thread(Sequential) Processor Performance
                          Mulitcore or ManyCore

                                                      [ Hennessy & Patterson, 2017 ]
                                                2 1
                                         2    0
                              J T      U
                            @X
                          n
                  j u   Re
                ng
         Pe
         RISC

                                                                   11

Moore’s Law Scaling with Cores

                                   2 1
                             2   0
                        J TU
                      @X
                    n
            j u   Re
          ng
       Pe

                                         12

Global Semiconductor Market

                                                             2 1
                                                      2    0
                                        J T         U
                                      @X
                                    n
                          j u     Re
                        ng
                   Pe
The global semiconductor market is estimated at $450 billion USD in revenue for 2020.
Products using these semiconductors represent global revenues of $2 trillion USD, or
around 3.5% of global gross domestic product (GDP)
                                                                  ISSCC2021‘Feb 13

Advanced Tech nodes continue provide value

                                                        2 1
                                                  2   0
                                    J T         U
                                  @X
                                n
                       j u    Re
                     ng
                Pe
Steady progress in two-dimensional transistor scaling and a variety of device
enhancement techniques have sustained energy-efficiency improvement and device
density gains from one technology generation to the next
                                                             ISSCC2021‘Feb 14

Upheaval in Computer Design
• Most of last 50 years, Moore’s Law ruled
   – Technology scaling allowed continual performance/energy
     improvements without changing software model
                                                      2  1
• Last decade, technology scaling slowed/stopped 2  0
                                         J   U
   – Dennard (voltage) scaling over (supplyTvoltage ~fixed)

                                  @    X
   – Moore’s Law (cost/transistor) over?

                             e
   – No competitive replacementn   for CMOS anytime soon

                       u  R
   – Energy efficiency constrains
                     j            everything

                n
• No “free lunch”  g    for software developers, must
          P
  consider:
             e
   – Parallel systems
   – Heterogeneous systems

                                                               15

Today’s Dominant Target Systems
• Mobile (smartphone/tablet)
   – >1 billion sold/year

     in system-on-a-chip (SoC)
                                                              2 1
   – Market dominated by ARM-ISA-compatible general-purpose processor

                                                       2    0
   – Plus sea of custom accelerators (radio, image, video, graphics, audio,
     motion, location, security, etc.)
                                       J T           U
• Warehouse-Scale Computers (WSCs)
                                     @X
   –
                                   n
       100,000’s cores per warehouse
   –
   –
                        j u      Re
       Market dominated by x86-compatible server chips
       Dedicated apps, plus cloud hosting of virtual machines
   –                  ng
       Now seeing increasing use of GPUs, FPGAs, custom hardware to
             e
           P
       accelerate workloads
• Embedded computing
   – Wired/wireless network infrastructure, printers
   – Consumer TV/Music/Games/Automotive/Camera/MP3
   – Internet of Things!

                                                                              16

Course Content Computer Architecture
• Instruction Level Parallelism
   – Superscalar
   – Very Long Instruction Word (VLIW)
                                                       2 1
• Advanced Memory and Caches
                                                2    0
• Data Level Parallelism
                                 J T          U
   – Vector Machine
                              @X
   – GPU
                           en
                     j u R
• Thread Level Parallelism
                  g
   – Multithreading
                n
              e
   – Multiprocessor/Multicore/ManyCore
            P
• Warehouse-Scale Computers (Request Level Paral.)
• Domain-Specific Architectures (DNN Accelerator)

                        Intel Nehalem Processor, Core i7
                                                             17

Architecture vs. Microarchitecture

“Architecture”/Instruction Set Architecture:
   Programmer visible state (Memory & Register)
                                              2 1
                                            0
   Operations (Instructions and how they work)
                                        2
   Execution Semantics (interrupts)
   Input/Output               J T    U
   Data Types/Sizes       @ X
                      e  n
Microarchitecture/Organization:
                j u
  Tradeoffs on how
                    R
                    to implement ISA for some metric
               g
             n Cost)
    (Speed,eEnergy,
         P Pipeline depth, number of pipelines, cache
  Examples:
    size, silicon area, peak power, execution ordering, bus
    widths, ALU widths

                                                              18

Same Architecture Diff Micro-Architecture

                                  2 1
                              2 0
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                            19

Diff Architecture Diff Micro-Architecture

                                   2 1
                               2 0
                         J T U
                       @X
                     n
             j u   Re
           ng
        Pe

                                            20

Where do Operands come from and
Where do Results Go ?

             Processor
                                       2 1
                                    20
                               J TU
                         ALU
                             @X
                           n
           j u           Re
         ng
      Pe
              MEMORY

                          …

                                             21

Where do Operands come from and
Where do Results Go ?
              Stack       Accumulator             Reg-Mem               Reg-Reg

                                                                Processor
                        Processor

                                          Processor
  Processor
                                                              2 1
                                                        2   0
               ALU                  ALU
                                          J T         U
                                                      ALU                   ALU

                                        @X
                                      n
                        j u         Re
                      ng

                                                                 MEMORY
                         MEMORY

                                           MEMORY
   MEMORY

                …                   …                 …                     …
                 Pe

                                                                                  22

Where do Operands come from and
      Where do Results Go ?
                         Stack       Accumulator               Reg-Mem                  Reg-Reg

                                                                                Processor
                                   Processor

                                                       Processor
             Processor
                                                                              2 1
                                                                       2    0
                          ALU                  ALU
                                                     J T             U
                                                                     ALU                     ALU

                                                   @X
                                                 n
                                   j u         Re
                                 ng

                                                                                 MEMORY
                                    MEMORY

                                                        MEMORY
              MEMORY

                           …                       …                   …                       …
                              Pe
Number Explicitly
                          0                    1                   2 or 3                   2 or 3
Named Operands
                                                                                                     23

Stack-Based Instruction Set Architecture(ISA)
               Stack
                           Burrough’s B5000 (1960)
   Processor
                           • Burrough’s B6700
                                                    2 1
                           • HP 3000
                                              2   0
                ALU
                           • ICL 2900
                                     J T    U
                                   @X
                           • Symbolics 3600
                           • Inmos Transputer
                                 n
                         j u   Re
                           Modern
                           • Forth machines
                       ng
    MEMORY

                 …
                  Pe       • Java Virtual Machine
                           • Intel x87 Floating Point Unit

                                                             24

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       25

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       26

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       27

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       28

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       29

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       30

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       31

Evaluation of Expressions

                                 2 1
                              20
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                       32

Hardware Organization of the Stack

                                             2 1
Stack is part of the processor state    2  0
                               T
     stack must be bounded and small
                             J        U
                            X
     ≈ number of Registers, not the size of main memory
                           @
                         n
Conceptually stack is unbounded

                 j u   Re
    a part of the stack is included in the

               ng
      processor state; the rest is kept in the main memory
           Pe

                                                             33

Stack Operations/Implicit Memory References

                                                    1
Suppose the top 2 elements of the stack are kept in registers and
                                                  2
the rest is kept in the memory.
                                            2   0
   Each push operation 1 memory
                                     J T U  reference
          pop operation        1@  X
                                 memory reference
                           e n
                      u  R
Better performance byjkeeping the top N elements in registers, and
                   g
memory referencesnare made only when register stack overflows or
underflows.   P e

                                                                    34

Stack Size and Memory References

                                   2 1
                             2   0
                       J T U
                     @X
                   n
           j u   Re
         ng
       Pe
                       Four Store and Fetch

                                              35

Stack Size and Memory References

                               2 1
                            20
                       J TU
                     @X
                   n
           j u   Re
         ng
       Pe

                                     36

Where do Operands come from and
    Where do Results Go ?
                     Stack        Accumulator                 Reg-Mem               Reg-Reg

                                                                            Processor
                                Processor

                                                      Processor
         Processor
                                                                          2 1
                                                                    2   0
                      ALU                    ALU
                                                   J T            U
                                                                  ALU                   ALU

                                                 @X
                                               n
                                j u          Re
                              ng

                                                                             MEMORY
                                 MEMORY

                                                       MEMORY
          MEMORY

                        …                      …                  …                     …
                        Pe
                     Push A                                                    Load R1, A
                                            Load A    Load R1 A
C= A+B               Push B                                                    Load R2, B
                                            Add B     Add R3 R1, B
                     Add                                                       Add R3, R1, R2
                                            Store C   Store R3, C
                     Pop C                                                     Store R3, C    37

Classes of Instructions

   • Data Transfer
                                          2 1
       – LD, ST, MFC1, MTC1, MFC0, MTC0
   • ALU                             2  0
                               J T U
       – ADD, SUB, AND, OR, XOR, MUL, DIV, SLT, LUI
   • Control Flow             X
                           @ERET
   • Floating Point Re
                         n
       – BEQZ, JR, JAL, TRAP,

               g j u
      – ADD.D, SUB.S, MUL.D, C.LT.D, CVT.S.W,
            e n (SIMD)
   • Multimedia
         P SUB.PS, MUL.PS, C.LT.PS
      – ADD.PS,
   • String
     – REP MOVSB (x86)

                                                      38

Addressing Modes: How to get operands from
Memory (MIPS)

                                      2 1
                               2    0
                        J T  U
                      @X
                    n
            j u   Re
          ng
       Pe

                       ** May not actually access memory   39

ISA Encoding

                                2 1
                             20
                        J TU
                      @X
                    n
            j u   Re
          ng
       Pe

                                      40

Case study: X86(IA-32) Instruction Encoding

                                  2 1
                              2 0
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                              41

RISC-V Instruction Encoding(1)

                                   2 1
                              2  0
                        J T U
                      @X
                    n
            j u   Re
          ng
       Pe

                                         42

RISC-V Instruction Encoding(2)

New open-source, license-free
ISA spec
                                                                 2 1
• Supported by growing shared                            2     0
   software ecosystem
                                       J T             U
• Appropriate for all levels of
                                     @X
   computing system, from          n
   microcontrollers to
                        j u      Re
   supercomputers
                      ng
                Pe
• 32-bit, 64-bit, and 128-bit
   variants (we’re using 32-bit in
   class, textbook uses 64-bit)

           http://www-inst.eecs.berkeley.edu/~cs61c/fa18/img/riscvcard.pdf   43

Real World Instruction Sets

                                   2 1
                                20
                         J T  U
                       @X
                     n
             j u   Re
           ng
       Pe

                                         44

Why the Diversity in ISAs?

Application Influenced ISA
• Instructions for Applications
 – DSP instructions                          2 1
• Compiler Technology has improved      2  0
                                   J TU
 – SPARC Register Windows no longer needed
                              @  X
 – Compiler can register allocate effectively
                            n
                          eISA
Technology Influenced
                   j u  R
                 g
• Storage is expensive, tight encoding important
                n Set Computer
            P e
• Reduced Instruction
  – Remove instructions until whole computer fits on die
• Multicore/Manycore
  – Transistors not turning into sequential performance

                                                           45

Recap

           Application
            Algorithm                               2 1
        Programming Language                    2 0
                                         ISA vs Micro-Architecture
                                         ISAUCharacteristics
 Operating System/Virtual Machines
                                       XJ T
                                         • Machine Models
Gap  too largeSet
                to Architecture (ISA)
   Instruction
                                    n @ • Encoding
bridge in one step
           Microarchitecture
                              R  e
                       g j u
     Register-Transfer Level (RTL)
                                         • Data Types

                  e n
                 Gates                   • Instructions
               PCircuits                 • Addressing Modes
             Devices
             Physics

                                                                     46

And in conclusion …
• Computer Architecture >> ISAs and RTL
• Computer Architecture is about interaction of
  hardware and software, and design of appropriate      2 1
  abstraction layers                             2    0
                                            J TU
                                         X
• Computer architecture is shaped by technology and
                                    @
  applications
   – History provides lessonsR
                               e  n
                             for the future
• Computer Science      j u
                      g at the crossroads from sequential
                e  n
  to parallelPcomputing
   – Salvation requires innovation in many fields, including computer
     architecture
• Read Chapter 1 & Appendix A for next time! (6th)
                                                                        47

2 1
                                0 2
Next Lecture： RISC-V ISA, Datapath
                              T U   & Control
                           XJ
                          @
         (ISA and Micro-Architecture)
                        n
                     R e
                gj u
              n
           Pe

                                            48

Acknowledgements
• Some slides contain material developed and copyright by:
   –   Arvind (MIT)
   –   Krste Asanovic (MIT/UCB)
                                                     2 1
   –   Joel Emer (Intel/MIT)
                                                  20
   –   James Hoe (CMU)
                                 J T            U
   –
   –
       David Patterson (UCB)

                            @  X
       David Wentzlaff (Princeton University)

                         en
                    u
• MIT material derived
                  j   Rfrom course 6.823
                g from course CS252 and CS 61C
• UCB materialnderived
         Pe

                                                             49

Idealized Uniprocessor Model
Processor names variables:
   – Integers, floats, pointers, arrays, structures, etc.
                                                      2 1
                                                    0
   – These are really words, e.g., 64-bit double, 32-bit INTs, bytes, etc.
                                                  2
Processor performs operations on those
                                         J T  U    variables:
  – Arithmetic, logical operations, etc.X
                                 n @on values in registers
                               e
  – Only performs these operations
                             R
Processor controls the
                      g  j u order, as specified by program
  – Branches(if), e n
                  loops(while), function calls, etc.
Idealized Cost
               P
   – Each operation has roughly the same cost: add, multiply, etc.

                                                                        50

2 1
Six Great Ideas in Computer U 20
                            Architecture
                       XJ T
                    n @
            j u   Re
          ng
      Pe

                                           51

New School Machine Architecture

• Parallel Requests
                        Software            Hardware
                                                                          2 1
    Assigned to computer      Harness
                                                 Warehouse
                                                     Scale
                                                                  2     0                      Smart
                                                                                               Phone
    e.g., Search “Cats”     Parallelism &
                                         J T
                                                  Computer
                                                                U
• Parallel Threads
    Assigned to core
                        Achieve High
                                       @X                                       Computer
    e.g., Lookup, Ads
                        Performance
                                    en                         Core           …      Core

• Parallel Instructions
                              j u R                         Memory             (Cache)

                         n g
    >1 instruction @ one time
                                                            Input/Output
                                                        Instruction Unit(s)
                                                                                     Core
                                                                                   Functional

• Parallel Data
                      e
    e.g., 5 pipelined instructions
                    P                                                              Unit(s)
                                                                       A0+B0A1+B1A2+B2A3+B3
    >1 data item @ one time                                  Main Memory
    e.g., Add of 4 pairs of words                                                            Logic Gates
• Hardware descriptions
    All gates working in parallel at same time

                                                                                                   52

Great Idea #1: Abstraction
 (Levels of Representation/Interpretation)

     High Level Language            temp = v[k];
                                                        2 1
       Program (e.g., C)
                 Compiler
                                    v[k] = v[k+1];
                                                    2 0
       Assembly Language
                                     J
                                    lw T          U
                                    v[k+1] = temp; Anything can be represented
                                         $t0, 0($2)                as a number,
      Program (e.g., RISC-V)
                 Assembler
                                   @X
                                    lw
                                    sw
                                    sw
                                         $t1, 4($2)
                                         $t1, 0($2)
                                         $t0, 4($2)
                                                        i.e., data or instructions
       Machine Language
                                 n
                               Re
                                    1000 1101 1110 0010 0000 0000 0000 0000
        Program (RISC-V)
Machine
                       j u          1000 1110 0001 0000 0000 0000 0000 0100

                     ng
Interpretation                      1010 1110 0001 0010 0000 0000 0000 0000

                 Pe
Hardware Architecture Description   1010 1101 1110 0010 0000 0000 0000 0100
     (e.g., block diagrams)
Architecture
Implementation
    Logic Circuit Description
  (Circuit Schematic Diagrams)

                                                                                 53

Great Idea #2: Moore’s law
(Technique driven development)

                                      2 1
                                   20
                          J T    U
                        @X
                      n
              j u   Re
            ng
         Pe

                                            54

Great Idea #3: Principle of Locality/
Memory Hierarchy

                                          2 1
                                     2  0
                            J T    U
                          @X
                        n
                j u   Re
              ng
          Pe

                                                55

Great Idea #4: Parallelism

                                  2 1
                               20
                          J TU
                        @X
                      n
              j u   Re
            ng
         Pe

                                        56

2 1
                     20
                J TU
              @X
            n
    j u   Re              Gene Amdahl

  ng
Pe
                          Computer Pioneer
                              Amdahl’s Law

                                        57

Great Idea #5: Performance Measurement
 and Improvement

• Matching application to underlying hardware
  to exploit:                        2 1
                                   0       2
  – Locality
                              J T        U
  – Parallelism
                            @X
                          n
  – Special hardware features, like specialized instructions

                  j u   Re
    (e.g., matrix manipulation)
• Latency
             e  ng
           P
  – How long to set the problem up
  – How much faster does it execute once it gets going
  – It is all about time to finish

                                                               58

Great Idea #6: Dependability via Redundancy

• Redundancy so that a failing piece doesn’t make the whole       2 1
  system fail                                             2     0
                                         J T            U
                                      @
                                   1+1=2X                2 of 3 agree

                                    n
                         j u      Re
               e       ng
             P
          1+1=2                     1+1=2                      1+1=1    FAIL!
Increasing transistor density reduces the cost of redundancy

                                                                            59

Great Idea #6: Dependability via Redundancy

• Applies to everything from datacenters to         2 1
  storage to memory to instructors            2   0
                                  J T       U
                                @X
   – Redundant datacenters so that can lose 1
     datacenter but Internet service stays online
                              n
                      j u   Re
   – Redundant disks so that can lose 1 disk but not lose
     data (Redundant Arrays of Independent Disks/RAID)
                    ng
                Pe
   – Redundant memory bits of so that can lose 1 bit but
     no data (Error Correcting Code/ECC Memory)

                                                            60

You can also read