Computer Architecture Lecture 01 - Introduction - Pengju Ren Institute of Artificial Intelligence and Robotics
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Computer Architecture 2 1 2 0 J T Lecture 01 - Introduction U X@ n j u Re Pengju Ren n g of Artificial Intelligence and Robotics Pe Xi’an Jiaotong University Institute http://gr.xjtu.edu.cn/web/pengjuren
Course Administration Instructor: Pengju Ren TA: Gelin Fu (Ph.D Candidate) Lectures: Two 100-minute lectures a week 2 1 2 0 Textbook: Computer Architecture: A Quantitative Approach 6th Edition(2019)J T U @X Prerequisite: Digital System Structure and Design n j u Re ng Pe 2
Preface “The most beautiful thing we can experience is the 1 mysterious. It is the source of all true art and Science.” 2 0 I believe, 1930 ---Albert Einstein,2What J T U @ X n j u Re ng Pe 3
What is Computer Architecture Application 2 1 2 0 computer In its broadest definition, architecture Tis U the design of the Gap too large to X J abstraction/Implementation layers @ us to implement information thatnallow bridge in one step R e processing applications efficiently gj u using available manufacturing e n technologies. P Physics 4
What is Computer Architecture Application Algorithm 2 1 Programming Language 20 Operating System/Virtual Machines J TU Gap too largeSet Instruction to Architecture (ISA) @X bridge in one step e n Microarchitecture j u R n g Register-Transfer Level (RTL) e Gates PCircuits Devices Physics 5
What is Computer Architecture Application Algorithm 2 1 Programming Language 2 0 Operating System/Virtual Machines J TU Gap too largeSet Instruction to Architecture (ISA) @X bridge in one step e n Microarchitecture j u R This course n g Register-Transfer Level (RTL) e Gates PCircuits Devices Physics 6
What is Computer Architecture Application Application Requirement: Algorithm 2 1 Suggest how to improve Programming Language 2 0 architecture Provide revenue to fund Operating System/Virtual Machines J T U development Gap too largeSet Instruction to Architecture (ISA) @X bridge in one step e nArchitecture provides feedback to Microarchitecture j u R guide application and technology research directions n g Register-Transfer Level (RTL) e Gates PCircuits Technology Constraints: Restrict what can be done efficiently Devices New technologies make new arch Physics possible 7
Computing Devices Now 2 1 2 0 J T U @X n j u Re ng Pe Modern computing is as much about enhancing capabilities as data processing! 9
Architecture continually changing Applications suggest Improved technologies make how to improve Applications 2 1 new applications technology, provide 2 0 possible revenue to fund J T U development @ X n Technology e j u R ng Pe Cost of software development makes compatibility a major force in market 10
Single-Thread(Sequential) Processor Performance Mulitcore or ManyCore [ Hennessy & Patterson, 2017 ] 2 1 2 0 J T U @X n j u Re ng Pe RISC 11
Moore’s Law Scaling with Cores 2 1 2 0 J TU @X n j u Re ng Pe 12
Global Semiconductor Market 2 1 2 0 J T U @X n j u Re ng Pe The global semiconductor market is estimated at $450 billion USD in revenue for 2020. Products using these semiconductors represent global revenues of $2 trillion USD, or around 3.5% of global gross domestic product (GDP) ISSCC2021‘Feb 13
Advanced Tech nodes continue provide value 2 1 2 0 J T U @X n j u Re ng Pe Steady progress in two-dimensional transistor scaling and a variety of device enhancement techniques have sustained energy-efficiency improvement and device density gains from one technology generation to the next ISSCC2021‘Feb 14
Upheaval in Computer Design • Most of last 50 years, Moore’s Law ruled – Technology scaling allowed continual performance/energy improvements without changing software model 2 1 • Last decade, technology scaling slowed/stopped 2 0 J U – Dennard (voltage) scaling over (supplyTvoltage ~fixed) @ X – Moore’s Law (cost/transistor) over? e – No competitive replacementn for CMOS anytime soon u R – Energy efficiency constrains j everything n • No “free lunch” g for software developers, must P consider: e – Parallel systems – Heterogeneous systems 15
Today’s Dominant Target Systems • Mobile (smartphone/tablet) – >1 billion sold/year in system-on-a-chip (SoC) 2 1 – Market dominated by ARM-ISA-compatible general-purpose processor 2 0 – Plus sea of custom accelerators (radio, image, video, graphics, audio, motion, location, security, etc.) J T U • Warehouse-Scale Computers (WSCs) @X – n 100,000’s cores per warehouse – – j u Re Market dominated by x86-compatible server chips Dedicated apps, plus cloud hosting of virtual machines – ng Now seeing increasing use of GPUs, FPGAs, custom hardware to e P accelerate workloads • Embedded computing – Wired/wireless network infrastructure, printers – Consumer TV/Music/Games/Automotive/Camera/MP3 – Internet of Things! 16
Course Content Computer Architecture • Instruction Level Parallelism – Superscalar – Very Long Instruction Word (VLIW) 2 1 • Advanced Memory and Caches 2 0 • Data Level Parallelism J T U – Vector Machine @X – GPU en j u R • Thread Level Parallelism g – Multithreading n e – Multiprocessor/Multicore/ManyCore P • Warehouse-Scale Computers (Request Level Paral.) • Domain-Specific Architectures (DNN Accelerator) Intel Nehalem Processor, Core i7 17
Architecture vs. Microarchitecture “Architecture”/Instruction Set Architecture: Programmer visible state (Memory & Register) 2 1 0 Operations (Instructions and how they work) 2 Execution Semantics (interrupts) Input/Output J T U Data Types/Sizes @ X e n Microarchitecture/Organization: j u Tradeoffs on how R to implement ISA for some metric g n Cost) (Speed,eEnergy, P Pipeline depth, number of pipelines, cache Examples: size, silicon area, peak power, execution ordering, bus widths, ALU widths 18
Same Architecture Diff Micro-Architecture 2 1 2 0 J T U @X n j u Re ng Pe 19
Diff Architecture Diff Micro-Architecture 2 1 2 0 J T U @X n j u Re ng Pe 20
Where do Operands come from and Where do Results Go ? Processor 2 1 20 J TU ALU @X n j u Re ng Pe MEMORY … 21
Where do Operands come from and Where do Results Go ? Stack Accumulator Reg-Mem Reg-Reg Processor Processor Processor Processor 2 1 2 0 ALU ALU J T U ALU ALU @X n j u Re ng MEMORY MEMORY MEMORY MEMORY … … … … Pe 22
Where do Operands come from and Where do Results Go ? Stack Accumulator Reg-Mem Reg-Reg Processor Processor Processor Processor 2 1 2 0 ALU ALU J T U ALU ALU @X n j u Re ng MEMORY MEMORY MEMORY MEMORY … … … … Pe Number Explicitly 0 1 2 or 3 2 or 3 Named Operands 23
Stack-Based Instruction Set Architecture(ISA) Stack Burrough’s B5000 (1960) Processor • Burrough’s B6700 2 1 • HP 3000 2 0 ALU • ICL 2900 J T U @X • Symbolics 3600 • Inmos Transputer n j u Re Modern • Forth machines ng MEMORY … Pe • Java Virtual Machine • Intel x87 Floating Point Unit 24
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 25
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 26
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 27
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 28
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 29
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 30
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 31
Evaluation of Expressions 2 1 20 J T U @X n j u Re ng Pe 32
Hardware Organization of the Stack 2 1 Stack is part of the processor state 2 0 T stack must be bounded and small J U X ≈ number of Registers, not the size of main memory @ n Conceptually stack is unbounded j u Re a part of the stack is included in the ng processor state; the rest is kept in the main memory Pe 33
Stack Operations/Implicit Memory References 1 Suppose the top 2 elements of the stack are kept in registers and 2 the rest is kept in the memory. 2 0 Each push operation 1 memory J T U reference pop operation 1@ X memory reference e n u R Better performance byjkeeping the top N elements in registers, and g memory referencesnare made only when register stack overflows or underflows. P e 34
Stack Size and Memory References 2 1 2 0 J T U @X n j u Re ng Pe Four Store and Fetch 35
Stack Size and Memory References 2 1 20 J TU @X n j u Re ng Pe 36
Where do Operands come from and Where do Results Go ? Stack Accumulator Reg-Mem Reg-Reg Processor Processor Processor Processor 2 1 2 0 ALU ALU J T U ALU ALU @X n j u Re ng MEMORY MEMORY MEMORY MEMORY … … … … Pe Push A Load R1, A Load A Load R1 A C= A+B Push B Load R2, B Add B Add R3 R1, B Add Add R3, R1, R2 Store C Store R3, C Pop C Store R3, C 37
Classes of Instructions • Data Transfer 2 1 – LD, ST, MFC1, MTC1, MFC0, MTC0 • ALU 2 0 J T U – ADD, SUB, AND, OR, XOR, MUL, DIV, SLT, LUI • Control Flow X @ERET • Floating Point Re n – BEQZ, JR, JAL, TRAP, g j u – ADD.D, SUB.S, MUL.D, C.LT.D, CVT.S.W, e n (SIMD) • Multimedia P SUB.PS, MUL.PS, C.LT.PS – ADD.PS, • String – REP MOVSB (x86) 38
Addressing Modes: How to get operands from Memory (MIPS) 2 1 2 0 J T U @X n j u Re ng Pe ** May not actually access memory 39
ISA Encoding 2 1 20 J TU @X n j u Re ng Pe 40
Case study: X86(IA-32) Instruction Encoding 2 1 2 0 J T U @X n j u Re ng Pe 41
RISC-V Instruction Encoding(1) 2 1 2 0 J T U @X n j u Re ng Pe 42
RISC-V Instruction Encoding(2) New open-source, license-free ISA spec 2 1 • Supported by growing shared 2 0 software ecosystem J T U • Appropriate for all levels of @X computing system, from n microcontrollers to j u Re supercomputers ng Pe • 32-bit, 64-bit, and 128-bit variants (we’re using 32-bit in class, textbook uses 64-bit) http://www-inst.eecs.berkeley.edu/~cs61c/fa18/img/riscvcard.pdf 43
Real World Instruction Sets 2 1 20 J T U @X n j u Re ng Pe 44
Why the Diversity in ISAs? Application Influenced ISA • Instructions for Applications – DSP instructions 2 1 • Compiler Technology has improved 2 0 J TU – SPARC Register Windows no longer needed @ X – Compiler can register allocate effectively n eISA Technology Influenced j u R g • Storage is expensive, tight encoding important n Set Computer P e • Reduced Instruction – Remove instructions until whole computer fits on die • Multicore/Manycore – Transistors not turning into sequential performance 45
Recap Application Algorithm 2 1 Programming Language 2 0 ISA vs Micro-Architecture ISAUCharacteristics Operating System/Virtual Machines XJ T • Machine Models Gap too largeSet to Architecture (ISA) Instruction n @ • Encoding bridge in one step Microarchitecture R e g j u Register-Transfer Level (RTL) • Data Types e n Gates • Instructions PCircuits • Addressing Modes Devices Physics 46
And in conclusion … • Computer Architecture >> ISAs and RTL • Computer Architecture is about interaction of hardware and software, and design of appropriate 2 1 abstraction layers 2 0 J TU X • Computer architecture is shaped by technology and @ applications – History provides lessonsR e n for the future • Computer Science j u g at the crossroads from sequential e n to parallelPcomputing – Salvation requires innovation in many fields, including computer architecture • Read Chapter 1 & Appendix A for next time! (6th) 47
2 1 0 2 Next Lecture: RISC-V ISA, Datapath T U & Control XJ @ (ISA and Micro-Architecture) n R e gj u n Pe 48
Acknowledgements • Some slides contain material developed and copyright by: – Arvind (MIT) – Krste Asanovic (MIT/UCB) 2 1 – Joel Emer (Intel/MIT) 20 – James Hoe (CMU) J T U – – David Patterson (UCB) @ X David Wentzlaff (Princeton University) en u • MIT material derived j Rfrom course 6.823 g from course CS252 and CS 61C • UCB materialnderived Pe 49
Idealized Uniprocessor Model Processor names variables: – Integers, floats, pointers, arrays, structures, etc. 2 1 0 – These are really words, e.g., 64-bit double, 32-bit INTs, bytes, etc. 2 Processor performs operations on those J T U variables: – Arithmetic, logical operations, etc.X n @on values in registers e – Only performs these operations R Processor controls the g j u order, as specified by program – Branches(if), e n loops(while), function calls, etc. Idealized Cost P – Each operation has roughly the same cost: add, multiply, etc. 50
2 1 Six Great Ideas in Computer U 20 Architecture XJ T n @ j u Re ng Pe 51
New School Machine Architecture • Parallel Requests Software Hardware 2 1 Assigned to computer Harness Warehouse Scale 2 0 Smart Phone e.g., Search “Cats” Parallelism & J T Computer U • Parallel Threads Assigned to core Achieve High @X Computer e.g., Lookup, Ads Performance en Core … Core • Parallel Instructions j u R Memory (Cache) n g >1 instruction @ one time Input/Output Instruction Unit(s) Core Functional • Parallel Data e e.g., 5 pipelined instructions P Unit(s) A0+B0A1+B1A2+B2A3+B3 >1 data item @ one time Main Memory e.g., Add of 4 pairs of words Logic Gates • Hardware descriptions All gates working in parallel at same time 52
Great Idea #1: Abstraction (Levels of Representation/Interpretation) High Level Language temp = v[k]; 2 1 Program (e.g., C) Compiler v[k] = v[k+1]; 2 0 Assembly Language J lw T U v[k+1] = temp; Anything can be represented $t0, 0($2) as a number, Program (e.g., RISC-V) Assembler @X lw sw sw $t1, 4($2) $t1, 0($2) $t0, 4($2) i.e., data or instructions Machine Language n Re 1000 1101 1110 0010 0000 0000 0000 0000 Program (RISC-V) Machine j u 1000 1110 0001 0000 0000 0000 0000 0100 ng Interpretation 1010 1110 0001 0010 0000 0000 0000 0000 Pe Hardware Architecture Description 1010 1101 1110 0010 0000 0000 0000 0100 (e.g., block diagrams) Architecture Implementation Logic Circuit Description (Circuit Schematic Diagrams) 53
Great Idea #2: Moore’s law (Technique driven development) 2 1 20 J T U @X n j u Re ng Pe 54
Great Idea #3: Principle of Locality/ Memory Hierarchy 2 1 2 0 J T U @X n j u Re ng Pe 55
Great Idea #4: Parallelism 2 1 20 J TU @X n j u Re ng Pe 56
2 1 20 J TU @X n j u Re Gene Amdahl ng Pe Computer Pioneer Amdahl’s Law 57
Great Idea #5: Performance Measurement and Improvement • Matching application to underlying hardware to exploit: 2 1 0 2 – Locality J T U – Parallelism @X n – Special hardware features, like specialized instructions j u Re (e.g., matrix manipulation) • Latency e ng P – How long to set the problem up – How much faster does it execute once it gets going – It is all about time to finish 58
Great Idea #6: Dependability via Redundancy • Redundancy so that a failing piece doesn’t make the whole 2 1 system fail 2 0 J T U @ 1+1=2X 2 of 3 agree n j u Re e ng P 1+1=2 1+1=2 1+1=1 FAIL! Increasing transistor density reduces the cost of redundancy 59
Great Idea #6: Dependability via Redundancy • Applies to everything from datacenters to 2 1 storage to memory to instructors 2 0 J T U @X – Redundant datacenters so that can lose 1 datacenter but Internet service stays online n j u Re – Redundant disks so that can lose 1 disk but not lose data (Redundant Arrays of Independent Disks/RAID) ng Pe – Redundant memory bits of so that can lose 1 bit but no data (Error Correcting Code/ECC Memory) 60
You can also read