Azul's Experiences with Hardware Transactional Memory - A Lock-Free Hash Table 2009 Transaction Memory Workshop

Page created by Ivan Sanders

Government & Politics

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Azul's Experiences with Hardware Transactional Memory - A Lock-Free Hash Table 2009 Transaction Memory Workshop

2009 Transaction Memory Workshop

         A Lock-Free Hash Table

Azul's Experiences with
Hardware Transactional Memory
Dr. Cliff Click
Chief JVM Architect & Distinguished Engineer
blogs.azulsystems.com/cliff
Azul Systems
January 31, 2009

Azul Systems
                                                                www.azulsystems.com

    •    Designs our own chips (fab'ed by TSMC)
    •    Builds our own systems
    •    Targeted for running business Java
    •    Large core count - 54 cores per die
            ─ Up to 16 die are cache-coherent
            ─ Very weak memory model meets Java spec w/fences

    • “UMA” - Flat medium memory speeds
            ─ Business Java is irregular computation
            ─ Have supercomputer-level bandwidth

    • Modest per-cpu caches
            ─ 54*(16K+16K) = 1.728Meg fast L1 cache
            ─ 6*2M = 12M L2 cache
            ─ Groups of 9 CPUs share L2
2
    |   ©2008 Azul Systems, Inc.

Azul Systems
                                                                www.azulsystems.com

    • Cores are classic in-order 64-bit 3-address RISCs
    • Each core can sustain 2 cache-missing ops
            ─ Plus each L2 can sustain 24 prefetches
            ─ 2300+ outstanding memory references at any time

    • Some special ops for Java
            ─ Read & Write barriers for GC
            ─ Array addressing and range checks
            ─ Fast virtual calls

    • But core clock rate not real high
    • So task-level parallelism is the name of the game

3
    |   ©2008 Azul Systems, Inc.

The Bottleneck is not the Platform
                                                                www.azulsystems.com

    • JVM scales linear to 864 CPUs
            ─ Have bandwidth to feed them all as well

    • Lite micro-kernel OS
            ─ Easily supports >100K runnable threads

    • Heaps >500Gig
            ─ Sustained allocation rates >40Gig/sec

                     How do we enable users to write programs
                     with hundreds of runnable threads?

4
    |   ©2008 Azul Systems, Inc.

The Bottleneck is not the Platform
                                                                              www.azulsystems.com

    • “Big Thread” programs tend to fall into 2 main camps
            ─ Parallel data “science” (or really “financial modeling”) apps
            ─ Web-tier app-server thread-pool + worklist apps

    • Data-parallel apps tend to scale nice
            ─ After a (short) round of tweaking
            ─ Although JDK concurrency libs often an issue at > 64 cpus.

    • Web-tier apps are more common
            ─ And scale less well
            ─ Internal locking of shared structures
            ─ e.g. legacy uses of Hashtable

    • Frequently see

Legacy Locking is an Issue
                                                           www.azulsystems.com

    • Lots of Old Java out there
    • 3rd-party apps being linked in
            ─ Source not available (company defunct?)

    • Intended to run on 1 cpu (and maybe now 2 cpu) machines
    • But lots of the locking is useless
            ─      Data contention is rare
            ─      At least on the users' data
            ─      Lock contention is far more common
            ─      e.g. Hashtable
            ─      But also shared large sync'd HashMaps
            ─      Lots of other Container classes

6
    |   ©2008 Azul Systems, Inc.

No “atomic” keyword
                                                                    www.azulsystems.com

    • No desire to enter the “language wars”
    • Costumers want old code to “just run faster”
            ─ Dusty-deck acceleration

    • So Azul built TM support to accelerate Java locks
            ─ Speed up “synchronized” keyword
            ─ No “atomic” keyword

    • Uncontended locks are by far the most common
            ─ And uncontended locks already fast
            ─ Azul's CAS and Fences can “hit in cache”

    • Contended locks already fast as can be
            ─ Lite microkernel OS; very fast context switch times

7
    |   ©2008 Azul Systems, Inc.

Hardware TM Support
                                                                         www.azulsystems.com

    • Built in our first chips
    • Leverage L1
            ─ Extra bits per cache-line
            ─ “Speculatively-read”, “speculatively-written”

    • No changes to L2 at all
            ─ No other CPU is aware that this CPU is attempting a XTN

    • SPECULATE instruction
            ─ Flips CPU speculate mode; starts tracking reads & writes

    • ABORT instruction, or abort on eviction from L1
            ─ Mark spec-lines as “invalid”

    • COMMIT instruction
            ─ Clear spec-bits

8
    |   ©2008 Azul Systems, Inc.

Hardware TM Support
                                                               www.azulsystems.com

    • No hardware register rollback
            ─ Software responsible for all recovery on Abort
            ─ Possible because only accelerating Java locks

    • Limited by size of L1
    • Limited by associativity of L1
            ─ And L2, since shared inclusive L2

    • No graceful fallback on Abort
            ─ No attempt at STM support

    • No abort on TLB miss or function call or ...
    • Either an XTN fits in cache or it doesn't
    • Software heuristics determine when to use the HTM

9
    |   ©2008 Azul Systems, Inc.

Software TM Support
                                                            www.azulsystems.com

     • Just “synchronized” keywords
     • HotSpot “thin locks” for uncontended locks
             ─ Just use the fast CAS, no HTM

     • Contended locks attempt HTM
             ─ Success-time/fail-time ratios kept
             ─ After some initial attempts
                ─ If ratio is bad, switch to OS lock
             ─ Periodically re-measure success/fail ratio

10
     |   ©2008 Azul Systems, Inc.

Experiences
                                                                     www.azulsystems.com

     • Some apps get 2x speedup (e.g. Trade6)
     • Most get

XTN Size
                                                            www.azulsystems.com

     • Limited by size & associativity of L1 & L2
     • But this appears to be generous:
     • XTN's of many thousands of instructions happen
             ─ Which include 100's of cache-hitting loads

     • Most interesting XTNs fail for conflict
          and not capacity.

12
     |   ©2008 Azul Systems, Inc.

Issues
                                                                         www.azulsystems.com

     • Users don't write “TM friendly” code
     • Neither do library writers:
             ─      e.g. “modcount” - bumped per mod to Hashtable
             ─      Large shared Hashtable, most updates are unrelated
             ─      Update itself works well in the HTM
             ─      But updates to shared “modcount” blow out HTM
             ─      “modcount” is mostly unused & useless field update
     • Same issue with many performance counters
             ─ Locked writes to a shared variable kills TM

     • Many times a small rewrite makes HTM possible
             ─ But blows the “dusty deck” goal

13
     |   ©2008 Azul Systems, Inc.

Issues
                                                              www.azulsystems.com

     • Hard part is to get customer past “dusty deck” thinking
     • Once a code rewrite is “on the table”
             ─ Customer goes whole-hog
             ─ And rewrites w/fine-grained locking

     •    Generally only need to “crack” a few locks
     •    Generally fairly easy, once exact locks are known
     •    Code then scales on all platforms, not just Azul
     •    Locks have known performance issues
             ─ Better than unknown TM rollback/retry issues

14
     |   ©2008 Azul Systems, Inc.

Hardware Abort is “Too Good”
                                                                           www.azulsystems.com

     • Hardware abort rolls back ALL memory writes
             ─ No “breadcrumbs” on failure

     • Hard to record fail reason!
             ─ Must pass fail code out in a reserved register
             ─ Even on success path

     • First CPUs did not report hardware failure
             ─ Cannot tell capacity issues from associativity issues from...

     • Failure reason is required for a robust heuristic

15
     |   ©2008 Azul Systems, Inc.

Summary
                                                                      www.azulsystems.com

     • Modest gains
     • Rewrites of “dumb” code could help a lot
     • Azul did some of this for common JDK pieces
             ─ Just too much of it Out There
     • Future work:
             ─ Robust customer friendly tool for finding XTN
               “unfriendly” code
             ─ Let customer solve the “why does HTM fail?”

                                http://blogs.azulsystems.com/cliff/

16
     |   ©2008 Azul Systems, Inc.

#1 Platform for
     Business Critical Java™

   WWW.AZULSYSTEMS.COM

Thank You

You can also read

2021 Educational Courses Catalog - STULZ.COM

BAE Systems Naval Ships - Centre for ...

Thales Overview & Cybersecurity on Critical Infrastructure Protection - April 2018 A.Toaiari

ENABLING DYNAMIC ANALYSIS OF LEGACY EMBEDDED SYSTEMS IN FULL EMULATED ENVIRONMENT - TA-LUN YEN TXONE IOT/ICS SECURITY RESEARCH LABS (TREND MICRO)

FUJITSU PRIMEHPC FX700 COMPILER AND PROFILING TOOLS - VION CORPORATION

KeyView Export SDK IDOL - Release Notes - Micro Focus

YELLOWPAGES.COM: Behind the Curtain - John Straw AT&T Interactive

Exterior Merchandising Solutions for Wendy's R - 2020 Product & Price Guide 1-800-386-9767 - Voxpop Marketing Systems

Learn to Program with Ruby - RailsBridge Triangle 2015

ABC3 and Quality Assurance - Dr Karin Denton Director of Cancer screening QA (SW)

Unboxing the Library: Introduction to APIs - Laura Wrubel @liblaura Library of Congress, Feb 21, 2018 - Library of ...

Insulation & Air Barrier - Viridiant

Proven Professional Certification - How to Schedule and Take an Exam - Dell EMC Education Services

Embedded Ruby Controlling Hardware with Simple Software

TMS Naming Scheme (325-v5) - wsdot

Monaco guide FINE DINING / 2020 GRAND PRIX EDITION - CATALANOSHIPPINGSERVICES +37793508686 - Catalano ...

DUSIM: O-DU EMULATION SOLUTION - FOR O-CU TESTING OVER F1 INTERFACE - KEYSIGHT

School Holiday Workshops - September / October 2021 Spring Holidays - Available for all children at two locations: Wesley College Glen Waverley ...

IMPLEMENTING THE EUROPEAN STUDENT CARD - Welcome to session 07.17 Room 3E, Level 0, Messukeskus Expo and Convention Centre, 26/09/2019 ...

Defiance Tools Launches "Tools To Navigate Life" at 2018 ACE Hardware Fall Show - Shopify

SAFE - Building Code Guidelines Residential Swimming Pools & Spas - KEEP YOUR POOL - Gloucester County, VA

GUIDE FOR INTERNATIONAL AND EXCHANGE STUDENTS - Polo ...

Instructional Guide For the - European Certificate of Inspection (COI) - Quality Assurance International

MHSSC DRESS CODE (for implementation from 2021)

BEYOND JAVASCRIPT FRAMEWORKS: WRITING RELIABLE WEB APPS WITH - ERIK WENDEL JFOKUS 2018

M3 Cloud Edition, Our Way of Working - Infoteam

Legion Youth Remembrance Contest - INFORMATION FOR TEACHERS - School District 69

Reading: Using Senses in the Winter - Tahoe Institute for ...

EN Milk frother Translation of the original manual - SMF 1010BK - FAST ČR

Tutorial on Parallel Debugging Victor Eijkhout TACC HPC Training 2021

Letter Sound Quiz Long O Words Level 1 - Boom Cards Digital task cards for interactive learning

Symbian OS Presented by: Daniel O'Grady Michael O'Sullivan

GOOGLE ANDROID OPERATING SYSTEM - JOHNNIE SPAIGHT 0543292 JAMES MCHUGH 0052833

Emissions Impacts of CCAM - From the Use to a Life-Cycle Approach Margarida C. Coelho - Dept. Mechanical Engineering, University of Aveiro ...

DNA Testing February 16, 2018

BIAS19 PROMOTIONAL PACKAGE - Spectacle Aérien International de ...

Fast Facts: Acne Second edition - Karger Publishers

Online Safety Keeping young people safe on the Internet

INTEGRITY COMMISSIONER & INCOMING COUNCILLORS INDUCTION 2020 - INTEGRITY COMMISSIONER, DR NIKOLA STEPANOV PHD (MELB.) - CRIME AND CORRUPTION ...

The limbic system Introduction to Systems Neuroscience Nov. 19, 2019 - Daniel C. Kiper

Optimize Productivity by Pruning for Maximum Light - California ...

Central Bank of Ireland Confirms Fast-Track Process for SFDR Updates - Matheson

MEDIA KIT 2021 THOROUGHBRED RACING IN QUEENSLAND - HOME - The Magazine Publishing Company

The Erne Rivers Trust Growing a cross-border rivers trust - John A. Spence - The ...

Develop your own Database 2018/2019 - Week 1 - Woche

Electric scooter-sharing moves into the fast lane - Tech Xplore

Draft Strategy for Legal Services Regulation and Draft Business Plan 2021-22 - Consultation response by: The Professional Paralegal Register (PPR) ...

Can BJP sustain the bounce from Pulwama and Balakot?

Postal Advocacy 2021/22 - Context - Citizens Advice Scotland

Community makes Rust an easy choice for npm - here - Feb 25, 2019 - Rust Programming Language