AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil

Page created by Margaret Arnold

Hobbies & Interests

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil

AIBench Training, Subsets
    and Its rankings
                 Fei Tang

     ICT, Chinese Academy of Sciences

      AIBench Tutorial at ISCA 2021

Other AIBench Contributors

Executive Summary

    A lack of understanding of learning dynamics raises serious AI
                      benchmarking challenges

AIBench Training methodology, workload characterizations, two subsets
for repeatable performance ranking and workload characterization, and
                              rankings

                https://www.benchcouncil.org/aibench

                            AIBench Tutorial on ISCA 2021

Learning dynamics are not understood
n High   dimension non-convex optimization problem
  uA slightchange leads to a different optimization path
  uHeavily depend on the experience for parameter tuning

                Picture from http://www.dashangu.com/postimg_13493485.html

Prohibitive Cost Challenge
n Running     an entire training session is mandatory!

n Takeseveral weeks to run a complete training session on a small-scale
 system
   uSimulators      with slowdowns 10 to 1,000 times exacerbate the challenge

nA  microbenchmark like HPL-AI cannot model the learning dynamics of
 deep learning
[1] HPL-AI Mixed-Precision Benchmark — HPL-AI 0.0.2 documentation. https://icl.bitbucket.io/hpl-ai/

Conflicting-requirement Challenge

n Earlier-stage   evaluations of a new architecture or system
   uAffordable
   uPortability   (Micro benchmarks)
   uSimplicity

n Later-stage   evaluations or purchasing off-the-shelf systems
   uComprehensiveness/Representativeness
   uReality   and overall system performance (Component or scenario benchmarks)

Short Shelf-life Challenge

n AI   model evolutions and changes outpace the AI benchmarks
   uIt   takes one year to walk through benchmark design, implementation,
       community adoption, and large-scale testing

n Synthetic benchmarks like ParaDNN [1] can traverse many networks,
 but it cannot model learning dynamics

 [1] Wang, Yu Emma, Gu-Yeon Wei, and David Brooks. n.d.
 “A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms,” 14.

Scalability Challenge
n An AI    task’s problem scale is often fixed

n HPL-AI  [1] is scalable, but it cannot model the learning dynamics
 without considering the model quality.

                          Picture from HPC AI500 Ranking, Image Classification

[1] HPL-AI Mixed-Precision Benchmark — HPL-AI 0.0.2 documentation. https://icl.bitbucket.io/hpl-ai/
[2] HPC-AI500 Ranking. https://www.benchcouncil.org/ranking.html

n The

                           Etc.
                           Dropout
                           Data shuffle
                           Data augment
                           Model initialization
                         Factors of randomness:
                                                                                                                                                                    networks is stochastic

N
    eu
       ra
          l   Ar
                ch
              it e
                   ct
                                                          10.00%
                                                                   15.00%
                                                                            20.00%
                                                                                     25.00%
                                                                                              30.00%
                                                                                                       35.00%
                                                                                                                40.00%

                                          0.00%
                                                  5.00%

          Fa u re
               ce          S
                    E ear
          O mb ch
             bj
                 ec ed d
                    t D in
               I         et g
     Im ma ect i
         ag ge- o n
            e
                Cl t o-T
         Re as si ext
              c           fi
     T e om cat i
         xt          m              on
             Su en d
                  m           at
                                   i
                     m
                                                                                                                                                                                                                                   Repeatability Challenge

                        ar o n
    3D T                   i z at
         Fa ex t- i on
             ce           to
      Sp Re -T e
         ee            co xt
             ch            g
                   Re n it io
3D        V            c              n
   O         id o gn
     bj          eo             it i
        ec            P             o
           t R red n
                e c          ic  t
      Sp            o
          ati ns t io n
                                                                                                                                                                        benchmark mandates being repeatable, while training deep

              al          ru
                  T           c
         L e ran ti on
             ar          s f
      Im n in orm
          ag          g-t o er
             e
                 Co -Ra
                                    n
                                                                                                                         Run-to-run Variation of AIBench Training

                      m
                         pr k
                             es
                    A            s
                      dv io n
                          ert
                               is i
                                    ng
                               N
                                   LP

Outline
n Challenges
n Related
        work
n AIBench Training
   uMethodology
   uWorkload  characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings

Related Work
Time-to-accuracy as the main metric

Modeling the critical paths of a real-world application scenario

A systematic AI benchmarking project

A synthetic AI benchmark

A micro benchmark that uses mixed-precision LU
decomposition to achieve upper bound FLOPS performance

Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology
   uWorkload  characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings

Methodology
n Perform a detailed survey of the critical domain—Internet Services,
 including search engines, social networks, and e-commerce

n Include   as most representative benchmarks as possible

n Proposerepeatable performance ranking subset and workload
 characterization subset, and keep the subsets to a minimum

n Considerthe full benchmarks, their subsets, and microbenchmarks as
 indispensable

Typical Internet service applications (with 17 industry partners)
   n Representative AI   tasks among search engines, social networks, and e-commerce

AIBench Training Workloads

Image Classification
n Classify   an image into multiple categories
   uDataset:ImageNet2012, one of the world’s largest image database, containing
    more than 14 million im- ages, and the data size is more than 100 GB
   uModel: Resnet50, a milestone model which exerts the ability of AI to classify
    images and exceeds the ability of humans

                                AIBench Tutorial on ISCA 2021

Image Generation
n Learning   the distribution of images to generate new images
   uDataset: LSUN, about 1 million labeled image data, divided into 10 scene
    categories and 20 object categories
   uModel: WGAN, one of the most famous GAN-based models, which uses
    adversarial generation networks to solve image generation problems.

                                AIBench Tutorial on ISCA 2021

Text Translation
n Text   conversion from one language to another
   uDataset: WMT English-German, which has 4.5 million sentence pairs
   uModel: Transformer, is the classical model for text translation and is the basis
    for the subsequent Bert model

                                  AIBench Tutorial on ISCA 2021

Image-to-Text
n Generate   description text for given images
   uCombination  of computer vision and natural language processing
   uDataset: MSCOCO2014, 82783 training samples, 40504 validation samples,
    40775 test samples (20GB+)
   uModel: Neural Image Caption, a combination of CNN and RNN

                               AIBench Tutorial on ISCA 2021

Image-to-Image
n Image-to-Image   — Convert an image from on representation to
 another
  uChange  of seasons, change of object species, etc.
  uDataset: Cityscapes, street view data for more than 50 cities (300MB)
  uModel: CycleGAN, a widely used GAN-based model, which has two
   generators and two discriminators

                               AIBench Tutorial on ISCA 2021

Speech Recognition
n Recognize   voice messages and translate them into text
  uDataset：LibriSpeech,   1000+ hours of voice data, the most representative
   audio dataset
  uModel: DeepSpeech2, a milestone model in speech recognition

                               AIBench Tutorial on ISCA 2021

Face Embedding
n Faceembedding is to verify a face by learning an embedding into the
 Euclidean space and this can be used as face recognition
   uDataset:VGGFace2，36GB training data，1.9GB test data
   uModel: FaceNet, a representative model and based on the GoogleNet style
    Inception model

                               AIBench Tutorial on ISCA 2021

Object Detection
n Objectdetection aims to find objects of certain target classes with
 precise localization in a given image
   uDataset: VOC2007, 9963 images, containing 24640 labeled objects
   uModel: Faster R-CNN, a classical model for object detection task and is the
    cornerstone of many other models such as Mask R-CNN

                                AIBench Tutorial on ISCA 2021

Recommendation
n Personalized   recommendations based on collaborative filtering
   uDataset:MovieLens, a real-world movie ratings dataset from IMDB (the
    world’s most popular and authoritative source for movie)
   uModel: Neural collaborative filtering, a fundamental algorithm for
    recommendation

                               AIBench Tutorial on ISCA 2021

Video Prediction
n Predict   the video frame after by learning the previous video frame
   uDataset: Robot pushing dataset，behavior data of 59000 robots，100GB+
   uModel: Motion-Focused Predictive, this model predicts how to transform the
    last image into the next image

                               AIBench Tutorial on ISCA 2021

Image Compression
n Reduce  redundant information in image data and store and transfer
 data in a more efficient format
  uDataset：ImageNet2012，100GB+,        this dataset is one of the world’s largest
   image database, containing more than 14 million im- ages, and the data size is
   more than 100 GB
  uModel: a RNN based model

                                AIBench Tutorial on ISCA 2021

3D Object Reconstruction
n capturethe shape and appearance of a real object, a core technology
 of a wide variety of fields like computer graphics and virtual reality
   uDataset: ShapeNet, containing about 51,300 different 3D models of 55
    commonly used object categories
   uModel: Convolutional Encoder-decoder Network, a model combining image
    encoder, volume decoder, and perspective transformer

                              AIBench Tutorial on ISCA 2021

Text Summarization
n Generate   summaries for given text
   uDataset:Gigaword，about 10 million text data, over 4 billion words
   uModel: Sequence-to-sequence Model, consisting an off-the-shelf attentional
    encoder-decoder RNN

                                AIBench Tutorial on ISCA 2021

Spatial Transformer
n Spatial   transformation of images such as spatial rotation and stretching
   uDataset: MNIST, containing 60,000 training images and 10,000 test images
   uModel: Spatial Transformer Network, a model includes a localisation network,
    a grid generator, a sampler

                                AIBench Tutorial on ISCA 2021

Neural Architecture Search
n  Automatically designs neural networks
n Dataset: PTB Dataset, containing 2,499 stories from a three-year Wall
  Street Journal collection of 98,732 stories for syntactic annotation
n Model: ENAS, a model finds efficient neural networks by
  reinforcement learning
                                        Search Network Architecture

                                                                  Performance
      Search Space          Search Strategy
                                                               Evaluation Strategy

                                       Evaluate Network Architecture
                               AIBench Tutorial on ISCA 2021

Advertising

n   Advertising is to display the most relevant ads to customers

n Dataset:   Kaggle Display Advertising Challenge Dataset

n Model:   Deep Learning Recommendation Model (DLRM)

                               AIBench Tutorial on ISCA 2021

Nature Language Processing (NLP)

n NLP is to train a language model, which we use for many tasks like
 translation and question answer

n Dataset:   Wikipedia

n Model:   BERT

                            AIBench Tutorial on ISCA 2021

Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology

   n Workload characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings

Representativeness and Comprehensiveness
nDiverse   behaviors for workload characterization
  uAlgorithm      behavior
    p Model   architectures, parameters, optimizers, and loss functions

  uSystem   behavior
    p Evaluation   time cost, variation, convergent rate, and number of hot
      functions

  uMicro-architecture     behavior
    p Computation    pattern, memory access pattern, and I/O pattern

Representativeness and Comprehensiveness
n Coverageof diverse network architectures (CNN, ResNet, LSTM,
 GRU, Attention, etc.)
  uText processing (7)
    p Text-to-Text, Text summarization, Learning to Rank, Recommendation, Neural Architecture Search,
      Advertising and NLP
  uImage processing (8)
    p Image Classification, Image Generation, Image-to-Text, Image-to-Image, Face Embedding, Object
      Detection, Image Compression, Spatial Transformer
  uAudio processing (1)
    p Speech Recognition

  uVideo processing         (1)
    p Video Prediction

  u3D     data processing (2)
      p   3D Face Recognition, 3D Object Reconstruction

AIBench Training vs. MLPerf Training

    The Comparisons of AIBench against MLPerf from the Perspectives of
    Model Complexity, Computational Cost, and Convergent Rate

Micro-architectural Characteristics
n Distinct   computation and memory access behaviors
   u AIBench   has a wider coverage than MLPerf
                                                                1: achieved occupancy
                                                                Warps utilization rate

                                                                2: ipc efﬁciency
                                                                IPC efficiency

                                                                3: gld efﬁciency
                                                                Global memory load
                                                                efficiency

                                                                4: gst efﬁciency
                                                                Global memory store
                                                                efficiency

                                                                5: dram utilization
                                                                DRAM utilization
                                  MLPerf (blue) vs. AIBench (red)

AIBench Training (v1.1) vs. MLPerf Training (v0.7)
   n Concurrent   work

   n AIBenchTraining has more
    wide coverage
      uTasks
      uDataset
      uDiverse   Characteristics
         p Algorithm
         p System
         p Microarchitecture

Runtime Breakdown of the AIBench Benchmarks

Hotspot Functions

n AIBench  Training covers more hotspot functions than MLPerf
 Training, which is more suitable for simulator research

Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology

   n Workload characterization
   uSubset for repeatable performance ranking and workload characterization
   uRankings

Repeatable performance ranking subset (RPR subset)

   n Reflecting
              diverse model complexity, computational cost, and
    convergent rate

   n Low   run-to-run variation

   n Widely   accepted evaluation metrics

Workload characterization subset (WC subset)

n Minimum   workloads with the most representative system or micro-
 architectural characteristics

Two Subsets
n RPR subset : Image Classification, Object Detection, and Learning-to-Rank
n WC subset: Spatial Transformer, Image-to-Text, and Speech-to-Text

                            The result of K-means clustering
                            using micro-architecture characteristics

Outline
n Challenges
n Related
        work
n AIBench Training
   n Methodology

   n Workload  characterization
   n Subset for repeatable performance ranking and workload characterization

   n Rankings

Performance Ranking
n Weuse the AIBench RPR subset to rank the performance of GPUs and
 TPUs

Insights
    n TPUs   have significant performance advantages in Image Classification,
      but lack generality and do not support many models (like Faster R-
      CNN and Learning to Rank) because they support limited TensorFlow
      operations [1]

[1] Available TensorFlow Ops | Cloud TPU. (n.d.). Google Cloud. From https://cloud.google.com/tpu/docs/tensorflow-ops

Insights
n PyTorch   is poorly optimized for TPUs, because it cannot load data
  directly from Google Cloud Storage onto the TPU as TensorFlow does
n Data loading is a bottleneck for the image classification task

   [1] [Question] Loading from Google Cloud Storage · Issue #1544 · pytorch/xla. (n.d.). GitHub.

Summary

n Five AIbenchmarking challenges: prohibitive cost, conflicting
 requirements, short shelf-life, scalability and repeatability

n AIBench  Training methodology, workload characterizations, two
 subsets for repeatable performance ranking and workload
 characterization, and rankings

                            AIBench Tutorial on ISCA 2021

Thank You!

AIBench Tutorial on ISCA 2021

You can also read

Coastal Modelling 101 - Mitchell Smith Australian Water School Webinar 19th May 2021

Introduction to Weak Supervision - Chris Ré CS229

9TH INTERNATIONAL SIMDRIVE 3D POWERTRAIN & USER CONFERENCE - Welcome Bienvenue Benvenuto Willkommen - Contecs ...

Melinda Friedland 99 West Shore Rd. Belvedere, CA 94920 - Centriq

KDD CUP 2021 MAG240M-LSC TEAM PASSAGES WINNER

Identifying Bubble Regime in Commodity Derivatives Market Using Markov Regime Switching Model

Modelling death rates due to COVID-19: A Bayesian approach

IPL CRICKET SCORE AND WINNING PREDICTION USING MACHINE LEARNING TECHNIQUES

Modeling Survival Time to Recovery from COVID-19: A Case Study on Singapore

ACA Health Plan Regulation - HHS Risk Adjustment Model Version 7 - HHS Risk Adjustment Model ...

An empirical analysis of the training and feature set size in text categorization for e-Discovery

CODEX: GENESTEALER CULTS - Warhammer Community

APSU Priority Project - Chiara Marsigli Deutscher Wetterdienst - COSMO model

Consumer Myopia in Vehicle Purchases: Evidence from a Natural Experiment - S ebastien Houde IAEE Conference June 2021 - (IAEE) online conference

The Sports Coaching Model Based on Youth as Rural Sports Activists

Human Resources Development Fund Malaysia - Ministry of Human Resources, Malaysia MOVING FORWARD - HRDF

Otiwhiti Station Land Based Training Agricultural School - Information Pack

WEIGHT TRAINING AREAS/GYMS - facilities guidance for - cloudfront.net

HOT YOGA INTENSIVE & TEACHER TRAINING WITH KIM MCMULLEN

Home Builders & Remodelers Association of Central Connecticut, Inc.

Special Operations Forces (SOF)/Air Education Training Command (AETC) Simulators

THE COVID-19 SITUATION IN LALIGA - FEBRUARY 2021 - THE WORLD IS OUR PLAYING FIELD

Introduction to Financial Modelling - Training course outline

Biomedical Concept Recognition in French Text Using Automatic Translation of English Terms

ELECTRICAL SAFETY & TECHNICAL TRAINING 2019 TRAINING CALENDAR - EsaSafe

SCHEDULE EQE COURSES PRE-EXAM AND MAIN EXAM EQE 2019 - Training for Patent Professionals

CBCMR Certification How to Apply Guide - Certification Board of Cardiovascular Magnetic Resonance - APCA.org

CALIFORNIA DEPARTMENT OF CORRECTIONS AND REHABILITATION - Physical Fitness Test (PFT) Candidate Preparation Guide

Jetstar Cadet Pilot Program

Education Department CPD Training Summer 2021 - Services to support the distinctive needs of Church of England Schools and Academies - Bath and ...

SKILLS AUDITS - THE ANALYTICAL APPROACH - WE USE DATA ANALYTICS TO PRODUCE FAST, ACCURATE SNAPSHOTS OF SKILLS GAPS WITHIN ORGANISATION ...

Payne Education Center - No Child Should Struggle to Read - Training Catalog | Spring/Summer 2021

Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020

Deep Graph Convolutional Networks for Wind Speed Prediction

2021 SCEC progress report for Shaw "Developing earthquake simulators for use in seismic hazard es- timates: Buried ruptures and implications for ...

Study on the Influential Factors of Micro-blog Forwarding

English | Italiano Empirica by MID SIDE Aps www.midside.it/empirica Instagram @empiricarecords Facebook @empiricarecords ...

Industry Insights: Food Hygiene in Waterstones - CASE STUDY: High Speed Training

LEARNING GERMAN IN BERLIN

Biology 3484B - Patterns in the Diversity of Life winter 2018

The 2020 Better World MBA Ranking Methodology - Corporate Knights

MYCONCERN FOR EDUCATION - ONE TEAM LOGIC

Exploration of Melt Spinning as a Route to Large Volume Production of Skutterudite Thermoelectric Materials - James Salvador David Brown and ...

ASSA ABLOY Security Doors - High performance steel and timber security doors The global leader in door opening solutions

SOCIAL COMMERCE. ON A GLOBAL SCALE - Berlin. Munich. London. Manchester. New York. San Diego. Los Angeles.

Reopening the Economy: Early Evidence from Georgia and Wisconsin

Introduction to Understanding Society - UK Data Service

NAMC urban food 28 Selected food basket price items: basket - National Agricultural ...

Comparative analysis of Reliance Jio with Airtel, Vodafone telecom service - IJSRR