AIBench Training, Subsets and Its rankings - Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021 - BenchCouncil
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
AIBench Training, Subsets and Its rankings Fei Tang ICT, Chinese Academy of Sciences AIBench Tutorial at ISCA 2021
Executive Summary A lack of understanding of learning dynamics raises serious AI benchmarking challenges AIBench Training methodology, workload characterizations, two subsets for repeatable performance ranking and workload characterization, and rankings https://www.benchcouncil.org/aibench AIBench Tutorial on ISCA 2021
Learning dynamics are not understood n High dimension non-convex optimization problem uA slightchange leads to a different optimization path uHeavily depend on the experience for parameter tuning Picture from http://www.dashangu.com/postimg_13493485.html
Prohibitive Cost Challenge n Running an entire training session is mandatory! n Takeseveral weeks to run a complete training session on a small-scale system uSimulators with slowdowns 10 to 1,000 times exacerbate the challenge nA microbenchmark like HPL-AI cannot model the learning dynamics of deep learning [1] HPL-AI Mixed-Precision Benchmark — HPL-AI 0.0.2 documentation. https://icl.bitbucket.io/hpl-ai/
Conflicting-requirement Challenge n Earlier-stage evaluations of a new architecture or system uAffordable uPortability (Micro benchmarks) uSimplicity n Later-stage evaluations or purchasing off-the-shelf systems uComprehensiveness/Representativeness uReality and overall system performance (Component or scenario benchmarks)
Short Shelf-life Challenge n AI model evolutions and changes outpace the AI benchmarks uIt takes one year to walk through benchmark design, implementation, community adoption, and large-scale testing n Synthetic benchmarks like ParaDNN [1] can traverse many networks, but it cannot model learning dynamics [1] Wang, Yu Emma, Gu-Yeon Wei, and David Brooks. n.d. “A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms,” 14.
Scalability Challenge n An AI task’s problem scale is often fixed n HPL-AI [1] is scalable, but it cannot model the learning dynamics without considering the model quality. Picture from HPC AI500 Ranking, Image Classification [1] HPL-AI Mixed-Precision Benchmark — HPL-AI 0.0.2 documentation. https://icl.bitbucket.io/hpl-ai/ [2] HPC-AI500 Ranking. https://www.benchcouncil.org/ranking.html
n The Etc. Dropout Data shuffle Data augment Model initialization Factors of randomness: networks is stochastic N eu ra l Ar ch it e ct 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 0.00% 5.00% Fa u re ce S E ear O mb ch bj ec ed d t D in I et g Im ma ect i ag ge- o n e Cl t o-T Re as si ext c fi T e om cat i xt m on Su en d m at i m Repeatability Challenge ar o n 3D T i z at Fa ex t- i on ce to Sp Re -T e ee co xt ch g Re n it io 3D V c n O id o gn bj eo it i ec P o t R red n e c ic t Sp o ati ns t io n benchmark mandates being repeatable, while training deep al ru T c L e ran ti on ar s f Im n in orm ag g-t o er e Co -Ra n Run-to-run Variation of AIBench Training m pr k es A s dv io n ert is i ng N LP
Outline n Challenges n Related work n AIBench Training uMethodology uWorkload characterization uSubset for repeatable performance ranking and workload characterization uRankings
Related Work Time-to-accuracy as the main metric Modeling the critical paths of a real-world application scenario A systematic AI benchmarking project A synthetic AI benchmark A micro benchmark that uses mixed-precision LU decomposition to achieve upper bound FLOPS performance
Outline n Challenges n Related work n AIBench Training n Methodology uWorkload characterization uSubset for repeatable performance ranking and workload characterization uRankings
Methodology n Perform a detailed survey of the critical domain—Internet Services, including search engines, social networks, and e-commerce n Include as most representative benchmarks as possible n Proposerepeatable performance ranking subset and workload characterization subset, and keep the subsets to a minimum n Considerthe full benchmarks, their subsets, and microbenchmarks as indispensable
Typical Internet service applications (with 17 industry partners) n Representative AI tasks among search engines, social networks, and e-commerce
AIBench Training Workloads
Image Classification n Classify an image into multiple categories uDataset:ImageNet2012, one of the world’s largest image database, containing more than 14 million im- ages, and the data size is more than 100 GB uModel: Resnet50, a milestone model which exerts the ability of AI to classify images and exceeds the ability of humans AIBench Tutorial on ISCA 2021
Image Generation n Learning the distribution of images to generate new images uDataset: LSUN, about 1 million labeled image data, divided into 10 scene categories and 20 object categories uModel: WGAN, one of the most famous GAN-based models, which uses adversarial generation networks to solve image generation problems. AIBench Tutorial on ISCA 2021
Text Translation n Text conversion from one language to another uDataset: WMT English-German, which has 4.5 million sentence pairs uModel: Transformer, is the classical model for text translation and is the basis for the subsequent Bert model AIBench Tutorial on ISCA 2021
Image-to-Text n Generate description text for given images uCombination of computer vision and natural language processing uDataset: MSCOCO2014, 82783 training samples, 40504 validation samples, 40775 test samples (20GB+) uModel: Neural Image Caption, a combination of CNN and RNN AIBench Tutorial on ISCA 2021
Image-to-Image n Image-to-Image — Convert an image from on representation to another uChange of seasons, change of object species, etc. uDataset: Cityscapes, street view data for more than 50 cities (300MB) uModel: CycleGAN, a widely used GAN-based model, which has two generators and two discriminators AIBench Tutorial on ISCA 2021
Speech Recognition n Recognize voice messages and translate them into text uDataset:LibriSpeech, 1000+ hours of voice data, the most representative audio dataset uModel: DeepSpeech2, a milestone model in speech recognition AIBench Tutorial on ISCA 2021
Face Embedding n Faceembedding is to verify a face by learning an embedding into the Euclidean space and this can be used as face recognition uDataset:VGGFace2,36GB training data,1.9GB test data uModel: FaceNet, a representative model and based on the GoogleNet style Inception model AIBench Tutorial on ISCA 2021
Object Detection n Objectdetection aims to find objects of certain target classes with precise localization in a given image uDataset: VOC2007, 9963 images, containing 24640 labeled objects uModel: Faster R-CNN, a classical model for object detection task and is the cornerstone of many other models such as Mask R-CNN AIBench Tutorial on ISCA 2021
Recommendation n Personalized recommendations based on collaborative filtering uDataset:MovieLens, a real-world movie ratings dataset from IMDB (the world’s most popular and authoritative source for movie) uModel: Neural collaborative filtering, a fundamental algorithm for recommendation AIBench Tutorial on ISCA 2021
Video Prediction n Predict the video frame after by learning the previous video frame uDataset: Robot pushing dataset,behavior data of 59000 robots,100GB+ uModel: Motion-Focused Predictive, this model predicts how to transform the last image into the next image AIBench Tutorial on ISCA 2021
Image Compression n Reduce redundant information in image data and store and transfer data in a more efficient format uDataset:ImageNet2012,100GB+, this dataset is one of the world’s largest image database, containing more than 14 million im- ages, and the data size is more than 100 GB uModel: a RNN based model AIBench Tutorial on ISCA 2021
3D Object Reconstruction n capturethe shape and appearance of a real object, a core technology of a wide variety of fields like computer graphics and virtual reality uDataset: ShapeNet, containing about 51,300 different 3D models of 55 commonly used object categories uModel: Convolutional Encoder-decoder Network, a model combining image encoder, volume decoder, and perspective transformer AIBench Tutorial on ISCA 2021
Text Summarization n Generate summaries for given text uDataset:Gigaword,about 10 million text data, over 4 billion words uModel: Sequence-to-sequence Model, consisting an off-the-shelf attentional encoder-decoder RNN AIBench Tutorial on ISCA 2021
Spatial Transformer n Spatial transformation of images such as spatial rotation and stretching uDataset: MNIST, containing 60,000 training images and 10,000 test images uModel: Spatial Transformer Network, a model includes a localisation network, a grid generator, a sampler AIBench Tutorial on ISCA 2021
Neural Architecture Search n Automatically designs neural networks n Dataset: PTB Dataset, containing 2,499 stories from a three-year Wall Street Journal collection of 98,732 stories for syntactic annotation n Model: ENAS, a model finds efficient neural networks by reinforcement learning Search Network Architecture Performance Search Space Search Strategy Evaluation Strategy Evaluate Network Architecture AIBench Tutorial on ISCA 2021
Advertising n Advertising is to display the most relevant ads to customers n Dataset: Kaggle Display Advertising Challenge Dataset n Model: Deep Learning Recommendation Model (DLRM) AIBench Tutorial on ISCA 2021
Nature Language Processing (NLP) n NLP is to train a language model, which we use for many tasks like translation and question answer n Dataset: Wikipedia n Model: BERT AIBench Tutorial on ISCA 2021
Outline n Challenges n Related work n AIBench Training n Methodology n Workload characterization uSubset for repeatable performance ranking and workload characterization uRankings
Representativeness and Comprehensiveness nDiverse behaviors for workload characterization uAlgorithm behavior p Model architectures, parameters, optimizers, and loss functions uSystem behavior p Evaluation time cost, variation, convergent rate, and number of hot functions uMicro-architecture behavior p Computation pattern, memory access pattern, and I/O pattern
Representativeness and Comprehensiveness n Coverageof diverse network architectures (CNN, ResNet, LSTM, GRU, Attention, etc.) uText processing (7) p Text-to-Text, Text summarization, Learning to Rank, Recommendation, Neural Architecture Search, Advertising and NLP uImage processing (8) p Image Classification, Image Generation, Image-to-Text, Image-to-Image, Face Embedding, Object Detection, Image Compression, Spatial Transformer uAudio processing (1) p Speech Recognition uVideo processing (1) p Video Prediction u3D data processing (2) p 3D Face Recognition, 3D Object Reconstruction
AIBench Training vs. MLPerf Training The Comparisons of AIBench against MLPerf from the Perspectives of Model Complexity, Computational Cost, and Convergent Rate
Micro-architectural Characteristics n Distinct computation and memory access behaviors u AIBench has a wider coverage than MLPerf 1: achieved occupancy Warps utilization rate 2: ipc efficiency IPC efficiency 3: gld efficiency Global memory load efficiency 4: gst efficiency Global memory store efficiency 5: dram utilization DRAM utilization MLPerf (blue) vs. AIBench (red)
AIBench Training (v1.1) vs. MLPerf Training (v0.7) n Concurrent work n AIBenchTraining has more wide coverage uTasks uDataset uDiverse Characteristics p Algorithm p System p Microarchitecture
Runtime Breakdown of the AIBench Benchmarks
Hotspot Functions n AIBench Training covers more hotspot functions than MLPerf Training, which is more suitable for simulator research
Outline n Challenges n Related work n AIBench Training n Methodology n Workload characterization uSubset for repeatable performance ranking and workload characterization uRankings
Repeatable performance ranking subset (RPR subset) n Reflecting diverse model complexity, computational cost, and convergent rate n Low run-to-run variation n Widely accepted evaluation metrics
Workload characterization subset (WC subset) n Minimum workloads with the most representative system or micro- architectural characteristics
Two Subsets n RPR subset : Image Classification, Object Detection, and Learning-to-Rank n WC subset: Spatial Transformer, Image-to-Text, and Speech-to-Text The result of K-means clustering using micro-architecture characteristics
Outline n Challenges n Related work n AIBench Training n Methodology n Workload characterization n Subset for repeatable performance ranking and workload characterization n Rankings
Performance Ranking n Weuse the AIBench RPR subset to rank the performance of GPUs and TPUs
Insights n TPUs have significant performance advantages in Image Classification, but lack generality and do not support many models (like Faster R- CNN and Learning to Rank) because they support limited TensorFlow operations [1] [1] Available TensorFlow Ops | Cloud TPU. (n.d.). Google Cloud. From https://cloud.google.com/tpu/docs/tensorflow-ops
Insights n PyTorch is poorly optimized for TPUs, because it cannot load data directly from Google Cloud Storage onto the TPU as TensorFlow does n Data loading is a bottleneck for the image classification task [1] [Question] Loading from Google Cloud Storage · Issue #1544 · pytorch/xla. (n.d.). GitHub.
Summary n Five AIbenchmarking challenges: prohibitive cost, conflicting requirements, short shelf-life, scalability and repeatability n AIBench Training methodology, workload characterizations, two subsets for repeatable performance ranking and workload characterization, and rankings AIBench Tutorial on ISCA 2021
Thank You! AIBench Tutorial on ISCA 2021
You can also read