REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE Eric Thorsen, Global Retail Business Development
CHALLENGES FACING CONSUMER INDUSTRIES • Millennials outnumber • Emergence of new Baby Boomers digital shopping • “Digital Natives” demand experiences changing experience Demographic • Emergence of device Digital Changes proxies Competition Consumer Omnichannel • Specific Demand Constraints • Mobile • Impatient • Web • Particular • Stores 2
AI ONLINE & IN THE STORE SHELF ANALYSIS, AR/VR CONSUMER TARGETED CONSUMER ADVICE INTERACTION RECOMMENDATIONS 4
RECOMMENDATION ENGINES ON GPU CLOUD SONG VIDEO TARGETED RECOMMENDATIONS RECOMMENDATIONS RECOMMENDATIONS 5
AI IN SUPPLY CHAIN WAREHOUSE DYNAMIC SUPPLY CHAIN COLLABORATIVE PLANNING OPTIMIZATION REAL-TIME RE-ROUTING AND REPLENISHMENT 6
AI AT CORPORATE HQ SINGLE VIEW OF DEMAND SIGNAL AD SPEND PREDICTIVE CONSUMER ANALYSIS OPTIMIZATION ANALYTICS 7
GPU-ACCELERATED ECOSYSTEM PLAN BUY (BUILD) MOVE SELL SERVICE Assortment Planning Procurement Inventory & Route Recommendation Logic Reverse Logistics Optimization CPFR Vendor Management Magic Mirror Returns Management Telemetry Seasonal Promotions Quality Inspection Clienteling Call Center Autonomous Vehicles Optimization Product Design Manufacturing and Drones Path to Purchase Automation Upsell / Cross Sell Open to Buy Demand Driven Supply Frictionless Commerce Network Pro Collaborative Design / Shelf Optimization AR/VR Customer Experience Viz Analytics Learning Asst Planning / Forecast & Replenishment NN Consumer Engagement / Recommendation Engine NN Deep CSP, NGC, DGX (Training) TRT (Inference) CSP, NGC, DGX (Training) TRT (Inference) Video Quality Inspection Loss Prevention, Shopper Tracking, Robotics, Frictionless Commerce GPU Accelerated Applications: Space Planning, Optimization, SAP Leonardo, SAP HANA GRA HPC Accelerated Analytics: Kinetica, MapD, Graphistry, H20 GRID Windows 10 Acceleration / Knowledge worker enablement 8
GPU’S PROVIDE BETTER DATA CENTER TCO 1/6th the cost 1/20th the power, 4 racks in a box 160 CPU Servers 1 NVIDIA HGX with 8 Tesla V100 GPU’s 65,000 Watts 3,000 Watts 9
RISE OF GPU-ENABLED COMPUTING APPLICATIONS GPU-Computing perf 1000X 107 1.5X per year By 2025 ALGORITHMS 106 1.1X per year 105 SYSTEMS 104 CUDA 103 1.5X per year 102 Single-threaded perf ARCHITECTURE 1980 1990 2000 2010 2020 Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 10
NVIDIA DEEP LEARNING EVERYWHERE, EVERY PLATFORM CLOUD Everywhere TESLA Servers in every shape and size DGX-1 AI Supercomputing Optimized Deep Learning Software TITAN X PC Development NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 11
PERFORMANCE FROM THE DATA CENTER Graphics accelerated virtual desktops and applications All devices have graphics Virtual machines also need a GPU NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 12
NVIDIA GPUs EVERYWHERE 120+ Servers from more than 30 system vendors Industry standard Industry Standardservers Servers Hyper Converged Hyper-Converged Infrastructure Cloud Public offerings Cloud BladeBlade Servers Servers NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 13
Inception Partners – AI Startups in Retail 14
RESOURCES: GTC & DLI GPU Technology Conference 2018 http://www.gputechconf.com/ Retail Breakfast to share best practices and lessons learned Selective Retail Business tracks highlighting AI success Deep dive hands-on sessions to experience AI Customer stories showing success using AI, ML, and DL DLI WORKSHOPS https://www.nvidia.com/en-us/deep-learning-ai/education/ NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 15
RETAIL CUSTOMER STORIES October 2017
USPS delivers more than 150 billion pieces of mail each year, a logistics operation that is at a scale second to none. After experiencing increased delays and instances of fraud, USPS needed a different approach to data analytics. Using Kinetica’s GPU-accelerated solution, USPS achieved near-immediate analysis of data from over 213,000 scanning devices at post offices and processing facilities around the country. Last year, USPS delivered 154 billion pieces of mail, while driving 70 million fewer miles, saving 7 million gallons of fuel and preventing 70,000 tons of carbon emissions. 18
RETAIL INVENTORY MANAGEMENT Safety Stock Optimization of safety stock for each store/item Home grown algorithm ported from CPU cluster to GPU Time required dropped from hundreds of days on a single CPU node to a few hours on a single GPU (x4) node. Speed up of approximately 700x Time Forecasting models Hundreds of millions store/item combinations forecast weekly – Multiple models utilized to forecast including Holt-Winter, Arima and GLM. NVIDIA provided a Holt-Winter GPU version specified and integrated by customer. Comparative tests of 8 million store/items showed reduction in time from 15 minutes (across approx. 38 servers) to 24 seconds on 1 GPU (x4) node. GPU version could allow a daily forecast because of speed and scaling abilities. 19
AI IMPROVES THE CUSTOMER EXPERIENCE AI is dramatically changing the online shopping experience with tangible improvements to retailers and consumers. In 2016 online British grocery giant Ocado improved customer service with their AI- enhanced contact center, and is applying machine learning and NVIDIA GPUs to develop humanoid robotics to assist maintenance technicians, and advanced computer vision for image classification and recognition to replace barcode systems. Computer vision will expedite the picking process and better ensure orders are filled correctly so customers receive exactly what they ordered. 20
AI-DRIVEN SMART SHOPPING According to Forrester E-Commerce was a $390B market in 2016 and is expected to double by 2024. E-commerce company Jet.com (acquired by Walmart) partners with multitudes of suppliers with different offerings at different prices. Jet uses GPU- accelerated AI to drive its smart cart solution that fulfills orders at the lowest prices though the smart bundling of supplier offers. The platform finds the ideal merchant and warehouse combination to lower the total order cost. The bigger the shopping cart, the greater the savings that can be generated. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 21
DISCOVER MORE WITH DEEP LEARNING Online shopping can be convenient but searching through multiple websites can be arduous and time-consuming. Pinterest makes it easy for users to quickly discover things they love. Automatic object detection lets users search for products within a Pin’s image, and Shop the Look lets users buy items seen in fashion and home décor Pins. Scientists on Pinterest’s visual search team use GPU-accelerated deep learning to teach their system to recognize image features using a dataset of billions of Pins and compute similarity scores to identify the best matches. One visual search study reports a 50% improvement in user engagement and traffic. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 22
AI TOOL LETS YOU APPLY BEFORE YOU BUY Testing different types of makeup can take hours and be a frustrating experience. ModiFace is using GPUs and facial modeling technology to help consumers explore and select the ideal products. ModiFace developed the ‘Sephora Virtual Artist’, an online tool that allows consumers to virtually experiment with new makeup without having to leave their computer screen. With technology on skin analysis and facial visualization, ModiFace and its AI features have introduced a more efficient way to style oneself. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 23
AI PERSONALIZES SKIN CARE Using the wrong skincare products can be a major cause of customer dissatisfaction so Olay is arming women with the knowledge they need to make informed product purchase decisions. Its Olay Skin Advisor is a GPU-accelerated AI tool that works on any mobile device — users provide a selfie, information about age, skin issues, skin type and product preferences, and the tool advises how to improve trouble areas using a daily regime of recommended Olay products. After four weeks 94% of women who tried the skin advisor continued to use the products it recommended. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 24
REINVENTING RETAIL BY COMBINING ART AND AI In fashion, styles change quickly but the fundamental customer experience —brick-and-mortar stores and traditional online shopping sites— hasn’t changed much in the past decade. Stitch Fix broke that mold with a fashion styling service that combines the art of personal styling with data analytics insights powered by GPU-accelerated deep learning. Stitch Fix’s 50+ style recommendation algorithms match clothing and accessories to clients based on their unique style preferences. Most recently, Stitch Fix changed the game again with a deep learning image recognition system that locates fashion items for clients based their shared Pinterest boards. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 25
REDEFINING CYBERSECURITY WITH AI We depend on a safe cyberspace for just about every aspect of our lives. Cyber attacks can be devastating, and in today’s world mutations have become the rule not the exception. Cylance leverages GPU-driven deep learning to predict and prevent malicious code execution by identifying indicators of an attack. CylancePROTECT immediately prevented the execution of the May 2017 WannaCry attack on 100% of its customers’ endpoints. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 26
AI TOOL BOOSTS CUSTOMER SERVICE KLM’s 235 social media service agents engage in 15K conversations a week, 24/7. To contend with the overwhelming volume of messages, KLM uses GPU- accelerated deep learning to predict the best response to an incoming message and shows it to a contact center agent for approval or personalization before sending it to the customer. The resulting time savings for KLM service agents means they can focus on customers with more pressing needs and handle a greater volume of questions while still maintaining a high degree of customer satisfaction. 27
A NEW WAVE OF AI BUSINESS APPLICATIONS Many brands rely on sponsoring televised events, yet impact is difficult to track. Manual tracking takes up to six weeks to measure ROI and even longer to adjust expenditures. SAP Brand Impact, powered by NVIDIA deep learning, measures brand attributes in near real- time with superhuman accuracy thanks to deep neural networks trained on NVIDIA DGX-1 and TensorRT to provide video inference analysis. Results are immediate, accurate and auditable, and delivered in a day. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 28
A NEW WAVE OF AI BUSINESS APPLICATIONS Brand impact measurement on televised events in real time vs 6 weeks. Immediate, accurate and auditable, delivered in a day Brand Impact, Service Ticketing, Invoice-to-Record applications 29
BETTER DATA, SMARTER BUILDINGS According to the EPA 62% of the U.S.'s electricity is consumed by the commercial and industrial segments. But how much of that consumption is inefficient? Verdigris is on a mission to help businesses eliminate wasteful energy spend with their Smart Building optimization solutions. Verdigris is harnessing the power of data and GPU- driven deep learning to continually audit and analyze electronic signatures of individual devices to learn what's "normal" and identify patterns and instances of energy waste. And with real-time monitoring and alerts, response teams can react to solve problems immediately. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 30
CONSUMER MONITORING Standard Traffic counters measure traffic into and out of the store. Computer Vision offers enhancements by providing: • Unique Identity Detection – integrated into loyalty program where appropriate or available • Age / Ethnicity segmentation. Detect age groups, including children, seniors. • Shopping behavior tracking. Groups, couples, individuals • Traffic patterns. Identify Path to Purchase, hot zones, cold zones, dwell points. Helps retailers make decisions on item placement or promotions • Can integrate into app-based recommendation logic. Ability to launch targeted promotions based on proximity, past purchase, and consumer profile Multiple camera signals can be stitched together to detect patterns within the store. Exterior cameras can determine shopper density based on parking. Origin tracking can identify external traffic sources, and/or co-marketing opportunities 31
Warehouse / Distribution Center Optimization Warehouse and DC’s are not built for consumer traffic and can pose health and safety challenges for workers. During peak season, shelves fill up and working space is reduced to a minimum, making it harder for humans to navigate safely and accurately measure inventory. IFM uses NVIDIA Jetson technology mounted on a drone to autonomously monitor inventory positions in the DC or Warehouse. As an Inception partner, IFM is closely aligned with NVIDIA and is poised to deliver incredible impact on retail and supply chain business processes YouTube Link: https://youtu.be/AMDiR61f86Y 32
Shelf Scanning Robotics Store Associates are representatives of the brand, and the face of the retail organization. It makes sense to reduce the time spent performing tasks that are not consumer-facing. Performing inventory counts, replacing misplaced items, or scanning for out-of-stock situations are examples of basic, repetitive, and non-impactful operations for store associates. Fellow Robots has created a solution to scan shelves, monitor misplaced items, and act as a wayfinder kiosk for consumers. This allows associates to interact with the shopping public, improving consumer satisfaction and raising revenue through larger shopping baskets. As an Inception partner, Fellow is closely aligned with NVIDIA and is poised to deliver incredible impact on retail business processes YouTube Link: https://youtu.be/l7NPmJP462M NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 33
TRAFFIC PATTERNS LOSS PREVENTION Using existing cameras, a retailer can install highly effective computer vision algorithms to detect shopper traffic patterns and prevent loss. In the US, LP is a $48B problem impacting all retailers. At the same time, investment in LP staff is flat of shrinking. While average cost of shoplifting incident is doubling to $798, 30% of inventory shrinkage is an inside job. Using computer vision can identify theft, shrinkage, and shoplifting incidents. This new technology can invigorate a longstanding problem for retail. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 34
COLLABORATIVE DESIGN Photorealistic Models Interactive Physics Design Flow Integration Collaboration NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 35
ART OF THE POSSIBLE The State of AI in Retail Paul Hendricks Solutions Architect phendricks@nvidia.com
INTRODUCTION • Paul Hendricks is a Solutions Architect at NVIDIA, helping enterprise customers with their deep learning and AI initiatives • Paul's background is primarily in retail, and has spent the past 5 years working with many Fortune 500 retail companies to implement data science and AI solutions. • Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a Data Scientist building models to understand customer propensity to purchase and how to optimize assortment in stores. • Currently, Paul's research at NVIDIA focuses on using deep learning in intelligent video analytics and recommendation systems. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 38
INTRODUCTION • Paul Hendricks is a Solutions Architect at NVIDIA, helping enterprise customers with their deep learning and AI initiatives • Paul's background is primarily in retail, and has spent the past 5 years working with many Fortune 500 retail companies to implement data science and AI solutions. • Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a Data Scientist building models to understand customer propensity to purchase and how to optimize assortment in stores. • Currently, Paul's research at NVIDIA focuses on using deep learning in intelligent video analytics and recommendation systems. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 39
Intelligent Video Analytics 40
Object Detection Problem Background • Data: Images • Goal: Identify objects in an image, and output bounding boxes around the objects and their classes 41
Frictionless Checkout https://www.standardcognition.com/ 42
Localizing Algorithms Sliding windows • If one of the windows only has half of the dog, the activation may not be strong enough • Using small windows and small strides will be very computationally intensive 43
Localizing Algorithms Sliding windows • If one of the windows only has half of the dog, the activation may not be strong enough • Using small windows and small strides will be very computationally intensive Fully convolutional neural network • Since convolutions are basically sliding windows, we can try replacing the fully connected layers with convolutional layers • Bounding boxes generated are not very accurate 44
Localizing Algorithms Sliding windows • If one of the windows only has half of the dog, the activation may not be strong enough • Using small windows and small strides will be very computationally intensive Fully convolutional neural network • Since convolutions are basically sliding windows, we can try replacing the fully connected layers with convolutional layers • Bounding boxes generated are not very accurate Region proposals • Selects blob-like structures and proposes these as the regions to be passed into a CNN • This concept is similar to sliding window 45
Localizing Algorithms Sliding windows • If one of the windows only has half of the dog, the activation may not be strong enough • Using small windows and small strides will be very computationally intensive Fully convolutional neural network • Since convolutions are basically sliding windows, we can try replacing the fully connected layers with convolutional layers • Bounding boxes generated are not very accurate Region proposals • Selects blob-like structures and proposes these as the regions to be passed into a CNN • This concept is similar to sliding window Single shot detection • This algorithm predicts the coordinates of the bounding boxes as well as the class of the objects • Fast since model looks at image once - YOLO 46
Getting Started DLI Courses • Object Detection with DIGITS - https://nvlabs.qwiklab.com/focuses/4125 Papers • Fully convolutional layers for semantic segmentation - https://arxiv.org/pdf/1605.06211.pdf • Rich hierarchies for accurate object detection and semantic segmentation - https://arxiv.org/pdf/1311.2524.pdf • Fast R-CNN - https://arxiv.org/pdf/1504.08083.pdf • Faster R-CNN: Towards real-time object detection with region proposal networks - https://arxiv.org/pdf/1506.01497.pdf • Yolo 9000: Better, Faster, Stronger - https://arxiv.org/pdf/1612.08242.pdf Libraries • https://github.com/pjreddie/darknet • https://github.com/tensorflow/models/tree/master/research/object_detection Datasets • COCO - http://cocodataset.org/ • ImageNet - https://www.kaggle.com/c/imagenet-object-detection-challenge 47
Anomaly Detection 48
Anomaly Detection Problem Background • Data: Image, sensor data (time series), text data • Goal: Detect if the data being generated is anomalous 49
Anomaly Detection Problem Background • Data: Image, sensor data (time series), text data • Goal: Detect if the data being generated is anomalous 50
UNSUPERVISED LEARNING Anomaly detection using deep learning Deep Autoencoder Network Input layer Size of data vector Input: X ෩ Output: X Bottleneck layer Summarized representation ▪ ‘embedding’ Output layer Same dimensionality as input Reconstruction error High errors indicate potential anomaly X−෩ X Reconstruction Error NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 51
UNSUPERVISED LEARNING DL anomaly detection in time series Time Series Signals Split into sliding windows Normalization and preprocessing w 1 w 2 w 3 … … w N NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 52
UNSUPERVISED LEARNING Detecting anomalies via reconstruction error Input NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 53
UNSUPERVISED LEARNING Detecting anomalies via reconstruction error Input NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 54
UNSUPERVISED LEARNING Detecting anomalies via reconstruction error Input Output (Reconstruction) NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 55
UNSUPERVISED LEARNING Detecting anomalies via reconstruction error Input Reconstruction error (RE) as a proxy to outliers Whenever RE is high, consider it a red flag Threshold can be set using statistical bounds Output (Reconstruction) Reconstruction vs Input NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 56
Getting Started DLI Courses • Introduction to Autoencoders • Anomaly Detection with Variational Autoencoders - https://nvlabs.qwiklab.com/focuses/8362 Papers & Books • Autoencoders - https://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdf • Deep Learning, Chapter 13 - http://a.co/1vbPNXr • Hands on Machine Learning with Scikit-Learn & TensorFlow, Chapter 13 - http://a.co/aImsrRT Datasets • Fashion MNIST - https://github.com/zalandoresearch/fashion-mnist • Deep Fashion - http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html • UT Zappos 50k - http://vision.cs.utexas.edu/projects/finegrained/utzap50k/ 57
Recommendation Systems 58
RECOMMENDATION SYSTEMS Problem Background • Data: Matrix R [ rows are users, columns are items, cell values are ratings ] • Goal: Compute missing Values in R – top N unseen items are good recommendation candidates R X 59
MANY APPLICATIONS FROM SIMILAR PROBLEMS Using autoencoders to generate recommendations 60
MANY APPLICATIONS FROM SIMILAR PROBLEMS Using autoencoders to generate recommendations https://github.com/NVIDIA/DeepRecommender/ 61
MANY APPLICATIONS FROM SIMILAR PROBLEMS Using autoencoders to generate recommendations 5 4 3 3 5 3 4 4 4 3 5 5 2 2 5 4 4 4 https://github.com/NVIDIA/DeepRecommender/ 62
Getting Started DLI Courses • Deep Autoencoders for Recommender Systems Papers • AutoRec – Autoencoders meet collaborative filtering - http://users.cecs.anu.edu.au/~u5098633/papers/www15.pdf • Training deep autoencoders for collaborative filtering- https://arxiv.org/pdf/1708.01715.pdf Libraries • https://github.com/NVIDIA/DeepRecommender • https://github.com/geffy/tffm • https://github.com/apache/incubator-mxnet/tree/master/example/recommenders Datasets • Netflix - https://netflixprize.com/ • MovieLens – https://grouplens.org/datasets/movielens/ • UC Irvine Online Retail Dataset - http://archive.ics.uci.edu/ml/datasets/online+retail 63
NVIDIA Tools 64
TESLA V100 32GB WORLD’S MOST ADVANCED DATA CENTER GPU NOW WITH 2X THE MEMORY 5,120 CUDA cores 640 NEW Tensor cores 7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS 20MB SM RF | 16MB Cache 32GB HBM2 @ 900GB/s | 300GB/s NVLink 65
FASTER RESULTS ON COMPLEX DL AND HPC Up to 50% Faster Results With 2x The Memory FASTER RESULTS HIGHER ACCURACY HIGHER RESOLUTION 1.5X Faster 1.5X Faster 40% Lower Error 4X Higher 1024x1024 Language Translation Calculations Rate resolution res images Unsupervised Image Translation Input winter photo Accuracy 1.2 3.8TF (152 layers) step/sec Accuracy 0.8 2.5TF (16 layers) 512x512 step/sec res images Neural Machine 3D FFT 1k x 1k x 1k VGG-16 RN-152 GAN Image to ImageGen Translation (NMT) AI converts it to summer V100 16GB V100 32GB GAN by NVRESEARCH (https://arxiv.org/pdf/1703.00848.pdf) | Dual E5-2698v4 server, 512GB DDR4, Ubuntu 16.04, CUDA9, cuDNN7| NMT is GNMT-like and run with R-CNN for object detection at 1080P with Caffe | V100 16GB V100 16GB and V100 32GB with CONFIDENTIAL. NVIDIA FP32 DO NOT DISTRIBUTE. 66 TensorFlow NGC Container 18.01 (Batch Size= 128 (for 16GB) and 256 (for 32GB) | FFT is with uses VGG16| V100 32GB uses Resnet-152 cufftbench 1k x 1k x 1k and comparing 2 V100 16GB (DGX1V) vs. 2 V100 32GB (DGX1V)
NEW TENSOR CORE BUILT FOR AI Delivering 120 TFLOPS of DL Performance TENSOR CORE TENSOR CORE MATRIX DATA OPTIMIZATION: Dense Matrix of Tensor Compute TENSOR-OP CONVERSION: FP32 to Tensor Op Data for Frameworks VOLTA TENSOR CORE 4x4 matrix processing array D[FP32] = A[FP16] * B[FP16] + C[FP32] Optimized For Deep Learning VOLTA-OPTIMIZED cuDNN ALL MAJOR FRAMEWORKS NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 67
NVIDIA DGX AI Supercomputer-in-a-Box 960 TFLOPS | 8x Tesla V100 16GB | NVLink Hybrid Cube Mesh 2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U — 3200W NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 68
THE WORLD’S FIRST 2 INTRODUCING PETAFLOPS SYSTEM NVIDIA DGX-2 THE WORLD’S MOST POWERFUL AI SYSTEM FOR THE MOST COMPLEX AI CHALLENGES • DGX-2 is the newest addition to the DGX family, powered by DGX software • Deliver accelerated AI-at-scale deployment and simplified operations • Step up to DGX-2 for unrestricted model parallelism and faster time-to-solution 69
10X PERFORMANCE GAIN LESS THAN A YEAR 15 days 15 DGX-1, SEP’17 DGX-2, Q3‘18 10 5 1.5 days 0 DGX-1V DGX-2 PyTorch Stack: Time to Train FAIRSEQ software improvements across the stack including NCCL, cuDNN, etc. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 70
NVSWITCH WORLD’S HIGHEST BANDWIDTH ON-NODE SWITCH 7.2 Terabits/sec or 900 GB/sec 18 NVLINK ports | 50GB/s per port bi-directional Fully-connected crossbar 2 billion transistors | 47.5mm x 47.5mm package 71
NVSWITCH ENABLES THE WORLD’S LARGEST GPU 16 Tesla V100 32GB Connected by New NVSwitch 2 petaFLOPS of DL Compute Unified 512GB HBM2 GPU Memory Space 300GB/sec Every GPU-to-GPU 2.4TB/sec of Total Cross-section Bandwidth 72
CHALLENGES WITH DEEP LEARNING Current DIY deep learning environments are complex and time consuming to build, Open Source Frameworks test and maintain Requires high level of expertise to manage driver, NVIDIA Libraries library, framework dependencies NVIDIA Docker NVIDIA Driver Development of frameworks NVIDIA GPU by the community is moving very fast NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 73
NVIDIA GPU CLOUD Deep Learning Everywhere, For Everyone Innovate in minutes, not weeks Removes all the DIY complexity of deep learning software integration Always up to date Monthly updates by NVIDIA to ensure maximum performance Deep learning across platforms Containers run locally on DGX Systems NVIDIA GPU Cloud integrates GPU-optimized and TITAN PCs, or on cloud service deep learning frameworks, runtimes, libraries, provider GPU instances and OS into a ready-to-run container, available at no charge NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 74
DEEP LEARNING ACROSS PLATFORMS NVIDIA Volta or NVIDIA NVIDIA DGX-1 and Amazon EC2 P3 instances Pascal-powered TITAN GPU DGX Station with NVIDIA Volta 75
Container Orchestration for DL Training & Inference KUBERNETES on NVIDIA AWS-EC2 | GCP | Azure | DGX GPUs KUBERNETES • Scale-up Thousands of GPUs Instantly • Self-healing Cluster Orchestration NVIDIA CONTAINER • GPU Optimized Out-of-the-Box NVIDIA GPU CLOUD • Powered by NVIDIA Container Runtime RUNTIME • Included with Enterprise Support on DGX NVIDIA GPUs • Available end of April 2018 76
TENSORRT DEPLOYMENT WORKFLOW Step 1: Optimize trained model Plan 1 Import Serialize Model Engine Plan 2 Plan 3 Trained Neural Network TensorRT Optimizer Optimized Plans Step 2: Deploy optimized plans with runtime Plan 1 De-serialize Deploy Engine Runtime Plan 2 Data center Plan 3 Optimized Plans TensorRT Runtime Engine Automotive NVIDIA Embedded CONFIDENTIAL. DO NOT DISTRIBUTE. 77
TensorRT INTEGRATED WITH TensorFlow Delivers 8x Faster Inference with TensorFlow + TRT Images/sec @ 7ms Latency ResNet-50 on TensorFlow 3,000 2,657 2,500 2,000 1,500 1,000 500 325 Available in TensorFlow 1.7 11 * https://github.com/tensorflow 0 V100 V100 Tensor Cores CPU (TensorFlow, (TensorFlow+ (FP32) FP32) TensorRT) * Best CPU latency measured at 83 ms CPU: Skylake Gold 6140, 2.5GHz, Ubuntu 16.04; 18 CPU threads. Volta V100 SXM; CUDA (384.111; v9.0.176); NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 78 Batch size: CPU=1, TF_GPU=2, TF-TRT=16 w/ latency=6ms
NVIDIA TensorRT 4 RC NOW AVAILABLE RNN and MLP Layers ONNX Import NVIDIA DRIVE Support Maximize RNN and Optimize and Deploy Support for NVIDIA MLP Throughput ONNX Models DRIVE Xavier Recommendation Engine 50X 40X 30X 45X Speedup 20X 10X 0X CPU TensorRT Speed up speech, audio and Easily import and accelerate Deploy optimized deep learning recommender app inference inference for ONNX frameworks inference models NVIDIA DRIVE performance through new layers (PyTorch, Caffe 2, CNTK, MxNet Xavier and optimizations and Chainer) Free download to members of NVIDIA Developer Program developer.nvidia.com/tensorrt NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 79
NVIDIA DIGITS Interactive Deep Learning GPU Training System Interactive deep learning training application for engineers and data scientists Simplify deep neural network training with an interactive interface to train and validate, and visualize results Built-in workflows for image classification, object detection and image segmentation Improve model accuracy with pre-trained models from the DIGITS Model Store Faster time to solution with multi-GPU acceleration developer.nvidia.com/digits 80
DIGITS DEEP LEARNING WORKFLOWS IMAGE OBJECT IMAGE CLASSIFICATION DETECTION SEGMENTATION 98% Dog 2% Cat Classify images into Find instances of objects Partition image into classes or categories in an image multiple regions Object of interest could Objects are identified Regions are classified at be anywhere in the image with bounding boxes the pixel level 81
WHAT’S NEW IN DIGITS 6? TENSORFLOW SUPPORT NEW PRE-TRAINED MODELS Train TensorFlow Models Interactively with Image Classification: VGG-16, ResNet50 DIGITS Object Detection: DetectNet NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 82
You can also read