REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand

Page created by Felix Robles
 
CONTINUE READING
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
REVOLUTIONIZING RETAIL
WITH ARTIFICIAL INTELLIGENCE
Eric Thorsen, Global Retail Business Development
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
CHALLENGES FACING CONSUMER INDUSTRIES

 • Millennials outnumber                                • Emergence of new
   Baby Boomers                                           digital shopping
• “Digital Natives” demand                                experiences
   changing experience       Demographic                • Emergence of device
                                           Digital
                               Changes                    proxies
                                         Competition

                              Consumer   Omnichannel
• Specific                     Demand     Constraints   • Mobile
• Impatient                                             • Web
• Particular                                            • Stores

                                                                                2
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
AI FOR RETAIL
  STORE          SUPPLY       CORPORATE
OPERATIONS        CHAIN      HEADQUARTERS

                                            3
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
AI ONLINE & IN THE STORE
 SHELF ANALYSIS,   AR/VR CONSUMER       TARGETED
CONSUMER ADVICE      INTERACTION    RECOMMENDATIONS

                                                      4
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
RECOMMENDATION ENGINES ON GPU CLOUD
       SONG              VIDEO            TARGETED
  RECOMMENDATIONS   RECOMMENDATIONS   RECOMMENDATIONS

                                                        5
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
AI IN SUPPLY CHAIN
 WAREHOUSE         DYNAMIC SUPPLY CHAIN   COLLABORATIVE PLANNING
OPTIMIZATION       REAL-TIME RE-ROUTING     AND REPLENISHMENT

                                                                   6
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
AI AT CORPORATE HQ
SINGLE VIEW OF    DEMAND SIGNAL     AD SPEND     PREDICTIVE
   CONSUMER         ANALYSIS      OPTIMIZATION   ANALYTICS

                                                              7
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
GPU-ACCELERATED ECOSYSTEM
                             PLAN                  BUY (BUILD)                     MOVE                        SELL                   SERVICE
                     Assortment Planning         Procurement                 Inventory & Route         Recommendation Logic      Reverse Logistics
                                                                             Optimization
                     CPFR                        Vendor Management                                     Magic Mirror              Returns Management
                                                                             Telemetry
                     Seasonal Promotions         Quality Inspection                                    Clienteling               Call Center
                                                                             Autonomous Vehicles
                                                                                                                                 Optimization
                     Product Design              Manufacturing               and Drones                Path to Purchase
                                                 Automation                                                                      Upsell / Cross Sell
                     Open to Buy                                             Demand Driven Supply      Frictionless Commerce
                                                                             Network
Pro

                     Collaborative Design / Shelf Optimization                                       AR/VR Customer Experience
Viz
Analytics Learning

                            Asst Planning / Forecast & Replenishment NN                  Consumer Engagement / Recommendation Engine NN
           Deep

                      CSP, NGC, DGX (Training)             TRT (Inference)               CSP, NGC, DGX (Training)              TRT (Inference)
 Video

                                               Quality Inspection               Loss Prevention, Shopper Tracking, Robotics, Frictionless Commerce

                                           GPU Accelerated Applications: Space Planning, Optimization, SAP Leonardo, SAP HANA
GRA
HPC

                                                           Accelerated Analytics: Kinetica, MapD, Graphistry, H20
    GRID

                                                         Windows 10 Acceleration / Knowledge worker enablement
                                                                                                                                                       8
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
GPU’S PROVIDE BETTER DATA CENTER TCO
               1/6th the cost 1/20th the power, 4 racks in a box

160 CPU Servers                           1 NVIDIA HGX with 8 Tesla V100 GPU’s

65,000 Watts                              3,000 Watts
                                                                                 9
REVOLUTIONIZING RETAIL WITH ARTIFICIAL INTELLIGENCE - Eric Thorsen, Global Retail Business Development - GTC On-Demand
RISE OF GPU-ENABLED COMPUTING

APPLICATIONS                                                         GPU-Computing perf                                                1000X
               107                                                   1.5X per year                                                     By 2025
 ALGORITHMS    106
                                                                                            1.1X per year
               105
  SYSTEMS
               104
    CUDA       103
                                                                    1.5X per year
               102
                             Single-threaded perf
ARCHITECTURE
                        1980                      1990                     2000                      2010                      2020
                     Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L.
                     Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp

                                                                                                                                          10
NVIDIA DEEP LEARNING EVERYWHERE,
EVERY PLATFORM

                                                                                                          CLOUD
                                                                                                        Everywhere

                                                               TESLA
                                                    Servers in every shape and size
                             DGX-1
                        AI Supercomputing
                 Optimized Deep Learning Software
  TITAN X
PC Development
                                                                                      NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   11
PERFORMANCE FROM THE DATA CENTER
        Graphics accelerated virtual desktops and applications

All devices have graphics                  Virtual machines also need a GPU

                                                   NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   12
NVIDIA GPUs EVERYWHERE
                          120+ Servers from more than 30 system vendors

Industry standard
    Industry Standardservers
                      Servers         Hyper Converged
                                    Hyper-Converged Infrastructure     Cloud Public
                                                                             offerings
                                                                                    Cloud

                                      BladeBlade
                                            Servers
                                                 Servers

                                                                     NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   13
Inception Partners – AI Startups in Retail

                                             14
RESOURCES: GTC & DLI
         GPU Technology Conference 2018
         http://www.gputechconf.com/

         Retail Breakfast to share best practices and lessons
         learned
         Selective Retail Business tracks highlighting AI success
         Deep dive hands-on sessions to experience AI
         Customer stories showing success using AI, ML, and DL

         DLI WORKSHOPS
         https://www.nvidia.com/en-us/deep-learning-ai/education/

                                 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   15
RETAIL CUSTOMER STORIES
October 2017
USPS delivers more than 150 billion pieces of
mail each year, a logistics operation that is at a
scale second to none.

After experiencing increased delays and
instances of fraud, USPS needed a different
approach to data analytics. Using Kinetica’s
GPU-accelerated solution, USPS achieved
near-immediate analysis of data from over
213,000 scanning devices at post offices and
processing facilities around the country.

Last year, USPS delivered 154 billion pieces of
mail, while driving 70 million fewer miles,
saving 7 million gallons of fuel and preventing
70,000 tons of carbon emissions.

                                                     18
RETAIL INVENTORY
MANAGEMENT
Safety Stock
Optimization of safety stock for each store/item
Home grown algorithm ported from CPU cluster to GPU
Time required dropped from hundreds of days on a single CPU
node to a few hours on a single GPU (x4) node.

Speed up of approximately 700x

Time Forecasting models
Hundreds of millions store/item combinations forecast weekly –
Multiple models utilized to forecast including Holt-Winter,
Arima and GLM.

NVIDIA provided a Holt-Winter GPU version specified and
integrated by customer. Comparative tests of 8 million
store/items showed reduction in time from 15 minutes (across
approx. 38 servers) to 24 seconds on 1 GPU (x4) node.

GPU version could allow a daily forecast because of speed and
scaling abilities.
                                                                 19
AI IMPROVES
THE CUSTOMER
EXPERIENCE
AI is dramatically changing the online shopping
experience with tangible improvements to retailers
and consumers. In 2016 online British grocery giant
Ocado improved customer service with their AI-
enhanced contact center, and is applying machine
learning and NVIDIA GPUs to develop humanoid
robotics to assist maintenance technicians, and
advanced computer vision for image classification
and recognition to replace barcode systems.
Computer vision will expedite the picking process
and better ensure orders are filled correctly so
customers receive exactly what they ordered.

                                                      20
AI-DRIVEN
SMART SHOPPING
According to Forrester E-Commerce was a
$390B market in 2016 and is expected to
double by 2024. E-commerce company
Jet.com (acquired by Walmart) partners
with multitudes of suppliers with different
offerings at different prices. Jet uses GPU-
accelerated AI to drive its smart cart
solution that fulfills orders at the lowest
prices though the smart bundling of
supplier offers. The platform finds the ideal
merchant and warehouse combination to
lower the total order cost. The bigger the
shopping cart, the greater the savings that
can be generated.

        NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   21
DISCOVER MORE
WITH DEEP
LEARNING
Online shopping can be convenient but
searching through multiple websites can be
arduous and time-consuming. Pinterest
makes it easy for users to quickly discover
things they love. Automatic object detection
lets users search for products within a Pin’s
image, and Shop the Look lets users buy
items seen in fashion and home décor Pins.
Scientists on Pinterest’s visual search team
use GPU-accelerated deep learning to teach
their system to recognize image features
using a dataset of billions of Pins and
compute similarity scores to identify the best
matches. One visual search study reports a
50% improvement in user engagement and
traffic.

        NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   22
AI TOOL LETS YOU
APPLY BEFORE
YOU BUY
Testing different types of makeup can take hours
and be a frustrating experience. ModiFace is using
GPUs and facial modeling technology to help
consumers explore and select the ideal products.
ModiFace developed the ‘Sephora Virtual Artist’, an
online tool that allows consumers to virtually
experiment with new makeup without having to
leave their computer screen. With technology on
skin analysis and facial visualization, ModiFace and
its AI features have introduced a more efficient way
to style oneself.

                                                       NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   23
AI PERSONALIZES
SKIN CARE
Using the wrong skincare products can be a major
cause of customer dissatisfaction so Olay is arming
women with the knowledge they need to make
informed product purchase decisions. Its Olay Skin
Advisor is a GPU-accelerated AI tool that works on any
mobile device — users provide a selfie, information
about age, skin issues, skin type and product
preferences, and the tool advises how to improve
trouble areas using a daily regime of recommended
Olay products. After four weeks 94% of women who
tried the skin advisor continued to use the products it
recommended.

                                                          NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   24
REINVENTING RETAIL
BY COMBINING
ART AND AI
In fashion, styles change quickly but the fundamental
customer experience —brick-and-mortar stores and
traditional online shopping sites— hasn’t changed much in
the past decade. Stitch Fix broke that mold with a fashion
styling service that combines the art of personal styling with
data analytics insights powered by GPU-accelerated deep
learning. Stitch Fix’s 50+ style recommendation algorithms
match clothing and accessories to clients based on their
unique style preferences. Most recently, Stitch Fix changed
the game again with a deep learning image recognition
system that locates fashion items for clients based their
shared Pinterest boards.

                                                                 NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   25
REDEFINING
CYBERSECURITY
WITH AI
We depend on a safe cyberspace for just about every
aspect of our lives. Cyber attacks can be devastating,
and in today’s world mutations have become the rule
not the exception. Cylance leverages GPU-driven deep
learning to predict and prevent malicious code
execution by identifying indicators of an attack.
CylancePROTECT immediately prevented the
execution of the May 2017 WannaCry attack on 100%
of its customers’ endpoints.

                                                         NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   26
AI TOOL BOOSTS
CUSTOMER SERVICE
KLM’s 235 social media service agents engage in 15K
conversations a week, 24/7. To contend with the
overwhelming volume of messages, KLM uses GPU-
accelerated deep learning to predict the best response
to an incoming message and shows it to a contact center
agent for approval or personalization before sending it to
the customer. The resulting time savings for KLM service
agents means they can focus on customers with more
pressing needs and handle a greater volume of questions
while still maintaining a high degree of customer
satisfaction.

                                                             27
A NEW WAVE OF AI
BUSINESS
APPLICATIONS
Many brands rely on sponsoring televised events, yet
impact is difficult to track. Manual tracking takes up to
six weeks to measure ROI and even longer to adjust
expenditures. SAP Brand Impact, powered by NVIDIA
deep learning, measures brand attributes in near real-
time with superhuman accuracy thanks to deep neural
networks trained on NVIDIA DGX-1 and TensorRT to
provide video inference analysis. Results are immediate,
accurate and auditable, and delivered in a day.

            NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   28
A NEW WAVE OF AI
BUSINESS
APPLICATIONS
Brand impact measurement on televised events in real
time vs 6 weeks.

Immediate, accurate and auditable, delivered in a day

Brand Impact, Service Ticketing, Invoice-to-Record
applications

                                             29
BETTER DATA,
SMARTER BUILDINGS
According to the EPA 62% of the U.S.'s electricity
is consumed by the commercial and industrial
segments. But how much of that consumption is
inefficient? Verdigris is on a mission to help
businesses eliminate wasteful energy spend with
their Smart Building optimization solutions.
Verdigris is harnessing the power of data and GPU-
driven deep learning to continually audit and
analyze electronic signatures of individual devices
to learn what's "normal" and identify patterns and
instances of energy waste. And with real-time
monitoring and alerts, response teams can react
to solve problems immediately.

                                                      NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   30
CONSUMER
MONITORING
Standard Traffic counters measure traffic into and out
of the store. Computer Vision offers enhancements by
providing:
• Unique Identity Detection – integrated into loyalty
   program where appropriate or available
• Age / Ethnicity segmentation. Detect age groups,
   including children, seniors.
• Shopping behavior tracking. Groups, couples,
   individuals
• Traffic patterns. Identify Path to Purchase, hot
   zones, cold zones, dwell points. Helps retailers
   make decisions on item placement or promotions
• Can integrate into app-based recommendation
   logic. Ability to launch targeted promotions based
   on proximity, past purchase, and consumer profile

Multiple camera signals can be stitched together to
detect patterns within the store.

Exterior cameras can determine shopper density based
on parking. Origin tracking can identify external
traffic sources, and/or co-marketing opportunities

                                            31
Warehouse /
Distribution Center
Optimization
Warehouse and DC’s are not built for consumer traffic
and can pose health and safety challenges for workers.
During peak season, shelves fill up and working space is
reduced to a minimum, making it harder for humans to
navigate safely and accurately measure inventory.

IFM uses NVIDIA Jetson technology mounted on a drone
to autonomously monitor inventory positions in the DC or
Warehouse.

As an Inception partner, IFM is closely aligned with
NVIDIA and is poised to deliver incredible impact on
retail and supply chain business processes

 YouTube Link:
 https://youtu.be/AMDiR61f86Y                 32
Shelf Scanning
Robotics
Store Associates are representatives of the brand, and
the face of the retail organization. It makes sense to
reduce the time spent performing tasks that are not
consumer-facing.

Performing inventory counts, replacing misplaced items,
or scanning for out-of-stock situations are examples of
basic, repetitive, and non-impactful operations for store
associates.

Fellow Robots has created a solution to scan shelves,
monitor misplaced items, and act as a wayfinder kiosk
for consumers. This allows associates to interact with the
shopping public, improving consumer satisfaction and
raising revenue through larger shopping baskets.

As an Inception partner, Fellow is closely aligned with
NVIDIA and is poised to deliver incredible impact on
retail business processes

 YouTube Link:
 https://youtu.be/l7NPmJP462M
            NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   33
TRAFFIC PATTERNS
LOSS PREVENTION
Using existing cameras, a retailer can install highly
effective computer vision algorithms to detect shopper
traffic patterns and prevent loss.
In the US, LP is a $48B problem impacting all retailers. At
the same time, investment in LP staff is flat of shrinking.
While average cost of shoplifting incident is doubling to
$798, 30% of inventory shrinkage is an inside job.
Using computer vision can identify theft, shrinkage, and
shoplifting incidents.
This new technology can invigorate a longstanding
problem for retail.

                                                              NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   34
COLLABORATIVE
DESIGN
Photorealistic Models

Interactive Physics

Design Flow Integration

Collaboration

                          NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   35
ART OF THE POSSIBLE
    The State of AI in Retail

Paul Hendricks
Solutions Architect
phendricks@nvidia.com
INTRODUCTION
• Paul Hendricks is a Solutions Architect at NVIDIA, helping
  enterprise customers with their deep learning and AI
  initiatives

• Paul's background is primarily in retail, and has spent the
  past 5 years working with many Fortune 500
  retail companies to implement data science and AI
  solutions.

• Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a
  Data Scientist building models to understand customer
  propensity to purchase and how to optimize assortment in
  stores.

• Currently, Paul's research at NVIDIA focuses on using deep
  learning in intelligent video analytics and recommendation
  systems.                           NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 38
INTRODUCTION
• Paul Hendricks is a Solutions Architect at NVIDIA, helping
  enterprise customers with their deep learning and AI
  initiatives

• Paul's background is primarily in retail, and has spent the
  past 5 years working with many Fortune 500
  retail companies to implement data science and AI
  solutions.

• Prior to joining NVIDIA, Paul worked at Victoria’s Secret as a
  Data Scientist building models to understand customer
  propensity to purchase and how to optimize assortment in
  stores.

• Currently, Paul's research at NVIDIA focuses on using deep
  learning in intelligent video analytics and recommendation
  systems.                           NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. 39
Intelligent Video Analytics

                              40
Object Detection
                                      Problem Background
• Data: Images

• Goal: Identify objects in an image, and output bounding boxes around the objects and their classes

                                                                                                       41
Frictionless Checkout

   https://www.standardcognition.com/

                                        42
Localizing Algorithms
Sliding windows
 •   If one of the windows only has half of the dog, the
     activation may not be strong enough
 •   Using small windows and small strides will be very
     computationally intensive

                                                             43
Localizing Algorithms
Sliding windows
  •   If one of the windows only has half of the dog, the
      activation may not be strong enough
  •   Using small windows and small strides will be very
      computationally intensive
Fully convolutional neural network
  •   Since convolutions are basically sliding windows,
      we can try replacing the fully connected layers
      with convolutional layers
  •   Bounding boxes generated are not very accurate

                                                              44
Localizing Algorithms
Sliding windows
  •   If one of the windows only has half of the dog, the
      activation may not be strong enough
  •   Using small windows and small strides will be very
      computationally intensive
Fully convolutional neural network
  •   Since convolutions are basically sliding windows,
      we can try replacing the fully connected layers
      with convolutional layers
  •   Bounding boxes generated are not very accurate
Region proposals
  •   Selects blob-like structures and proposes these as
      the regions to be passed into a CNN
  •   This concept is similar to sliding window

                                                              45
Localizing Algorithms
Sliding windows
  •   If one of the windows only has half of the dog, the
      activation may not be strong enough
  •   Using small windows and small strides will be very
      computationally intensive
Fully convolutional neural network
  •   Since convolutions are basically sliding windows,
      we can try replacing the fully connected layers
      with convolutional layers
  •   Bounding boxes generated are not very accurate
Region proposals
  •   Selects blob-like structures and proposes these as
      the regions to be passed into a CNN
  •   This concept is similar to sliding window
Single shot detection
  •   This algorithm predicts the coordinates of the
      bounding boxes as well as the class of the objects
  •   Fast since model looks at image once - YOLO
                                                              46
Getting Started
DLI Courses
  •   Object Detection with DIGITS - https://nvlabs.qwiklab.com/focuses/4125
Papers
  •   Fully convolutional layers for semantic segmentation - https://arxiv.org/pdf/1605.06211.pdf
  •   Rich hierarchies for accurate object detection and semantic segmentation - https://arxiv.org/pdf/1311.2524.pdf
  •   Fast R-CNN - https://arxiv.org/pdf/1504.08083.pdf
  •   Faster R-CNN: Towards real-time object detection with region proposal networks - https://arxiv.org/pdf/1506.01497.pdf
  •   Yolo 9000: Better, Faster, Stronger - https://arxiv.org/pdf/1612.08242.pdf
Libraries
  •   https://github.com/pjreddie/darknet
  •   https://github.com/tensorflow/models/tree/master/research/object_detection
Datasets
  •   COCO - http://cocodataset.org/
  •   ImageNet - https://www.kaggle.com/c/imagenet-object-detection-challenge

                                                                                                                        47
Anomaly Detection

                    48
Anomaly Detection
                                       Problem Background
• Data: Image, sensor data (time series), text data

• Goal: Detect if the data being generated is anomalous

                                                            49
Anomaly Detection
                                       Problem Background
• Data: Image, sensor data (time series), text data

• Goal: Detect if the data being generated is anomalous

                                                            50
UNSUPERVISED LEARNING
Anomaly detection using deep learning

Deep Autoencoder Network
    Input layer
         Size of data vector
                                                  Input: X                                                    ෩
                                                                                                      Output: X
    Bottleneck layer
         Summarized representation
             ▪ ‘embedding’
    Output layer
         Same dimensionality as input
    Reconstruction error
         High errors indicate potential anomaly

                                                                   X−෩
                                                                     X
                                                             Reconstruction Error
                                                                  NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   51
UNSUPERVISED LEARNING
        DL anomaly detection in time series

    Time Series Signals
            Split into sliding windows
              Normalization and preprocessing

w
1
    w
    2
        w
        3
                  …                             …   w
                                                    N

                                                        NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   52
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
            Input

                                               NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   53
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
            Input

                                               NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   54
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
               Input

      Output (Reconstruction)

                                               NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   55
UNSUPERVISED LEARNING
Detecting anomalies via reconstruction error
               Input

                                Reconstruction error (RE) as a proxy to
                                outliers
                                    Whenever RE is high, consider it a red flag
                                         Threshold can be set using statistical bounds

      Output (Reconstruction)

                                               Reconstruction vs Input

                                                            NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   56
Getting Started
DLI Courses
  •   Introduction to Autoencoders
  •   Anomaly Detection with Variational Autoencoders - https://nvlabs.qwiklab.com/focuses/8362
Papers & Books
  • Autoencoders - https://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdf
  • Deep Learning, Chapter 13 - http://a.co/1vbPNXr
  • Hands on Machine Learning with Scikit-Learn & TensorFlow, Chapter 13 - http://a.co/aImsrRT
Datasets
  • Fashion MNIST - https://github.com/zalandoresearch/fashion-mnist
  • Deep Fashion - http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html
  • UT Zappos 50k - http://vision.cs.utexas.edu/projects/finegrained/utzap50k/

                                                                                                                     57
Recommendation Systems

                         58
RECOMMENDATION SYSTEMS
                                       Problem Background
• Data: Matrix R [ rows are users, columns are items, cell values are ratings ]

• Goal: Compute missing Values in R – top N unseen items are good recommendation candidates

           R                                             X

                                                                                              59
MANY APPLICATIONS FROM SIMILAR PROBLEMS
    Using autoencoders to generate recommendations

                                                     60
MANY APPLICATIONS FROM SIMILAR PROBLEMS
    Using autoencoders to generate recommendations

            https://github.com/NVIDIA/DeepRecommender/

                                                         61
MANY APPLICATIONS FROM SIMILAR PROBLEMS
    Using autoencoders to generate recommendations

                                                                 5           4
                                                             3       3       5
                                                             3               4
                                                         4       4   3   5
                                                         5       2   2   5
                                                         4   4           4

            https://github.com/NVIDIA/DeepRecommender/

                                                                             62
Getting Started
DLI Courses
  •   Deep Autoencoders for Recommender Systems
Papers
  •   AutoRec – Autoencoders meet collaborative filtering - http://users.cecs.anu.edu.au/~u5098633/papers/www15.pdf
  •   Training deep autoencoders for collaborative filtering- https://arxiv.org/pdf/1708.01715.pdf
Libraries
  •   https://github.com/NVIDIA/DeepRecommender
  •   https://github.com/geffy/tffm
  •   https://github.com/apache/incubator-mxnet/tree/master/example/recommenders
Datasets
  •   Netflix - https://netflixprize.com/
  •   MovieLens – https://grouplens.org/datasets/movielens/
  •   UC Irvine Online Retail Dataset - http://archive.ics.uci.edu/ml/datasets/online+retail

                                                                                                                      63
NVIDIA Tools

               64
TESLA V100 32GB
WORLD’S MOST ADVANCED DATA CENTER GPU
NOW WITH 2X THE MEMORY

5,120 CUDA cores
640 NEW Tensor cores
7.8 FP64 TFLOPS | 15.7 FP32 TFLOPS | 125 Tensor TFLOPS
20MB SM RF | 16MB Cache
32GB HBM2 @ 900GB/s | 300GB/s NVLink

                                                         65
FASTER RESULTS ON COMPLEX DL AND HPC
                                                              Up to 50% Faster Results With 2x The Memory

                              FASTER RESULTS                                                        HIGHER ACCURACY                                           HIGHER RESOLUTION

         1.5X Faster                                   1.5X Faster                                           40% Lower Error                                       4X Higher
                                                                                                                                                                                           1024x1024
     Language Translation                              Calculations                                               Rate                                             resolution              res images                     Unsupervised Image
                                                                                                                                                                                                                              Translation
                                                                                                                                                                                                                            Input winter photo

                                                                                                                                 Accuracy
                              1.2                                         3.8TF                                                 (152 layers)
                           step/sec
                                                                                                        Accuracy
          0.8                                         2.5TF                                            (16 layers)                                                   512x512
       step/sec                                                                                                                                                     res images

          Neural Machine                            3D FFT 1k x 1k x 1k                                  VGG-16                     RN-152                          GAN Image to ImageGen
         Translation (NMT)
                                                                                                                                                                                                                              AI converts it to
                                                                                                                                                                                                                                  summer
                                                                                                    V100 16GB                          V100 32GB
                                                                                                                                                             GAN by NVRESEARCH (https://arxiv.org/pdf/1703.00848.pdf) |
Dual E5-2698v4 server, 512GB DDR4, Ubuntu 16.04, CUDA9, cuDNN7| NMT is GNMT-like and run with   R-CNN for object detection at 1080P with Caffe | V100 16GB
                                                                                                                                                             V100 16GB and V100 32GB with CONFIDENTIAL.
                                                                                                                                                                                NVIDIA    FP32               DO NOT       DISTRIBUTE.   66
TensorFlow NGC Container 18.01 (Batch Size= 128 (for 16GB) and 256 (for 32GB) | FFT is with     uses VGG16| V100 32GB uses Resnet-152
cufftbench 1k x 1k x 1k and comparing 2 V100 16GB (DGX1V) vs. 2 V100 32GB (DGX1V)
NEW TENSOR CORE BUILT FOR AI
                                 Delivering 120 TFLOPS of DL Performance

                                            TENSOR CORE
 TENSOR CORE
 MATRIX DATA OPTIMIZATION:
Dense Matrix of Tensor Compute
   TENSOR-OP CONVERSION:
  FP32 to Tensor Op Data for
         Frameworks                          VOLTA TENSOR CORE
                                                 4x4 matrix processing array
                                            D[FP32] = A[FP16] * B[FP16] + C[FP32]
                                                Optimized For Deep Learning
VOLTA-OPTIMIZED cuDNN                                                                       ALL MAJOR FRAMEWORKS

                                                                                    NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   67
NVIDIA DGX
             AI Supercomputer-in-a-Box

    960 TFLOPS | 8x Tesla V100 16GB | NVLink Hybrid Cube Mesh
2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U — 3200W

                                                      NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   68
THE WORLD’S FIRST 2
                      INTRODUCING
 PETAFLOPS SYSTEM     NVIDIA DGX-2

                      THE WORLD’S MOST POWERFUL
                      AI SYSTEM FOR THE MOST COMPLEX
                      AI CHALLENGES

                      • DGX-2 is the newest addition to the DGX
                        family, powered by DGX software
                      • Deliver accelerated AI-at-scale deployment
                        and simplified operations
                      • Step up to DGX-2 for unrestricted model
                        parallelism and faster time-to-solution

                                                             69
10X PERFORMANCE GAIN LESS THAN A YEAR
                                        15 days
                                 15

DGX-1, SEP’17                                                                     DGX-2, Q3‘18
                                 10

                                  5

                                                      1.5 days

                                  0
                                        DGX-1V         DGX-2

                              PyTorch Stack: Time to Train FAIRSEQ

          software improvements across the stack including NCCL, cuDNN, etc.

                                                                     NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   70
NVSWITCH
WORLD’S HIGHEST BANDWIDTH ON-NODE SWITCH

7.2 Terabits/sec or 900 GB/sec
18 NVLINK ports | 50GB/s per port bi-directional
Fully-connected crossbar
2 billion transistors | 47.5mm x 47.5mm package

                                                   71
NVSWITCH
ENABLES THE WORLD’S LARGEST GPU

16 Tesla V100 32GB Connected by New NVSwitch
2 petaFLOPS of DL Compute
Unified 512GB HBM2 GPU Memory Space
300GB/sec Every GPU-to-GPU
2.4TB/sec of Total Cross-section Bandwidth

                                               72
CHALLENGES WITH DEEP LEARNING

Current DIY deep learning
environments are complex
and time consuming to build,                             Open Source
                                                         Frameworks
test and maintain

Requires high level of
expertise to manage driver,
                                                         NVIDIA Libraries
library, framework
dependencies                                             NVIDIA Docker

                                                         NVIDIA Driver
Development of frameworks                                NVIDIA GPU
by the community is moving
very fast
                               NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   73
NVIDIA GPU CLOUD
                Deep Learning Everywhere, For Everyone

Innovate in minutes, not weeks
Removes all the DIY complexity of deep
learning software integration
Always up to date
Monthly updates by NVIDIA to ensure
maximum performance
Deep learning across platforms
Containers run locally on DGX Systems      NVIDIA GPU Cloud integrates GPU-optimized
and TITAN PCs, or on cloud service         deep learning frameworks, runtimes, libraries,
provider GPU instances                     and OS into a ready-to-run container,
                                           available at no charge

                                                        NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   74
DEEP LEARNING ACROSS PLATFORMS

           NVIDIA Volta or NVIDIA   NVIDIA DGX-1 and   Amazon EC2 P3 instances
         Pascal-powered TITAN GPU     DGX Station         with NVIDIA Volta
                                                                                 75
Container Orchestration for
  DL Training & Inference

                                           KUBERNETES on NVIDIA
AWS-EC2   |   GCP   |    Azure   |   DGX   GPUs
              KUBERNETES                   •   Scale-up Thousands of GPUs Instantly
                                           •   Self-healing Cluster Orchestration
NVIDIA CONTAINER                           •   GPU Optimized Out-of-the-Box
                        NVIDIA GPU CLOUD   •   Powered by NVIDIA Container Runtime
    RUNTIME
                                           •   Included with Enterprise Support on DGX
              NVIDIA GPUs                  •   Available end of April 2018

                                                                                         76
TENSORRT DEPLOYMENT WORKFLOW
Step 1: Optimize trained model
                                                                                                        Plan 1
                         Import                                  Serialize
                          Model                                   Engine
                                                                                                        Plan 2

                                                                                                        Plan 3
       Trained Neural
          Network                        TensorRT Optimizer                           Optimized Plans

Step 2: Deploy optimized plans with runtime

               Plan 1   De-serialize                             Deploy
                          Engine                                 Runtime
               Plan 2                                                                                 Data center

               Plan 3

      Optimized Plans                  TensorRT Runtime Engine                    Automotive
                                                                             NVIDIA                    Embedded
                                                                                    CONFIDENTIAL. DO NOT DISTRIBUTE.   77
TensorRT INTEGRATED WITH TensorFlow
                             Delivers 8x Faster Inference with TensorFlow + TRT
                                  Images/sec @ 7ms Latency
                                   ResNet-50 on TensorFlow
3,000
                                                                            2,657
2,500

2,000

1,500

1,000

 500                                                           325                        Available in TensorFlow 1.7
                            11 *                                                           https://github.com/tensorflow
   0
                                                           V100       V100 Tensor Cores
                         CPU
                                                       (TensorFlow,     (TensorFlow+
                        (FP32)
                                                          FP32)           TensorRT)
    * Best CPU latency measured at 83 ms

    CPU: Skylake Gold 6140, 2.5GHz, Ubuntu 16.04; 18 CPU threads.
    Volta V100 SXM; CUDA (384.111; v9.0.176);                                                   NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   78
    Batch size: CPU=1, TF_GPU=2, TF-TRT=16 w/ latency=6ms
NVIDIA TensorRT 4 RC NOW AVAILABLE
          RNN and MLP Layers              ONNX Import               NVIDIA DRIVE Support
    Maximize RNN and                   Optimize and Deploy                   Support for NVIDIA
     MLP Throughput                       ONNX Models                          DRIVE Xavier
        Recommendation Engine
  50X
  40X
  30X
        45X Speedup
  20X
  10X
  0X
           CPU        TensorRT

Speed up speech, audio and           Easily import and accelerate
                                                                          Deploy optimized deep learning
recommender app inference            inference for ONNX frameworks
                                                                          inference models NVIDIA DRIVE
performance through new layers       (PyTorch, Caffe 2, CNTK, MxNet
                                                                          Xavier
and optimizations                    and Chainer)

                      Free download to members of NVIDIA Developer Program
                                 developer.nvidia.com/tensorrt            NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   79
NVIDIA DIGITS
Interactive Deep Learning GPU Training System

Interactive deep learning training application for
engineers and data scientists
Simplify deep neural network training with an interactive
interface to train and validate, and visualize results

Built-in workflows for image classification, object detection
and image segmentation

Improve model accuracy with pre-trained models from the
DIGITS Model Store

Faster time to solution with multi-GPU acceleration

developer.nvidia.com/digits                                     80
DIGITS DEEP LEARNING WORKFLOWS
     IMAGE                       OBJECT                     IMAGE
 CLASSIFICATION                 DETECTION               SEGMENTATION

              98% Dog
                 2% Cat

Classify images into       Find instances of objects   Partition image into
classes or categories      in an image                 multiple regions

Object of interest could   Objects are identified      Regions are classified at
be anywhere in the image   with bounding boxes         the pixel level

                                                                                   81
WHAT’S NEW IN DIGITS 6?
    TENSORFLOW SUPPORT                       NEW PRE-TRAINED MODELS

Train TensorFlow Models Interactively with   Image Classification: VGG-16, ResNet50
                 DIGITS                      Object Detection: DetectNet

                                                            NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.   82
You can also read