Accelerated Device Placement Optimization with Contrastive Learning - Hao Lan, Li Chen, Baochun Li University of Toronto

Page created by Patrick Morgan

Society

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Accelerated Device Placement Optimization with Contrastive Learning - Hao Lan, Li Chen, Baochun Li University of Toronto

Accelerated Device Placement
Optimization with Contrastive
Learning

Hao Lan, Li Chen, Baochun Li
University of Toronto
hao.lan@mail.utoronto.ca

What is device placement?

Background
‣ Problem: Large DNNs cannot t into single GPU (mem limitation).

‣ Solution: Model parallelism, Partition a DNN across multiple GPUs

                                        CPU     GPU1     GPU3

                                                GPU2     GPU4

                                               Machine

   Deep Neural Networks               Computation Resources           3
                       fi

Device placement with deep reinforcement learning
Objective: to find the best way to assign devices to operations to
minimize training time
                                  Device Placement Optimization with Reinforcement Learning

Figure 5. RL-based placement of Inception-V3. Devices are denoted by colors, where the transparent color represents an operation on a
Mirhoseini et al. (Google Inc.), “Device placement optimization with
CPU and each other unique color represents a different GPU. RL-based placement achieves the improvement of 19.7% in running time
compared to expert-designed placement.
reinforcement learning,” in Proc. ICML 2017.
Neural MT. We train our Neural MT model on the                      Inception-V3. We train Inception-V3 on the ImageNet
WMT14 English-German dataset.1 For these experiments,               dataset (Russakovsky et al., 2015) until the model reaches          4
we pre-process the dataset into word pieces (Wu et al.,             the accuracy of 72% on the validation set. In practice, more

Device placement with deep reinforcement learning
                   Agent
                                                1        2

                           Policy                                 action
                                            3        4       5
                          Network

                                            6        7       8

                                                    reward
              observation
              1       2               CPU           GPU 1        GPU 3

          3       4         5
                                                    GPU 2        GPU 4

          6       7         8                   Machine

        Neural Networks             Computation Resources                  5

Agent design
     1       2                       1       2                     1               2

                     Grouper                         Placer
 3       4       5               3       4       5            3            4               5

 6       7       8
                     merge       6       7       8            6            7               8

Node Features                  Group Embeddings                   Placement

     1       2                       1       2                         1               2

3        4       5   Encoder     3       4       5   Placer   3                4               5

6        7       8               6       7       8            6                7               8

Node Features                  Node Embeddings                    Placement                        6

Challenges
We notice that Google’s ICML 2017 uses about 100 workers and
12 to 27 hours to nd the best placement for workloads. It mainly
caused by two reasons:

‣ The REINFORCE algorithm is ine cient. It requires large number
  of samples to train the agent.

‣ In device placement, the environment is a real-world machine
  with multiple devices(CPU and GPUs). Collecting rewards from a
  real-world environment is very slow, especially when the
  workload is a large DNN.
                                                                   7
           fi
                          ffi

Solution
‣ For the rst challenge, we replace the REINFORCE algorithm
  with PPO algorithm to improve the sample e ciency, however, it
  still needs a substantial amount of samples to train the agent.

‣ For the second challenge, one of the solution is using a
  simulated environment instead of a real one (proposed by
  Placeto). In that sense, we need to build a model for every
  single environment, which is not a model-free methods.

‣ Is there a way to train the agent with a very few amount of
  samples or even without any samples?
                                                                    8
  fi
                                       ffi

Our framework
R 2019
        Published as a conference paper at ICLR 2019
            ‣ To address these challenges, we proposed our DRL-based
              framework, Mars, a graph encoder pre-trained by contrastive
              learning,   followedPublished
                      (H, A)
                  (X, A)           by a aslight-weight
                                               (H, A)               segment-level
                                            a conference paper at ICLR 2019
                                                                                  sequence-to-
              sequence placer.E
 E                         ~xi                                       ~hi                    D        +
                                 ~hi                           D            +         R
              Neural                 01…00                  DGI                                             Seq-to-Seq
ublished Cas Network
             a conference paper at ICLR
                                     1 0 2019
                                         …01           R       (X, A)
                                                       Graph Encoder
                                                                                            ~s              (H, A)
                                                                                                              Placer
                                         …
                                            …
                                  …

                          ~x
                           ej
                                                E              ~s   ~e
                                                                     hj                   ED              output sequence
                                                                                                               ~hi                   D       +
                                  01…10                             ~
                                                                    xi          0.232 … 0.876                                        0.021 … 0.547
 E          1        2          ~e                        readout                                                              R
                               Adjacency
                                 h j     Matrix
                                       Published as          D
                                                  Ca conference paper at ICLR 2019                                                    ~s

                                                                                  …

                                                                                       …

                                                                                            …

                                                                                                                                           …

                                                                                                                                               …

                                                                                                                                                   …
                (X, A)    e  e
                         (X, A)                       (H, A)      e  e
                                                                 (H, A)
                                                                                                             Softmax
         3      4      5                                              ~x                      E                  ~e
                                     op_type
          Figure 1: A high-level overview of Deep Graph Infomax.       ej
                                                                    Refer  to      0.521 … 0.391
                                                                              Section 3.4 for more details.… LSTM hj
                                                                                                                              …      0.602
                                                                                                                                     D     … 0.102
                                       E                       GCNs                                                   LSTM
                ~xi           e   input_shape
                                   e                       ~hi                      D         +
         6      7      8 (Table
                             H,output_shape
                                  A)
                                   1: Summary of the datasets used ineoure experiments.
                                                        (X, A)      (X, A)
                                                                              R Node Representation  (H, A)     e A)
                                                                                                               (H,  e               Placement Policy
                                                                             ~s
eep   Graph  Infomax.
         Dataset   Task
                       Refer to
                          Node   Section
                                 Features
                              Nodes
                                          3.4   for   more
                                             Figurecorruption
                                          Edges
                                                              details.
                                                       1: A high-level
                                                      Features         overviewTrain/Val/Test
                                                                   Classes      of Deep Graph   Infomax.input
                                                                                              Nodes       Refersequence
                                                                                                                to Section 3.4 for more details.   9
                               E                   ~ei                     E
               ~                                  ~x                                             ~h                      D         +

Contrastive Learning

Computational Graph       Graph        h ⃗
                                                            Unsupervised
                          Neural              Summary s ⃗
    Corrupted            Network
                                       h̃ ⃗                 Loss ℒ ( ⋅ )
Computational Graph

  We use contrastive learning to pre-train the graph encoder without
  any labeled data. After pre-training, the graph encoder can encode
  the operation in computational graph into a node representation
     ⃗
   h , which represents the operation in a vector space.
                                                                           10

Contrastive Learning

Computational Graph       Graph       h ⃗
                                                             Unsupervised
                          Neural             Summary s ⃗
    Corrupted            Network
                                      h̃ ⃗                   Loss ℒ ( ⋅ )
Computational Graph
            Parameters
                             DRL agent

  Node          Graph             Node
                                                    Placer       Placement
 Features      Encoder       Representations
                                                                             11

Light-weight Placer
‣ In node classi cation task, DGI add a logistic regression layer as
  classi er after the pre-trained graph encoder, and train it with the
  labeled data. To reduce the amount of labeled data required for
  training, the classi er added should be simple and easy to train.

‣ Following this idea, we use a light-weight placer in our agent
  design, a segment-level sequence-to-sequence neural network.
                              DRL agent

                Pre-trained
 Node                           Node
                  Graph                           Placer      Placement
Features                      Embeddings
                 Encoder
                                                                          12
 fi
           fi
                fi

Light-weight Placer
‣ A segment-level sequence-to-sequence placer has been designed
  to avoid placing extremely long operation sequence at a time.

‣ The light-weight placer can utilize the pre-trained parameters of
  the graph encoder better.
                                 p1                 ps          ps+1                 p2s          pks+1       pn

                                LSTM       …       LSTM         LSTM        …       LSTM    …      LSTM   …   LSTM

                                 go

             BiLSTM                            BiLSTM               …             BiLSTM

                                                                                                                     13
       {~h1 , ~h2 . . . ~hs }         {~hs+1 , ~hs+2 . . . ~h2s }       {~hks+1 , ~hks+2 . . . ~hn }

Experimental Results
 ‣ From experimental results, ours approach outperforms all state-
   of-the-arts.

               Per-step runtime (in seconds) of placements found
                             by different approaches.

                 Human      GPU     Grouper-   Encoder-         Mars (no
  Models                                                  Mars
                 Experts    Only     Placer     Placer         pre-training)

Inception-V3     0.071     0.071     0.067      0.067     0.067    0.067
  GNMT-4         1.661      OOM      1.418      1.437     1.379    1.396
   BERT           OOM       OOM     12.661     11.737     9.214    11.363
                                                                               14

Experimental Results
‣ With unsupervised pre-training, we achieved better performance
  while reducing the agent training time by 13.2% on average.

Thank you

You can also read

VELOCITY POWER BMW PLUG & PLAY SUBWOOFER UPGRADE

Supplementary Materials: Blood Glucose Level Regression for Smartphone PPG Signals Using Machine Learning - MDPI

BUBBLE CLUSTER DYNAMICS WORKSHOP - Organizers: Stephane Zaleski Daniel Fuster Francis Luck

Snow Plough Championship - 1st European

Mumbai-Ahmedabad High Speed Railway Corridor - February 2013 Dr. Toshiji TAKATSU, Executive Vice President

BDS TEMPERATURE KIOSK - QUICK START GUIDE BDS-8W(-1), BDS-8THA(-1), BDS-8FHA(-1) - CNET Content Solutions

THE NEXT CHAPTER IN AV CONTROL - Hall Technologies

Tracker Hunter 6 Quick user guide 2020

Physical Activity pathways for people with long term conditions - Andrew Power Health & Wellbeing Manager

Stay active and healthy despite social distancing with the new live streaming classes from Urban Sports Club

July 17th to 25th, 2019 - Corvus American Academy

Rochester Yacht Club Live Drill with The US Coast Guard and First Responders

SDG METADATA TRANSLATION PROJECT - Using Open-Source Solutions to Accelerate SDG Metadata Translation and Build National Statistical Capacity - UNECE

FIS Ski and Snowboard Slopestyle and Big Air Asia Cup Team Invitation - CBS Wanda International Resort

Activity 1: Tolerating face coverings - Why might someone find it difficult to tolerate a face covering? - Skills for Care

Studios & Event Space at London City Island - English ...

Annual Review of Competence Progression (ARCP) Requirements Guidance - all trainees - Valid until September 2021

2021 Popcorn Leader Guide - Mobile Area Council

Grafana @ eBay - Moving Monitoring Visualizations from custom UIs to Grafana Plugins - Vijay Samuel Github: @vjsamuel Twitter: @vjsamuel_ - GrafanaCon

TALKTALK + OPEN-XCHANGE - TALKTALK MAIL

www.gteic.ie - Greasán Digiteach na Gaeltachta - Clann Credo

Pre-Final year students - Guidelines for Training & Placement cell KIT, Tiptur

ISTORIA - CREATE YOUR OWN - ISTORIA Hotel Santorini

PREMIER RETAIL MEDIA PACK 2018 - RETAIL - Roma Publications

Shipping Alcohol-based Hand Sanitizer - Temporary Guidance on Transport During COVID- 19 Public Health Emergency - phmsa

PREMIUM HOSPITALITY - Optus Stadium

CTE Coffee & Conversation - January 6, 2020

Towards a 3D prediction of fogs on airports with Météo-France operational forecast model AROME - Alain Dabas T. Bergot, C. Lac, F. Burnet, P ...

PowerShell Dropper: Remcos RAT Date: 15/02/2021 Shikha Sangwan

CHARTER AND CARGO - NICHE AVIATION BUSINESSES DELIVERING LOW RISK, HIGH MARGIN REVENUE - MERREN MCARTHUR CHIEF EXECUTIVE VIRGIN AUSTRALIA REGIONAL ...

Realy Tech COVID-19 Antigen Rapid Test - Accurate Fast - German Medical ...

2019 Westpac Future Leaders Scholarship - westpac.com.au/scholarships - PRISM Alliance

UCAS APPLY - #MAKEITHAPPEN - REGISTER TO APPLY - UCAS.COM/STUDENTS - WYKE SIXTH FORM COLLEGE

We inspire healthy habits for real life * For people, families, communities, the world-for everyone.

Asia Summit 2021 - PEI Events

Toowoomba North Basalts Groundwater Status Review 2020 - Contents of this presentation - Department of Natural Resources ...

Clout Coin Copyright CloutCoin. All rights - Webflow

Facility Advertising Packages 2020 - Central Ohio's premier indoor sports complex - Resolute Athletic Complex

TUNING GUIDE - Fox Service Center

ATEX/IECEx LED Worklite - NEW AND IMPROVED 2018 - NEW AND IMPROVED 2018 | Wolf Safety ...

DIRTY DEFENSE DONE DIRT CHEAP HOW TO MAKE MY LIFE HARDER - AND YOUR LIFE EASIER - Red Siege

Artificial Intelligence solves real problems in industry and business - Dario Piga, IDSIA - Swissphotonics

FREESTANDING ACRYLIC BATHTUBS NEWS 2018 - DESIGN ELEGANCE ARCHITECTURE INSPIRATION - Cabasa

BasicDIM Wireless Lighting Control - Zumtobel

Digital transformation with 5G - Iskra Nikolova 15/10/2018 - Linux Foundation Events

Invest EU 2021-2027 EU budget for the future - Merete Clausen, Head of Unit, Directorate-General of Economic and Financial Affairs, European ...

Focal Point Meeting External evaluation of the focal point network - 19 May 2021 - EFSA

Industrial Router - TK500 Series - Sartelco

EULAR 7th Course for Ultrasound Trainers in Rheumatology: Teach the Teachers

Web Application Troubleshooting - White Paper - VIAVI Solutions