Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...

Page created by Douglas Aguilar
 
CONTINUE READING
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
Pioneering and Democratizing Scalable HPC+AI

                                 Nick Nystrom                   Paola Buitrago
                                 Interim Director, PSC          Director, AI & Big Data, PSC
                                 nystrom@psc.edu                paola@psc.edu

                                              2019 Stanford Conference · Stanford · February 15, 2019

©1 2019 Pittsburgh Supercomputing Center                                                       © 2019 Pittsburgh Supercomputing Center
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
Outline

                Motivation & Vision
    Realizing the Vision: Bridges and Bridges-AI
               Exemplars of Success
                      Summary

2
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
What is PSC?
    National service provider for research and discovery                                        Active member in the CMU and Pitt communities
    • Bridges, Anton 2, Brain Image Library,                                                    • Research collaborations
      Open Compass, XSEDE, Olympus                                                              • Colocation for lower cost and greater capability

                                             Brain Image
                       Bridges     Anton 2        Library

                                                                 PSC is a joint effort of
                                                              Carnegie Mellon University
    Research institution advancing knowledge                and the University of Pittsburgh.
                                                                                                       Networking and security
    through converged HPC, AI, and Big Data                   33 years of leadership in HPC,           • Networking & security service provider
    • ~30 active funded projects                            HPDA, and computational science.
                                                                                                       • Research networking
                                                              21 HPC systems, 10 of which
                                                                were the first or unique.
                                                                Pioneering the convergence
           Education and training                                   of AI + HPC + data.
           • Lead national & local workshops                                                      Advise and support industry
           • Support courses at CMU and elsewhere                                                 • Training, access to advanced resources,
           • Teaching, thesis committees, interns                                                   collaborative research

3
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
Research Needs Converged HPC, AI, and Data

    Pan-STARRS telescope
    http://pan-starrs.ifa.hawaii.edu/public/

                                               Social networks and the Internet                         Video                      Wearable Sensors                                   Detecting Cancer                   The Human BioMolecular Atlas Program
                                                                                                   Wikipedia Commons                    F. De Roose et al.,                      https://research.googleblog.com/2017/             https://commonfund.nih.gov/hubmap
                                                                                                                         https://techxplore.com/news/2016-12-smart-contact-            03/assisting-pathologists-in-
                                                                                                                                      lens-discussed-electron.html                            detecting.html

    Genome sequencers
    (Wikipedia Commons)

                                                                 Collections                              Legacy documents                   Environmental sensors: Water temperature                                             Library of Congress stacks
    BlueTides astrophysics simulation            Horniman museum: http://www.horniman.ac.uk/               Wikipedia Commons
                                                                                                                                                 profiles from tagged hooded seals                                          https://www.flickr.com/photos/danlem2001/6922113091/
    http://bluetides-project.org/                    get_involved/blog/bioblitz-insects-reviewed                                                 http://www.arctic.noaa.gov/report11/biodiv_whales_walrus.html

            Structured, regular,                                                                                   Unstructured, irregular, heterogeneous
               homogeneous
4
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
Enabling the Creation of Knowledge
    Objectives                                             Common Goal
    Enable data-intensive applications & workflows         Enable the creation of knowledge
     • Deliver HPC Software as a Service                    • Democratize HPC, Big Data, and AI
       (Science Gateways)                                   • Enable research areas that have not
     • Deliver Big Data as a Service (BDaaS)                  previously used HPC
     • Provide scalable deep learning, machine learning,    • Advance previously traditional fields
       and graph analytics                                    through machine learning and data
     • Support very large in-memory databases                 analytics
     • Facilitate data assimilation from instruments and    • Couple applications in novel ways
       the Internet
    Scale beyond the laptop and to
    interdisciplinary, collaborative teams

5
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
The Rapid Growth of AI

    From: Artificial Intelligence Index: 2018 Annual Report (Stanford University, 2018)

6
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
Outline

                Motivation & Vision
    Realizing the Vision: Bridges and Bridges-AI
               Exemplars of Success
                      Summary

7
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
• Available at no cost for open research and
                                                                                                  courses and by arrangement to industry
                                                                                                • Easier access for CMU and Pitt faculty through
                                                                                                  the Pittsburgh Research Computing Initiative
                                                                                                • 29,036 Intel Xeon CPU cores
                                                                                                • 216 NVIDIA GPUs: 64 K80, 64 P100, 88 V100
                                                                                                • 17 PB storage (10 PB persistent, 7.3 PB local)
                                                                                                • 277 TB memory (RAM), up to 12 TB per node
                                                                                                • 44M core-hours, 173k GPU-AI-hours,
                                                                                                  442k GPU-hours, and 343k TB-hours
                                                                                                  allocated quarterly
                                                                                                • Serving ~1,850 projects and ~7500 users at
                                                                                                  393 institutions, spanning 119 fields of study

                                                                                                • Bridges-AI: NVIDIA DGX-2 Enterprise AI
                                                                                                  system + 9 HPE 8-Volta Apollo 6500 Gen10
                                                                                                  servers: total of 88 V100 GPUs

    Bridges converges HPC, AI, and Big Data to empower new research communities, bring desktop convenience
    to advanced computing, expand remote access, and help researchers to work more intuitively.
    •   Funded by NSF award #OAC-1445606 ($20.9M), Bridges emphasizes usability, flexibility, and interactivity
    •   Available at no charge for open research and coursework and by arrangement to industry
    •   Popular programming languages and applications: Python, Jupyter, R, MATLAB, Java, Spark, Hadoop, …
    •   856 compute nodes containing Intel Xeon CPUs and 128GB (800), 3TB (42), and 12TB (4) of RAM each
    •   216 NVIDIA Tesla GPUs: 64 K80, 64 P100, (new) 88 V100 configured to balance capability & capacity
    •   Dedicated nodes for persistent databases, gateways, and distributed services
    •   The world’s first deployment of the Intel Omni-Path Architecture fabric
8
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
Acquisition and operation of Bridges are made
                                        possible by the National Science Foundation
                                        through award #OAC-1445606 ($20.9M):
                                        Bridges: From Communities and Data to
                                                 Workflows and Insight

                                                                delivered Bridges, and is now
                                                                delivering Bridges GPU-AI

    All trademarks, service marks, trade names, trade dress, product names, and logos appearing herein are the property of their respective owners.

9
Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
Bridges Makes Advanced Computing Easy

     Make HPC accessible to all research communities
     Converge HPC, AI, and Big Data
     Support the widest range of science with an extremely rich
     computing environment
     • 3 tiers of memory: 12 TB, 3 TB, and 128 GB
     • Powerful, flexible CPUs and GPUs
     • Familiar, easy-to-use user environment:
                                  Elements not available in traditional supercomputers
         – Interactivity
         – Popular languages and frameworks:
           Python, Anaconda, R, MATLAB, Java, Spark, Hadoop
         – AI frameworks: TensorFlow, Caffe2, PyTorch, etc.
         – Containers (e.g., NGC) and virtual machines (VMs)
         – Databases
         – Gateways and distributed (web) services
         – Large collection of applications and libraries

10
10
Conceptual Architecture

                                                                               Users,
                                                                               XSEDE,
                                                                               campuses,
     Web Server         Database         Data Transfer       Login             instruments
       nodes             nodes              nodes            nodes

                              Intel Omni-Path
     Parallel File                                       Management
                                Architecture
       System                                              nodes
                                    fabric

                                                          Bridges-AI
                                                                            Introduced in
                                          RSM Nodes
     ESM Nodes          LSM Nodes
                                          128GB RAM      NVIDIA DGX-2       Operations Year 3
     12TB RAM            3TB RAM                          (16 V100 GPUs)
                                          800 nodes,
      4 nodes            42 nodes                        9x HPE A6500
                                         48 with GPUs    (9x 8 V100 GPUs)

11
20 Storage Building Blocks,      Project &
                                                                                                       implementing the parallel Pylon community
                                                                                                       storage system (10 PB usable)    datasets

                                                                                                                 4 HPE Integrity
                                                                                                            Superdome X (12TB)
                                                                                                              compute nodes …                 Large-
                                                                                                         … each with 2 gateway nodes         memory
                                                                                                                                              Java &
                                            4 MDS nodes                                                                                      Python
                                       2 front-end nodes                                               42 HPE ProLiant DL580 (3
 Representative                             2 boot nodes                                                    TB) compute nodes
 uses for AI                        8 management nodes
                                                                                                12 HPE ProLiant DL380
                                                                                                       database nodes              User interfaces for
                      6 “core” Intel® OPA edge switches:
                      fully interconnected,                                                      6 HPE ProLiant DL360                AIaaS, BDaaS
                      2 links per switch                                                            web server nodes

          Robust paths to                                                                     20 “leaf” Intel® OPA edge switches
          parallel storage       Intel® OPA cables
                                                                                                        Distributed training, Spark, etc.

     Purpose-built Intel® Omni-Path                                                                                                               Deep
                                                                                                 32 HPE Apollo 2000 (128GB) GPU nodes
        Architecture topology for                                                                                                               Learning
                                                                                                    with 2 NVIDIA Tesla P100 GPUs each
           data-intensive HPC
                                                                                           Bridges-AI                 Maximum-Scale Deep Learning
                                                                                  32 RSM nodes, each with
                                                                                  2 NVIDIA Tesla P100 GPUs                            NVIDIA DGX-2 and
                                                                                                                                      9 HPE Apollo 6500
                                                                                                                                      Gen10 nodes:
                                                                          16 RSM nodes, each with 2 NVIDIA Tesla K80 GPUs             88 NVIDIA Tesla
                               748 HPE Apollo 2000 (128GB)                                                                            V100 GPUs
                                            compute nodes
                                                                16 HPE Apollo 2000 (128GB) GPU nodes
                             ML, inferencing, DL development,       with 2 NVIDIA Tesla K80 GPUs each               Bridges Virtual Tour:
                                   Spark, HPC AI (Libratus)
                                                                   Simulation (including AI-enabled)                https://psc.edu/bvt
12
12
Accessing Bridges: No Cost for Research & Education and
                            Cost-Recovery Rates for Corporate Use
     The following annual allocations are renewable and extendable, also at no cost for research and education.

                                                                   Open Research                                        Industry

                                                                                                                  PSC Corporate Program
                                           Startup                     Research                    Education

      Cost                               No charge                    No charge                    No charge      Cost recovery rates

      CPU-hours                              50k                      Up to ~107                   Up to ~106         Up to ~18M

      GPU-hours                              2500                     Up to ~105                   Up to ~104         Up to ~180k

      GPU-AI hours                           1500                     Up to ~105                   Up to ~104          Up to ~69k

      TB-hours                               1000                     Up to ~104                   Up to ~104         Up to ~137k

      Developer                               Yes                          Yes                         (Yes)              Yes

      Accepted                            Any time                     Quarterly                    Any time            Any time

      Awarded                             ~1-2 days                    Quarterly                    ~1-3 days            ASAP

13
Interactivity

     Interactivity is the feature most frequently
     requested by nontraditional HPC communities.
      – Interactivity provides immediate
        feedback for doing exploratory
        data analytics and testing hypotheses.
      – Bridges offers interactivity through a combination of shared,
        dedicated, and persistent resources to maximize availability
        while accommodating diverse needs.

14
High-Productivity Programming

     Supporting languages that communities already use is vital for them to
     apply HPC to their research questions. This applies to both traditional and
     nontraditional HPC communities.

15
Gateways and Tools for Building Them
     Gateways provide easy-to-use access to Bridges’ HPC and data resources, allowing users to
     launch jobs, orchestrate complex workflows, and manage data from their browsers.
      – Provide “HPC Software-as-a-Service”
      – Extensive use of VMs, databases, and distributed services

                   Galaxy (PSU, Johns Hopkins)   The Causal Web (Pitt, CMU)       Neuroscience Gateway (SDSC)
                    https://galaxyproject.org/   http://www.ccd.pitt.edu/tools/

16
Databases and Distributed/Web Services

     Dedicated database nodes power persistent relational
     and NoSQL databases
      – Support data management and data-driven workflows
      – SSDs for high IOPs; HDDs for high capacity
                                                      (examples)

     Dedicated web server nodes
      – Enable distributed, service-oriented architectures
      – High-bandwidth connections to XSEDE and the Internet

17
Bridges-AI: Overview
     • 1 NVIDIA DGX-2                                                       Volta introduces Tensor Cores to
       Tightly couples 16 NVIDIA Tesla V100 (Volta) GPUs                    accelerate neural networks, yielding
                                                                            extremely high peak performance
       at 2.4 TB/s bisection bandwidth, to provide maximum
                                                                            for appropriate applications.
       capability for the most demanding of AI challenges
                                                                            Bridges-AI providea massive
     • 9 Hewlett Packard Enterprise Apollo 6500 Gen10 servers               aggregate performance:
                                                                             • 9.9Pf/s mixed-precision tensor
       Each with 8 NVIDIA Tesla V100 GPUs connected by
                                                                             • 251Tf/s 32-bit
       NVLink 2.0, to balance great AI capability and capacity
                                                                             • 125Tf/s 64-bit
     • Bridges-AI is integrated with Bridges and allocated through
       XSEDE as resource “Bridges GPU-AI”, analogous to Bridges GPU, RM, LM, and Pylon

     • Bridges-AI adds 9.9 Pf/s of mixed-precision tensor, 1.24 Pf/s of fp32, and 0.62 Pf/s of fp64.
       (Totals: 9.9 Pf/s tensor, 3.93 Pf/s fp32, 1.97 Pf/s fp64).

     • The $1.786M supplement includes additional staffing to support solutions and scaling

     • Deployment: Bridges-AI deployed on time. PSC ran an Early User Program from November-
       December 2018, and production operations began January 1, 2019.

18
The Heart of Bridges-AI: NVIDIA Volta

                                                                             New Streaming Multiprocessor (SM) architecture, introducing
                                                                             Tensor Cores, independent thread scheduling, combined L1 data cache and
                                                                             shared memory unit, and 50% higher energy efficiency over Pascal.
                                                                             Tensor Cores accelerate deep learning training and inference, providing up to 12× and
            NVIDIA Tesla V100 SXM2 Module                                    6× higher peak flops respectively over the P100 GPUs currently available in XSEDE.
                with Volta GV100 GPU
                                                                             NVLink 2.0 delivering 300 GB/s total bandwidth per GV100, nearly 2× higher than P100.
                                                                             HBM2 bandwidth and capacity increases: 900 GB/s and up to 32GB.
                                                                             Enhanced Unified Memory and Address Translation Services improve accuracy of
                                                                             memory page migration by providing new access counters.
                                                                             Cooperative Groups and New Cooperative Launch APIs expand the programming model
                                                                             to allow organizing groups of communicating threads.
                                                                             Volta-Optimized Software includes new versions of frameworks and libraries optimized
     Training ResNet-50 with ImageNet:                                       to take advantage of the Volta architecture: TensorFlow, Caffe2, MXNet, CNTK, cuDNN,
     V100 : 1075 images/sa                                                   cuBLAS, TensorRT, etc.
     P100 : 219 images/sb
     K80 : 52 images/sb
     a. https://devblogs.nvidia.com/tensor-core-ai-performance-milestones/
     b. https://www.tensorflow.org/performance/benchmarks

19
Balancing AI Capability & Capacity: HPE Apollo 6500

                                    Bridges-AI adds 9 HPE Apollo 6500 Gen10 servers
                                    Each HPE Apollo 6500 couples 8 NVIDIA Tesla V100 SXM2 GPUs
                                      – 40,960 CUDA cores and 5,120 tensor cores

                                    Performance: 1 Pf/s mixed-precision tensor, 125 Tf/s 32b, 64 Tf/s 64b
                                    Memory: 128 GB HBM2, 7.2 TB/s aggregate memory bandwidth
     HPE Apollo 6500 Gen10 Server
                                    2 × Intel Xeon Gold 6148 CPUs and 192 GB of DDR4-2666 RAM
                                      – 20c, 2.4–3.7 GHz, 27.5 MB L3, 3 UPI links

                                    4 × 2 TB NVMe SSDs for user and system data
                                    1 × Intel Omni-Path host channel adapter
                                    Hybrid cube-mesh topology connecting the 8 V100 GPUs and 2 Xeon
                                    CPUs, using NVLink 2.0 between the GPUs and PCIe3 to the CPUs
        HPE Apollo 6500 Gen10
      hybrid cube-mesh topology

20
Maximum DL Capability: NVIDIA DGX-2

                                  Couples 16 NVIDIA Tesla V100 SXM2 GPUs
                                    – 81,920 CUDA cores and 10,240 tensor cores

                                  Performance: 2 Pf/s mixed-precision tensor, 251 Tf/s 32b, 125 Tf/s 64b
                                  Memory: 512 GB HBM2, 14.4 TB/s aggregate memory bandwidth
                                  2 × Intel Xeon Platinum 8168 CPUs and 1.5 TB of DDR4-2666 RAM
                                    – 24c, 2.7–3.7 GHz, 33 MB L3, 3 UPI links
           NVIDIA DGX-2
                                  2 × 960 GB NVMe SSDs host the Ubuntu Linux OS
                                  8 × 3.84 TB NVMe SSDs (aggregate ~30 TB)
                                  8 × Mellanox ConnectX adapters for EDR InfiniBand & 100 Gb/s Ethernet
                                  The NVSwitch tightly couples the 16 V100 GPUs for capability & scaling
                                    – Each of the 12 NVSwitch chips is an 18×18-port, fully-connected crossbar
     NVIDIA DGX-2 with NVSwitch     – 50 GB/s/port and 900 GB/s/chip bidirectional bandwidths
          internal topology
                                    – 2.4 TB/s system bisection bandwidth

21
Deep Learning Frameworks on Bridges

22
Containers
                                                                                   Interoperability
                                                                                   with clouds and
     Containers enable reproducible, cloud-interoperable workflows                 other resources

     and simplify deployment of applications and frameworks
      – PSC is a key partner of the Critical Assessment of Metagenome Interpretation
        (CAMI) project for reproducible evaluation of metagenomics tools
      – CAMI and the DOE Joint Genome Institute defined the biobioxes standard for
        Docker containers encapsulating bioinformatics tools
     Docker images can be converted to Singularity images and run on Bridges
      – Certain vetted Docker containers are also supported

23
Community Datasets
                                                          Some unique, others
                                                         with local caching for
     • Hosting mature corpus of data and data tools      efficiency and to drive
                                                            interdisciplinary
       for an open science community                            research
      – Accessible by multiple users, multiple groups.
      – Provision of reusable data management tools
      – Facilitate collaboration
      – Offload data management
     • Interoperable with HPC capabilities
      – High speed data transfer
      – High performance compute capabilities
     • Support copies, maintenance, guarantee
       integrity
     • Data resource not subject to project
       limitations

24
The Expanding Ecosystem of Bridges

                        Hybrid on-prem
     10s of PB       data/AI/HPDA + Cloud
                                                         2.2 PB

      Human BioMolecular Atlas                           Big Data for Better Health

                                                      Dedicated resources +
      10 PB                                            cloud use of Bridges

         Brain Image Library                                  Campus Clusters
25
Big Data for Better Health (BD4BH)

     Implementing, applying, and evaluating machine
     learning methods for predicting patient outcomes of
     breast and lung cancer
     University of Pittsburgh Department of Biomedical
     Informatics (Gregory Cooper), CMU Machine Learning
     (Ziv Bar-Joseph) and Computational Biology (Robert
     Murphy), and PSC (Nick Nystrom, Alex Ropelewski)
     Dedicated 2.2 PB file system (/pghbio) attached to Bridges
     for long-term data management & collaboration
     Big Data research training opportunities: summer
     program for Lincoln University students

26
The Brain Image Library                                   brainimagelibrary.org

     Confocal Fluorescence Microscopy:
     multispectral, subcellular resolution, highly quantitative
     Will contain whole-brain volumetric images of mouse, rat, and
     other mammals, targeted experiments highlighting connectivity
     between cells, spatial transcriptomic data, and metadata
     describing essential information about the experiments.
     Supported by the National Institute of Mental Health of the
     NIH under award number R24MH114793 ($5M).
     Alex Ropelewski (PSC), Marcel Bruchez (CMU Biology),
     Simon Watkins (Pitt Cell Biology & Center for Biologic Imaging)
     Integrated with Bridges to support additional advanced analytics and
     development of AI/ML techniques.

          A. M. Watson et al., Ribbon scanning confocal for high-speed high-resolution volume
          imaging of brain. PLoS ONE 12 (2017) doi: https://doi.org/10.1371/journal.pone.0180486.

27
Human Biomolecular Atlas Program (HuBMAP)
                                                                                          Hybrid on-prem data/AI/HPDA + Cloud

                                           “The Human BioMolecular Atlas Program (HuBMAP) aims
                                           to facilitate research on single cells within tissues by
                                           supporting data generation and technology development
                                           to explore the relationship between cellular organization
                                           and function, as well as variability in normal tissue
                                           organization at the level of individual cells.” —NIH

     The PSC+Pitt team was awarded development of the Infrastructure Component (IC) for the
     HuBMAP HIVE (Integration, Visualization & Engagement)
      – To receive data from Tissue Mapping Centers at Florida (lymphatic system), CalTech (endothelium),
        Vanderbilt, Stanford, and UCSD (kidney, urinary tract, and lung)
      – Supporting Tools Components at CMU and Harvard
      – Supporting Mapping Components at Indiana University Bloomington and New York Genome Center
      – Interfacing with the Collaboration Component at U. of South Dakota
      – Supporting Transformative Technology Development centers at CalTech (single-cell transcriptomics),
        Stanford (genomic imaging), Purdue (sub-cellular mass spec), and Harvard (proteomics)

28
Outline

                 Motivation & Vision
     Realizing the Vision: Bridges and Bridges-AI
                Exemplars of Success
                       Summary

29
AI for Strategic Reasoning
                   Tuomas Sandholm and Noam Brown, Carnegie Mellon University

     An AI for making decisions with imperfect information:
     Beating Top Pros in Heads-Up No-Limit Texas Hold’em Poker

     Imperfect-info games require different
                                                                                                    Prof. Tuomas Sandholm watching one of the
     algorithms, but apply to important
                                                                                                    world’s best players compete against Libratus.
     classes of real-world problems:
      – Medical treatment planning
      – Negotiation
      – Strategic pricing
      – Auctions
      – Military allocation problems
                                                                   Libratus improved upon previous best
     Heads-up no-limit Texas hold’em is the main                   algorithms by incorporating real-time
     benchmark for games with imperfect information:               improvements in its strategy.
      – 10161 situations
       Libratus was the first program to beat top humans
       Beat 4 top pros playing 120,000 hands over 20 days
       Libratus won decisively: 99.98% statistical significance
30
AI for Strategic Reasoning
                   Tuomas Sandholm and Noam Brown, Carnegie Mellon University

                                                                                “The best AI's ability to do strategic
                                                                                 reasoning with imperfect information has
                                                                                 now surpassed that of the best humans.”
                                                                                 —Professor Tuomas Sandholm,
                                                                                 —Carnegie Mellon University

           1. N. Brown, T. Sandholm, Safe and Nested Subgame Solving for Imperfect-
              Information Games, in NIPS 2017, I. Guyon et al., Eds. (Curran Associates,   Awarded Best Paper at NIPS 2017
              Inc., Long Beach, California, 2017), pp. 689-699.
           2. N. Brown, T. Sandholm, Superhuman AI for heads-up no-limit poker: Libratus   Companion paper in Science
              beats top professionals. Science (2017) doi: 10.1126/science.aao1733.

     Bridges enabled this breakthrough through 19 million core-hours of computing and 2.6 PB of data
     in the knowledge base that Libratus generated.
     Libratus, under the Chinese name Lengpudashi, or “cold poker master”, also won a 36,000-hand exhibition in
     China in April 2017 against a team of six strong Chinese poker players. Further demonstrated at IJCAI 17
     (Melbourne, August 2017) and NIPS 2017 (Long Beach, December 2017).

31
Impact on the National Interest

                                  Prof. Sandholm launched two startups
                                  on Libratus’ algorithms:
                                  Strategic Machine Inc. and Strategy
                                  Robot.
                                  In August 2018, Strategy Robot
                                  received a 2-year contract for up to
                                  $10M from the Pentagon’s Defense
                                  Innovation Unit.

                 https://www.wired.com/story/poker-playing-robot-goes-to-pentagon/
32
Materials Discovery for Energy Applications
                                Chris Wolverton, Northwestern University
                                                                                     AI-Driven HPC

     Materials Discovery Through Data Driven Structural Search and
     Heusler Nanostructures
     Discovery of high-pressure compounds
      – Materials discovery using density functional theory and the minima hopping
        structure prediction method
      – Discovery of FeBi2, the first iron-bismuth compound
      – Discovery of two superconducting compounds in the
        Cu-Bi system, CuBi and Cu11Bi7

     Discovery of a new form of TiO2
      – Employed machine learning to explore new TiO2 polymorphs
      – Identified a new TiO2 hexagonal nano sheet (HNS)
      – The HNS has a tunable band-gap and could be used for photocatalytic water
        splitting and H2 production

33
Severe Thunderstorm Prediction with Big Visual Data
                                                 James Z. Wang et al., Penn State

     Applying machine learning to detect severe storm-
     causing clouds
      – Leveraging the vast historical archive of satellite imagery,
        radar data, and weather report data from the NOAA to train
        statistical models including deep neural networks on Bridges’
        CPUs and GPUs
      – Achieved high accuracy in detection of cloud patterns
      – Developed fundamental statistical methods for data analysis
                                                                                        Detection of severe storm causing comma-shaped clouds
      – Increasing the prediction lead time using deep models and                                         from satellite images
        GPUs

      1. Zheng et al., Detecting Comma-shaped Clouds for Severe Weather Forecasting
         using Shape and Motion, IEEE Transactions on Geosciences and Remote
         Sensing, under 2nd-round review, 2018.
      2. J. Ye, P. Wu, J. Z. Wang, J. Li, Fast Discrete Distribution Clustering Using
         Wasserstein Barycenter With Sparse Support. IEEE Transactions on Signal              Detection and categorization of bow echoes
         Processing 65, 2317-2332 (2017) doi: 10.1109/TSP.2017.2659647.                                from weather radar data

34
Fermilab Using Bridges to Prep for CMS @ HL-LHC
     The High-Luminosity Large Hadron Collider (HL-LHC) will increase
     luminosity by 10×, resulting in ~1EB of data.
     The Compact Muon Solenoid (CMS) experiment will allow study of
     the Standard Model, extra dimensions, and dark matter.
     Fermilab is now using Bridges to integrate HPC into their workflow,
     in preparation for HL-LHC coming online in 2026.

 CMS Detector. From CERN,                  Event display of heavy-ion collision registered at the CMS   Estimated CPU resources required for CMS into the HL-LHC era, using the
 https://home.cern/science/experiments/cms detector on Nov. 8, 2018 (image: Thomas McCauley).           current computing model with parameters projected out for the next 12
                                           From https://cms.cern/news/2018-heavy-ion-collision-         years. From A Roadmap for HEP Software and Computing R&D for the 2020s,
                                           run-has-started.                                             HPE Software Foundation.

                            Learn more: https://www.psc.edu/news-publications/2930-psc-supplies-computation-to-large-hadron-collider-group

35
Unsupervised Deep Learning Reveals Prognostically Relevant Subtypes of Glioblastoma
                            Jonathan D. Young, Chunhui Cai, and Xinghua Lu, Univ. of Pittsburgh

     Showed that a deep learning model can be trained to
     represent biologically and clinically meaningful abstractions
     of cancer gene expression data
     Data: The Cancer Genome Atlas (1.2 PB)
     Hypotheses: Hierarchical structures emerging from deep
     learning on gene expression data relate to the cellular signal
     system, and the first hidden layer represents signals related
     to transcription factor activation. [1]
       – Model selection indicates ~1,300 units in the first hidden
         layer, consistent with ~1,400 human transcription factors.
       – Consensus clustering on the third hidden layer led to
                                                                                           “One of these clusters contained all of the glioblastoma
         discovery of clusters of glioblastoma multiforme with                              samples with G-CIMP, a known methylation
         differential survival.                                                             phenotype driven by the IDH1 mutation and
                                                                                            associated with favorable prognosis, suggesting that

       J. D. Young, C. Cai, X. Lu, Unsupervised deep learning reveals prognostically
                                                                                       ·    the hidden units in the 3rd hidden layer
                                                                                            representations captured a methylation signal
       relevant subtypes of glioblastoma. BMC Bioinformatics 18, 381 (2017)                 without explicitly using methylation data as input.”
       doi: 10.1186/s12859-017-1798-2.                                                      —Jonathan D. Young, Chunhui Cai, and Xinghua Lu

36
Modeling of Imaging and Genetics using a Deep Graphical Model
                         Kayhan Batmanghelich, University of Pittsburgh

     Causal Generative Domain Adaptation Networks
      – A deep learning model trained with image data from
        one hospital (“domain”) may fail to produce reliable
        predictions in a different hospital where the data
        distribution is different
      – A generative domain adaptation network (G-DAN),
        implemented using PyTorch, is able to understand
        distribution changes and generate new domains
      – Incorporating causal structure into the model – a
        causal G-DAN (CG-DAN) can reduce its complexity
        and accordingly improve the transfer efficiency

                             M. Gong, K. Zhang, B. Huang, C. Glymour, D. Tao, and K. Batmanghelich,
                             “Causal Generative Domain Adaptation Networks,” arXiv:1804.04333, 2018,
                             http://arxiv.org/abs/1804.04333.

37
Multimodal Automatic Speech Recognition (ASR)
                            Florian Metze (CMU) et al.

     2017 Jelinek Summer Workshop on Speech and Language Technology (JSALT)

38
Deep Learning for Text-Based Prediction in Finance
                        Bryan Routledge and Vitaliy Merso, Carnegie Mellon University

     Studying firm and investment fund financial disclosure
     using Deep Learning Natural Language Processing
     models
         – Results presented at the Doctoral Consortium at the Text as
           Data 2018 conference
         – An early version linking the text of earnings announcements
           to market reactions was been presented at the SEC Doctoral
           Symposium 2018

         “Given the large sizes of our corpora (hundreds of millions of words) and
          the computational requirements of the modern Deep Learning models,
     ·    our work would be impossible without the support from Bridges.”
          —Brian Routledge, CMU

                     Many words used by investment funds in letters to their
                     shareholders are highly context-dependent. For example, the
                     word “subprime” can be either a very strong signal of a letter
                     describing a booming market or a very weak one, depending
                     on what other words appear around it.

39
Exploring and Generating Data with Generative Adversarial Networks
                            Giulia Fanti, Zinan Lin, Carnegie Mellon University

     Privacy-preserving dataset generation
      – Fanti & Lin’s recent research aims to understand
        fundamentally how Generative Adversarial Networks
        (GANs) internally represent complex data structures and
        to harness these observations to use GANs for privacy-
        preserving dataset generation
      – GANs are a new class of data-driven, neural network
        based generative models that excel in high dimensions.

     This work has led to two papers accepted to NIPS 2018:         CelebA samples generated from DCGAN (upper) and
      – “The power of two samples in generative adversarial         PacDCGAN2 (lower) show PacDC-GAN2 generates
                                                                    more diverse and sharper images.
        networks” proposes “packing”, a principled approach to
        improving the quality of generated images
      – “Robustness of conditional GANs to noisy labels” earned a    1. Z. Lin, A. Khetan, G. Fanti, and S. Oh, “PacGAN: The
        Spotlight Award at NIPS 2018, proposing a novel,                power of two samples in generative adversarial
        theoretically sound, and practical GAN architecture that        networks,” arXiv:1712.04086, 2017.
                                                                     2. K. Thekumparampil, A. Khetan, Z. Lin, and S. Oh,
        consistently improves upon baseline approaches to               “Robustness of conditional GANs to noisy labels,”
        learning conditional generators where the labels are            forthcoming in NIPS 2018, 2018 (Spotlight Award).
        corrupted by random noise

40
Towards a Deeper Understanding of Generative Image Models in Vision
                                           Ying Nian Wu, UCLA

      Learning interpretable latent representations:
      a deformable generator model disentangles
      appearance and geometric information into
      two independent latent vectors
       – The appearance generator produces the
         appearance information, including color,
         illumination, identity or category, of an image
       – The geometric generator produces                  Each dimension of the appearance latent vector encodes appearance
         displacement of the coordinates of each pixel
                                                           information such as color, illumination, and gender. In the fist line,
                                                           from left to right, the color of background varies from black to white,
         and performs geometric warping, such as           and the gender changes from a woman to a man. In the second line,
         stretching and rotation, on the appearance        the moustache of the man becomes thicker when the corresponding
                                                           dimension of Z approaches zero, and the hair of the woman becomes
         generator to obtain the final synthesized         denser when the corresponding dimension of Z increases. In the third
         image.                                            line, from left to right, the skin color changes from dark to white. In
                                                           the fourth line, from left to right, the illumination lighting changes
      The model can learn both representations             from the left-side of the face to the right-side of the face.

      from image data in an unsupervised manner.

41
Towards Real-time Video Object Detection Using Adaptive Scaling
     Ting-Wu (Rudy) Chin, Ruizhou Ding, and Diana Marculescu, Carnegie Mellon University

     Exploiting Resolution to Tune Accuracy and Speed
      – The AdaScale project is about exploiting the resolution of the image
        “as a knob” to improve the accuracy and speed of the deep neural
        network-based object detection system.

                                                                                          Without AdaScale            With AdaScale

                                                                                              The qualitative results of detection
                                                                                              accuracy achieved by AdaScale.

                                                                 1. T.-W. Chin, R. Ding, and D. Marculescu, “AdaScale: Towards Real-Time
                  The performance of AdaScale on                    Video Object Detection Using Adaptive Scaling,” in SysML 2019, 2019
                  various baselines.                                [Online]. Available: https://www.sysml.cc/papers.html#

42
Mapping Energy Infrastructure Using Deep Learning and Large Remote Sensing Datasets
                                 Jordan Malof, Duke University

                                                                                                 Satellite image               Building mappings
      Extracting high-quality information about energy
      systems from overhead imagery with deep learning
        – Precise locations of buildings (energy consumption)
        – Small-scale solar arrays (energy generation)
        – Improved speed and performance by expanding the receptive
          field of neural networks only during label inference
                                                                                              B. Huang et al., “Large-scale semantic classification:
                                                                                              outcome of the first year of Inria aerial image labeling
                                                                                              benchmark,” in IEEE International Geoscience and Remote
           Performance Computation time
         (higher is better) (lower is better)

                                                                                              Sensing Symposium – IGARSS 2018, 2018.
                                                                                              https://hal.inria.fr/hal-01767807

                                                                                                  Aerial photograph           Solar mappings

                                                Increasing receptive field size (in pixels)

43
Understanding Public Space Use in Market Square
                       Javier Argota Sánchez-Vaquerizo, Carnegie Mellon University

     The Project in Figures
      –   4 cams
      –   5 weeks of data collection (Aug 24 to Sep 28, 2018)
      –   3200 hours of video processed
      –   250 million detections
      –   12 categories: pedestrians, trolleys, seats, tables,
          sun umbrellas, tents, cars, pickups, vans, trucks,
          bikes, motorcycles
     Motivations
      –   Public safety
      –   Pedestrian flow and crowd management
      –   Vehicular traffic affection
      –   Venues and events impact assessment
     Technology Capabilities
      –   Number of people, vehicles and objects detected        Insights
      –   Segmentation                                            –   Weather (rain) affection on attendance
      –   Location, Trajectory, Speed                             –   Uneven distribution of pedestrians in the space
      –   Prediction                                              –   Events and venues positive impact on attendance
      –   Anonymity from scratch                                  –   Short duration of visits

44
45
Pedestrians
     Trolleys
     Seats
     Tables
     Sun umbrellas
     Tents
     Cars
     Pickups
     Vans
     Trucks
     Bikes
     Motorcycles

46
Fast and Accurate Object Detection in High-Resolution Video Using GPUs
                    Vic Ruzicka and Franz Franchetti, Carnegie Mellon University

       Object detection in computer vision traditionally works
       with relatively low-resolution images. However, the
       resolution of recording devices is increasing, requiring
       new methods for processing high-resolution data.
       Ruzicka & Franchetti’s attention pipeline method
       uses two-staged evaluation of each image or video
       frame under rough and refined resolution to limit          Example of a crowded 4K video frame annotated
       the total number of necessary evaluations.                 with Ruzicka & Franchetti’s method.

       Both stages use the fast object detection model YOLO v2.
       Their distributed-GPU code maintains high accuracy while reaching performance of
       3-6 fps on 4k video and 2 fps on 8k video. This outperforms the individual base-line
       approaches, while allowing the user to set the trade-off between accuracy and performance.
       Best Paper Finalist at IEEE High Performance Extreme Computing Conference (HPEC) 2018

47
Fast and Accurate Object Detection in High-Resolution Video Using GPUs
                Vic Růžička and Franz Franchetti, Carnegie Mellon University

48
Distributed Learning for Large-Scale Multi-Robot Path Planning in Complex Environments
                                     Guillaume Sartoretti, Carnegie Mellon University

      Multi-agent path finding (MAPF)
        – An essential component of many large-scale, real-world robot
          deployments, from aerial swarms to warehouse automation.
        – Most state-of-the-art MAPF algorithms still rely on centralized
          planning, scaling poorly past a few hundred agents.
        – Such planning approaches are maladapted to real-world
          deployments, where noise and uncertainty often require paths
          be recomputed online, which is impossible when planning
          times are in seconds to minutes.                                          Example problem where 100 simulated robots (white dots) must

      Pathfinding via Reinforcement + Imitation Learning                            compute individual, collision-free paths in a large factory-like
                                                                                    environment. Reproduced from [1].
        – Using Bridges-GPU, Sartoretti trained and tested PRIMAL, a novel
                                                                               1. G. Sartoretti et al., “PRIMAL: Pathfinding via
          framework for MAPF that combines reinforcement and imitation            Reinforcement and Imitation Multi-Agent Learning,”
          learning to teach fully-decentralized policies, where agents            2018. http://arxiv.org/abs/1809.03531.
          reactively plan paths online in a partially-observable world while
          exhibiting implicit coordination.
        – In low obstacle-density environments, PRIMAL outperforms state-of-the-art MAPF planners in certain cases,
          even though these have access to the whole state of the system. They also deployed PRIMAL on physical and
          simulated robots in a factory mockup scenario, showing how robots can benefit from their online, local-
          information-based, decentralized MAPF approach.

49
Automation in data discovery
                                                                   Automation in data curation and generation
                                                                      Measuring and improving data quality
                                                                Integrating datasets and enabling interoperability
                                                                       Biomedical data discovery and reuse
     •…                                                             Data privacy, security and algorithmic bias
                                                              The future of scientific data and how we work together

                                                                                                        INVITED SPEAKERS
                                                KEYNOTES
                                                           Tom M. Mitchell         Glen de Vries                           Robert F. Murphy      Natasha Noy
                                                           Interim Dean and        President and                           Ray and Stephanie     Staff Scientist
                                                           E. Fredkin University   Co-founder                              Lane Professor        Google AI
                                                           Professor               Medidata Solutions                      Head of
                                                           School of                                                       Computational Biology
                                                           Computer Science                                                School of
                                                           Carnegie Mellon                                                 Computer Science
                                                           University                                                      Carnegie Mellon
                                                                                                                           University

                                                                            Deadline for Abstracts: February 22

     https://events.library.cmu.edu/aidr2019/
50
2018 HPCwire Awards

51
Outline

                 Motivation & Vision
     Realizing the Vision: Bridges and Bridges-AI
                Exemplars of Success
                       Summary

52
Summary

     PSC’s approach to scalable, converged HPC+AI is enabling breakthroughs
     across an extremely broad range of research areas.
     These resources – Bridges, including Bridges-AI, are available at no charge
     for research and education
      – Bridges-AI builds on Bridges’ strength in converged HPC, AI, and Big Data to
        provide a unique platform for AI and AI-enabled simulation.

     To request a free research/education allocation, visit:
                          https://psc.edu/about-bridges/apply

53
Thank you!

     Questions?

54
You can also read