Pioneering and Democratizing Scalable HPC+AI - HPC Advisory ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Pioneering and Democratizing Scalable HPC+AI Nick Nystrom Paola Buitrago Interim Director, PSC Director, AI & Big Data, PSC nystrom@psc.edu paola@psc.edu 2019 Stanford Conference · Stanford · February 15, 2019 ©1 2019 Pittsburgh Supercomputing Center © 2019 Pittsburgh Supercomputing Center
Outline Motivation & Vision Realizing the Vision: Bridges and Bridges-AI Exemplars of Success Summary 2
What is PSC? National service provider for research and discovery Active member in the CMU and Pitt communities • Bridges, Anton 2, Brain Image Library, • Research collaborations Open Compass, XSEDE, Olympus • Colocation for lower cost and greater capability Brain Image Bridges Anton 2 Library PSC is a joint effort of Carnegie Mellon University Research institution advancing knowledge and the University of Pittsburgh. Networking and security through converged HPC, AI, and Big Data 33 years of leadership in HPC, • Networking & security service provider • ~30 active funded projects HPDA, and computational science. • Research networking 21 HPC systems, 10 of which were the first or unique. Pioneering the convergence Education and training of AI + HPC + data. • Lead national & local workshops Advise and support industry • Support courses at CMU and elsewhere • Training, access to advanced resources, • Teaching, thesis committees, interns collaborative research 3
Research Needs Converged HPC, AI, and Data Pan-STARRS telescope http://pan-starrs.ifa.hawaii.edu/public/ Social networks and the Internet Video Wearable Sensors Detecting Cancer The Human BioMolecular Atlas Program Wikipedia Commons F. De Roose et al., https://research.googleblog.com/2017/ https://commonfund.nih.gov/hubmap https://techxplore.com/news/2016-12-smart-contact- 03/assisting-pathologists-in- lens-discussed-electron.html detecting.html Genome sequencers (Wikipedia Commons) Collections Legacy documents Environmental sensors: Water temperature Library of Congress stacks BlueTides astrophysics simulation Horniman museum: http://www.horniman.ac.uk/ Wikipedia Commons profiles from tagged hooded seals https://www.flickr.com/photos/danlem2001/6922113091/ http://bluetides-project.org/ get_involved/blog/bioblitz-insects-reviewed http://www.arctic.noaa.gov/report11/biodiv_whales_walrus.html Structured, regular, Unstructured, irregular, heterogeneous homogeneous 4
Enabling the Creation of Knowledge Objectives Common Goal Enable data-intensive applications & workflows Enable the creation of knowledge • Deliver HPC Software as a Service • Democratize HPC, Big Data, and AI (Science Gateways) • Enable research areas that have not • Deliver Big Data as a Service (BDaaS) previously used HPC • Provide scalable deep learning, machine learning, • Advance previously traditional fields and graph analytics through machine learning and data • Support very large in-memory databases analytics • Facilitate data assimilation from instruments and • Couple applications in novel ways the Internet Scale beyond the laptop and to interdisciplinary, collaborative teams 5
The Rapid Growth of AI From: Artificial Intelligence Index: 2018 Annual Report (Stanford University, 2018) 6
Outline Motivation & Vision Realizing the Vision: Bridges and Bridges-AI Exemplars of Success Summary 7
• Available at no cost for open research and courses and by arrangement to industry • Easier access for CMU and Pitt faculty through the Pittsburgh Research Computing Initiative • 29,036 Intel Xeon CPU cores • 216 NVIDIA GPUs: 64 K80, 64 P100, 88 V100 • 17 PB storage (10 PB persistent, 7.3 PB local) • 277 TB memory (RAM), up to 12 TB per node • 44M core-hours, 173k GPU-AI-hours, 442k GPU-hours, and 343k TB-hours allocated quarterly • Serving ~1,850 projects and ~7500 users at 393 institutions, spanning 119 fields of study • Bridges-AI: NVIDIA DGX-2 Enterprise AI system + 9 HPE 8-Volta Apollo 6500 Gen10 servers: total of 88 V100 GPUs Bridges converges HPC, AI, and Big Data to empower new research communities, bring desktop convenience to advanced computing, expand remote access, and help researchers to work more intuitively. • Funded by NSF award #OAC-1445606 ($20.9M), Bridges emphasizes usability, flexibility, and interactivity • Available at no charge for open research and coursework and by arrangement to industry • Popular programming languages and applications: Python, Jupyter, R, MATLAB, Java, Spark, Hadoop, … • 856 compute nodes containing Intel Xeon CPUs and 128GB (800), 3TB (42), and 12TB (4) of RAM each • 216 NVIDIA Tesla GPUs: 64 K80, 64 P100, (new) 88 V100 configured to balance capability & capacity • Dedicated nodes for persistent databases, gateways, and distributed services • The world’s first deployment of the Intel Omni-Path Architecture fabric 8
Acquisition and operation of Bridges are made possible by the National Science Foundation through award #OAC-1445606 ($20.9M): Bridges: From Communities and Data to Workflows and Insight delivered Bridges, and is now delivering Bridges GPU-AI All trademarks, service marks, trade names, trade dress, product names, and logos appearing herein are the property of their respective owners. 9
Bridges Makes Advanced Computing Easy Make HPC accessible to all research communities Converge HPC, AI, and Big Data Support the widest range of science with an extremely rich computing environment • 3 tiers of memory: 12 TB, 3 TB, and 128 GB • Powerful, flexible CPUs and GPUs • Familiar, easy-to-use user environment: Elements not available in traditional supercomputers – Interactivity – Popular languages and frameworks: Python, Anaconda, R, MATLAB, Java, Spark, Hadoop – AI frameworks: TensorFlow, Caffe2, PyTorch, etc. – Containers (e.g., NGC) and virtual machines (VMs) – Databases – Gateways and distributed (web) services – Large collection of applications and libraries 10 10
Conceptual Architecture Users, XSEDE, campuses, Web Server Database Data Transfer Login instruments nodes nodes nodes nodes Intel Omni-Path Parallel File Management Architecture System nodes fabric Bridges-AI Introduced in RSM Nodes ESM Nodes LSM Nodes 128GB RAM NVIDIA DGX-2 Operations Year 3 12TB RAM 3TB RAM (16 V100 GPUs) 800 nodes, 4 nodes 42 nodes 9x HPE A6500 48 with GPUs (9x 8 V100 GPUs) 11
20 Storage Building Blocks, Project & implementing the parallel Pylon community storage system (10 PB usable) datasets 4 HPE Integrity Superdome X (12TB) compute nodes … Large- … each with 2 gateway nodes memory Java & 4 MDS nodes Python 2 front-end nodes 42 HPE ProLiant DL580 (3 Representative 2 boot nodes TB) compute nodes uses for AI 8 management nodes 12 HPE ProLiant DL380 database nodes User interfaces for 6 “core” Intel® OPA edge switches: fully interconnected, 6 HPE ProLiant DL360 AIaaS, BDaaS 2 links per switch web server nodes Robust paths to 20 “leaf” Intel® OPA edge switches parallel storage Intel® OPA cables Distributed training, Spark, etc. Purpose-built Intel® Omni-Path Deep 32 HPE Apollo 2000 (128GB) GPU nodes Architecture topology for Learning with 2 NVIDIA Tesla P100 GPUs each data-intensive HPC Bridges-AI Maximum-Scale Deep Learning 32 RSM nodes, each with 2 NVIDIA Tesla P100 GPUs NVIDIA DGX-2 and 9 HPE Apollo 6500 Gen10 nodes: 16 RSM nodes, each with 2 NVIDIA Tesla K80 GPUs 88 NVIDIA Tesla 748 HPE Apollo 2000 (128GB) V100 GPUs compute nodes 16 HPE Apollo 2000 (128GB) GPU nodes ML, inferencing, DL development, with 2 NVIDIA Tesla K80 GPUs each Bridges Virtual Tour: Spark, HPC AI (Libratus) Simulation (including AI-enabled) https://psc.edu/bvt 12 12
Accessing Bridges: No Cost for Research & Education and Cost-Recovery Rates for Corporate Use The following annual allocations are renewable and extendable, also at no cost for research and education. Open Research Industry PSC Corporate Program Startup Research Education Cost No charge No charge No charge Cost recovery rates CPU-hours 50k Up to ~107 Up to ~106 Up to ~18M GPU-hours 2500 Up to ~105 Up to ~104 Up to ~180k GPU-AI hours 1500 Up to ~105 Up to ~104 Up to ~69k TB-hours 1000 Up to ~104 Up to ~104 Up to ~137k Developer Yes Yes (Yes) Yes Accepted Any time Quarterly Any time Any time Awarded ~1-2 days Quarterly ~1-3 days ASAP 13
Interactivity Interactivity is the feature most frequently requested by nontraditional HPC communities. – Interactivity provides immediate feedback for doing exploratory data analytics and testing hypotheses. – Bridges offers interactivity through a combination of shared, dedicated, and persistent resources to maximize availability while accommodating diverse needs. 14
High-Productivity Programming Supporting languages that communities already use is vital for them to apply HPC to their research questions. This applies to both traditional and nontraditional HPC communities. 15
Gateways and Tools for Building Them Gateways provide easy-to-use access to Bridges’ HPC and data resources, allowing users to launch jobs, orchestrate complex workflows, and manage data from their browsers. – Provide “HPC Software-as-a-Service” – Extensive use of VMs, databases, and distributed services Galaxy (PSU, Johns Hopkins) The Causal Web (Pitt, CMU) Neuroscience Gateway (SDSC) https://galaxyproject.org/ http://www.ccd.pitt.edu/tools/ 16
Databases and Distributed/Web Services Dedicated database nodes power persistent relational and NoSQL databases – Support data management and data-driven workflows – SSDs for high IOPs; HDDs for high capacity (examples) Dedicated web server nodes – Enable distributed, service-oriented architectures – High-bandwidth connections to XSEDE and the Internet 17
Bridges-AI: Overview • 1 NVIDIA DGX-2 Volta introduces Tensor Cores to Tightly couples 16 NVIDIA Tesla V100 (Volta) GPUs accelerate neural networks, yielding extremely high peak performance at 2.4 TB/s bisection bandwidth, to provide maximum for appropriate applications. capability for the most demanding of AI challenges Bridges-AI providea massive • 9 Hewlett Packard Enterprise Apollo 6500 Gen10 servers aggregate performance: • 9.9Pf/s mixed-precision tensor Each with 8 NVIDIA Tesla V100 GPUs connected by • 251Tf/s 32-bit NVLink 2.0, to balance great AI capability and capacity • 125Tf/s 64-bit • Bridges-AI is integrated with Bridges and allocated through XSEDE as resource “Bridges GPU-AI”, analogous to Bridges GPU, RM, LM, and Pylon • Bridges-AI adds 9.9 Pf/s of mixed-precision tensor, 1.24 Pf/s of fp32, and 0.62 Pf/s of fp64. (Totals: 9.9 Pf/s tensor, 3.93 Pf/s fp32, 1.97 Pf/s fp64). • The $1.786M supplement includes additional staffing to support solutions and scaling • Deployment: Bridges-AI deployed on time. PSC ran an Early User Program from November- December 2018, and production operations began January 1, 2019. 18
The Heart of Bridges-AI: NVIDIA Volta New Streaming Multiprocessor (SM) architecture, introducing Tensor Cores, independent thread scheduling, combined L1 data cache and shared memory unit, and 50% higher energy efficiency over Pascal. Tensor Cores accelerate deep learning training and inference, providing up to 12× and NVIDIA Tesla V100 SXM2 Module 6× higher peak flops respectively over the P100 GPUs currently available in XSEDE. with Volta GV100 GPU NVLink 2.0 delivering 300 GB/s total bandwidth per GV100, nearly 2× higher than P100. HBM2 bandwidth and capacity increases: 900 GB/s and up to 32GB. Enhanced Unified Memory and Address Translation Services improve accuracy of memory page migration by providing new access counters. Cooperative Groups and New Cooperative Launch APIs expand the programming model to allow organizing groups of communicating threads. Volta-Optimized Software includes new versions of frameworks and libraries optimized Training ResNet-50 with ImageNet: to take advantage of the Volta architecture: TensorFlow, Caffe2, MXNet, CNTK, cuDNN, V100 : 1075 images/sa cuBLAS, TensorRT, etc. P100 : 219 images/sb K80 : 52 images/sb a. https://devblogs.nvidia.com/tensor-core-ai-performance-milestones/ b. https://www.tensorflow.org/performance/benchmarks 19
Balancing AI Capability & Capacity: HPE Apollo 6500 Bridges-AI adds 9 HPE Apollo 6500 Gen10 servers Each HPE Apollo 6500 couples 8 NVIDIA Tesla V100 SXM2 GPUs – 40,960 CUDA cores and 5,120 tensor cores Performance: 1 Pf/s mixed-precision tensor, 125 Tf/s 32b, 64 Tf/s 64b Memory: 128 GB HBM2, 7.2 TB/s aggregate memory bandwidth HPE Apollo 6500 Gen10 Server 2 × Intel Xeon Gold 6148 CPUs and 192 GB of DDR4-2666 RAM – 20c, 2.4–3.7 GHz, 27.5 MB L3, 3 UPI links 4 × 2 TB NVMe SSDs for user and system data 1 × Intel Omni-Path host channel adapter Hybrid cube-mesh topology connecting the 8 V100 GPUs and 2 Xeon CPUs, using NVLink 2.0 between the GPUs and PCIe3 to the CPUs HPE Apollo 6500 Gen10 hybrid cube-mesh topology 20
Maximum DL Capability: NVIDIA DGX-2 Couples 16 NVIDIA Tesla V100 SXM2 GPUs – 81,920 CUDA cores and 10,240 tensor cores Performance: 2 Pf/s mixed-precision tensor, 251 Tf/s 32b, 125 Tf/s 64b Memory: 512 GB HBM2, 14.4 TB/s aggregate memory bandwidth 2 × Intel Xeon Platinum 8168 CPUs and 1.5 TB of DDR4-2666 RAM – 24c, 2.7–3.7 GHz, 33 MB L3, 3 UPI links NVIDIA DGX-2 2 × 960 GB NVMe SSDs host the Ubuntu Linux OS 8 × 3.84 TB NVMe SSDs (aggregate ~30 TB) 8 × Mellanox ConnectX adapters for EDR InfiniBand & 100 Gb/s Ethernet The NVSwitch tightly couples the 16 V100 GPUs for capability & scaling – Each of the 12 NVSwitch chips is an 18×18-port, fully-connected crossbar NVIDIA DGX-2 with NVSwitch – 50 GB/s/port and 900 GB/s/chip bidirectional bandwidths internal topology – 2.4 TB/s system bisection bandwidth 21
Deep Learning Frameworks on Bridges 22
Containers Interoperability with clouds and Containers enable reproducible, cloud-interoperable workflows other resources and simplify deployment of applications and frameworks – PSC is a key partner of the Critical Assessment of Metagenome Interpretation (CAMI) project for reproducible evaluation of metagenomics tools – CAMI and the DOE Joint Genome Institute defined the biobioxes standard for Docker containers encapsulating bioinformatics tools Docker images can be converted to Singularity images and run on Bridges – Certain vetted Docker containers are also supported 23
Community Datasets Some unique, others with local caching for • Hosting mature corpus of data and data tools efficiency and to drive interdisciplinary for an open science community research – Accessible by multiple users, multiple groups. – Provision of reusable data management tools – Facilitate collaboration – Offload data management • Interoperable with HPC capabilities – High speed data transfer – High performance compute capabilities • Support copies, maintenance, guarantee integrity • Data resource not subject to project limitations 24
The Expanding Ecosystem of Bridges Hybrid on-prem 10s of PB data/AI/HPDA + Cloud 2.2 PB Human BioMolecular Atlas Big Data for Better Health Dedicated resources + 10 PB cloud use of Bridges Brain Image Library Campus Clusters 25
Big Data for Better Health (BD4BH) Implementing, applying, and evaluating machine learning methods for predicting patient outcomes of breast and lung cancer University of Pittsburgh Department of Biomedical Informatics (Gregory Cooper), CMU Machine Learning (Ziv Bar-Joseph) and Computational Biology (Robert Murphy), and PSC (Nick Nystrom, Alex Ropelewski) Dedicated 2.2 PB file system (/pghbio) attached to Bridges for long-term data management & collaboration Big Data research training opportunities: summer program for Lincoln University students 26
The Brain Image Library brainimagelibrary.org Confocal Fluorescence Microscopy: multispectral, subcellular resolution, highly quantitative Will contain whole-brain volumetric images of mouse, rat, and other mammals, targeted experiments highlighting connectivity between cells, spatial transcriptomic data, and metadata describing essential information about the experiments. Supported by the National Institute of Mental Health of the NIH under award number R24MH114793 ($5M). Alex Ropelewski (PSC), Marcel Bruchez (CMU Biology), Simon Watkins (Pitt Cell Biology & Center for Biologic Imaging) Integrated with Bridges to support additional advanced analytics and development of AI/ML techniques. A. M. Watson et al., Ribbon scanning confocal for high-speed high-resolution volume imaging of brain. PLoS ONE 12 (2017) doi: https://doi.org/10.1371/journal.pone.0180486. 27
Human Biomolecular Atlas Program (HuBMAP) Hybrid on-prem data/AI/HPDA + Cloud “The Human BioMolecular Atlas Program (HuBMAP) aims to facilitate research on single cells within tissues by supporting data generation and technology development to explore the relationship between cellular organization and function, as well as variability in normal tissue organization at the level of individual cells.” —NIH The PSC+Pitt team was awarded development of the Infrastructure Component (IC) for the HuBMAP HIVE (Integration, Visualization & Engagement) – To receive data from Tissue Mapping Centers at Florida (lymphatic system), CalTech (endothelium), Vanderbilt, Stanford, and UCSD (kidney, urinary tract, and lung) – Supporting Tools Components at CMU and Harvard – Supporting Mapping Components at Indiana University Bloomington and New York Genome Center – Interfacing with the Collaboration Component at U. of South Dakota – Supporting Transformative Technology Development centers at CalTech (single-cell transcriptomics), Stanford (genomic imaging), Purdue (sub-cellular mass spec), and Harvard (proteomics) 28
Outline Motivation & Vision Realizing the Vision: Bridges and Bridges-AI Exemplars of Success Summary 29
AI for Strategic Reasoning Tuomas Sandholm and Noam Brown, Carnegie Mellon University An AI for making decisions with imperfect information: Beating Top Pros in Heads-Up No-Limit Texas Hold’em Poker Imperfect-info games require different Prof. Tuomas Sandholm watching one of the algorithms, but apply to important world’s best players compete against Libratus. classes of real-world problems: – Medical treatment planning – Negotiation – Strategic pricing – Auctions – Military allocation problems Libratus improved upon previous best Heads-up no-limit Texas hold’em is the main algorithms by incorporating real-time benchmark for games with imperfect information: improvements in its strategy. – 10161 situations Libratus was the first program to beat top humans Beat 4 top pros playing 120,000 hands over 20 days Libratus won decisively: 99.98% statistical significance 30
AI for Strategic Reasoning Tuomas Sandholm and Noam Brown, Carnegie Mellon University “The best AI's ability to do strategic reasoning with imperfect information has now surpassed that of the best humans.” —Professor Tuomas Sandholm, —Carnegie Mellon University 1. N. Brown, T. Sandholm, Safe and Nested Subgame Solving for Imperfect- Information Games, in NIPS 2017, I. Guyon et al., Eds. (Curran Associates, Awarded Best Paper at NIPS 2017 Inc., Long Beach, California, 2017), pp. 689-699. 2. N. Brown, T. Sandholm, Superhuman AI for heads-up no-limit poker: Libratus Companion paper in Science beats top professionals. Science (2017) doi: 10.1126/science.aao1733. Bridges enabled this breakthrough through 19 million core-hours of computing and 2.6 PB of data in the knowledge base that Libratus generated. Libratus, under the Chinese name Lengpudashi, or “cold poker master”, also won a 36,000-hand exhibition in China in April 2017 against a team of six strong Chinese poker players. Further demonstrated at IJCAI 17 (Melbourne, August 2017) and NIPS 2017 (Long Beach, December 2017). 31
Impact on the National Interest Prof. Sandholm launched two startups on Libratus’ algorithms: Strategic Machine Inc. and Strategy Robot. In August 2018, Strategy Robot received a 2-year contract for up to $10M from the Pentagon’s Defense Innovation Unit. https://www.wired.com/story/poker-playing-robot-goes-to-pentagon/ 32
Materials Discovery for Energy Applications Chris Wolverton, Northwestern University AI-Driven HPC Materials Discovery Through Data Driven Structural Search and Heusler Nanostructures Discovery of high-pressure compounds – Materials discovery using density functional theory and the minima hopping structure prediction method – Discovery of FeBi2, the first iron-bismuth compound – Discovery of two superconducting compounds in the Cu-Bi system, CuBi and Cu11Bi7 Discovery of a new form of TiO2 – Employed machine learning to explore new TiO2 polymorphs – Identified a new TiO2 hexagonal nano sheet (HNS) – The HNS has a tunable band-gap and could be used for photocatalytic water splitting and H2 production 33
Severe Thunderstorm Prediction with Big Visual Data James Z. Wang et al., Penn State Applying machine learning to detect severe storm- causing clouds – Leveraging the vast historical archive of satellite imagery, radar data, and weather report data from the NOAA to train statistical models including deep neural networks on Bridges’ CPUs and GPUs – Achieved high accuracy in detection of cloud patterns – Developed fundamental statistical methods for data analysis Detection of severe storm causing comma-shaped clouds – Increasing the prediction lead time using deep models and from satellite images GPUs 1. Zheng et al., Detecting Comma-shaped Clouds for Severe Weather Forecasting using Shape and Motion, IEEE Transactions on Geosciences and Remote Sensing, under 2nd-round review, 2018. 2. J. Ye, P. Wu, J. Z. Wang, J. Li, Fast Discrete Distribution Clustering Using Wasserstein Barycenter With Sparse Support. IEEE Transactions on Signal Detection and categorization of bow echoes Processing 65, 2317-2332 (2017) doi: 10.1109/TSP.2017.2659647. from weather radar data 34
Fermilab Using Bridges to Prep for CMS @ HL-LHC The High-Luminosity Large Hadron Collider (HL-LHC) will increase luminosity by 10×, resulting in ~1EB of data. The Compact Muon Solenoid (CMS) experiment will allow study of the Standard Model, extra dimensions, and dark matter. Fermilab is now using Bridges to integrate HPC into their workflow, in preparation for HL-LHC coming online in 2026. CMS Detector. From CERN, Event display of heavy-ion collision registered at the CMS Estimated CPU resources required for CMS into the HL-LHC era, using the https://home.cern/science/experiments/cms detector on Nov. 8, 2018 (image: Thomas McCauley). current computing model with parameters projected out for the next 12 From https://cms.cern/news/2018-heavy-ion-collision- years. From A Roadmap for HEP Software and Computing R&D for the 2020s, run-has-started. HPE Software Foundation. Learn more: https://www.psc.edu/news-publications/2930-psc-supplies-computation-to-large-hadron-collider-group 35
Unsupervised Deep Learning Reveals Prognostically Relevant Subtypes of Glioblastoma Jonathan D. Young, Chunhui Cai, and Xinghua Lu, Univ. of Pittsburgh Showed that a deep learning model can be trained to represent biologically and clinically meaningful abstractions of cancer gene expression data Data: The Cancer Genome Atlas (1.2 PB) Hypotheses: Hierarchical structures emerging from deep learning on gene expression data relate to the cellular signal system, and the first hidden layer represents signals related to transcription factor activation. [1] – Model selection indicates ~1,300 units in the first hidden layer, consistent with ~1,400 human transcription factors. – Consensus clustering on the third hidden layer led to “One of these clusters contained all of the glioblastoma discovery of clusters of glioblastoma multiforme with samples with G-CIMP, a known methylation differential survival. phenotype driven by the IDH1 mutation and associated with favorable prognosis, suggesting that J. D. Young, C. Cai, X. Lu, Unsupervised deep learning reveals prognostically · the hidden units in the 3rd hidden layer representations captured a methylation signal relevant subtypes of glioblastoma. BMC Bioinformatics 18, 381 (2017) without explicitly using methylation data as input.” doi: 10.1186/s12859-017-1798-2. —Jonathan D. Young, Chunhui Cai, and Xinghua Lu 36
Modeling of Imaging and Genetics using a Deep Graphical Model Kayhan Batmanghelich, University of Pittsburgh Causal Generative Domain Adaptation Networks – A deep learning model trained with image data from one hospital (“domain”) may fail to produce reliable predictions in a different hospital where the data distribution is different – A generative domain adaptation network (G-DAN), implemented using PyTorch, is able to understand distribution changes and generate new domains – Incorporating causal structure into the model – a causal G-DAN (CG-DAN) can reduce its complexity and accordingly improve the transfer efficiency M. Gong, K. Zhang, B. Huang, C. Glymour, D. Tao, and K. Batmanghelich, “Causal Generative Domain Adaptation Networks,” arXiv:1804.04333, 2018, http://arxiv.org/abs/1804.04333. 37
Multimodal Automatic Speech Recognition (ASR) Florian Metze (CMU) et al. 2017 Jelinek Summer Workshop on Speech and Language Technology (JSALT) 38
Deep Learning for Text-Based Prediction in Finance Bryan Routledge and Vitaliy Merso, Carnegie Mellon University Studying firm and investment fund financial disclosure using Deep Learning Natural Language Processing models – Results presented at the Doctoral Consortium at the Text as Data 2018 conference – An early version linking the text of earnings announcements to market reactions was been presented at the SEC Doctoral Symposium 2018 “Given the large sizes of our corpora (hundreds of millions of words) and the computational requirements of the modern Deep Learning models, · our work would be impossible without the support from Bridges.” —Brian Routledge, CMU Many words used by investment funds in letters to their shareholders are highly context-dependent. For example, the word “subprime” can be either a very strong signal of a letter describing a booming market or a very weak one, depending on what other words appear around it. 39
Exploring and Generating Data with Generative Adversarial Networks Giulia Fanti, Zinan Lin, Carnegie Mellon University Privacy-preserving dataset generation – Fanti & Lin’s recent research aims to understand fundamentally how Generative Adversarial Networks (GANs) internally represent complex data structures and to harness these observations to use GANs for privacy- preserving dataset generation – GANs are a new class of data-driven, neural network based generative models that excel in high dimensions. This work has led to two papers accepted to NIPS 2018: CelebA samples generated from DCGAN (upper) and – “The power of two samples in generative adversarial PacDCGAN2 (lower) show PacDC-GAN2 generates more diverse and sharper images. networks” proposes “packing”, a principled approach to improving the quality of generated images – “Robustness of conditional GANs to noisy labels” earned a 1. Z. Lin, A. Khetan, G. Fanti, and S. Oh, “PacGAN: The Spotlight Award at NIPS 2018, proposing a novel, power of two samples in generative adversarial theoretically sound, and practical GAN architecture that networks,” arXiv:1712.04086, 2017. 2. K. Thekumparampil, A. Khetan, Z. Lin, and S. Oh, consistently improves upon baseline approaches to “Robustness of conditional GANs to noisy labels,” learning conditional generators where the labels are forthcoming in NIPS 2018, 2018 (Spotlight Award). corrupted by random noise 40
Towards a Deeper Understanding of Generative Image Models in Vision Ying Nian Wu, UCLA Learning interpretable latent representations: a deformable generator model disentangles appearance and geometric information into two independent latent vectors – The appearance generator produces the appearance information, including color, illumination, identity or category, of an image – The geometric generator produces Each dimension of the appearance latent vector encodes appearance displacement of the coordinates of each pixel information such as color, illumination, and gender. In the fist line, from left to right, the color of background varies from black to white, and performs geometric warping, such as and the gender changes from a woman to a man. In the second line, stretching and rotation, on the appearance the moustache of the man becomes thicker when the corresponding dimension of Z approaches zero, and the hair of the woman becomes generator to obtain the final synthesized denser when the corresponding dimension of Z increases. In the third image. line, from left to right, the skin color changes from dark to white. In the fourth line, from left to right, the illumination lighting changes The model can learn both representations from the left-side of the face to the right-side of the face. from image data in an unsupervised manner. 41
Towards Real-time Video Object Detection Using Adaptive Scaling Ting-Wu (Rudy) Chin, Ruizhou Ding, and Diana Marculescu, Carnegie Mellon University Exploiting Resolution to Tune Accuracy and Speed – The AdaScale project is about exploiting the resolution of the image “as a knob” to improve the accuracy and speed of the deep neural network-based object detection system. Without AdaScale With AdaScale The qualitative results of detection accuracy achieved by AdaScale. 1. T.-W. Chin, R. Ding, and D. Marculescu, “AdaScale: Towards Real-Time The performance of AdaScale on Video Object Detection Using Adaptive Scaling,” in SysML 2019, 2019 various baselines. [Online]. Available: https://www.sysml.cc/papers.html# 42
Mapping Energy Infrastructure Using Deep Learning and Large Remote Sensing Datasets Jordan Malof, Duke University Satellite image Building mappings Extracting high-quality information about energy systems from overhead imagery with deep learning – Precise locations of buildings (energy consumption) – Small-scale solar arrays (energy generation) – Improved speed and performance by expanding the receptive field of neural networks only during label inference B. Huang et al., “Large-scale semantic classification: outcome of the first year of Inria aerial image labeling benchmark,” in IEEE International Geoscience and Remote Performance Computation time (higher is better) (lower is better) Sensing Symposium – IGARSS 2018, 2018. https://hal.inria.fr/hal-01767807 Aerial photograph Solar mappings Increasing receptive field size (in pixels) 43
Understanding Public Space Use in Market Square Javier Argota Sánchez-Vaquerizo, Carnegie Mellon University The Project in Figures – 4 cams – 5 weeks of data collection (Aug 24 to Sep 28, 2018) – 3200 hours of video processed – 250 million detections – 12 categories: pedestrians, trolleys, seats, tables, sun umbrellas, tents, cars, pickups, vans, trucks, bikes, motorcycles Motivations – Public safety – Pedestrian flow and crowd management – Vehicular traffic affection – Venues and events impact assessment Technology Capabilities – Number of people, vehicles and objects detected Insights – Segmentation – Weather (rain) affection on attendance – Location, Trajectory, Speed – Uneven distribution of pedestrians in the space – Prediction – Events and venues positive impact on attendance – Anonymity from scratch – Short duration of visits 44
45
Pedestrians Trolleys Seats Tables Sun umbrellas Tents Cars Pickups Vans Trucks Bikes Motorcycles 46
Fast and Accurate Object Detection in High-Resolution Video Using GPUs Vic Ruzicka and Franz Franchetti, Carnegie Mellon University Object detection in computer vision traditionally works with relatively low-resolution images. However, the resolution of recording devices is increasing, requiring new methods for processing high-resolution data. Ruzicka & Franchetti’s attention pipeline method uses two-staged evaluation of each image or video frame under rough and refined resolution to limit Example of a crowded 4K video frame annotated the total number of necessary evaluations. with Ruzicka & Franchetti’s method. Both stages use the fast object detection model YOLO v2. Their distributed-GPU code maintains high accuracy while reaching performance of 3-6 fps on 4k video and 2 fps on 8k video. This outperforms the individual base-line approaches, while allowing the user to set the trade-off between accuracy and performance. Best Paper Finalist at IEEE High Performance Extreme Computing Conference (HPEC) 2018 47
Fast and Accurate Object Detection in High-Resolution Video Using GPUs Vic Růžička and Franz Franchetti, Carnegie Mellon University 48
Distributed Learning for Large-Scale Multi-Robot Path Planning in Complex Environments Guillaume Sartoretti, Carnegie Mellon University Multi-agent path finding (MAPF) – An essential component of many large-scale, real-world robot deployments, from aerial swarms to warehouse automation. – Most state-of-the-art MAPF algorithms still rely on centralized planning, scaling poorly past a few hundred agents. – Such planning approaches are maladapted to real-world deployments, where noise and uncertainty often require paths be recomputed online, which is impossible when planning times are in seconds to minutes. Example problem where 100 simulated robots (white dots) must Pathfinding via Reinforcement + Imitation Learning compute individual, collision-free paths in a large factory-like environment. Reproduced from [1]. – Using Bridges-GPU, Sartoretti trained and tested PRIMAL, a novel 1. G. Sartoretti et al., “PRIMAL: Pathfinding via framework for MAPF that combines reinforcement and imitation Reinforcement and Imitation Multi-Agent Learning,” learning to teach fully-decentralized policies, where agents 2018. http://arxiv.org/abs/1809.03531. reactively plan paths online in a partially-observable world while exhibiting implicit coordination. – In low obstacle-density environments, PRIMAL outperforms state-of-the-art MAPF planners in certain cases, even though these have access to the whole state of the system. They also deployed PRIMAL on physical and simulated robots in a factory mockup scenario, showing how robots can benefit from their online, local- information-based, decentralized MAPF approach. 49
Automation in data discovery Automation in data curation and generation Measuring and improving data quality Integrating datasets and enabling interoperability Biomedical data discovery and reuse •… Data privacy, security and algorithmic bias The future of scientific data and how we work together INVITED SPEAKERS KEYNOTES Tom M. Mitchell Glen de Vries Robert F. Murphy Natasha Noy Interim Dean and President and Ray and Stephanie Staff Scientist E. Fredkin University Co-founder Lane Professor Google AI Professor Medidata Solutions Head of School of Computational Biology Computer Science School of Carnegie Mellon Computer Science University Carnegie Mellon University Deadline for Abstracts: February 22 https://events.library.cmu.edu/aidr2019/ 50
2018 HPCwire Awards 51
Outline Motivation & Vision Realizing the Vision: Bridges and Bridges-AI Exemplars of Success Summary 52
Summary PSC’s approach to scalable, converged HPC+AI is enabling breakthroughs across an extremely broad range of research areas. These resources – Bridges, including Bridges-AI, are available at no charge for research and education – Bridges-AI builds on Bridges’ strength in converged HPC, AI, and Big Data to provide a unique platform for AI and AI-enabled simulation. To request a free research/education allocation, visit: https://psc.edu/about-bridges/apply 53
Thank you! Questions? 54
You can also read