CDAC Scientific Cloud: On Demand Provisioning of Resources for Scientific Applications
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
CDAC Scientific Cloud: On Demand Provisioning of Resources for Scientific Applications A. Payal Saluja, B. Prahlada Rao B.B, C. Ankit Mittal and D. Rameez Ahmad SSDH, Centre for Development of Advanced Computing C-DAC Knowledge Park, Bangalore, Karnataka, India resource utilization and allows scientists to scale up to solve Abstract - Scientific applications have special requirements larger science problems. It also enables the system software of availability of a massive computational power for to be configured as needed for individual application performing large scale experiments and huge storage requirements.For research groups, cloud computing will capacity to storage terabyte or petabyte range of outputs. provide convenient access to reliable, high performance Scientific Cloud provides scientists computational, storage clusters and storage, without the need to purchase and and network resources with a inbuilt capability of utilizing maintain sophisticated hardware. It has been said by Pete the infrastructure. The scientific applications can be Beckman, director of Argonne’s Leadership Computing dynamically provisioned with the required cloud solutions Facility that “Cloud computing has the potential to that are tailored to the application needs. Centre for accelerate discoveries and enhance collaborations in Development of Advanced Computing (CDAC) under everything from providing optimized computing Department of IT, is the pioneer in HPC in India with ~70TF environment for scientific applications to analyzing data compute power. The authors of this paper have discussed the from climate research, while conserving energy and lowering need and benefits of scientific cloud. Authors have explained operational costs”. However, there are various challenges of the model, architecture and components of CDAC scientific HPC on demand [29] like performance, power consumption cloud. CDAC HPC resources can be provisioned on-demand and collaborative work environments. In this approach to the scientific research community and released when they paper, we present the concept of scientific clouds, HPC as a are not required. For Indian researchers and scientists, service and its benefits to the scientific research community. CDAC scientific cloud model will provide convenient access The authors also propose a prototype for CDAC scientific to reliable, high performance clusters and storage, without cloud that will provide the following offerings the need to purchase and maintain sophisticated hardware. I. Infrastructure as a Service(IaaS)[18] by providing traditional MPI enabled HPC with parallel file Keywords: HPC, HPC as a Service, Map Reduce, Cloud system like GlusterFS[19] and by provisioning Vault Hadoop[20] clusters with map reduce[21] with the support of Hadoop distributed file system(HDFS)[20]. 1 Introduction II. Storage as a service (StaaS) [22] to provide petabytes High Performance Computing (HPC) allows scientists of data storage to the scientific communities. and engineers to solve complex science, engineering and business problems using applications that require high The rest of the sections of this paper are organized as bandwidth, low latency networking, very high compute and follows: Section 2 describes the concept of HPC as a Service, storage capabilities. Scientists in the areas of high-energy the challenges of HPC on cloud and how cloud computing physics [13], astronomy [14], climate modeling [15], chemo benefits the scientific community. Section 3 talks about the informatics [16] and other scientific fields, require massive other scientific cloud projects and their objective and the computing power to run experiments and huge data centers relevant work. Section 4 details about the CDAC scientific to store data. Typically, scientists and engineers must wait cloud and its offerings. Section 5 details the proposed model in long queues to access shared clusters or acquire expensive and architecture for the CDAC scientific cloud. Section 6 hardware systems. talks about the applications that will be enabled on CDAC scientific cloud. Section 7 concludes and tells about the Cloud computing [17] is a model for on-demand access to a future plan of the work. shared pool of configurable computing resources (e.g., networks, servers, storage, applications, services, and software) that can be easily provisioned as and when needed. Cloud computing aggregates the resources to gain efficient
2 HPC as a Service on Cloud • Reduction in overall Job execution time: Jobs will be scheduled using intelligent data aware job Bringing HPC facilities to cloud will provision the scheduling algorithms. scientists and researchers with a crucial set of resources and enable them to solve large-scale, data-intensive, advanced computation problems on research topics across the Figure 1 depicts the layered architecture of scientific cloud. disciplinary spectrum. HPC as a service is an on-demand The lowest layer of the stack is the physical resources provisioning of high-performance, scalable HPC (compute, storage and network) that will be connected environment with high-density compute nodes and huge through a high speed link. The first software layer above the storage on high performance interconnects like Infiniband physical hardware is the host operating system. Since [4] and Myrinet [5]. HPC as a service is provisioned to meet scientific cloud will be catering HPC applications, the HPC application demands, whether one server (Virtual performance of such applications on such infrastructure will machine) or a large cluster (Virtual cluster). A Virtual be of prime importance. Hence, Type 1 or bare-metal cluster is a collection of Virtual Machines configured to hypervisor should be preferred for virtualization that will interact with each other as a traditional Linux cluster. run directly on the host's hardware to control the hardware Scientific cloud or HPC as a Service enables greater systems and to manage the guest operating systems. flexibility and eliminates the need for dedicated hardware resources per applications and would help researchers cope with exploding volumes of data that need to be analyzed to `SCIENTIIFC APPLICATIONS (bioinformatics, Climate modeling) yield meaningful results. It also simplifies usage models and enables dynamic allocation per given task. [6] Described a demonstration of a low-order coupled A User Interfaces ( APIs,Web Interface atmosphere-ocean simulation running in parallel on an EC2 U Mobile Interface, Portals, Workflows & T PSE ) Mana- system. It highlights the significant way in which cloud H gement SaaS PaaS IaaS computing could impact traditional HPC computing (Clusters - paradigms. The results show that the performance is below MPI & MR SLA & & , storage) Policy the level seen at dedicated, supercomputer centers, however, performance is comparable with low-cost cluster systems. Mana- Also it has been concluded that it is possible to envisage S Cloud Middleware Software Stack gemen E (Resource provisioning, scheduling, File t cloud systems more closely targeted to HPC applications, C system, monitoring) that feature a specialized interconnect such as Myrinet or U Acco- Infiniband. R unting Scientific Cloud benefits to the Scientists & research I T Operating System/Hypervisors Community: Y • Dynamic Provisioning of HPC Clusters: Access to Meteri on-demand cloud resources enables automatic Compute Network Storage ng & Billin provisioning of additional resources from the HPC g service to process peak application workloads, Ethernet and INFINIBAND reducing the need to provision data center capacity (10 -20Gbps interconnect) according to peak demand. Hence, scientists will benefit from the ability to scale up and down the computing infrastructure according to the application requirements and the budget of users. • Virtual ownership of resources : Virtual Figure 1 Scientific Cloud Architecture ownership of cloud resources will reduce uncertainty concerning access to those resources A guest operating system will run on another level above the when you need to use them hypervisor. Hypervisor actually controls the host processor and resources, allocating what are needed to each operating • Ease of deployment and access: The use of virtual system in turn and making sure that the guest operating machine images offers the ability to package the exact OS, libraries, patches, and application codes systems (called virtual machines) cannot disrupt each other. The virtualized resources include the basic cloud computing together for deployment. Scientists can have easy services such as processing power, storage, and network. The access to large distributed infrastructures and completely customize their execution environment, Cloud middleware software stack is the key component that handles resource provisioning and scheduling, volume thus providing the perfect setup for their management, system monitoring for all the higher-level experiments. components and services.
Cloud management is a crucial component as it monitors and provides two major functionalities of Compute Grids and In- manages all the cloud resources at physical and virtual level. Memory Data Grids The various management components that will be part of scientific cloud are : Resource Inventory search, Hardware 3.4 StratusLab monitoring & Management , Storage maps and reports, Alerts & notifications with automated rectification, Stratus Lab [31] is developing a complete, open-source accounting and billing(to recover costs, capacity planning to cloud distribution that allows grid and non-grid resource ensure that consumer demands will be met) , policy centers to offer and to exploit an Infrastructure as a Service management & SLA(Service level Agreements- management cloud. It basically enhances the grid infrastructure with to ensure that the terms of service agreed to by the provider virtualization and cloud technologies. It is particularly and consumer are adhered to, and reporting for focused on enhancing distributed computing infrastructures administrators). such as the European Grid Infrastructure (EGI). Each of the above mentioned projects focuses either on provisioning data centers on cloud or compute power on 3 Science Cloud Projects cloud. Amazon Web Services alone provisions the various The following are some of the science cloud projects have services required for variety of HPC applications like been executed in the direction to achieve HPC as a Service: Amazon Elastic Compute cloud EC2, Amazon Elastic Map Reduce (EMR), Amazon Simple Storage Service (S3) [32]. CDAC Scientific cloud is an effort to provide the services 3.1 Cumulus like compute and storage for the HPC community along with the software technologies like map reduce, MPI , mobile Cumulus [2] is a project to build a Scientific Cloud for a applications that will accelerate discoveries and enhance Data Center. It is a storage cloud system that adapts existing collaborations in science. storage implementations to provide efficient upload/download interfaces compatible with S3.It provides 4 CDAC Scientific Cloud (CSC) features such as quota support, fair sharing among clients, and an easy to- use, easy-to-install approach for C-DAC [7] is the pioneer in HPC in India and its HPC maintenance. The most important feature of Cumulus is its facilities on cloud can be linked by a 1 Gbps National well-articulated back-end extensibility module. It allows Knowledge Network (NKN) [8], developed by NIC. The storage providers to configure Cumulus with existing bandwidth offered by NKN will facilitate rapid transfer of systems such as GPFS [9], PVFS [10], and HDFS [11], in data between geographically dispersed clouds and enable order to provide the desired reliability, availability or scientists to use available computing resources regardless of performance trade-offs. Cumulus is part of the open source location. In addition, CDAC Scientific cloud will provide Nimbus toolkit [12]. Cumulus is implemented in the python data storage resources that will be used to address the programming language as a REST service. The Cumulus challenge of analyzing the massive amounts of data being API is a set of python objects that are responsible for produced by scientific applications and instruments. Storage handling specific user requests. as a service is of particular importance to scientific research, where volumes of data produced by one community can 3.2 OpenCirrus reach the scale of terabytes per day .CDAC will make the Open Cirrus [3] tested is a collection of federated datacenters Scientific cloud storage available to science communities by for open-source systems and services research. It is designed aggregating a set of storage servers .It will make use of to support research into the design, provisioning, and advanced technologies to provide fast random access storage management of services at a global, multi-datacenter scale. It to support more data-intensive problems. The test bed will be is designed to encourage research into all aspects of service a mix of virtual clusters and storage options, traditional HPC and datacenter management. cluster, Hadoop cluster, distributed and global disk storage, archival storage. The system provides both a high- 3.3 GridGain bandwidth, low-latency InfiniBand network as well as a commodity Gigabit Ethernet network. This configuration is GridGain[30] is Java based open source middleware for real different from a typical cloud infrastructure but is more time big data processing and analytics that scales up from suitable for the needs of scientific applications. one server to thousands of machines. It enables the Using CDAC Scientific cloud instances, users can expedite development of compute and data intensive High their HPC workloads on elastic resources as needed .Users Performance Distributed Applications. Applications can choose from Cluster Compute or Cluster Hadoop developed with Gridgain can scale up on any infrastructure - instances within a full-bisection high bandwidth network for from a single Android device to a large cloud. Gridgain tightly-coupled and IO-intensive workloads or scale out across thousands of cores for throughput-oriented
applications. This will let scientists focus on running their online and through mail about their login credentials and the applications and crunching or analyzing the data generated IP address for the ssh access to the compute cluster. The by applications without having to worry about time- allocation of the cluster and its nodes (master & worker consuming set-up, management or tuning of clusters or nodes) will depend upon the CPU, memory, IO requirements storage capacity upon which they sit. Users will be able to of the application. The applications that will need more of run HPC applications on these instances including molecular data processing and less of communications will provided modeling, genome sequencing & analysis, and numerical with the best suited map reduce cluster. The applications modeling across many industries including Biopharma, Oil that are more compute and memory intensive will be and Gas, Financial Services and Manufacturing. In addition, provisioned by the MPI enabled clusters with parallel IO academic researchers will be able to perform research in facility. physics, chemistry, biology, computer science, and materials science. Following will be the supported features of CDAC High Physical Compute Resource pools Storage Performance Computing as a service (HPCaaS): Nodes Dynamic Provisioning of clusters :On demand Hyperviso Infiniband Provisioning MPI and Map reduce clusters to Hyperviso support compute intensive and data intensive Hyperviso Image Repository applications On-demand dynamic provisioning of storage volumes: Dynamic provisioning of clusters and Parallel File System User Virtual cluster (GlusterFS) storage will be handled by the Cloud Resource Request from request cloud portal Broker (CRB) or cloud metaschedular. Security : Simple , Secure and quick access to HPC Security Cloud resource broker Infiniband / clusters module And scheduler Ethernet Provisioning of customized libraries, softwares Interconnec t workflows ,etc on HPC clusters as per the applications requirement Users will be provided Ssh access with an option of selecting the specific MPI versions to cluster or compiler versions to suffice the application requirements Performance: To reduce the hypervisor overhead type-1 kind of hypervisor will be used. The Virtual Cluster distributed locations will be connected with 1Gbps MPI/ Map Reduce link and within the site nodes will be connected with infiniband interconnect to reduce the latencies.VM allocation to form a cluster will be Figure 2 Infrastructure as a Service (IaaS) done by the cloud scheduler based on nearness to storage nodes to minimize the data movement on cloud. 4.2 Storage As a Service (StaaS) Following are the services that will be provisioned on the CDAC Scientific Cloud A service of supplying data storage capacity over Internet is Storage as a Service. In context of scientific cloud, StaaS provisions petabytes of data storage to the scientific 4.1 Infrastructure as a Service (IaaS) communities. CDAC’s Cloud Vault based on OpenStack C-DAC has its HPC facilities at various CDAC Swift Object Storage software will provide scientists and locations like Bangalore, Pune, Chennai, and Hyderabad researchers partners with a convenient and affordable way to with approximately 70TF. Figure 2 depicts the prototype store, share, and archive data, including extremely large data model for dynamic provisioning of the computational sets. CDAC Cloud Vault is an object based storage system resources when requested by the user. Users will be able to and multiple interface methods make the Cloud Vault easy to access the CDAC scientific cloud services through cloud use for the average user. It also provides a flexible, portal. First time users will have to register with their configurable, and expandable solution to meet the needs of required details and also the details about the kind of more demanding applications. In this, files (also known as applications they want to run on the cluster. Based on the objects) are written to multiple physical storage arrays type of the application mentioned by the user resources will simultaneously, ensuring at least two verified copies exist on be allocated by the cloud broker and the cluster instance will different servers at all times. Figure 3 depicts the flow of the be created on the fly. Immediately user will be intimated Storage as a service (StaaS).The user registers himself by
providing the required details and the required amount of Cloud Vault will also be accessed by mobiles using mobile storage. After the users request gets validated and approved, application for the basic file operations like list, upload, user is sent the access details of the storage through email. download, and synchronize. There will also be a facility to The various interfaces through which user can access Cloud auto synchronize users mobile with his cloud vault files so Vault are as follows: that he can keep his mobile backup on cloud vault. 4.2.1 Web Interface Web interface will allow access to the cloud vault files 4.2.5 APIs through browser. User will be able to list, create containers, Files of any size can be stored in the Cloud Vault, from small Upload/Download files, and Delete files using this interface. personal document collections to multi-terabyte backup sets There will not be any need of installation of any clients to routed directly to the cloud using Rack space or S3 API in access cloud files. applications. 4.2.2 Desktop GUI Application 5 CSC Architecture and Components Cloud Vault files will be accessible using open source Figure 4 depicts the components of CDAC scientific cloud. desktop application called cyberduck. It is an FTP-like stand The various components of CDAC scientific cloud are as alone GUI application for accessing files. It supports file/ follows: directory listing, upload, download, synchronize, editing, etc. Cyberduck is a open source desktop application available for MAC and Windows system 5.1 Hypervisor 1 Registration A hypervisor, also known as a virtual machine Request manager/monitor (VMM), is computer hardware platform virtualization software that allows several operating systems Cloud Vult to share a single hardware host. The hypervisor controls the web Interface host processor and resources so that systems/virtual machines are unable to disrupt each other. As virtualization 2 credentials adds overheads to the cluster performance, we choose to use are sent via email 3 Login with type-1 or bare-metal hypervisors for virtualization. Type-1 credentials hypervisors run directly on the host's hardware to control the hardware and to manage guest operating systems. Some of CDAC the examples of type-1 hypervisors are Citrix XenServer 4 Access Cloud vault Files CloudVault [24], VMware ESX/ESXi [25], and Microsoft Hyper-V Storage hypervisor. CDAC scientific cloud will be using Xen 5 Access GUI desktop Files application to access hypervisor for the same. Through CDAC Cloud Vault Desktop GUI 4 Access Files 5.2 Cloud middleware Through mobile Cloud Middleware or Cloud OS: Cloud middleware is the software stack for provisioning the large networks of virtual machines on demand. It also handles scalability & Cloud Vault reliability of the resources provided to the users. There are mobile Interface various open source & commercial cloud middleware available like Nimbus [12], Open Nebula [26], and vCentre Figure 3 Storage as a Service (StaaS) [27], Eucalyptus [28]. 4.2.3 Command Line 5.3 Cloud resource broker Cloud Resource Broker and Meta scheduler: Cloud resource Command line access will allow the access to cloud vault broker is a common gateway to provision access to the HPC files with the UNIX shell. Client installation of the scripts resources like compute clusters, storage on cloud .It is an needs to be done on the user machine or laptop. intelligent scheduler that will provision the best pool of available resources to the users by using policy based 4.2.4 Mobile Interface decision. The various components that will build up a cloud resource broker are as follows:
5.3.1 Resource Discovery 5.4 Cloud Management and Monitoring Resource discovery of the available resources based on the kind of user application that can be Compute intensive or Cloud Infrastructure monitoring & management tool is the Data intensive or Memory Intensive control point for the virtual environment in cloud. This tool will provide a single point access for administrators to monitor & manage the resources of cloud. The following HDFS / Glusterfs Storage for Hadoop features that will be supported : Cloud Resource Inventory search: inventory including Management virtual machines, hosts, data stores, and networks at Hadoop Hadoop & Monitoring Data Node Hadoop Data Data Node tool the administrators fingertips from anywhere Node Hardware monitoring & Management ETHERNET / InfiniBand Storage maps and reports: Provides storage usage, INTERCONNECT connectivity and configuration. Customizable Map Reduce MPI enabled topology views give you visibility into storage Virtual Cluster Virtual Cluster infrastructure and assist in diagnosis and Cloud troubleshooting of storage issues. Vault Portal Alerts & notifications with automated rectification Gluster Utilisation, Performance & Energy Consumption Mount over Trends Infiniband Accounting and billing (to recover costs, capacity planning to ensure that consumer demands will be met), Policy management & SLA. Cloud Broker or Automated dynamic Meta Scheduler provisioning scripts Virtual Machines Cloud Middleware 5.5 Cloud Portal Virtual Virtual Cloud vault Storage As 5.5.1 Portal for IaaS Provisioning and Problem Solving Machines Machines Cloud Cloud A Service Cloud Environments (PSE) Middleware Middleware Middleware Openstack Openstack Openstack The scientific cloud portal will be the access point for Nova Nova SWIFT the users for requesting & accessing the on demand HPC Hypervisor Hypervisor clusters. There will also be customized PSEs for bioinformatics & climate modeling domains that will provide the complete environment and workflow for the domain specific applications. Ethernet 5.5.2 Portal for Storage as a Service In house development Storage The portal for storage as a service will an access point Node Storage Storage Node for the cloud storage through which user can register Open source tool Node himself and ask for the required amount of storage .Also user will be allowed to request for expanding the allocated storage on the fly Figure 4 Components of CDAC Scientific Cloud (CSC) 5.3.2 Policy based resource selection 6 Target Applications on CDAC Resource selection and provisioning will be done Scientific Cloud considering the various aspects like Load balancing, resources utilization, power aware. On-demand cloud computing can add new dimension to HPC, in which virtualized resources can be sequestered, 5.3.3 Data aware Job scheduling in a form customized to target a specific application requirement, at any point of time. [6] Described the Data aware scheduling enables computation to be done feasibility of running Coupled Atmosphere-Ocean Climate nearest to the location of the data .In this case, the cloud Models on an EC2 computing cloud and found that the resource broker will talk to the cloud file system components performance is below the level seen at dedicated clusters. to find out the nearest storage nodes where data resides However, cloud systems that feature a specialized
interconnect such as Myrinet or Infiniband and support MPI [9] http://www.darwinproject.mit.edu/wiki/images/2/2e/Gpfs or Map reduce are more closely targeted to HPC _overview.pdf applications.[23] states that Life Sciences are very good [10] Philip H. Carns, Walter B. Ligon III, Robert B. Ross candidates for Map Reduce on cloud including sequence Rajeev Thakur, PVFS: A Parallel File System for Linux Clusters, In assembly and the use of BLAST and similar algorithms for Proc. of the Extreme Linux Track: 4th Annual Linux Showcase and sequence alignment. On the other hand partial differential Conference, October 2000 equation solvers, particle dynamics and linear algebra [11] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, require the full MPI model for high performance parallel Robert Chansler, implementation on cloud. The two application domains that http://moodle.openfmi.net/file.php/331/lectures/lecture_4/The_Had have been identified as pilot applications for CDAC oop_Distributed_File_System.pdf scientific cloud are Bioinformatics applications like Blast, [12] The Nimbus Toolkit: www.nimbusproject.org Climate Modeling like Seasonal Forecast model (SFM). [13] http://www.nersc.gov/assets/HPC-Requirements-for- Seasonal Forecast Model (SFM) is an atmosphere general Science/Spentz.pdf circulation model used for predicting the Indian summer [14] http://www.stfc.ac.uk/resources/pdf/ctreport.pdf monsoon rainfall in advance of a season. It involves the [15] http://www.mmm.ucar.edu/events/indo_us/PDFs/0630_S single operation on multiple data sets that makes it a suitable KDash_HPC-USA-final.pdf case for using map reduce in this particular application [16] http://www.daylight.com/cheminformatics/casestudies/inf inity.html 7 Conclusions and Future Plans [17] http://www.gartner.com/it-glossary/cloud-computing/ Scientific applications require the availability of massive [18] http://searchcloudcomputing.techtarget.com/definition/Inf compute and storage resources. Cloud computing can be of rastructure-as-a-Service-IaaS great help in on demand provisioning of the HPC resources. [19] http://www.gluster.org/about/ The applications can scale up heavily using HPC as a service [20] Hadoop, http://en.wikipedia.org/wiki/Apache_Hadoop on cloud. However, the performance related challenges have to be addressed by fine tuning the cloud middleware stack [21] ]Map Reduce, and the software libraries. The proposed model of CDAC http://hadoop.apache.org/common/docs/current/mapred_tutorial.htm l scientific cloud is an attempt to address the requirements and challenges of HPC as a service on cloud. Currently, the test [22] http://searchstorage.techtarget.com/definition/Storage-as- bed setup for the same is in progress and in future we plan to a-Service-SaaS develop the cloud system software components like Cloud [23] http://grids.ucs.indiana.edu/ptliupages/publications/Cloud Resource Broker and Meta scheduler, management and sandMR.pdf monitoring tools, portal & PSEs [24] Citrix Xenserver, http://www.citrix.com/English/ps2/products/product.asp?contentID =683148 8 References [25] VMWare ESXi, http://www.vmware.com/files/pdf/VMware-ESX-and-VMware- [1] K. Keahey1, R. Figueiredo2, J. Fortes2, T. Freeman1, M. ESXi-DS-EN.pdf Tsugawa2, Science Clouds: Early Experiences in Cloud Computing for Scientific Applications, 1University of Chicago, 2University of [26] OpenNebula:http://opennebula.org/ Florida [27] vCentre, http://www.vmware.com/products/vcenter- [2] Cumulus: John Bresnahan, David LaBissoniere server/overview.html http://www.nimbusproject.org/files/bresnahan_sciencecloud2011.pd [28] Eucalyptus, http://www.eucalyptus.com/ f [29] http://www.penguincomputing.com/files/whitepapers/PO [3] Roy Campbell,5 Indranil Gupta,et. Al, Open CirrusTM, DWhitePaper.pdf Cloud Computing Testbed: Federated Data Centers for Open [30] http://www.gridgain.com/features/ Source Systems and Services Research,. [31] http://stratuslab.eu/doku.php/start [4] http://en.wikipedia.org/wiki/InfiniBand [32] http://aws.amazon.com/hpc-applications/ [5] http://en.wikipedia.org/wiki/Myrinet [6] Constantinos Evangelinos and Chris N. Hill, Cloud Computing for parallel Scientific HPC Applications: Feasibility of running Coupled Atmosphere-Ocean Climate Models on Amazon’s EC2, CCA-08 in Chicago [7] www.cdac.in [8] www.nkn.in
You can also read