Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico

Page created by Brian Blake
 
CONTINUE READING
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
Analysis on cloud using Amazon Web
               Services

          Harinder Singh Bawa
    California State University Fresno
     Future Analysis Systems, October 27, 2020
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
Why Amazon Web Service (AWS)? Capacity/Cost/Elasticity

            Facility is not in steady state

                                                    2
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
What is CSU AWS Project?
                 http://aws-csufresno-atlas.education/

Maintaining the hardware of the huge ATLAS computing grid and
replacing them every few years has been very expensive.
The CSU ATLAS group received $250k AWS credits from CSU
Chancellor’s Office(CO) to set up the prototype US ATLAS shared Tier
3 cluster on cloud to explore cloud solution for ATLAS. Last $150k
AWS credits has to be utilized by 8/2022.
CSU Fresno ATLAS cloud team: H. Bawa, Y. Gao, V. Barden, A.
Orkusyan and Anand Saggu with strong support from CSU Fresno IT
team (Jay Fowler, John Wagenleitner and Geoff King).
Experts from USATLAS: Kaushik De, Paolo Calafiura, Fernando Harald
Barreiro Megino, Johannes Elmsheuser, Mario Lassnig are helping us
to integrate panda queues using Kubernetes services.
                                                               3
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
m5 - general purpose instance that provides a
balance of compute, memory, and network
resources
c5 - optimized for compute-intensive workloads
r5 - optimized for memory-intensive workloads

                                           4
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
$14,000
                      m2.xlarge running Linux in US-East Region
          $12,000
                      over 3 Year period
                                                                                    Break-even
          $10,000                                                                   point
           $8,000
   Cost

                                                                               Heavy Utilization
           $6,000                                                              Medium Utilization
                                                                               Light Utilization
           $4,000
                                                                               On-Demand
           $2,000

              $-

                                            Utilization

Utilization         Sweet Spot                Feature                       Savings over On-Demand
75%                Heavy Utilization RI      Lowest Total Cost             Up to 71% (3-Year)
                                              Ideal for Baseline Servers
                                                                                                     5
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
Amazon Storage Options

                                                   -Linux/unix
https://aws.amazon.com/efs/sla/                    file permissions

                      1 GB / Month        100 GB / Month
    Storage              (Year)               (Year)             1 TB / Month (Year)
       S3              $0.023 ($0.276)        $2.30 ($27.6)             $23.00 ($276)
      EFS                  $0.30 ($3.6)      $30.00 ($360)            $300.00 ($3600)
                                                                                  6
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
Virtual Private Cloud (VPC)

                              7
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
VPC (Pricing)
                             NAT(Network Address Translation)
                             Gateway
                                Ensure EC2 instances in the
                                private subnet aren't accessible
                                from the internet
                             Bastion Host
                                Primary access point to the VPC
                                from the internet.
                                Acts as a proxy to EC2 instances
                                in the private subnet.

NAT Gateway Hourly   Bastion Host     VPC Monthly     VPC Annual
       Cost          Monthly Cost        Cost            Cost
      $0.045            $8.50            $41.44         $497.28
                                                              8
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
AMI (Amazon machine Image)

                             9
Analysis on cloud using Amazon Web Services - Harinder Singh Bawa California State University Fresno - CERN Indico
Launching computing Instance for analysis
A custom AMI (centos7-atlas-YYYYMMDD.nn) was built on CentOS 7 with the CVMFS client set
up and is available in the My AMI’s list when launching an instance.

Select the highest template version and then fill in the
Instance details

                                                                                           10
11
Choose Optional Disk space:

 EBS volume can be selected during instance launching
 One can add more volumes too.
 One can customized the VPC to get default EBS

                                                        12
That’s it :: EC2 launched..

                              13
Setup rucio
Setup panda
Setup root

      14
Accessing data from s3 disk through CLI

                                          15
*Slide Credit: Fernando Megino
                        16
Screenshot on 26 Oct
EKS cluster details
●   EKS cluster with Spot auto scaling group
     ○   0.1 USD/hour bid for a t3.2xlarge VM (around a
         third of normal price)
     ○   Auto-scaled up to 20 VMs (max 160 vCPU)
     ○   Spot instances can last several days - survival
         increases with your bid
     ○   (Note that egress and VM disk prices are not
         reduced!)
● Storage currently at SWT2-CPB, while S3 being
  setup
● Running simulation and analysis jobs

                                                           *Slide Credit: Fernando Megino
                                                                                        17
Stability and failure rate

                              Under investigation:
                              seems related old
                              systemd used in EKS
                              images
  Increasing
  capacity

                Temporary
                DDM/storage
                issues

                                                     *Slide Credit: Fernando Megino
                                                                                  18
Summary:
 CSU AWS on Cloud is ready and currently running with no
 issues. http://aws-csufresno-atlas.education/
  Currently production and analysis jobs are running fine.
 We welcome analysis groups to use our system !
  s3 integration is expected soon, In the meantime, we are using
 SWT2 as RSE
 We are in process of gathering enough data to calculate cost
 details and we need physics analyzers to use our AWS services
 rigorously .
  If you are interested, please email me at hbawa@csufresno.edu
 and I would be happy to open an account for you.

                             Comments and suggestions are welcome
                                                              19
Thank you

10/27/2020
EXTRA

10/27/2020   Harindr Bawa, Future
             Analysis Systems
Lists of EC2 into account

                            22
Advantage/Drawback of AWS

 10/27/2020   Harindr Bawa, Future
              Analysis Systems
Basic AWS Infrastructure

                           24
EC2 Cost Comparison for 50 Nodes (Using AWS
  Calculator)

                                           On-Demand                   1 Year All Upfront Reserved
Instance Type   vCPU   Memory (GiB) Monthly Cost Annual Cost     Upfront Cost Monthly Cost Annual Cost
  m5.large       2          8          $3,865.40    $46,384.80     $27,003.50      $100.00     $28,203.50
  m5.xlarge      4         16          $7,730.25    $92,763.00     $53,860.50      $100.00     $55,060.50
 m5.2xlarge      8         32         $15,338.32   $184,059.84    $107,115.00      $100.00    $108,315.00
 m5.4xlarge      16        64         $30,376.63   $364,519.56    $212,430.00      $100.00    $213,630.00
  c5.large       2          4          $3,422.10    $41,065.20     $23,633.00      $100.00     $24,833.00
  c5.xlarge      4          8          $6,844.20    $82,130.40     $47,066.00      $100.00     $48,266.00
 c5.2xlarge      8         16         $13,615.08   $163,380.96     $93,727.50      $100.00     $94,927.50
 c5.4xlarge      16        32         $26,930.16   $323,161.92    $185,707.50      $100.00    $186,907.50
   r5.large      2         16          $5,073.20    $60,878.40     $34,921.50      $100.00     $36,121.50
  r5.xlarge      4         32         $10,145.85   $121,750.20     $69,643.00      $100.00     $70,843.00
  r5.2xlarge     8         64         $20,037.76   $240,453.12    $138,090.00      $100.00    $139,290.00
  r5.4xlarge     16        128        $39,775.51   $477,306.12    $274,188.00      $100.00    $275,388.00

                                                                                                     25
Squid server
 Cache is used by the worker nodes, so it is placed in the private
 subnet(s)
 We make use of Frontier Squid, which is a patched version of squid.
 To validate, we selected a t3.medium instance type. For production,
 the r5.large might be ideal ( optimized for memory ).
 Concerns
 EBS attached to EC2 may need to be adjusted to accommodate
 network when run at scale?

                                                               26
27
You can also read