BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico

Page created by Roberto Padilla
 
CONTINUE READING
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
BNL New Data Center –
Status and Plans
Tony Wong
HEPIX Spring 2018
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Background

• Update on presentation at LBL in 2016
• Scientific Data & Computing Center (SDCC) activities:
   • NP, HEP, Energy and Intensity Frontier
   • HPC
• Original floor space (7,000 ft2 and 1 MW) built in the
  1960’s
   • Added 8,000 ft2 and 1.3 MW by 2009
   • Added another ~500 ft2 and 0.7 MW in 2016 for
     HPC clusters
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
SDCC Support Overview
• US ATLAS Experiment @ CERN LHC (HEP)
   •   Largest of 12 Tier 1 ATLAS computing centers worldwide
   •   23% (today) of worldwide ATLAS computing and data storage capabilities (and
       growing)
   •   Support future growth in computing and data storage resources due to the High-
       Luminosity LHC (HL-LHC) project upgrade
   •   Maintain 99% Service Availability Requirement (MoU)
• RHIC Experiment(s) (NP)
   •   RHIC Tier 0 Site
   •   Primary provider of computing capacity for the analyzing and storage of data from
       the STAR , PHENIX, & sPHENIX experiments, supporting thousands of users
       worldwide.
   •   eRHIC (Pending site decision)
• Additional BNL Scientific Computing Support
   •   Provide centralized computing resources to multiple/growing BNL experimental
       facilities (CSI, NSLS- II, CFN, LQCD, BELLE II, LSST, DUNE)
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Current Data
Center Is Full
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Possible Solutions
• Four scenarios considered
   1. Do nothing
   2. Utilize existing BNL facilities
       a. Renovate current data center
       b. Re-purpose another building
   3. Build new facility
   4. Use cloud resources
• Conducted analysis to estimate cost of:
   • Infrastructure (power, cooling and space)
   • Computing (simulation and analysis)
   • Disk storage
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Analysis Process

• DOE mandate to prioritize alternatives to building new
  data centers
• Current budgetary realities and program requirements
  have compelled the HENP community to evaluate off-
  site alternatives, independent of DOE mandate
• Commercial providers (Amazon, Google) offer
  increasingly price-competitive cloud services
• Organizations (ie, OSG) have shown that harnessing
  the compute power of non-dedicated (HTC and HPC)
  resources is viable
• Must demonstrate option 2b (re-purpose another
  building) is most cost-effective solution
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Status of New Data Center
Proposal
• DOE CFR (Core Facility Revitalization) program
• Review process underway
    • CD-0 granted Fall 2015 (science case)
    • CD-1 granted Summer 2016 (alternative analysis)
    • CD-2 review Summer 2018 (design and cost estimate)
• Timeline
    • Hired A&E company after CD-1 to assist with design and cost estimate
    • Most realistic scenario indicates occupancy in 2021
    • Contingency plans for temporary space 2017-2021 to accommodate HPC
      growth and any other HTC-centric programs (ie, Belle-II, LSST, DUNE,
      etc)
         • Obtained a Data Center container (for free!) capable of housing 25
           racks and 0.5 MW of power in 2017
         • Renovate and put in use if needed – on standby for now
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Data Center Container
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Timeline Details
CFR Preliminary Schedule & Cost
• Construction timed with CERN’s LHC LS2 in mind
• Availability expected when LHC-HL program starts
• Total Project Cost on the order of ~$75M
BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
Today
Capacity & Power
Considerations
Bldg. 725 and CFR
CFR Scope
Renovate & Revitalize Building 725
   •   IT Power (Computing Power)
        •   Day-one: Deliver 3.6 MW IT power w/ option to increase by 1.2 MW
        •   Provisions for additional 1.2 MW increments at future dates
   •   Cooling
        •   Matching cooling capability to support initial 3.6 MW IT power
        •   Cooling strategy updated to support high density deployment (utilize water
            cooled vs. air cooled racks)
   •   Back-up Capabilities
        •   UPS for IT and Mechanical Equipment, Generators, Chillers & Chilled Water
            back-up capabilities
        •   Day-one: Deliver 1.2 MW emergency back-up power w/ option to increase
   •   Growth & Expansion
        •   Long term growth supported within the balance of the Building 725 facility

       An incremental approach allows for future flexibility, “right-size”
       deployment of equipment, and minimizes risk of equipment
       underutilization.
A Possible Layout
Cooling Considerations
Current Discussions
• Data Center Infrastructure
     •   25,000 ft2 (~2,320 m2) or more of usable space
     •   PUE of 1.2 to 1.4 (mandated by DOE)
     •   Raised floor vs. concrete floor
• Occupancy
     •   Separate spaces for high and low-power density equipment
     •   Synchronize migration of SDCC resources to new data center to minimize downtime
           •   Install new tape robots and disk storage in new data center to coincide with end-of-
               life of existing equipment
           •   Move compute racks from old to new data center
• Operations
     •   Scope of Data Center Infrastructure Management (DCIM) integration
     •   Role of data center operations team
     •   Migrate SDCC staff to Building 725 – sufficient office space for up to 50 people
Near-term developments
• Ahead of CD-2 review
   • Frequent interactions with A&E company to
     prepare baseline design + options
   • Estimate operational costs over lifespan (25+
     years) of new data center
• CD-2 decision expected in early Fall
• Provide another update 12-18 months from now
You can also read