BNL New Data Center - Status and Plans - Tony Wong HEPIX Spring 2018 - CERN Indico
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Background • Update on presentation at LBL in 2016 • Scientific Data & Computing Center (SDCC) activities: • NP, HEP, Energy and Intensity Frontier • HPC • Original floor space (7,000 ft2 and 1 MW) built in the 1960’s • Added 8,000 ft2 and 1.3 MW by 2009 • Added another ~500 ft2 and 0.7 MW in 2016 for HPC clusters
SDCC Support Overview • US ATLAS Experiment @ CERN LHC (HEP) • Largest of 12 Tier 1 ATLAS computing centers worldwide • 23% (today) of worldwide ATLAS computing and data storage capabilities (and growing) • Support future growth in computing and data storage resources due to the High- Luminosity LHC (HL-LHC) project upgrade • Maintain 99% Service Availability Requirement (MoU) • RHIC Experiment(s) (NP) • RHIC Tier 0 Site • Primary provider of computing capacity for the analyzing and storage of data from the STAR , PHENIX, & sPHENIX experiments, supporting thousands of users worldwide. • eRHIC (Pending site decision) • Additional BNL Scientific Computing Support • Provide centralized computing resources to multiple/growing BNL experimental facilities (CSI, NSLS- II, CFN, LQCD, BELLE II, LSST, DUNE)
Possible Solutions • Four scenarios considered 1. Do nothing 2. Utilize existing BNL facilities a. Renovate current data center b. Re-purpose another building 3. Build new facility 4. Use cloud resources • Conducted analysis to estimate cost of: • Infrastructure (power, cooling and space) • Computing (simulation and analysis) • Disk storage
Analysis Process • DOE mandate to prioritize alternatives to building new data centers • Current budgetary realities and program requirements have compelled the HENP community to evaluate off- site alternatives, independent of DOE mandate • Commercial providers (Amazon, Google) offer increasingly price-competitive cloud services • Organizations (ie, OSG) have shown that harnessing the compute power of non-dedicated (HTC and HPC) resources is viable • Must demonstrate option 2b (re-purpose another building) is most cost-effective solution
Status of New Data Center Proposal • DOE CFR (Core Facility Revitalization) program • Review process underway • CD-0 granted Fall 2015 (science case) • CD-1 granted Summer 2016 (alternative analysis) • CD-2 review Summer 2018 (design and cost estimate) • Timeline • Hired A&E company after CD-1 to assist with design and cost estimate • Most realistic scenario indicates occupancy in 2021 • Contingency plans for temporary space 2017-2021 to accommodate HPC growth and any other HTC-centric programs (ie, Belle-II, LSST, DUNE, etc) • Obtained a Data Center container (for free!) capable of housing 25 racks and 0.5 MW of power in 2017 • Renovate and put in use if needed – on standby for now
Timeline Details CFR Preliminary Schedule & Cost • Construction timed with CERN’s LHC LS2 in mind • Availability expected when LHC-HL program starts • Total Project Cost on the order of ~$75M
Bldg. 725 and CFR
CFR Scope Renovate & Revitalize Building 725 • IT Power (Computing Power) • Day-one: Deliver 3.6 MW IT power w/ option to increase by 1.2 MW • Provisions for additional 1.2 MW increments at future dates • Cooling • Matching cooling capability to support initial 3.6 MW IT power • Cooling strategy updated to support high density deployment (utilize water cooled vs. air cooled racks) • Back-up Capabilities • UPS for IT and Mechanical Equipment, Generators, Chillers & Chilled Water back-up capabilities • Day-one: Deliver 1.2 MW emergency back-up power w/ option to increase • Growth & Expansion • Long term growth supported within the balance of the Building 725 facility An incremental approach allows for future flexibility, “right-size” deployment of equipment, and minimizes risk of equipment underutilization.
A Possible Layout
Cooling Considerations
Current Discussions • Data Center Infrastructure • 25,000 ft2 (~2,320 m2) or more of usable space • PUE of 1.2 to 1.4 (mandated by DOE) • Raised floor vs. concrete floor • Occupancy • Separate spaces for high and low-power density equipment • Synchronize migration of SDCC resources to new data center to minimize downtime • Install new tape robots and disk storage in new data center to coincide with end-of- life of existing equipment • Move compute racks from old to new data center • Operations • Scope of Data Center Infrastructure Management (DCIM) integration • Role of data center operations team • Migrate SDCC staff to Building 725 – sufficient office space for up to 50 people
Near-term developments • Ahead of CD-2 review • Frequent interactions with A&E company to prepare baseline design + options • Estimate operational costs over lifespan (25+ years) of new data center • CD-2 decision expected in early Fall • Provide another update 12-18 months from now
You can also read