Stephen Hawking 1942-2018 - CERN Indico
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
UKT0 where are we ? UKT0 face-2-face meeting Pete Clarke 14-16 March 2018 University of Edinburgh STFC-RAL 22 David Britton, IET, Oct 09
Outline • STFC structure and STFC supported computing • UKT0 – what and why • UKT0 – achievements past, present • UKT0 - next steps 33
Reminder of STFC structure (with apologies to STFC staff for stripped down version) Finance STFC has seven Directorates Strategy, Business & Corp Policy, Innovation External Services Comms Networks HR National Programmes Laboratories Directorate Directorate Runs National Facilities: Supports programmes in: Diamond Light Source HEP, ISIS (neutrons) Astronomy, Central Laser Facility Astro-Particle, Nuclear RAL Space . + CERN, ESA, EU-XFEL, ILL, ESS 4
Computing for STFC science Corp External Services Networks National Programmes Facilities Directorate Directorate STFC has five Scientific Hartree Computing Centre Department (SCD) HTC Computing and HPC Computing for Industry focus Computing for Data for Theory Cosmology, National Facilities LHC (+other HEP) Nuclear SCD is internal to STFC, GridPP & DiRAC are external (funded at Universities) except that the Tier-1 is run within SCD. Up until now these have been separate computing “facilities” (but with very good informal co-operation). SCD runs JASMIN for NERC. 5
UKT0 – a community initiative • Initiative to bring STFC computing interests together • Formed bottom up by the science communities and compute providers • Association of peer interests (PPAN + National Facilities + Friends) – Particle Physics: LHC + other PP experiments – DiRAC – National Facilities: Diamond Light Source, ISIS – Astro: LOFAR, LSST, EUCLID, SKA, .... – Astro-particle: LZ, Advanced-LIGO – STFC Scientific Computing Dept (SCD) – Nuclear – CCFE (Culham Fusion) • It is not a project seeking to find users - it is users wanting to work together • Why? Primarily because its a common sense thing to do – Communities collaborate naturally and the science is all linked – Avoid duplication, share experience, pool resources,… • Make case for investment in eInfrastructure commensurate with STFC investments in scientists, facilities, instruments and experiments 7
USERS All Data Activities contain STFC UKT0 Use of parallelism elements of: • Collection/generation PORTALS in Data Science: • Simulation • Analysis/discovery • HTC • Modelling/fitting AAAI ACCESS RING • HPC Processing/Analysis of Experiment/Observation, & Large Non-parallel Simulations Bespoke systems that are HIGH THROUGHPUT HIGH PERFORMANCE directly connected to COMPUTING (HTC) PROCESSING/ANALYSIS OF UKT0 COMPUTING (HPC) experiments, observatories & National Facilities HIGHLY PARALLEL SIMULATIONS EXPERIMENT/OBSERVATION, Data & DATA INTENSIVE & LARGE NON-PARALLEL Highly Parallel Simulations & SIMULATIONS Activities WORKFLOWS Data Intensive Workflows Systems which generate data GridPP DATA DATA DiRAC from a set of starting ANALYSIS & MODELLING assumptions/equations Ada Lovelace Centre DISCOVERY & FITTING Hartree Centre STFC-SCD STFC-SCD Data Analysis & Discovery Systems analyze data to discover relationships, structures and meaning within, and between, Central Laser GridPP Hartree PUBLIC datasets Facility SECTOR, Centre Diamond PARTNERSHIP INDUSTRY & HEIs Data Modelling & Fitting CCFE SCIENTIFIC DATA SOURCES Systems use a combination of ISIS simulations, experimental data LHC & HEP PARTNERSHIP and statistical techniques to test Observatories DiRAC ARCHER theories and estimate STFC-SCD parameters from data 8
Reviews support this “All STFC programme areas anticipate an order of magnitude increase over the next five years in data volumes, with implications for requirements for computing hardware, storage and network bandwidth. Meeting this challenge is essential to ensure the UK can continue to produce world leading science, but in the current financial climate it is clear there will be funding limitations. Long- term planning will be critical, as will more efficient use of resources. The scientific disciplines supported by STFC should work more closely together to find ways of sharing computing infrastructure, with new projects encouraged to make use of existing expertise and infrastructure. “ • STFC eInfrastructure strategy The data processing, analysis and event simulation requirements – Document says: of particle physics, astronomy and nuclear physics (PPAN) researchers, Facilities users and others require advanced High Throughput Computing facilities. By co-operating across organisational and project boundaries to make UKT0 a reality it will be possible to ensure the consolidation of computing resources and provide HTC access in a cost effective manner. 9
Change of funding agency structure in UK will mandate this Following a national review (“Nurse Review”) it was decided to bring all UK Research Councils (=Funding agencies) into a single organisation. • UKRI = UK Research and innovation: • will be born in April 2018 • There has been a UKRI-wide group working for some time towards a National eInfrastructure for Research è Currently we are making a case to our ministry (BEIS) for investment in UKRI eInfratructure • So direction of travel in UK is: – joined up (shared) computing across STFC and then UKRI – progress towards a National eInfrastructure for research – a push towards the “Cloud” where applicable. 10
The need for eInfrastructure across STFC 11 11
Large scale Astronomy and Particle-Astro computing interests LSST Advanced LIGO Lux-Zeplin EUCLID Data Access Centre Run-3 with increased Mock Data Challenge1 Simulation tests in sensitivity in 2018 in 2017 2018 Data Challenge2 in 2018 MDC2 in 2018 SKA : HQ in Manchester Responsible for Science data Processor (P.Alexander) Developing European Science Regional Centre (SRC) Cambridge, Manchester & STFC involvement in AENEAS (H2020) WLCG-SKA meetings, CERN-SKA accord, CERN-SKA “Big Data” workshop 2018 @ Alan Turing Institute. 12
Diamond Data Rates • Ever rising data rates – Early 2007: Diamond first user. • No detector faster than ~10 MB/sec. – Early 2013: • First 100 Hz Pilatus 6M system @ 600 MB/sec – 2015: Latest detectors 6000 MB/sec • Doubling the data rates every 7.5 months • Tomography: Dealing with high data volumes – 200Gb/scan, – ~5 TB/day (one experiment at DLS) • MX: smaller files, but more experiments • Data storage – 2013: 1PB – 2015: 4PB, 1 billion files – 2017: 10PB,1.8 billion files – Cataloguing 12,000 Files per minute 13
What about ISIS ? • A more complex situation • Data sets tend to be smaller • But can still get 100GB files (Excitations) • Lots of modelling and simulation to interpret the data (e.g. RMC) • Combining models, simulations and data analysis • Complex algorithms and tools • Good visualisation •The Message is the same: • ISIS science is now : DATA INTENSIVE • The users can’t handle the data + algorithms • The science is being affected by the computing Similar stories in CLF and elsewhere 14
LHC luminosity increasing DUNE experiment Billion HS06-hours 0.5 1.5 2.5 3.5 4.5 0 1 2 3 4 5 2010 Jan 2010 Mar 2010 May 2010 Jul 2010 Sep 2010 Nov 2011 Jan 2011 Mar 2011 May 2011 Jul 2011 Sep 2011 Nov 2012 Jan ALICE 2012 Mar 2012 May 2012 Jul 2012 Sep 2012 Nov ATLAS 2013 Jan 2013 Mar 2013 May CMS 2013 Jul 2013 Sep Data: 230 PB 2013 Nov 2014 Jan LHCb 2014 Mar 2014 May 2014 Jul 2014 Sep 2014 Nov 2015 Jan 2015 Mar 2015 May CPU Delivered: HS06-hours/month 2015 Jul 2015 Sep 2015 Nov 2016 Jan 2016 Mar 550M files 2016 May 2016 Jul 2016 Sep New peak: ~210 M HS06-days/month 2016 Nov 2017 Jan ~ 700 k cores continuous 2017 Mar 2017 May 2017 Jul 2017 Sep 2017 Nov 2018 Jan Particle Physics : LHC output increasing + new compute-large experiments 15
HPC Science Drivers • Capability calculations: - Galaxy formation - most realistic simulations to date - QCD - high precision calculation of quark masses • Data Intensive calculations: - Gravitational waves - Gaia modelling - Precision cosmology using Planck satellite data 1 16 6
What UKT0 has achieved ….with just good will (i.e. no additional resource) 17 17
Spirit of cooperation is now established - trust is growing 18
Spirit of cooperation is now established - trust is growing Resource sharing ALC • DiRAC - sharing RAL tape store • ALC Launch (September 2016 • Astronomy jobs run on GridPP • ALC Steering Group in place • Lux-Zeplin in production • Projects: • Recent aLIGO expansion to use RAL • Data Analysis as a Service • Fusion @ Culham LAb. • ULTRA - High throughput HPC platform for tomographic image Joint working analysis • Joint GridPP posts with SKA, LSST, LZ. • Octopus - ICAT, Job Portal and associated infrastructure for CLF • AENEAS for SKA Planning capability Supporting STFC for BEIS • Resource commitment made to LSST-DESC • Prepared 2016 and 2017 RCUK-eI • Intent-to-support EUCLID June Campaign Group BEIS submission. (30,000 cores !!! - we will see) • Wrote the successful 2018 BEIS cases. 19
What have we done with first actual resource 20 20
UKT0 – eInfrastructure Award • Awarded £1.5M for STFC eInfrastructure in Dec 2017 – £1.2M for hardware for “under-supported” communities – £300k for digital assets for facilities users and federated OpenStack development • To deliver additional CPU and Storage resources aimed at STFC supported communities with little or no compute resource • To support the ADA Lovelace Centre • To deliver enhancements to OpenStack • To deploy a high performance database archive testbed • To move STFC nearer towards a UK NeI 21
• UKT0 has established – Functioning iPMB, meeting bi-weekly – Functioning Technical Working Group, meeting bi-weekly – Draft organizational structures for managing future common funding è see Friday talk • Hardware: 3000 Cores and 3 PB of disk are deployed • Digital staff effort deployed for ALC and StackHPC – Data Movement Service & Virtual Machine Manager being constructed – OpenStack federation being constructed • Successful formulation and submission of STFC specific BEIS business case for £10M for DiRAC and for £16M for the rest of UKT0 over 4 years – To “keep the lights on and not miss time limited opportunities” 22 22
• UKT0 has established – Functioning iPMB, meeting bi-weekly – Functioning Technical Working Group, meeting bi-weekly – Draft organizational structures for managing future common funding è see Friday talk • Hardware: 3000 Cores and 3 PB of disk are deployed • Digital staff effort deployed for ALC and StackHPC – Data Movement Service & Virtual Machine Manager being constructed – OpenStack federation being constructed • Successful formulation and submission of STFC specific BEIS business case for £10M for DiRAC and for £16M for the rest of UKT0 over 4 years – To “keep the lights on and not miss time limited opportunities” This was a perfect example of what UKT0 is able to do: mobilise the entire STFC Computing Responsible community to work together in a coherent way 23 23
What is in front of us 24 24
Whats in front of us Google search 25
Whats in front of us Google search • Appoint a Director of Brand and Logo 26 26
What is in front of us • Make sure the 3000 Cores and 3 PB of disk get used !!!!!!!!!! – Solve the operations with no resource problem • Take the next steps towards truly shared eInfrastructure – AAAI Pilot – Global Data Movement and Management ? – Site monitoring – Security and security incident response • Deliver to new activities with need (LSST DC2, EUCLID in June) • Enact proposed resource request, scrutiny and allocation process – See talk on Friday • If BEIS £16M injection is approved – Have appropriate organisational structures in place è See talk on Friday – Prepare case to manage theses funds to be reviewed by some STFC body 27 27
What is in front of us • If the BEIS 16M is approved and UKT0 is successful in being selected to manage it – then this is a serious project. – To plan, deploy and manage hardware and digital infrastructure for STFC computing • Get serious about resource (staff) for – Operations – Research Software Engineers • Pilot use of hybrid commercial cloud • This is important year for UKRI infrastructures – Continue (already significant) effort going into UKRI eInfrastructure Expert Group – and the STFC place in the NeI 28 28
Conclusion q The need for more and joined up eInfrastructure for STFC is well established; q The spirit of cooperation across STFC is established – and came from the bottom up through UKT0 q The bulb was planted in 2016 – and grew shoots in 2016/17 on good will, and best efforts; q Maybe one of the most important shoots is trust ?; q The shoots have turned into first flowers due to the 2017/18 award; q There is now every reason to be motivated to continue to put effort into this from each activity - now we need the flower garden and we can all work on it together. 29
Conclusion in pictures è 30
Conclusion è 31
Additional Material 32 32
UKT0 Ethos Science Domains remain “sovereign” where appropriate Activity 1 Ada Lovelace Activity 3 (e.g. LHC, SKA, LZ, EUCLID..) Centre .... (Facilities VO Management users) Reconstruction Data Manag. Analysis Services: Services: Federated Federated Public & Monitoring Federated Data Mgm HTC HPC Tape Commercial Accountin Data Cloud Job Mgm. g Incident Clusters Clusters Archive Storage access AAI reporting VO tools .... .... Share in common where it makes sense to do so 33
Proposed UKT0 Organisational Structures Reports to Strategic Advisory Board (SAB) STFC Oversight Committee Advises and receives reports Delivery Board Resource Review and (DB) Allocation Group Technical Working Group (RRB) (TWG) Coordinates on strategy, policy, funding,…. Coordinates all technical matters STFCeI resource: STFCeI resource: STFCeI resource: STFCeI resource: STFCeI resource: SCD GridPP DiRAC Another Site ALC STFCeI resource: …………. HARTREE 34 34
You can also read