Stephen Hawking 1942-2018 - CERN Indico

Page created by Christine Stewart
 
CONTINUE READING
Stephen Hawking 1942-2018 - CERN Indico
Stephen Hawking 1942-2018

                            11
Stephen Hawking 1942-2018 - CERN Indico
UKT0
                where are we ?

UKT0 face-2-face meeting                             Pete Clarke
14-16 March 2018                         University of Edinburgh
STFC-RAL                                                       22
David Britton,             IET, Oct 09
Stephen Hawking 1942-2018 - CERN Indico
Outline

•   STFC structure and STFC supported computing
•   UKT0 – what and why
•   UKT0 – achievements past, present
•   UKT0 - next steps

                                                  33
Stephen Hawking 1942-2018 - CERN Indico
Reminder of STFC structure
                                  (with apologies to STFC staff for stripped down version)

           Finance
                          STFC has seven Directorates
                                             Strategy,                     Business &
             Corp                             Policy,                      Innovation
External   Services                           Comms
Networks
             HR               National
                                                             Programmes
                            Laboratories
                                                             Directorate
                            Directorate

                      Runs National Facilities:          Supports programmes in:

                      Diamond Light Source                        HEP,
                         ISIS (neutrons)                       Astronomy,
                      Central Laser Facility             Astro-Particle, Nuclear
                            RAL Space
                                 .

                  + CERN, ESA, EU-XFEL, ILL, ESS                                         4
Stephen Hawking 1942-2018 - CERN Indico
Computing for STFC science

                   Corp
External         Services

Networks
                                       National
                                                                              Programmes
                                       Facilities
                                                                              Directorate
                                      Directorate
                            STFC has five

                                         Scientific
                      Hartree           Computing
                      Centre            Department
                                           (SCD)

                                                                  HTC Computing and                 HPC Computing for
        Industry focus Computing for                                   Data for                     Theory Cosmology,
                      National Facilities
                                                                   LHC (+other HEP)                      Nuclear
 SCD is internal to STFC, GridPP & DiRAC are external (funded at Universities) except that the Tier-1 is run within SCD. Up
 until now these have been separate computing “facilities” (but with very good informal co-operation).
 SCD runs JASMIN for NERC.
                                                                                                                              5
Stephen Hawking 1942-2018 - CERN Indico
And in any meeting such as this we should always
     acknowledge the vital relationship to JANET

                                                   6
Stephen Hawking 1942-2018 - CERN Indico
UKT0 – a community initiative

•   Initiative to bring STFC computing interests together

•   Formed bottom up by the science communities and compute providers

•   Association of peer interests (PPAN + National Facilities + Friends)
     –   Particle Physics: LHC + other PP experiments
     –   DiRAC
     –   National Facilities: Diamond Light Source, ISIS
     –   Astro: LOFAR, LSST, EUCLID, SKA, ....
     –   Astro-particle: LZ, Advanced-LIGO
     –   STFC Scientific Computing Dept (SCD)
     –   Nuclear
     –   CCFE (Culham Fusion)

•   It is not a project seeking to find users - it is users wanting to work together

•   Why? Primarily because its a common sense thing to do
     –   Communities collaborate naturally and the science is all linked
     –   Avoid duplication, share experience, pool resources,…

•   Make case for investment in eInfrastructure commensurate with STFC investments in
    scientists, facilities, instruments and experiments
                                                                                        7
Stephen Hawking 1942-2018 - CERN Indico
USERS                                                    All Data Activities contain

STFC UKT0                                                     Use of parallelism              elements of:
                                                                                              • Collection/generation
                                    PORTALS                    in Data Science:               • Simulation
                                                                                              • Analysis/discovery
                                                                   • HTC                      • Modelling/fitting
                                 AAAI ACCESS RING                  • HPC
                                                                                              Processing/Analysis of
                                                                                              Experiment/Observation, &
                                                                                              Large Non-parallel Simulations
                                                                                              Bespoke systems that are
 HIGH THROUGHPUT                                          HIGH PERFORMANCE                    directly connected to
 COMPUTING (HTC)
  PROCESSING/ANALYSIS OF
                                   UKT0                     COMPUTING (HPC)                   experiments, observatories &
                                                                                              National Facilities
                                                     HIGHLY PARALLEL SIMULATIONS
 EXPERIMENT/OBSERVATION,              Data                 & DATA INTENSIVE
   & LARGE NON-PARALLEL                                                                       Highly Parallel Simulations &
       SIMULATIONS
                                    Activities               WORKFLOWS
                                                                                              Data Intensive Workflows
                                                                                              Systems which generate data
 GridPP                         DATA        DATA                            DiRAC             from a set of starting
                             ANALYSIS &   MODELLING                                           assumptions/equations
 Ada Lovelace Centre         DISCOVERY    & FITTING                 Hartree Centre
 STFC-SCD                                                                    STFC-SCD         Data Analysis & Discovery
                                                                                              Systems analyze data to discover
                                                                                              relationships, structures and
                                                                                              meaning within, and between,
                          Central Laser
                GridPP                              Hartree                          PUBLIC   datasets
                            Facility                                                SECTOR,
                                                    Centre
     Diamond                                                      PARTNERSHIP
                                                                                   INDUSTRY
                                                                                     & HEIs   Data Modelling & Fitting
  CCFE
                           SCIENTIFIC DATA SOURCES                                            Systems use a combination of
     ISIS                                                                                     simulations, experimental data
            LHC & HEP                                          PARTNERSHIP                    and statistical techniques to test
                        Observatories            DiRAC                          ARCHER        theories and estimate
                                    STFC-SCD                                                  parameters from data

                                                                                                                                   8
Stephen Hawking 1942-2018 - CERN Indico
Reviews support this

                                  “All STFC programme areas anticipate an order of magnitude
                                  increase over the next five years in data volumes, with implications
                                  for requirements for computing hardware, storage and network
                                  bandwidth. Meeting this challenge is essential to ensure the UK can
                                  continue to produce world leading science, but in the current
                                  financial climate it is clear there will be funding limitations. Long-
                                  term planning will be critical, as will more efficient use of
                                  resources.

                                  The scientific disciplines supported by STFC should work more
                                  closely together to find ways of sharing computing
                                  infrastructure, with new projects encouraged to make use of
                                  existing expertise and infrastructure. “

• STFC eInfrastructure strategy    The data processing, analysis and event simulation requirements
    –   Document says:             of particle physics, astronomy and nuclear physics (PPAN)
                                   researchers, Facilities users and others require advanced High
                                   Throughput Computing facilities. By co-operating across
                                   organisational and project boundaries to make UKT0 a reality it
                                   will be possible to ensure the consolidation of computing
                                   resources and provide HTC access in a cost effective manner.

                                                                                                           9
Stephen Hawking 1942-2018 - CERN Indico
Change of funding agency structure in UK will mandate this

Following a national review (“Nurse Review”) it was decided to bring all UK Research
Councils (=Funding agencies) into a single organisation.

• UKRI = UK Research and innovation:
    •   will be born in April 2018

• There has been a UKRI-wide group working for
  some time towards a National eInfrastructure
  for Research
    è Currently we are making a case to our ministry
      (BEIS) for investment in UKRI eInfratructure

• So direction of travel in UK is:
    –   joined up (shared) computing across STFC and then UKRI
    –   progress towards a National eInfrastructure for research
    –   a push towards the “Cloud” where applicable.

                                                                                       10
The need for eInfrastructure across
               STFC

                                      11
                                      11
Large scale Astronomy and Particle-Astro
                                                                     computing interests

        LSST                Advanced LIGO                Lux-Zeplin                 EUCLID
  Data Access Centre      Run-3 with increased       Mock Data Challenge1      Simulation tests in
                           sensitivity in 2018             in 2017                   2018
  Data Challenge2 in
        2018                                            MDC2 in 2018

                                   SKA : HQ in Manchester
                      Responsible for Science data Processor (P.Alexander)
                      Developing European Science Regional Centre (SRC)
                  Cambridge, Manchester & STFC involvement in AENEAS (H2020)

WLCG-SKA meetings, CERN-SKA accord,
CERN-SKA “Big Data” workshop 2018 @ Alan Turing Institute.

                                                                                                     12
Diamond Data Rates

•   Ever rising data rates
     – Early 2007: Diamond first user.
        • No detector faster than ~10 MB/sec.
     – Early 2013:
        • First 100 Hz Pilatus 6M system @ 600 MB/sec
     – 2015: Latest detectors 6000 MB/sec

•   Doubling the data rates every 7.5 months

• Tomography: Dealing with high data volumes
   – 200Gb/scan,
   – ~5 TB/day (one experiment at DLS)
• MX: smaller files, but more experiments

•   Data storage
     –   2013: 1PB
     –   2015: 4PB, 1 billion files
     –   2017: 10PB,1.8 billion files
     –   Cataloguing 12,000 Files per minute                             13
What about ISIS ?
• A more complex situation
   •     Data sets tend to be smaller
       •     But can still get 100GB files (Excitations)
   •     Lots of modelling and simulation to interpret the data (e.g. RMC)
       •     Combining models, simulations and data analysis
       •     Complex algorithms and tools
       •     Good visualisation

•The Message is the same:
•    ISIS science is now : DATA INTENSIVE
•    The users can’t handle the data + algorithms
•    The science is being affected by the computing

                      Similar stories in CLF
                         and elsewhere
                                                                             14
LHC luminosity increasing

 DUNE experiment
                                                                                                                                    Billion HS06-hours
                                                                                                      0.5
                                                                                                                1.5
                                                                                                                          2.5
                                                                                                                                     3.5
                                                                                                                                                   4.5

                                                                                                  0
                                                                                                            1
                                                                                                                      2
                                                                                                                                3
                                                                                                                                           4
                                                                                                                                                                 5

                                                                                      2010 Jan
                                                                                     2010 Mar
                                                                                     2010 May
                                                                                       2010 Jul
                                                                                      2010 Sep
                                                                                     2010 Nov
                                                                                      2011 Jan
                                                                                     2011 Mar
                                                                                     2011 May
                                                                                       2011 Jul
                                                                                      2011 Sep
                                                                                     2011 Nov
                                                                                      2012 Jan
                                                                                                                                           ALICE

                                                                                     2012 Mar
                                                                                     2012 May
                                                                                       2012 Jul
                                                                                      2012 Sep
                                                                                     2012 Nov
                                                                                                                                           ATLAS

                                                                                      2013 Jan
                                                                                     2013 Mar
                                                                                     2013 May
                                                                                                                                           CMS

                                                                                       2013 Jul
                                                                                      2013 Sep
                   Data: 230 PB

                                                                                     2013 Nov
                                                                                      2014 Jan
                                                                                                                                           LHCb

                                                                                     2014 Mar
                                                                                     2014 May
                                                                                       2014 Jul
                                                                                      2014 Sep
                                                                                     2014 Nov
                                                                                      2015 Jan
                                                                                     2015 Mar
                                                                                     2015 May
                                                                                                                                                         CPU Delivered: HS06-hours/month

                                                                                       2015 Jul
                                                                                      2015 Sep
                                                                                     2015 Nov
                                                                                      2016 Jan
                                                                                     2016 Mar
                   550M files

                                                                                     2016 May
                                                                                       2016 Jul
                                                                                      2016 Sep
                                               New peak: ~210 M HS06-days/month

                                                                                     2016 Nov
                                                                                      2017 Jan
                                                          ~ 700 k cores continuous

                                                                                     2017 Mar
                                                                                     2017 May
                                                                                       2017 Jul
                                                                                      2017 Sep
                                                                                     2017 Nov
                                                                                      2018 Jan
                                                                                                                                                                                           Particle Physics : LHC output increasing + new compute-large experiments

15
HPC Science Drivers

• Capability calculations:
   - Galaxy formation - most realistic
     simulations to date
   - QCD - high precision calculation of
     quark masses

• Data Intensive calculations:
   - Gravitational waves
   - Gaia modelling
   - Precision cosmology using Planck
     satellite data

                                      1                          16
                                      6
What UKT0 has achieved

   ….with just good will

(i.e. no additional resource)

                                17
                                17
Spirit of cooperation is now established - trust is growing

                                                              18
Spirit of cooperation is now established - trust is growing

Resource sharing                                   ALC
• DiRAC - sharing RAL tape store                   • ALC Launch (September 2016
• Astronomy jobs run on GridPP                     • ALC Steering Group in place
• Lux-Zeplin in production                         • Projects:
• Recent aLIGO expansion to use RAL                • Data Analysis as a Service
• Fusion @ Culham LAb.                             • ULTRA - High throughput HPC
                                                     platform for tomographic image
Joint working                                        analysis
• Joint GridPP posts with SKA, LSST, LZ.           • Octopus - ICAT, Job Portal and
                                                     associated infrastructure for CLF
• AENEAS for SKA

Planning capability                                Supporting STFC for BEIS
• Resource commitment made to LSST-DESC            • Prepared 2016 and 2017 RCUK-eI
• Intent-to-support EUCLID June Campaign             Group BEIS submission.
   (30,000 cores !!! - we will see)                • Wrote the successful 2018 BEIS
                                                     cases.

                                                                                         19
What have we done with first actual
            resource

                                      20
                                      20
UKT0 – eInfrastructure Award

•   Awarded £1.5M for STFC eInfrastructure in Dec 2017
     –   £1.2M for hardware for “under-supported” communities
     –   £300k for digital assets for facilities users and federated OpenStack development

•   To deliver additional CPU and Storage resources aimed at STFC supported communities with little or
    no compute resource

•   To support the ADA Lovelace Centre

•   To deliver enhancements to OpenStack

•   To deploy a high performance database archive testbed

•   To move STFC nearer towards a UK NeI

                                                                                                     21
•   UKT0 has established
     –   Functioning iPMB, meeting bi-weekly
     –   Functioning Technical Working Group, meeting bi-weekly
     –   Draft organizational structures for managing future common funding è see Friday talk

•   Hardware: 3000 Cores and 3 PB of disk are deployed

•   Digital staff effort deployed for ALC and StackHPC
     –   Data Movement Service & Virtual Machine Manager being constructed
     –   OpenStack federation being constructed

•   Successful formulation and submission of STFC specific BEIS business case for £10M
    for DiRAC and for £16M for the rest of UKT0 over 4 years
     –   To “keep the lights on and not miss time limited opportunities”

                                                                                                22
                                                                                                22
•   UKT0 has established
     –   Functioning iPMB, meeting bi-weekly
     –   Functioning Technical Working Group, meeting bi-weekly
     –   Draft organizational structures for managing future common funding è see Friday talk

•   Hardware: 3000 Cores and 3 PB of disk are deployed

•   Digital staff effort deployed for ALC and StackHPC
     –   Data Movement Service & Virtual Machine Manager being constructed
     –   OpenStack federation being constructed

•   Successful formulation and submission of STFC specific BEIS business case for £10M
    for DiRAC and for £16M for the rest of UKT0 over 4 years
     –   To “keep the lights on and not miss time limited opportunities”

           This was a perfect example of what UKT0 is able to do:
         mobilise the entire STFC Computing Responsible community
                     to work together in a coherent way
                                                                                                23
                                                                                                23
What is in front of us

                         24
                         24
Whats in front of us

Google search

                                   25
Whats in front of us

Google search

                •   Appoint a Director of Brand and Logo

                                       26
                                                           26
What is in front of us

•   Make sure the 3000 Cores and 3 PB of disk get used !!!!!!!!!!
     –   Solve the operations with no resource problem

•   Take the next steps towards truly shared eInfrastructure
     –   AAAI Pilot
     –   Global Data Movement and Management ?
     –   Site monitoring
     –   Security and security incident response

•   Deliver to new activities with need (LSST DC2, EUCLID in June)

•   Enact proposed resource request, scrutiny and allocation process
     –   See talk on Friday

•   If BEIS £16M injection is approved
     –   Have appropriate organisational structures in place è See talk on Friday
     –   Prepare case to manage theses funds to be reviewed by some STFC body

                                                                                    27
                                                                                    27
What is in front of us

•   If the BEIS 16M is approved and UKT0 is successful in being selected to manage it –
    then this is a serious project.
     –   To plan, deploy and manage hardware and digital infrastructure for STFC computing

•   Get serious about resource (staff) for
     –   Operations
     –   Research Software Engineers

•   Pilot use of hybrid commercial cloud

•   This is important year for UKRI infrastructures
     –   Continue (already significant) effort going into UKRI eInfrastructure Expert Group
     –   and the STFC place in the NeI

                                                                                              28
                                                                                              28
Conclusion

q The need for more and joined up eInfrastructure for STFC is well established;

q The spirit of cooperation across STFC is established – and came from the bottom up
  through UKT0

q The bulb was planted in 2016 – and grew shoots in 2016/17 on good will, and best
  efforts;

q Maybe one of the most important shoots is trust ?;

q The shoots have turned into first flowers due to the 2017/18 award;

q There is now every reason to be motivated to continue to put effort into this from each
  activity - now we need the flower garden and we can all work on it together.

                                                                                        29
Conclusion in pictures

è

                         30
Conclusion

è

             31
Additional Material

                      32
                      32
UKT0 Ethos
        Science Domains remain “sovereign” where appropriate

        Activity 1
                         Ada Lovelace          Activity 3
      (e.g. LHC, SKA,
       LZ, EUCLID..)        Centre
                                                                     ....
                          (Facilities
      VO Management         users)
      Reconstruction
       Data Manag.
         Analysis

                                                                            Services:
                                                                Services:

Federated    Federated
                                                    Public &                Monitoring
                         Federated                              Data Mgm
  HTC          HPC                    Tape         Commercial               Accountin
                          Data                      Cloud       Job Mgm.
                                                                            g Incident
Clusters     Clusters                Archive
                         Storage                    access
                                                                   AAI
                                                                            reporting
                                                                 VO tools
                                                                                ....
                                                                   ....

            Share in common where it makes sense to do so
                                                                                         33
Proposed UKT0 Organisational Structures

                                 Reports to
 Strategic Advisory Board
           (SAB)                                 STFC Oversight Committee

         Advises and receives reports

                                                        Delivery Board
  Resource Review and                                        (DB)
    Allocation Group                                                                             Technical Working Group
          (RRB)                                                                                           (TWG)

                   Coordinates on strategy, policy, funding,….

                                                                          Coordinates all technical matters

STFCeI resource:          STFCeI resource:             STFCeI resource:         STFCeI resource:              STFCeI resource:
     SCD                      GridPP                       DiRAC                  Another Site                      ALC

          STFCeI resource:
                                              ………….
             HARTREE

                                                                                                                                 34
                                                                                                                                 34
You can also read