O2/PDP Status Report Predrag Buncic - CERN Indico
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Overview – Part 1 • Work Package progress report – Part 2 • Run 3 Computing Model – Part 3 • Resource estimates for Run 3 2 ALICE O2 LHCC Review, November 2018
Part 1 – Work Package progress reports addressing • Software framework status • Simulation – Progress report and update on • Performance of data compression – Update on RAW data volume estimates • Performance of reconstruction – Progress and outlook on using GPUs for reconstruction – Update on CPU estimates • Data types and data management in Run 3 – Derived data types and relative size estimates 3 ALICE O2 LHCC Review, November 2018
WP3 · Software process and tools ● 2018: 2200 commits, 80 authors ● 2017: ALICE campaign to involve developers is working → 40% more developers → 50% more commits ● Currently 350000 SLOCs Authors Contributions ● 25% of the O² project code is shared with FAIR ● Some O² components migrated to FairMQ/ROOT for ALICE code shared with reusability FAIR 4 ALICE O2 LHCC Review, November 2018
WP4 – Data Processing Layer (DPL) WP8 Multi process, concurrent, message driven software framework RDataFram e O2 Monitoring and InfoLogger integration Apache Arrow integration for ROOT upstreamed WP12 Digitization example using DPL ALICE Framework showcased at CHEP2018 Sofia. Data Processing Layer (DPL) is common TPC reconstruction WP13 WP7 integration framework for ALICE data- example using DPL DataSampling example using DPL processing needs. 5 ALICE O2 LHCC Review, November 2018
WP4 – Data Processing Layer (DPL) DPL version 1.0 • Released in November as planned and agreed in the June O2 Technical Board. • Real world demonstrators for Simulation (digitisation of multiple detectors), Reconstruction (TPC chain, MID pre-clustering), QC (Data Sampling) implemented using it. Synthetic examples for Analysis also provided. • Extra: integration with O2 Monitoring and Logging. Support for multiple message serialisation techniques, most notably ROOT (for HEP interoperability) and Apache Arrow (for rest of the world interoperability). Facility to integrate custom (ALFA) components, Event Display and custom GUIs. DPL Version 2.0: • June 2019. Focus on integration with O2 Control, Grid deployment. Improved support for Simulation, Reconstruction, and Analysis. Goal is the 2019 Data Challenge. DPL Version 3.0: • November 2019. Focus on feedback from Data Challenge, performance optimisation, and user friendliness. 6 ALICE O2 LHCC Review, November 2018
WP12 – Simulation Parallel high-performance simulation framework • Geant4 based development of a scalable • Demonstrated strong scaling speedup (24 core and asynchronous parallel simulation server) for workers collaborating on few large system based on independent actors and Pb-Pb event FairMQ messaging • Small memory footprint due to particular ”late- • Supports parallelization of simulation for forking” technique (demonstrated with Geant4) any Virtual MC engine • In result, reduce wall-time to treat a Pb-Pb • Supports sub-event parallelism events from O(h) to few minutes and consequently gain access to opportunistic – Make simulation jobs more fine-granular resources for improved scheduling and resource utilization 7 ALICE O2 LHCC Review, November 2018
WP12 – Simulation Milestone: Simulation of all detectors in O2 • Constant progress within detector specific developments • Recent updates include – ZDC development kick-started (code- sprint) • incl. first implementation of physics response • targeting digitization phase but waiting for input from read-out team – Refinement in EMCAL digitization with first stable version – MUON geometry additions (MID) – Several improvements for ITS, TPC, etc: • QED background inclusion in ITS digitization • TPC distortion treatment in digitization • Planning: – TRD identified manpower to start focused developments in 12/’18 8 ALICE O2 LHCC Review, November 2018
WP13 – Reconstruction on GPU (synchronous) Illustration of Run 3 tracking problem: 2 ms timeframe(10% of total) ALICE O2 LHCC Review, November 2018 9
WP13 – GPU reconstruction (synchronous) • Already running on GPU: GPU Speedup (NVIDIA 1080 v.s. single core @4.2 GHz) – TPC track finding (speedup: 40x) – TPC track fit (speedup: 35x) TPC Track finding 40x – ITS track finding (speedup: 10x – to be improved) TPC / ITS Track fit >30x • Will definitely run on GPU (ongoing development): ITS Track finding >10x – TRD Track following + fit (expected speedup: O(30x)) TRD Tracking / Matching / Refit ongoing development – ITS track fit (expected speedup: O(30x)) – TPC cluster transformation – TPC track merging – TPC compression (except final entropy coding step) 40x speedup factor on GPU vs 28x used in • Ongoing studies: TDR estimates (O2 – TPC dE/dx computation – TPC-ITS matching TDR, TABLE 6.3) – Global Refit – Entropy encoding – ITS V0 finder 10 ALICE O2 LHCC Review, November 2018
WP13 – CPU reconstruction (asynchronous) Reconstruction Performance Improvements for O2 CPU time for an average Pb-Pb events, 1 CPU core Run2: 79s → Run3: 11s 7 (conservative estimate) Speedup on CPU: Other Detectors Run3 (O2) vs Run2 (AliRoot) 20 1 TPC Tracking 20x ITS Tracking 17 ITS Tracking 17x TPC Ion Tail + 0 (not needed in Run3) Common Mode… 10 Rest (estimate) >3x TPC dE/dx 2 12 3x faster than TDR TPC Tracking 1 20 estimates 0 10 20 30 CPU seconds ALICE O2 LHCC Review, November 2018 11
WP13 – Milestones and plans CPU GPU Tracking (*) done done TPC dE/dX Q1/2019 (*) TPC reconstruction is operational Compression Q1/2019 Q2/2019(*) Tracking finding done / extra passes: Q2/2019 done as DPL device, others still need to be Track fitting done Q2/2019 interfaced to DPL ITS ITS-TPC matching done / afterburner: Q2/2019 Q3/2019 Compression Q1/2019 Q2/2019(*) (*) Feasibility of entropy compression on TRD Matching to ITS-TPC done Q4/2018 GPU is under study TOF Matching to ITS-TPC done (in validation) Q2/2019 EMCAL/PHOS Clustering Q2/2019 (old estimates, no upd) - MCH ? - MUON MID done (in validation) Tracking (standalone) Q1/2019 - MFT Matching to MCH Depends on MCH schedule FIT T0+ reconstructions done (in validation) - • TPC Vdrift calibration: Q4/2018 • TPC distortions calibration with track residuals: Q4/2018 • TPC distortions calibration with digital currents: Q2/2019 • TOF channels calibration: Q1/2019 ALICE O2 LHCC Review, November 2018 12
WP11 – Conditions database • Metadata central server ✓ • REST API ✓ • Offloading calibration files to existing Grid SEs ✓ – HTTP interface to EOS yet to be enabled⌛ – Xrootd protocol redirection until then • Fitting well the use case of CCDB and QC queries – 15kHz INSERT 12kHz SELECT • Meeting requirements for CCDB and QC target rates O2 TDR, Ch. 4.11 ALICE O2 LHCC Review, November 2018
Part 2 – Run 3 computing model • Physics program and requirements • Compression and data rates • Data processing workflows and schedule • Data management and policies • Computing model parameters • Resource requirements until the end of LS2 14 ALICE O2 LHCC Review, November 2018
Physics Requirements • Physics program – ALICE LoI Section 1.4 – O2 TDR Chapter 2 • Requirements as number of collisions (Run 3 + Run 4) – Pb-Pb: 1011 collisions – 10 nb-1 at nominal solenoid magnetic field – 3 nb-1 at reduced solenoid magnetic field • Yearly: – 4 years with 2.3*1010 collisions – 1 year with 1.1*1010 collisions – pp: 1.6*1011 • Yearly: 6 years with 2.7 1010 15 ALICE O2 LHCC Review, November 2018
Expected number of recorded collisions • Average minimum bias readout rate 23 kHz • Combined data taking efficiency: 0.57% (0.48% assumed in O2 TDR) • Total number of collisions in one yearly HI period: 2.6*1010 (2.3*1010 in O2 TDR, Table 2.3) • O2 input data rate – 3.5 TB/s (includes continuous unmodified TPC raw data) • Data rate after baseline correction and zero suppression of TPC data – 22 MB/ uncompressed event – Up to 50 kHz * 22 MB = 1.1 TB/s (O2 TDR Table 3.1) ALICE O2 LHCC Review, November 2018 16
O2 Data Rates (in GB/s) for Pb-Pb @ 50 kHz TDR Current status raw data: 1081 raw data: 3521 IT TR IT TR Others Others TPC 1000 S D 21 TPC 3456 S D 21 20→4 GB/s change in TRD: 40 20 40 4 1000 3000 central collisions instead of minimum bias were used in clusterizer 40 20 21 clusterizer 40 4 21 TDR estimate 400 570 clusterization, clusterization, 26→5 GB/s change in ITS track model noise track model noise suppression, suppression, the noise in ALPIDE chip is charge trans. Etc. charge trans. Etc. significantly better (
WP1 · Data types O2 TDR, Table 4.2 • Transient • Exists during the process lifetime • Temporary • Removed after predefined lifetime • Persistent • Remains on custodial storage ALICE O2 LHCC Review, November 2018 18
WP1 · AOD content ALICE O2 LHCC Review, November 2018 19
WP1 · AOD size 15% 85% CTF AOD • AOD data size is
Run 3 – Computing Model O2 TDR, Ch. 4 Reconstruction T0/T1 1..n O2 1 Calibration Calibration CTF RAW -> CTF -> ESD Reconstruction CTF -> ESD -> AOD Archiving -> AOD Compression Analysis AOD AOD AOD T2/HPC 1..n AF 1..3 Simulation MC -> CTF -> ESD Analysis AOD -> HISTO, -> AOD TREE • Subject to fine tuning • MC can be run as a backfill ALICE O2 LHCC Review, November 2018 21
Data Processing Timeline (PbPb) Year Year + 1 N D J F M A M J J A S O N D S. RECO CALIB A. RECO SIM A. RECO SIM • Estimates based on new and improved CPU and GPU based reconstruction and data compression performance • Simulation requirements remain the same as in TDR ALICE O2 LHCC Review, November 2018 22
Recap of the Run 3 Computing model • Assuming p+p (2 weeks) and PbPb (4 weeks) data taking in every year of Run 3. • RAW data (CTF) compressed by the O2 facility and stored to its disk buffer. • 1/3 of CTFs exported, archived and processed on T1s. • 2/3s of CTFs processed by O2 + T0 and archived at T0. • One calibration and two reconstruction passes over raw data each year. • CTFs and ESDs removed from disk before a new data taking period starts. • Only AODs are kept on T0/T1 disk and archived to tape. • Only one copy of a given data type (CTF, AOD) on disk and one on tape. • 10% of AODs sampled and sent to the Analysis Facility for quick analysis and cut tuning. • Analysis of full data sample across T0//T1s only upon Physics Board approval. • Reprocessing of first two years during the LS3. • Gradual removal of Run 2 data and cessation of Run2 related processing. • Growth of WLCG resources assumed to be limited to 20% per year. ALICE O2 LHCC Review, November 2018 23
Run 3 Processing sequence • 60 PB disk buffer size remains unchanged 2x 10% ALICE O2 LHCC Review, November 2018 24
WP15 – Disk buffer • Objective – design and deliver O2-attached storage; assure its compatibility with distributed Grid storage and existing local and distributed data management tools • O2 TDR, Ch. 6.10, 6.10.3 • All current milestones completed # 149: Requirements for the O2 event buffer # 150: Basic design of the event buffer with management software selection # 151: Document “The ALICE O2 data storage” – Together with CERN IT, includes cost estimate scenarios # 152: Design and development of tools for event buffer performance evaluation – Common project with US (LBNL and ORNL) 25 ALICE O2 LHCC Review, November 2018
WP15 – Ongoing work # 153: Design of abridged evaluation process - testing in production (expected completion 31/12 2018) – Aligned with the Pb-Pb data buffer test – “10% validation” test for the O2 storage – Includes all data flows foreseen in the TDR, scaled data rates – Evaluation will continue during Pb-Pb run (24 days) 26 ALICE O2 LHCC Review, November 2018
WP15 – Remaining work and decision to take # 154: Buffer performance summary after 2018 Pb-Pb data taking (due April 2019) – After at least ½ of 2018 Pb-Pb data processing completed – Full evaluation of asynchronous CTF processing workflow # 155: Decision on disk buffer location from CERN IT (following official request, due December 2018) – Has a direct bearing on installation and support – Possible effect on integration with CTA (CASTOR replacement system) ALICE O2 LHCC Review, November 2018 27
Run 3 deletion/parking policy Re- Reconstruc Archive Calibrate Reconstruc t t • With the exception of raw data (CTF) and derived analysis data (AOD), all other intermediate data created at various processing stages is transient (removed after a given processing step) or temporary (with limited lifetime) – CTF and AODs are archived to tape • Given the limited size of the disk buffers in O2 and Tier 1s, all CTF data collected in the previous year, will have to be removed before new data taking period starts. • All data not finally processed during this period will remain parked on tapes until the next opportunity for re-processing arises: LS3 ALICE O2 LHCC Review, November 2018 28
Run 3 Processing workflows • Initially we assumed that 100% of AODs will be stored on AFs 2x (now 10%) • AODs will be systematically archived on T1s and recalled from tape on demand in coordinated fashion 10% ALICE O2 LHCC Review, November 2018 29
Analysis Facility T T 1 5 Input T T T Output File File(s) 2 4 6 T 3 • Motivation • Analysis remains /O bound in spite of attempts to make it more efficient by using the train approach • Solution • Collect AODs on a dedicated sites that are optimized for fast processing of a large local datasets • Run organized analysis on local data like we do today on the Grid • Requires 20-30’000 cores and 5-10 PB of disk on very performant file system • Such sites can be elected between the existing T1s (or even T2s) but ideally this would be a purpose build facility optimized for such workflow ALICE O2 LHCC Review, November 2018 30
WP14 - Status of analysis framework • Transition of Run1/Run2 analysis to Root6 – Modifications of the analysis trains system: done – Modifications of the analysis tasks: ongoing • Development of RDataFrame-based analysis – Development of Apache Arrow API for RDataFrame (contributed to Root6) – Prototype of parallel analysis chain using Apache Arrow layout, O2 Data Processing Layer and RDataFrame: multiple data decompressing devices and multiple analysis clients Presented at CHEP2018 • Prototype of O2 Analysis Facility (O2 TDR, Ch. 4.6) : successfully completed in November 2017. – Tested with analysis train and Run3 prototype analysis • Design of Lego train system for Run3: ongoing – Evolution of a single process train to multi process parallel analysis framework – Each analysis represented as FairMQ device ALICE O2 LHCC Review, November 2018
Part 3 – Resource estimates for Run 3 • O2 (disk, CPU, GPU) • T0 (disk, CPU, tape) • T1 (disk, CPU, tape) • T2 (disk, CPU) • Analysis Facility (disk, CPU) 32 ALICE O2 LHCC Review, November 2018
Resource estimate for Run 3 CoCoTime • Multi year resource simulation of an experiment computing model • CPU • Disk • Tape • Network ALICE O2 LHCC Review, November 2018 33
O2 Facility: Disk • 60 PB disk buffer is sufficient to hold one standard year of data taking (pp + PbPb) • In case of better than expected LHC performance we can accommodate up to +20% of data volume (but without redundancy and a safety margin for operations) • The RAW storage cost is within O2 budget at today’s price point ALICE O2 LHCC Review, November 2018 34
O2 Facility: CPU CPU GPU • Thanks to performance improvements of GPU and CPU based reconstruction we can now assume that the O2 facility will need at least • 1500 modern GPUs will be sufficient for synchronous processing (vs 3000 in O2 TDR, Table 10.2) • 360 kHS06 CPU capacity will be needed for asynchronous processing (vs 960 kHS06 in O2 TDR, Table 10.2) ALICE O2 LHCC Review, November 2018 35
T0 • Assuming that our resource request until the start of Run 3 will be fulfilled, we will fit in 20% growth under the fixed budget scenario for both CPU and disk. • Total disk requirements shown on these plots do not include 10% operational margin. • Total CPU requirements do not account for (in)efficiency on the Grid jobs (current observed efficiency is 80%). ALICE O2 LHCC Review, November 2018 36
T1 • Assuming that our resource request until the start of Run 3 will be fulfilled, we will fit in 20% growth under the fixed budget scenario for both CPU and disk ALICE O2 LHCC Review, November 2018 37
Tapes at T0/T1 • Since we will be now archiving CTFs and AODs on tape, our growth will be 45% during data taking years on T0 (25% on T1s) and 5% during LS3. ALICE O2 LHCC Review, November 2018 38
T2 • T2 will be dedicated to simulation and, as described in TDR, we plan to rebalance disk/CPU ratio favoring more aggressive CPU growth (30% per year) while keeping disk growth at 5% per year • See O2 TDR, Ch. 4.12.2 • In these estimates, we did not account for fast simulation requirements including event mixing and embedding • See O2 TDR, Ch. 8.2 ALICE O2 LHCC Review, November 2018 39
Analysis Facility • Compared to TDR where we assumed that 100% of AODs would be exported to AF, we now assume that only 10% of sampled AODs will be placed to AF for prompt analysis • See O2 TDR, Ch. 4.6 • The analysis full data sample will be run across T1s in an organized way and upon approval of the Physics Board ALICE O2 LHCC Review, November 2018 40
Conclusions • We have achieved a significant improvement in key performance figures that allow us to reduce resource needs for Run 3 – Asynchronous reconstruction improved by x3 – Number of GPUs for synchronous reconstruction reduced by x2 – Persistent data (CTF, AOD) within 10% of original estimates – Disk buffer size unchanged (60 PB) • The projection of our computing resource needs presented here is to illustrate that, under the present assumptions, the computing and storage capacity of the O2 facility in combination with expected evolution of Grid, assuming a nominal growth of 20% will be sufficient to process ALICE Run3 data – These estimates represent the minimal needs based on optimized computing model, no redundancy and minimum contingency – That is NOT our definitive resource request to WLCG but an indication that ALICE will not break the flat funding scenario during Run 3 • The same computing model resulting in similar requirements will be applied to Run 4 ALICE O2 LHCC Review, November 2018 41
Backup ALICE O2 LHCC Review, November 2018 42
Run 3 CPU shares CPU 4000 3500 3000 CPU T2 2500 CPU AF 20% kHS06 CPU T2 2000 CPU O2 CPU T1 46% CPU T1 1500 CPU T0 17% CPU O2 1000 CPU T0 500 17% 0 2019 2020 2021 2022 2023 2024 2025 2026 ALICE O2 LHCC Review, November 2018 43
Run 3 Disk shares Disk 350 300 Disk AF 250 9% Disk O2 AF 20% Disk T2 200 T2 15% PB T1 150 T0 O2 Disk T0 100 Disk T1 28% 28% 50 0 2019 2020 2021 2022 2023 2024 2025 2026 ALICE O2 LHCC Review, November 2018 44
Run 3 Tape shares Tape 350 300 250 Tape T1 200 35% PB Tape T1 150 Tape T0 Tape T0 100 65% 50 0 2019 2020 2021 2022 2023 2024 2025 2026 ALICE O2 LHCC Review, November 2018 45
You can also read