ATLAS High Level Trigger within the multi-threaded software framework AthenaMT
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Journal of Physics: Conference Series PAPER • OPEN ACCESS ATLAS High Level Trigger within the multi-threaded software framework AthenaMT To cite this article: Rafal Bielski and on behalf of the ATLAS Collaboration 2020 J. Phys.: Conf. Ser. 1525 012031 View the article online for updates and enhancements. This content was downloaded from IP address 46.4.80.155 on 10/04/2021 at 14:01
ACAT 2019 IOP Publishing Journal of Physics: Conference Series 1525 (2020) 012031 doi:10.1088/1742-6596/1525/1/012031 ATLAS High Level Trigger within the multi-threaded software framework AthenaMT Rafal Bielski, on behalf of the ATLAS Collaboration CERN, 1211 Geneva 23, Switzerland E-mail: rafal.bielski@cern.ch Abstract. Athena is the software framework used in the ATLAS experiment throughout the data processing path, from the software trigger system through offline event reconstruction to physics analysis. The shift from high-power single-core CPUs to multi-core systems in the computing market means that the throughput capabilities of the framework have become limited by the available memory per process. For Run 2 of the Large Hadron Collider (LHC), ATLAS has exploited a multi-process forking approach with the copy-on-write mechanism to reduce the memory use. To better match the increasing CPU core count and, therefore, the decreasing available memory per core, a multi-threaded framework, AthenaMT, has been designed and is now being implemented. The ATLAS High Level Trigger (HLT) system has been remodelled to fit the new framework and to rely on common solutions between online and offline software to a greater extent than in Run 2. We present the implementation of the new HLT system within the AthenaMT framework, which is going to be used in ATLAS data-taking during Run 3 (2021 onwards) of the LHC. We also report on interfacing the new framework to the current ATLAS Trigger and Data Acquisition (TDAQ) system, which aims to bring increased flexibility whilst needing minimal modifications to the current system. In addition, we show some details of architectural choices which were made to run the HLT selection inside the ATLAS online data- flow, such as the handling of the event loop, returning of the trigger decision and handling of errors. 1. AthenaMT motivation and design Athena [1] is a software framework used in the ATLAS experiment [2] at all stages of event data processing path, from simulation and trigger, through event reconstruction to physics analysis. It is based on an inter-experiment framework Gaudi [3] designed in the early 2000s. At the time of the initial design, there was no strong motivation for concurrent event processing in experiments like ATLAS and the architectural choices made for Gaudi and Athena assumed only a single-process and single-thread mode of operation. However, over the last decade, the computing market has transitioned towards many-core processors and the unit price of memory has plateaued. The combination of these two trends presented in Figures 1 and 2 implies that high throughput of data processing in software requires concurrent execution and memory sharing. In Run 2 of the Large Hadron Collider (LHC) [6], the ATLAS experiment software has already suffered from suboptimal use of resources leading to the throughput of simulation and From ATL-DAQ-PROC-2019-004. Published with permission by CERN. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1
ACAT 2019 IOP Publishing Journal of Physics: Conference Series 1525 (2020) 012031 doi:10.1088/1742-6596/1525/1/012031 105 107 Memory price [USD/MB] 106 Frequency [MHz] 104 105 Number of logical cores 104 3 10 103 102 102 10 1 10 10−1 10−2 1 10−3 10−4 1975 1980 1985 1990 1995 2000 2005 2010 2015 1975 1980 1985 1990 1995 2000 2005 2010 2015 Year Year Figure 1. CPU frequency and logical core Figure 2. Memory price trends between count trends between 1972 and 2018, based on 1972 and 2018, based on data collected data collected by K. Rupp [4]. Since around by J.C. McCallum [5]. The orange line 2005, higher processing throughput is achieved represents a fit to data from the years 1972– by increasing the number of cores rather than 2009. A deviation from the logarithmic price their clock frequency. decrease is observed since around 2012. reconstruction workflows being limited by the available memory. A multi-process approach based on forking after initialisation (AthenaMP) was adopted as an intermediate solution to this problem. A reduction of the overall memory requirements was possible thanks to the copy- on-write mechanism. A similar approach has been also used in the ATLAS Trigger system. At the same time, work has been started towards redesigning the core framework of both Gaudi and Athena to support multi-threaded execution in a native, efficient and user-friendly manner. The effort is aimed at production-ready software available for Run 3 of the LHC, starting in spring 2021. The multi-threaded Athena framework, AthenaMT, is based on Gaudi Hive [7] which provides a concurrent task scheduler based on Intel Thread Building Blocks (TBB) [8]. The framework supports both inter-event and intra-event parallelism by design and defines the algorithm execution order based on the data dependencies declared explicitly as ReadHandles and WriteHandles. When all input dependencies of an algorithm are satisfied, it is pushed into a TBB queue which takes care of its execution. The combination of inter-event and intra- event parallelism is depicted in Figure 3, where four threads are shown to execute concurrently algorithms which process data from either the same or different events. More efficient memory usage can be achieved thanks to a large share of event-independent data. 2. High Level Trigger in AthenaMT Taking the advantage of major changes in the Athena core software, a decision has been made to considerably redesign the ATLAS High Level Trigger (HLT) framework which is part of Athena. In Run 2, the HLT software used a dedicated top-level algorithm implementing a dedicated sub- algorithm scheduling procedure. All trigger algorithms implemented an HLT-specific interface and any offline reconstruction algorithm required a wrapper to be used in the HLT. The Run-3 HLT framework will take full advantage of the Gaudi scheduler and no HLT-specific interfaces will be required. The main functional requirements of the ATLAS HLT have been incorporated into the design of the Gaudi Hive framework. These include the processing of partial event data (regional reconstruction) and early termination of an execution path if the event is failing the trigger selection. The regional reconstruction is achieved with Event Views which allow an algorithm to be executed in the same way on either partial or full event data. The Event Views are prepared by Input Maker algorithms scheduled before the reconstruction algorithms. Early 2
ACAT 2019 IOP Publishing Journal of Physics: Conference Series 1525 (2020) 012031 doi:10.1088/1742-6596/1525/1/012031 Event 1 Event 3 Event 5 Event 2 Event 4 Time Thread 1 = Algorithms Thread 2 Thread 3 Thread 4 Figure 3. Schematic flowchart describing concurrent event pro- cessing in AthenaMT. Each shape Event data corresponds to a type of algorithm in memory and each colour corresponds to Inter-event data associated with one event. shared memory Data flow is represented by arrows. termination is achieved with Filter algorithms. The scheduling of all algorithms in the HLT is assisted by configuration-time Control Flow which defines sequences of Filter → Input Maker → Reconstruction → Hypothesis algorithms. If the hypothesis testing fails in one sequence, the filter step of the next sequence ensures the early termination of the given path. The Control Flow creates a diagram including all possible execution paths at configuration time. The diagram is built from a list of all physics selections configured to be executed and does not change during runtime. However, each trigger “chain” corresponding to one path through the diagram can be individually disabled during runtime or executed only on a fraction of events. An example fragment of the Control Flow diagram is presented in Figure 4. 3. HLT within the TDAQ infrastructure The ATLAS Trigger and Data Acquisition (TDAQ) infrastructure, presented in Figure 5, consists of detector readout electronics, a hardware-based Level-1 (L1) trigger, an HLT computing farm and data flow systems. Data from muon detectors and calorimeters are analysed by the L1 system at the LHC bunch crossing rate of 40 MHz in order to select potentially interesting events and limit the downstream processing rate to a maximum of 100 kHz. In addition to the accept decision, L1 produces also Region-of-Interest (RoI) information which seeds the regional reconstruction of events in the HLT and will serve as the input to Event View creation in the new software. The HLT computing farm consists of around 40 000 physical cores executing Athena to enhance the trigger decision and to reduce the output rate to around 1.5 kHz which can be written to permanent storage. The HLT software can be used both in online data processing and in offline reprocessing or simulation. Although the event processing sequence is the same in both cases, the online processing requires a dedicated layer of communication to the TDAQ system. This layer implements the TDAQ interfaces for input/output handling, online-specific error handling procedures and an additional time-out watcher thread which is not needed offline. Each node in the HLT farm runs a set of applications presented in Figure 6. The main application responsible for event processing is the HLT Multi-Process Processing Unit (HLTMPPU) which loads an instance of Athena. After initialisation, the process is forked to achieve memory savings in a multi-process execution. The mother process only monitors the children and does not participate in event processing. Each child processes events by executing Athena methods and transferring the inputs and outputs to/from the Data Collection Manager (DCM) which communicates with the data flow infrastructure. 3
ACAT 2019 IOP Publishing Journal of Physics: Conference Series 1525 (2020) 012031 doi:10.1088/1742-6596/1525/1/012031 Data dependencies Initial processing define how algorithms are scheduled Trigger chains Filter step 1 Filter step 1 Filter step 1 correspond to different paths through the fixed control flow diagram Muon Muon+Electron Electron/Photon Filter algorithms run at the start of each step and implement the early rejection Input maker Input maker Muon RoIs EM RoIs Input maker algorithms restrict the following reconstruction to a region of interest Reconstruction Reconstruction Standalone muons Calorimeter clustering Reconstruction algorithms process detector data to extract features Hypo step 1 Hypo step 1 Hypo step 1 Muon Muon+Electron Electron/Photon Hypothesis algorithms execute hypothesis testing (e.g. pT > 10 GeV) for all active chains Filter step 2 Filter step 2 Filter step 2 Filter step 2 Muon Muon+Electron Electron Photon Input maker Track RoIs Reconstruction Input maker Fast ID tracking Photon RoIs Reconstruction Reconstruction Reconstruction Combined muons Electrons Photons Hypo step 2 Hypo step 2 Hypo step 2 Hypo step 2 Muon Muon+Electron Electron Photon Figure 4. Example fragment of Filter step 3 Muon Filter step 3 Muon+Electron Filter step 3 Electron Filter step 3 Photon HLT Control Flow diagram present- ing several execution paths through Input maker Muon RoIs Input maker EM cluster RoIs sequences of Filter → Input Maker → Reconstruction → Hypothesis algo- rithms. Peak event rates Peak data rates 40 MHz Muon Pixel O(60 TB/s) Trigger DAQ Calo SCT Other Detector Readout Level-1 Trigger Custom FE FE FE FE Level-1 Accept Hardware ROD ROD ROD ROD Event ID, ~160 GB/s Region of Interest (RoI) Data Readout System (ROS) HLTSV O(100) High Level Trigger Event Fragments Processing Data Collection Data Flow Units ~25 GB/s Network 100 kHz Accepted Events FE: Front End HLTSV: High Level Trigger Supervisor Data Logger ROD: Readout Driver O(10) ROS: Readout System 1.5 kHz ~2 GB/s (primary physics) CERN Permanent Storage (primary physics) Figure 5. Functional diagram of the ATLAS TDAQ system showing typical peak rates and bandwidths through each component in Run 2 [9]. 4
ACAT 2019 IOP Publishing Journal of Physics: Conference Series 1525 (2020) 012031 doi:10.1088/1742-6596/1525/1/012031 ATLAS Final State Machine HLT Run Control (HLTRC) execute state transitions, e.g. start/stop run HLT Multi-Process Processing Unit HLT Processing Unit (HLTMPPU) [mother] initialise and fork, monitor the children athena HLTMPPU HLTMPPU HLTMPPU [child] [child] [child] execute event loop execute event loop execute event loop athena athena athena Data Collection Manager (DCM) provide event fragments and process HLT result Data Flow Infrastructure Figure 6. Structure of the HLT Processing Unit applications and the communication between them. The design of the TDAQ infrastructure will not change in Run 3 and the HLT Processing Unit will consist of the same applications as in Run 2. However, the data flow between them will change considerably to make use of the new AthenaMT HLT software framework. The multi- process approach will remain in use, however, the Athena instance within each HLTMPPU child will be able to process multiple events concurrently using multiple threads. This design provides flexibility for optimising the performance of the system. In Run 2, the event processing loop was steered by HLTMPPU and events were pushed into Athena sequentially. In Run 3, Athena will manage the event processing loop and will actively pull events from DCM through HLTMPPU when processing slots are available. The new interfaces between Athena, HLTMPPU and DCM are presented in Figure 7. The role of HLTMPPU during event processing is reduced to calling doEventLoop in Athena, monitoring the process and forwarding data flow between Athena and DCM. Athena requests new events with a getNext call, pulls event fragments during processing with collect and sends the result of the processing to DCM with eventDone. The processing continues until DCM signalises that there are no more events available for processing. 4. Summary and outlook Recent trends in the computing market and software design motivated a large redesign of the Athena framework to fundamentally support multi-threaded event execution. Along this adaptation, the ATLAS High Level Trigger is also being re-implemented to achieve closer integration with the multi-threaded core framework and with the offline processing software. The upgrade of both offline and online Athena software is aimed at Run 3 of the LHC starting in 2021. The fundamental steps of the design and implementation of the core framework as well as 5
ACAT 2019 IOP Publishing Journal of Physics: Conference Series 1525 (2020) 012031 doi:10.1088/1742-6596/1525/1/012031 DCM HLTMPPU AthenaMT doEventLoop() loop [ until NoMoreEvents ] getNext() Level-1 Result collect() Event Fragments eventDone() HLT Result Figure 7. Simplified call diagram presenting the interaction between applications within an HLT Processing Unit according to Run-3 interfaces. the interfaces to the TDAQ infrastructure have already been achieved and the adaptation of the event processing algorithms and the supporting and monitoring infrastructure is ongoing. The implementation is under continuous testing within ATLAS offline software using an application emulating the online environment and serving events from Run-2 recorded data. Further tests are planned with the use of ATLAS HLT farm, where the new software will be executed using cosmic events, random Level-1 triggers or preloaded Run-2 data. In addition to the full implementation, the evaluation and optimisation of the performance of the new system is also required before the start of Run 3. This includes the determination of the optimal number of forks, threads, and concurrently processed events in the online system and large-scale tests with the full TDAQ infrastructure. New classes of software problems are anticipated with the multi-threaded execution and measures to minimise their impact and analyse them have to be defined. References [1] ATLAS Collaboration 2019 Athena [software] Release 22.0.1 https://doi.org/10.5281/zenodo.2641997 [2] ATLAS Collaboration 2008 The ATLAS Experiment at the CERN Large Hadron Collider JINST 3 S08003 [3] Barrand G and others 2001 GAUDI – A software architecture and framework for building HEP data processing applications Comput. Phys. Commun. 140 45–55. See also Gaudi [software] Release v31r0 https://gitlab.cern.ch/gaudi/Gaudi/tags/v31r0 [4] Rupp K 2018 Microprocessor Trend Data https://github.com/karlrupp/microprocessor-trend-data commit 7bbd582ba1376015f6cf24498f46db62811a2919 [5] McCallum J C 2018 Memory Prices (1957-2018) http://jcmit.net/memoryprice.htm [accessed 2019-02-12] [6] Evans L and Bryant P 2008 LHC Machine JINST 3 S08001 [7] Clemencic M, Funke D, Hegner B, Mato P, Piparo D and Shapoval I 2015 Gaudi components for concurrency: Concurrency for existing and future experiments J. Phys. Conf. Ser. 608 012021 [8] Reinders J 2007 Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism (O’Reilly Media). See also TBB [software] Release 2019 U1 https://github.com/intel/tbb/tree/2019_U1 [9] ATLAS Collaboration 2019 Public results page https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ ApprovedPlotsDAQ [accessed 2019-02-12] 6
You can also read