DataGrid WP1 Massimo Sgaravatto INFN Padova
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
DataGrid WP1 Massimo Sgaravatto INFN Padova
WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical Annex"): To define and implement a suitable architecture for distributed scheduling and resource management on a GRID environment This includes in particular the design and development of a useful (as seen from the DataGrid applications perspective) grid scheduler, or Resource Broker
WP1 Achievements (after the first year) Analysis and evaluation of existing projects, software, technologies Analysis of User Requirements from Applications Definition and implementation of 1st Workload Management System First available Resource Broker ("super- scheduler") with the ability to take into account the data access requirements that are typical of the DataGrid applications
WP1 Components UI - User Interface Lightweight component for access to the workload management system RB - Resource Broker Core WP1 component able to find the “best” resource matching the user requirements JSS - Job Submission Services Reliable job management operations II - Information Index Caching index to the information space directly connected to the RB LB - Logging and Bookkeeping Repository for events occurring in the lifespan of a job
WP1 Components UI (User Interface) Lightweight component for access to the workload management system Ability to submit a job, described via an appropriate Job Description Language (JDL), based on Condor ClassAds to the DataGrid testbed from any user machine
UI commands dg-job-submit To submit a job on the Grid dg-job-get-output To retrieve the job output files (OutputSandbox) dg-job-list-match Returns the list of resources fulfilling job requirements dg-job-cancel To cancel one or more submitted jobs dg-job-status To get the job status dg-job-get-logging-info To get logging info
WP1 Components RB (Resource Broker) Responsible to choose the “best” resources where to submit jobs based on the constraint specified in the JDL and characteristics and status of resources (published in the Grid Information Service and Replica Catalog) The strategy that is used for this first project release is to send the job to an appropriate CE (Computing Element): where the submitting user has proper authorization that matches the characteristics specified in the JDL (Architecture, computing power, application environment, etc.) where the specified input data (and possibly the chosen output SE) are determined to be "close enough" by the appropriate resource administrators. Matchmaking performed using Condor ClassAds library
WP1 Components JSS (Job Submission Service) Responsible for job management operations (issued when requested by RB) and to keep tracks of submitted jobs Wrapper of Condor-G II (Information Index) First filter to the Grid Information Service Specific applications of Globus GIIS LB (Logging & Bookkeeping) Job status information “State machine” view of each job Push model
WP1 deployment “Community” RB LB server One for or “Personal” RB each RB Submitting RB-JSS Can submit machine (UI) to multiple One for II RBs each RB RC CE SE CE SE Queue of a LRMS (LSF, PBS)
dg-job-submit myjob.jdl Job submission scenario Myjob.jdl Executable = "$(CMS)/exe/sum.exe"; InputData = "LF:testbed0-00019"; ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it"; DataAccessProtocol = "gridftp"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other.Architecture == "INTEL" && other.OpSys== "LINUX Red Hat 6.2"; Rank = other.FreeCPUs;
WP1 Y2 plans Support for automatic proxy renewal (1.2: March 2002) Interim (working !) solution by March 2002 "Cleaner" solution later when/if our GRAM patches (necessary to forward the "fresh" proxy to the jobmanager) are merged in the standard Globus distribution Provision of APIs for the applications (1.3: May 2002) Ability to submit MPI jobs (1.3: May 2002) Starting considering MPI jobs within a single CE
WP1 Y2 plans Use of WP3 R-GMA for L&B services Tests to be done by March 2002 Date for actual integration can’t be foreseen now Support for interactive jobs (1.4: July 2002) Jobs running on some CE worker node where a channel to the submitting (UI) node is available for the standard streams (proof like applications) Support for job dependencies (1.4: July 2002) Integration of Condor DAGman
WP1 Y2 plans Grid Accounting (2.0: September 2002) Economy based model GUI (1.4: July 2002) Advance reservation API’s (September 2002) Collaboration with GARA efforts Support for job partitioning and "trivial" job checkpointing (2.0: September 2002) Integration of WP2 “query optimization” (based on network information and driving data replication)
Other info http://www.infn.it/workload-grid WP1 doc. “WP1 – WMS Software - Administrator and User Guide”
You can also read