EResearch at VUW: the eScience Consultant's Tale - Kevin M. Buckley School of Engineering and Computer Science
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
School of Engineering and Computer Science eResearch at VUW: the eScience Consultant’s Tale Kevin M. Buckley School of Engineering and Computer Science Victoria University of Wellington New Zealand Kevin.Buckley@ecs.vuw.ac.nz
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale The Abstract In 2008, Victoria University of Wellington finally filled the post of eScience Consultant, although back then the post had been drawn up with the title of eResearch Programmer, not that either name really gives much insight into the range of facilitation that the role has provided. This tale will highlight recent eResearch activity at the capital city university, from the viewpoint of its eScience Consultant and in light of the experiences of the last six years. Outline Cycle-stealing Grids MWA Activity Sci Fac HPC Facility Loss of BeSTGRID HTML and PDF renditions of the slides should always be available here: http://www.ecs.vuw.ac.nz/~kevin/Conferences/NZeResSymp14
School of Engineering and Computer Science eRA09: Kevin Buckley: A Grid-Based Facility for Large-Scale Cross-Correlation of Continuous Seismic Data And finally ... who is supposed to do this "no-boundary" science ? Domain scientists ? Might need to brush up on their BEPL, SCUFL, WSDL, MPI and DRMAA Workflow/Grid computer scientists ? Might need to read up on SEED, SAC, QuakeML Phew ! Might still be a few jobs for people who can straddle the boundaries
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Couple of thoughts on where eResearch in NZ might be heading I nearly wasn’t able to give this talk, because, on coming to submit my two-paragraph, 76 word, 468 byte, plain text file, abstract, Submission.txt, I was informed (in red text!) You have tried to upload a file type which is not permitted or file is too large. Please try again using a standard document file type fortunately, renaming Submission.txt to Submission.txt.doc sidestepped whatever standard document file type checking there was But really?! Similarly, I recently tried to contribute to the conversation around the likely state of eResearch, in New Zealand, in 2020, on the back of someone "not seemingly getting it", however, depsite having an long-standing identity at the eresearch.org.nz web presence, as well as having an NZ-federated, Shibbolised, identity at my own institution it’s been suggested that I need yet another identity (via linkedin) to contribute!? Made me wonder if NZ eResearch, having already seemingly progressed way beyond a "plain text" future by 2014, would even need a Tuakiri in 2020 ? And if every "professional" in NZ was co-erced onto social media, would NZ need any institutions ?!
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Cycle-stealing Grids ECS Grid: 300-ish Eng and Comp Sci UNIX workstations - SGE SCS Grid: 900/600/49 VUW public lab windows machines - Condor Both of these grids are operated by the School of Engineering and Computer Science, not that much "operating" is actually needed, given that they make use of existing resources. Early usage of the SCS was UNIX/Cygwin, but a couple of more recent projects have been windows-native: The Zoological Society of London’s Colony program (genotype sibship and parentage) Compiled MATLAB simulations of historic stock pricing strategies Recently there have been two "threats" to the continued operation of the SCS Grid as a research resouce that can be used without jumping through (too) many hoops: Virtual packages Powersaving Initiatives
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Cycle-stealing Grids: Threats Virtual packaging supposedly removes the need to install software onto machines, however any software deployed as such becomes invisible to the grid. Solution to this has been to continue to use the virtual packaging "infrastructure" to install packages which the grid would use physically, below C:\Grid\, thereby installing them twice VUW’s central IT facilitator were recently asked to look at power saving across the "enterprise" "Solutions" considered didn’t consider any effects to the SCS Grid but identified a commercial product that would do "what was required" During a trial, a user of the SCS Grid became a "concerned" user of the SCS Grid and so, somewhat belatedly, a proper investigation, as to how the free software scheduler and commercial product play together, was started Let’s take a look at what VUW’s central IT facilitator found, once they started looking
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Cycle-stealing Grids vs Power Saving Condor could already do everything that’s needed in terms of powering machines down Hardly a surprise - once/if you stop to think about it In order to schedule jobs into idle cycles, across machines, a scheduler needs to know how resources are being used and so, by running such a grid, you already know machines are, and/or have been, idle and so could power them down, instead of accepting jobs Some infrastructural changes are needed to allow the free software to power machines up Bad! The grid makes use of spare cycles within existing infrastructure: can’t dictate it Some infrastructural changes are needed to allow the commerical product to power machines up Good! VUW’s IT purchasers also operate the infrastructure, so can just change it to fit. Better still? Wouldn’t you know it, these are the same changes Condor would need! It’s possible to save power as a by-product of deploying free cycle-stealing grid software and yet VUW already was, before it started looking around for something to purchase
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale MWA-related activity The Murchison Widefield Array (MWA) is one of the precursor projects to the Square Kilometer Array. VUW is very fortunate to have Melanie Johnston-Hollitt within in the SKA project, as well as having her current research group, based within VUW’s Science Faculty, working on data from the MWA programme, a key component of which, a 24-node IBM iDataplex system, is Melanie’s Melanie even "rescued for NZ", VUW’s initial MWA-related hardware, after it ended up in WA. (Take away: You don’t mess with Melanie!) Not content with just working on the MWA data, Melanie has also initiated the deployment of a New Zealand "data node", running the NGAS platform, mirroring data from WA, along with MIT and RRI in India, using hardware from SGI, who put MWA/SKA-related kit into iVEC. Data node currently comprises an SGI IS5000 "tray", housing 96TB. (MWA data slated for ~160TB) (Take away: SGI technical staff come highly recommended: their marketing department, and their support portal, may need some work!)
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale MWA-related activity 2 The Twin 2U chassis has given us two separate nodes: one inside VUW and one outside In order to avoid data bottlenecks at VUW’s edge, REANNZ helped Melanie facilitate what’s now, in effect, a 10GbE "Science DMZ", ie, avoiding VUW’s centrally facilitated IT infrastructure, (not currently able to offer 10GbE), and which, in it’s 1GbE days, we, VUW, and folk at UoA/CeR/NeSI had "maxed out" in some GridFTP testing. (Take away: Mellanox IB adaptor firmware can be flashed to 10GbE - thanks: toddc@sgi.com ) When I say "faciliate" above, I’m leaving a lot left unsaid, not least some NZ eRes Symp 12 leftovers Despite it’s rather fortutious birth, this 10GbE capability became, again somewhat serendipitously, extremely useful, in this last year, for Nevile Brownlie’s group, up at UoA, wanting a platform for network profiling research, although, when the two technical groups came to use the resources, we found that UoA’s central IT people had taken away their old "research DMZ" capability, which stalled the proto-collaboration, whilst the UoA end got back up to speed! (Take away: Don’t ever let your central IT people anywhere near your research kit!)
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Science Faculty HPC Facility VUW’s Science Faculty only got an HPC Facility after a lecturer moved from Massey but, because of the then existing BeSTGRID community, was then able to contact VUW’s eScience Consultant to ask what resources VUW had and, on discovering that VUW had next to nothing, bar a grid of desktop PCs running windows, decided to not keep his head down at his new home, but to ask for some HPC kit, leading to (Doppler effect studies using the sirens of the paramedics heading to/from VUW) the Dean of the Science Faculty showing some vision, and seeing his Faculty obtain 25 computers, 52 CPUs, 624 cores, 1920 GB RAM, IB interconnect 6-off, SGI C2112, 2x12-core AMD Opteron 6174, 64GB RAM (4-node units) 1-off, SGI H2106, 4x12-core AMD Opteron 6174, 512GB RAM RHEL5 OS hosting an SGE local resource manager (note: no vendor appl. stack) Since added: 2-off, SGI C2112 2x16-core AMD Opteron 6174, 64GB RAM (4-node units) 1-off, SGI ISS3500, which houses around 30TB storage OS upgrade to CentOS6 So, currently at: 784 cores with access to 64GB, and the 0.5TB node.
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Science Faculty HPC Facility: 2 Main users of the facility have been computational chemistry research groups within VUW’s School of Chemistry and Physical Sciences (SCPS), one led by that lecturer who moved from Massey, Matthias Lein, and more recently, Nicola Gaston, who moved into SCPS from IRL, but brought her MacDiarmid Institute PI funding with her. Their research focuses around Gaussian and VASP, respectively, although Nicola has been looking at deploying Crystal but found the code’s Italian authors operate a "code of silence" The large memory node, originally slated for use by VUW’s School of Biological Sciences (SBS), gets, as a result of the researcher driving its acquisition leaving for ANU, sporadic use by SBS researchers for BLAST searches, with a recently completed PhD project around protein-docking studies using RosettaLigand although without a large memory footprint. In terms of "really testing the beast" we’ve had to rely on an School of Enginerering and Computer Science project which used COMSOL to study far-field superlens effects, which touched 360 GB. Large usage has also come from our Faculty of Architecture and Design, where a combination of GenOpt and EnergyPlus are being used for optimisation studies of the energy performance
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Science Faculty HPC Facility: 3 So, the Facility has seen quite a range of projects and disciplines and so here are some of the issues that people seem to have when using it Transition from PC to HPC From as simple as not really "getting" directories or submission scripts to treating the shared resource as their own and writing job-scheduling "daemons" (read: script-kiddy self-spawning python scripts) Knowledge transfer within research groups/communities New users in a research group look to the resource facilitator, not to their group, for initial help Equally, code authors happy to share the code, but not to correspond about it So, unless you can match the developement environment, you may not be able to run "open source" codes
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Two simple things people don’t get Here’s a directory structure that helps people run Condor Grid programs and a schematic of a cross-correlation | A B C D E +--Colony2 1 2 3 4 A | +--- colony2s.exe impi.dll 5 6 7 B | libguide40.dll libiomp5md.dll 8 9 C | simu2.exe 10 D | E +--RunSet01 +---- submit.cmd | +--logs | +--0 | +-- INPUT3.PAR | +--1 | +-- INPUT3.PAR | +--2 | +-- INPUT3.PAR ... | +--999 +-- INPUT3.PAR makes use of %CONDOR_PROCESS%, initialdir = $(Process) and $SGE_TASK_ID respectively
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Loss of BeSTGRID Already touched upon the fact that a couple of folk from UoA/CeR/NeSI and from VUW had to be talking, over dinner up at eResearch Australasia 2013, in Brisbane, in order for some serendiptious collaboration around 10GbE networking between two sites down here in South Eastern Australasia to get off the ground. On Monday, I discovered that Landcare have been operating exactly the same piece of new hardware that I’ve recently deployed - but which of us knew? Yesterday it was suggested that SECS at VUW aren’t teaching UG students anything about Grid/HPC concepts - we do I also learned that a VUW researcher, running codes at a NeSI site, and whose PI has students running the same codes on the facility I look after has had that code profiled for them by NeSI - again, who knew? It’s my belief that, "back in the day", I, and others, would not need to go to a once a year Symposium to find out what users of eResearch at their instution were doing, or go over "the ditch" to hear about NZ activities.
School of Engineering and Computer Science eResearch at VUW: The eScience Consultant’s Tale Loss of BeSTGRID 2 Yes, BeSTGRID was very much a "Best Efforts" institution, however it gave New Zealand eResearch a level of collegiality that seems to have been lost, now that it’s gone. Once a month, interested parties from tertiary education and CRI-land would get together within a video conference and, whilst it was often the same people speaking, there was a conversation that got fed back into institutions, even if some of those institutions probably wished that the whole eResearch thing would go away and leave them to get on without any of this collaborative stuff Similarly, those interested parties were often "go to" folk for institutional users who wanted to make use of the very national level resources that their institutions were trying to ignore It might have been possible to have drawn a comparison between those, informal, lines of communication and the kind of eResearch support provision that we saw "across the ditch" There was a lot of information around joining into national-level collaborative efforts hosted on the collaboratively editable website technical.bestgrid.org
School of Engineering and Computer Science Colophon Slides for this talk have been created and delivered using MagicPoint http://member.wide.ad.jp/wg/mgp/ An X11 based presentation tool which has the slide sources in plain text and which also provides for creation of an HTML slide-set.
You can also read