News - The Bayes Centre: EPCC's new home at the heart of the Data Driven Innovation Programme and supercomputing in the UK - The University of ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The newsletter of EPCC, the supercomputing centre at the University of Edinburgh news Issue 84 AUTUMN 2018 The Bayes Centre: EPCC’s new home at the heart of the Data Driven Innovation Programme and supercomputing in the UK
From our Director 3 Contents EPCC’s new home We’ve moved! Welcome to the Autumn 2018 power at the ACF and a fourth edition of EPCC News. Over the computer room. 4 Prosperity Partnership Next-generation engineering summer EPCC has moved offices simulation We have also successfully won a for the first time in its 28-year number of new projects which you history. can read about in this issue. The 5 Catalyst UK programme Major Arm-based HPC system While the process has not been largest of these is the 5-year £15m without its challenges, we’ve now settled into our beautiful new home Prosperity Partnership which EPCC leads with Rolls-Royce. This project 6 Industry roundup Our industry partnerships on the second floor of the £45m will develop the world’s first high- Bayes Centre in central Edinburgh. fidelity simulation of a next- 7 Data-driven innovation What it means for business We’re already benefiting from this generation gas turbine engine in central location in terms of operation. interaction with the School of We are also looking forward to the 8 Fortissimo review Bringing HPC to business Informatics and visitors to the delivery of our first large-scale University. Arm-based HPC system from the 9 VESTEC project HPC for urgent decision making The Bayes Centre is one of the six Catalyst UK project led by HPE. new buildings funded through the This 4,096-core system based on Edinburgh & SE Scotland City Cavium’s ThunderX2 Arm processor 10 ISO certification Ensuring data security Region deal which was signed by is a new departure for us. More the Prime Minister and First Minister in August. As well as being a key diversity in processor technology is very important for the long-term 11 Data research Supporting national policy making organisation within the Bayes future of the HPC industry world- Centre, EPCC is also in charge of wide - particularly as we approach 12 Humanities data Cray Urika-GX’s data services creating the World Class Data the Exascale. Infrastructure which supports the Data Driven Innovation Programme. I hope you enjoy this issue! 14 Our MSc programmes New beginnings Over the next 10 years we will invest £115 million in this infrastructure including an increase to 32MW of Mark Parsons, EPCC Director m.parsons@epcc.ed.ac.uk 15 Online learning at EPCC Study with us anywhere 16 SpiNNaker Neuromorphic HPC at EPCC EPCC at Supercomputing 2018 17 EPCC’s Computing Facility Energy-efficient computing EPCC booth: 2800 EPCC will be participating in a wide (2033). At the EPCC booth in the 18 EPiGRAM-HS project Doing more with less range of activities at this year’s exhibition hall you can find out Supercomputing conference in Dallas, including the "Open Source about our latest work including our involvement in the Data Driven 19 ARCHER eCSE Funding research software Supercomputing" workshop, the Innovation Programme. Partner "Sustaining Research Software" booths include the European 20 Cell biology image analysis Improving usability panel, and BoF sessions on Exascale Project booth (237), where “Software Engineering and Reuse in you can learn about DEEP-EST and Computational Science and SAGE2, and booth 1847 where the 21 Women in HPC We are now a local Chapter Engineering", "Multi-Level Memory GEANT networking project is HPC-Europa3 and Storage for HPC and Data represented. User group meeting review Analytics", and education, outreach Adrian Jackson, EPCC and training discussions. a.jackson@epcc.ed.ac.uk 22 Research software engineers There will be presentations about Building a new community different projects EPCC is involved Full details of all our activities in, including VESTEC (see p9) and at SC’18 can be found on our INTERTWinE at the PRACE booth website: http://bit.ly/2CPaihv 23 Outreach EPCC at New Scientist Live www.epcc.ed.ac.uk info@epcc.ed.ac.uk +44 (0)131 650 5030 Twitter: @EPCCed EPCC is a supercomputing centre based at The University of Edinburgh, which is a charitable body registered in Scotland with registration number SC005336. 2
Images of the Bayes Centre by Mark K Jackson and Mark Reynolds. The Bayes Centre: EPCC’s new home In August EPCC moved to the Bayes Centre, the University of Edinburgh’s new community of academic, research and commercial expertise located on its Central campus. EPCC has left its original home on innovation. Education offerings for the Kings Buildings Campus to join companies are also being the newly established Bayes Centre developed here. community, which will eventually The Bayes will enable the University house some 600 experts in data to build on its globally recognised science and artificial intelligence strengths in data management, “EPCC’s move to the from research and industry. artificial intelligence, theoretical Bayes Centre is a really The move brings us into much computer science, computational important moment in its closer contact with other data- linguistics, systems architectures based organisations such as The and bioinformatics. history. We’re at the Data Lab, the Centre for Design As part of the University’s strategy heart of the Data Driven Informatics, and the Alan Turing to build multi-disciplinary Innovation Programme Institute. Our new open-plan study area also allows our MSc and PhD partnerships with industry, the and supercomputing in Bayes Centre will accommodate the UK, and this new students to better integrate with industrial tenants such as satellite- EPCC’s staff (see p10). technology provider Orbital Micro home brings us closer to The Bayes Centre is one of the Systems (OMS), (see page 6 for our many new opportunities University’s five innovation hubs, collaboration with this company), for collaboration in the which are each designed to fuel and we look forward to the further growth in industry partnerships. opportunities for knowledge region.” World-leading technology, data exchange the Bayes will bring. science and enterprise teams will Mark Parsons, Finally, we are very much enjoying EPCC Director work together with corporate exploring the exciting lunch partners at the Centre to help shape opportunities in our new the future using data-driven neighbourhood! The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 3
New Prosperity Partnership to develop world first in high-fidelity engineering simulations Rolls-Royce power gearbox. Image: Rolls-Royce plc A consortium led by Rolls-Royce and EPCC was recently awarded a Prosperity Partnership worth £14.7m to develop the next generation of engineering simulation and modelling techniques, with the aim of developing the world’s first high-fidelity simulation of a complete gas- turbine engine during operation. The five-year “Strategic Partnership not a single well-defined milestone Michèle Weiland, EPCC in Computational Science for to be reached by 2030 however. m.weiland@epcc.ed.ac.uk Advanced Simulation and Modelling The necessary simulation capability of Virtual Systems” (ASiMoV for is essential, but so is the evidential short) is embarking on an extremely basis for trusting the simulation. challenging programme of research Requirements for speed, fidelity and to enable this level of simulation. It accuracy are well beyond current will require breakthroughs at all simulation and high performance levels (mathematics, algorithms, computing capability. Where software, and security) and uniquely simulations can be carried out at all, combines fundamental engineering they do not meet fidelity and computational science research requirements and take weeks or to address a challenge that is well even months to complete. True beyond the capabilities of today’s virtual certification simulations will state of the art. therefore require new high- The ultimate long-term goal of the resolution physical models and full The partners gathered for the Partnership’s kick-off meeting. research is to enable the “virtual system simulations that drive us certification” of aero engines, be from today’s model sizes (with they gas-turbine, hybrid or fully 10-100 million cells) towards electric, by 2030. The journey to models with trillions of cells. A result virtual certification requires a will be the need for techniques that thorough evidential database to can exploit future computing convince the certification authorities platforms and the unprecedented that the analysis can be trusted. For amounts of data they consume and example, engine manufacturers produce, robustly, securely and In addition to Edinburgh, working with the FAA have affordably. the Prosperity Partnership successfully replaced the large includes four Universities This is a transformational change bird-strike certification test with (Bristol, Cambridge, Oxford and requiring a transformational analysis - it took around ten years Warwick) as well as two SMEs, collaboration, and EPCC is to obtain FAA approval for virtual CFMS and Zenotech. delighted to be leading the certification. Virtual certification is Partnership. 4
Catalyst UK programme brings Arm-based HPC system to EPCC! Areas of particular interest to the initial phase of the Catalyst programme include weather and ocean modelling. © iStock.com/shayes17 Earlier this year, HPE announced the Catalyst UK programme: a collaboration with Arm, SUSE and three UK universities to deploy one of the largest Arm-based high performance computing (HPC) installations in the world. EPCC was chosen as the site for one of these systems; the other two are the Universities of Bristol and Leicester. The HPE Apollo 70-based systems exercise will unearth issues (eg with Michèle Weiland, EPCC will each consist of 64 compute compilers, libraries, or the network) m.weiland@epcc.ed.ac.uk nodes with two 32-core Cavium and we will work closely with the ThunderX2 processors (ie 4096 vendors to resolve these. A handful cores in total), 128GB of memory of applications will be selected for composed of 16 DDR4 DIMMs, and in-depth optimisation - areas of Mellanox InfiniBand interconnects. particular interest are engineering, They will be made available to both computational chemistry, and industry and academia, with the aim weather and ocean modelling, and to build applications that drive these will be given priority. However economic growth and productivity it’s not an exclusive club: industry, as outlined in the UK government’s high-performance data analytics Industrial Strategy. and machine learning applications are also welcome. As part of the programme, EPCC will port the most heavily used Catalyst training ARCHER and Cirrus packages to From 3–4 December, we will run a the Catalyst system and make them PRACE Advanced Training Centre available as modules, so that it’s course (free to attend for all) that easy for users to explore the new will teach participants about the platform. Catalyst system architecture and Hewlett Packard Enterprise Our initial focus will be on making software stack, how to compile www.hpe.com as many applications available as codes for Arm, as well as possible. As documentation is a key performance hints and tips. SUSE component of driving adoption, we www.suse.com The installation of our Catalyst will create detailed build process system at EPCC’s Advanced UK Industrial Strategy documentation as part of our Computing Facility is imminent, and www.gov.uk/government/ porting activities and contribute to we are eagerly looking forward to topical-events/the-uks- the Arm HPC community on GitLab. exploring the potential of this industrial-strategy It is expected that the initial porting Arm-based HPC system! The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 5
Industry projects round up EPCC is working with satellite technology provider Orbital Micro Systems in the International Centre for Earth Data which will improve forecasts for sectors including shipping. 2018 continues to be a busy year for EPCC in terms of industrial © iStock.com/wissanu01 collaborations. Following our move to the new Bayes Centre, we look forward to further exciting industrial collaborations. Paywizard The ICED will enable near real-time Thomas Blyth, EPCC, monitoring and improved forecasts t.blyth@epcc.ed.ac.uk We recently completed a successful for sectors such as insurance, project with Paywizard, the pay-TV agriculture, aviation, and shipping. subscription, billing, and customer EPCC is providing data analytics relationship management specialist, expertise in conjunction with the to drive the development of new AI-driven capabilities within its latest in high performance HPC access computing (HPC) research to subscriber intelligence platform. manage such large data sets. HPC programmes Our data science specialists is crucial when dealing with the enhanced existing machine learning EPCC’s ‘Accelerator’ and on- petabyte levels of data required for capabilities within the Paywizard demand access programmes for this type of satellite-based platform and developed further industrial users of our world-class modelling. predictive modelling capabilities. HPC infrastructure continue to This multi-disciplinary project thrive. We have ongoing A resultant AI product, Paywizard involving EPCC and the University’s collaborations with major players Singula, went on to win a ‘Best of Schools of Geosciences and such as Shearwater (advanced Show’ TV technology award at IBC, Informatics is a great example of ocean modelling) and Screen the industry’s largest European the projects that will be at the LASSE (simulation support for event. forefront of the Bayes Centre advanced semiconductor device Orbital Micro Systems agenda. manufacture). A collaboration with EPCC is working closely with Sustainably ENGYS has created a cost satellite technology provider Orbital effective, pay-per-use, simulation- We have also just kicked off a as-a-service business model Micro Systems (OMS) as part of the project with the fast-growing fintech offering access to advanced CFD newly formed International Centre start-up, Sustainably. EPCC is modelling based on an for Earth Data (ICED) where a providing software and data implementation of ENGYS’ HELYX satellite-based system is being architecture expertise to support product. developed that will vastly improve the company’s rapid growth plans monitoring and forecasting of as it looks to scale its offering extreme weather and natural across the open banking network. disasters anywhere in the world. DJ Alexander The programme will capture and analyse data from OMS’ planned In a similar vein, EPCC is Contact our business team constellation of 40 satellites – each supporting the Edinburgh-based to find out more. about the size of a large shoebox. property management company DJ These missions will include the first Alexander, which will launch an George Graham: launch of the recently announced innovative online platform in 2019. g.graham@epcc.ed.ac.uk UK spaceflight programme, using We are providing software Thomas Blyth: the planned spaceport on engineering expertise to its growing t.blyth@epcc.ed.ac.uk Scotland’s north coast. development team. 6
Data-driven innovation for business © iStock.com/enot-polosku There is a lot of hype around big data and big computing for business, but it is undeniable that the influence of data-driven innovation will be profound. The expertise and support available projects for industry, academia or Thomas Blyth and Mark Sawyer, in Scotland has created a massive both, and – by bringing together EPCC opportunity for our engineering and regional, national and international t.blyth@epcc.ed.ac.uk manufacturing sectors and, with the datasets – facilitating new products, m.sawyer@epcc.ed.ac.uk launch of the £500m Data-Driven services, and research. Innovation (DDI) strand of the The WCDI’s high-resiliency data and To understand the potential impact Edinburgh and South-East Scotland computing facilities will support of data-driven innovation, consider City Region Deal, this is an exciting work with complex, high volume, the case of a manufacturing time for exploring how technology real-time datasets from across the production line, running 24 hours a can benefit business. City Region and beyond. We are day: an unexpected breakdown will Industry can gain huge benefits by already seeing demand from a wide be extremely costly. combining data science expertise, range of sectors including fintech HPC hardware, and readily- and other financial services, space A modern production line generates available software and data and satellite, data analytics, and a huge amount of data from fault- analytics tools. HPC enables data tech start-ups. The establishment of detecting sensors. If machine scientists to manage, process and this data hub and the production of learning could predict faults before work with extremely large and new applications will in turn lead to they occur, the number of times the complex datasets, which in turn new companies. line breaks down would be allows businesses to develop new dramatically reduced, leading to EPCC collaborates with companies massive savings. This kind of products and revenue streams. of all sizes to tackle real-life application, where computers make The Bayes Centre in Edinburgh is problems or enhance business predictions based on meaningful home to a community of world- processes, and the direct results patterns in data sets, will increase leading data science and artificial can include gaining a competitive in importance as the amount of data intelligence teams, including EPCC, advantage, reducing costs, or grows. and it is set to play a key role in improving operational or research delivering the DDI programme. and development processes. Here Central to the programme is a new at EPCC we see the WCDI as a facility for the secure hosting and unique opportunity for companies Contact our business team analysis of huge and varied to adopt data-driven innovation. It to find out more. datasets. This £70m investment in will offer state-of-the-art data and the World-Class Data Infrastructure compute infrastructure, supported George Graham: (WCDI) will be fundamental in by data analytics and modelling g.graham@epcc.ed.ac.uk positioning the City Region as data skills from across the University of Thomas Blyth: capital of Europe, acting as an Edinburgh and the wider region. t.blyth@epcc.ed.ac.uk enabler for many data science The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 7
Fortissimo: a boost for Swedish SME Koenigsegg develops and produces high-performance, limited-edition motor vehicles. Its use of HPC-based simulation for the production European business of a new car configuration led to a 30% saving in design costs, a reduction of 50% in wind tunnel and physical testing, a 60% saving in prototyping costs, and a 30% shortening of time to market. December 2018 marks the end of the Fortissimo project. EPCC has been collaborating in the programme since 2013, and during this time 92 business experiments have been carried out with over 100 small European companies, many of which had no previous experience of using HPC. The overall purpose of Fortissimo whether this is related directly to Mark Sawyer was to demonstrate that the HPC or associated with the specific m.sawyer@epcc.ed.ac.uk benefits which the cloud has business in question. The brought to enterprise computing Fortissimo approach was to bring can be replicated in engineering together the necessary expertise in fields such as modelling and small partnerships that could then simulation, and also in high- focus on a specific business performance data-analytics. Cloud problem. There was a stringent computing reduces an enterprise’s selection process for these The Fortissimo cost and complexity by removing experiments, with a series of the need for it to buy and operate competitive calls for proposals. We experiments came from its own computing equipment. learned that there is great demand a diverse range of Instead it can access services to trial HPC in this way, with the industry sectors. You provided by third parties, allowing it calls being oversubscribed almost to focus on its core expertise. ten-fold. can read about each Fortissimo has shown that similar Fortissimo Marketplace experiment on the benefits can be obtained by offering Fortissimo website: The Fortissimo project led directly high performance computing (HPC) to the creation of the Fortissimo www.fortissimo-project. in this way. Most small companies cannot afford the high entry cost of Marketplace, which offers services eu/success-stories. developed in the experiments and acquiring and operating an HPC by HPC centres and other system, even though it is a proven providers. This Marketplace will technology. By providing access to enable companies to access HPC systems as a service, advanced modelling, simulation and Fortissimo has allowed companies data-analytics services using a to trial engineering techniques that pay-per-use model, with the were previously out of reach. ultimate aim of improving the Fortissimo Marketplace Access to resources is, however, competitiveness of the European only part of the solution. Another economy. www.fortissimo-project.eu/ vital component is expertise, 8
VESTEC: saving the world one byte at a time © iStock.com/JPhilipson With jobs submitted to a batch system, supercomputing has traditionally been centred around an offline, non-interactive approach to running codes such as simulations. However, it is our belief that there is great potential in fusing HPC with real-time data for use as part of urgent decision-making processes in response to natural disasters and crises. It is not just HPC that has benefited disruption cause by geomagnetic Nick Brown, EPCC from phenomenal developments in storms), and mosquito-borne n.brown@epcc.ed.ac.uk hardware: our ability to physically diseases. Not only do these areas collect data, for example via entail risk to life, they also have a high-velocity sensors, has also significant economic impact undergone a revolution in recent (estimated at billions of dollars a years. year). We hope that by combining traditional simulations with high- Until now, the role that HPC can velocity sensor data, it will be play in complementing this and possible to make correlations turning data streaming into valuable between simulations and collateral has been overlooked. But observations such that much more this is not a simple job and entails precise and reliable predictions can Image shows Wildfire Analyst, one of the much more than hooking up some be generated. These would feed VESTEC partners and three project use-cases. data sources to HPC machines. It models the spread of forest fires, and is used into disaster recovery or even to assist in disaster planning and mitigation. Instead, to fuse HPC with real-time prevention. The VESTEC project will develop infrastructure data, a large number of challenges which enables this application to be fed with real need to be tackled, from the low At EPCC we are leading the work- time sensor data and run ensemble fashion on supercomputers. The end result will be a step level software stack up to package on interactive change in capability for disaster recovery teams, interactive visualisation tools. supercomputing. This is especially where they can much more accurately advise fire fighting teams on the ground to contain the fire interesting for us as it combines our The VESTEC project will tackle and ultimately save lives. expertise in traditional HPC with these challenges to make HPC that of data, to challenge some of more interactive and capable of the assumptions that the current processing raw data arriving in real generation of HPC machines are time, so creating a tool for use in built upon. I think it is going to be Meet us at SC’18 urgent decision-making. It is our fascinating to see how, over the hypothesis that combining HPC Our BoF at SC18, “HPC meets next three years of the project, the computational models with real- Real-Time Data: Interactive technology and techniques that we time data will significantly aid in Supercomputing for Urgent will develop as part of VESTEC urgent decision-making, ultimately Decision Making” will run contribute to solving the challenges saving lives and reducing economic between 12:15 and 13:15 on we have identified and their loss. Thursday 15th November. If resulting wider societal impact. VESTEC is focusing on three use VESTEC is funded by the EU’s Horizon2020 you are going to SC, it would cases: forest fires, the impact of programme. It started in September 2018 and will be great to see you there! run for three years. The project has eleven partners, space weather (specifically the each with a different area of expertise. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 9
Your data is secure with EPCC! The UK Research Data Facility, hosted by EPCC. Image: Craig Manzi. EPCC has recently been certified for the ISO 27001 Information Security standard for all the HPC and Data Services that we run, including ARCHER, Cirrus, the RDF, Farr National Safe Haven and Tesseract. Here at EPCC we aim to be a leader datasets hosted and managed by Anne Whiting, EPCC in the secure hosting and EPCC. Key to the success of EPCC a.whiting@epcc.ed.ac.uk management of huge and varied in providing data services is trust datasets to support data from its customers that it provides research. For example we host and best practice in information security manage Safe Havens on behalf of and data handling. ISO 27001 the Farr Institute and Scottish certification introduces a framework Genome Partnership, with a Safe to deliver best practice and to Haven for the Alan Turing Institute demonstrate this achievement to under development. our customer and user base. A Safe Haven is a secure ISO 27001 is a specification for an environment in which data is linked information security management BUSINESS ASSURANCE MANAGEMENT SYSTEM CERTIFICATION and accessed. It provides a high system (ISMS). An ISMS is a powered computing service, secure framework of policies and About ISO 27001 A value to communicate Guidelines with year September 2016 analytic environment, secure file procedures that includes all legal, ISO 27001 requires management SAFER, SMARTER, GREENER transfer protocol for receipt of data, physical and technical controls to: and provision of a range of analytic involved in an organisation’s Systematically examine the software. Safe havens allow data information risk management organisation’s information security from electronic records to be used processes. risks, taking account of the threats, to support research when it is not EPCC has designed an ISMS to vulnerabilities, and impacts; practicable to obtain individual provide services and systems to patient or subject consent, and Design and implement a coherent meet the terms of the relevant protect patient or subject identity and comprehensive suite of contracts and agreements with and privacy. Data from different information security controls and/or respect to confidentiality, integrity, sources can be linked to answer other forms of risk treatment (such accessibility and availability. It has specific research questions, subject as risk avoidance or risk transfer) to also been designed to meet the to the required information address those risks that are information security risk appetite of governance. deemed unacceptable; and its stakeholders. With ISO 27001 The University of Edinburgh is set to and ISO 9001 Quality Management Adopt an overarching management play a key role in the Edinburgh and certifications, we are confident that process to ensure that the South East Scotland City Region we have the processes and information security controls Deal, delivering the deal’s Data- information security framework continue to meet the organisation’s Driven Innovation programme. which deliver best practice services information security needs on an Underpinning new data innovation to our customers and provide a ongoing basis. hubs across the University will be mechanism to continually improve an exciting new facility for the our services to meet customer and secure and trustworthy hosting and user requirements. analysis of huge and varied 10
Scottish Administrative Data Research Partnership © iStock.com/Rawpixel EPCC has received funding via the Economic and Social Research Council (ESRC) to continue its work with the Scottish Administrative Data Research Partnership (S-ADRP). The aim of the partnership is to ensure that legal and ethical Mark Sawyer, EPCC enable research that leads to policy practices (such as the removal of m.sawyer@epcc.ed.ac.uk decisions that will in turn will help data that enables identification of Scotland progress towards the an individual) are followed. vision outlined in the National The technical infrastructure that will Performance Framework. This support this is a Safe Haven framework helps to shape high level operated by EPCC, which protects research priorities for Scottish patient identity and privacy while Government, including tackling allowing data from electronic poverty, providing quality jobs and records to be used to support fair work for all, and ensuring that research when it is not practicable we live in inclusive, empowered, to obtain individual patient consent. resilient and safe communities. S-ADRP consists of a number of The partners are currently Strategic Impact Programmes investigating ways to link the data (SIPs) each dealing with a research derived from various databases so priority. that it can be used by researchers working in the SIPs. This linked data Underpinning all this is data. must provide the information EPCC’s role in the partnership will needed by the researchers to derive be to help provide research-ready useful findings, but must also data from the numerous sources preserve the privacy of individuals that will be needed to support according to the governance policy-making decisions. These policies. sources include health and educational records, police and In addition to this is the task of judicial databases, census data and making the data ready for research. emergency services data. Of This includes ‘cleaning’ the data (for course, there are governance issues example identifying and dealing related to the use of this data, as it with records containing data that is Scottish Administrative comes from many sources and clearly out of range) and, since the Data Research Partnership contains sensitive information. The databases will be large, optimising https://adrn.ac.uk/about/ framework that will allow them to support the types of query network/scotland/ researchers to use this data must that the researchers will generate. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 11
Analysing humanities data using Plot of proportion of books mentioning ‘cholera’ against publication date. Cray Urika-GX In our role as members of the Research Engineering Group of the Alan Turing Institute, we have been working with Melissa Terras, University of Edinburgh’s College of Arts, Humanities and Social Sciences (CAHSS), and Raquel Alegre, Research IT Services, University College London (UCL), to explore text analysis of humanities data. Our collaboration aimed to run text access the DataStore via the mount analysis codes developed by UCL points. Also, by moving the data to upon data used by CAAHSS to Lustre, we minimised the need for exercise the data access, transfer data movement and network and analysis services of the Turing transfer during analysis. Institute’s deployment of a Cray To exercise Urika’s data analytics Urika-GX system. capabilities, we ran two text The Cray Urika GX We used two data sets of interest to analysis codes, one for each CAHSS and hosted within the collection, which were initially system is a high- University of Edinburgh’s DataStore: developed by UCL with the British performance analytics British Library newspapers data Library. cluster with a pre- (around 1tb of digitised newspapers from the 18th–20th centuries), and UCL’s code for analysing the integrated stack of newspapers data is written in British Library books data (around Python and runs queries via the popular analytics 224gb of compressed digitised Apache Spark framework. A range packages, including books from the 16th–19th centuries). Both are collections of of queries are supported eg count Apache Spark, Apache the number of articles per year, XML documents, but have been count the frequencies of a given list Hadoop, Jupyter organised differently and conform to of words, find expressions matching notebooks and different XML schemas, so affecting how the data can be queried. a pattern. complemented by To access both data sets from UCL’s code for analysing the books frameworks to develop within Urika, we mounted the data is also written in Python and data analytics runs queries via mpi4py, a wrapper DataStore directories into our home for the message-passing interface applications in Python, directories on Urika using SSHFS. We then copied the data into Urika’s (MPI) for parallel programming. Scala, R and Java. However, work had been started on own Lustre file system. We did this migrating some of these queries to because, unlike Urika’s login nodes, use Spark. A range of queries are Urika’s compute nodes have no supported eg count the total network access and so cannot number of pages across all books, 12
count the frequencies of a given list how these occurrences change over Rosa Filgueira and Mike Jackson, of words, etc. This code is time. Taken together, one can EPCC complemented with a set of Jupyter examine the extent to which r. filgueira@epcc.ed.ac.uk notebooks to visualise query results references to specific diseases m.jackson@epcc.ed.ac.uk and to perform further analyses. change in literature over time. To run the codes within Urika we We compared the results of running needed to modify them both to run the modified code on Urika to the without any dependence on UCL’s original results and they were local environment, and instead generally consistent but with some access data located within Lustre. anomalies which we identified as As a result, the modified arising from data missing from the Our updated codes and newspapers code now allows the books data set held within documentation on how to run location of XML documents to be DataStore, which has been reported them are publicly available on specified using either URLs or back to Melissa. GitHub. absolute file paths. The modified Urika is designed with the use of • Newspaper code: books code now runs its MPI-based Spark in mind and Spark is well- http://bit.ly/2EQrYvS queries via Urika’s Apache Mesos suited for this form of text analysis. resource manager. • Books code (Spark version): Migrating the mpi4py books queries For the books data, Melissa to Spark would be a good area for http://bit.ly/2qeCwL7 suggested that we try to reproduce future work, combining this with the • Books code (MPI version): the results from her Jisc Research newspapers code which already http://bit.ly/2ES63EJ Data Spring 2015 project at UCL. uses Spark and can handle several This project developed queries to XML Schemas. This would then • Jupyter notebook for books: search for the names of thirteen yield a single code, with a common http://bit.ly/2JmcIoZ diseases (eg “cholera”, underlying data model, that could “tuberculosis” etc) and return the run queries across both the Research Engineering Group of total number of occurrences of each newspapers and books data. the Alan Turing Institute name, and to then to normalise the This work was funded by Scottish Enterprise as www.turing.ac.uk/research/ results by the number of books, part of the Alan Turing Institute-Scottish Enterprise research-engineering pages and words per year to see Data Engineering Programme. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 13
Left: Students of the Class of 2019 during Induction Week. Image: Mario Antonioletti. Above: Part of the MSc students’ new communal work area in the Bayes Centre. New beginnings Image: Mark Reynolds. An update on our MSc programmes in High Performance Computing (HPC) and HPC with Data Science. Every new academic year is filled programmes and their new city, and Ben Morse, EPCC with new beginnings, and 2018/19 also staff, who were acclimatising b.morse@epcc.ed.ac.uk is certainly not short of them with: to teaching in the new location of the University’s Central Area. EPCC • 36 students commencing their has always welcomed students into Farewell to the studies on the MSc programmes • EPCC’s first year in its new home the heart of the centre’s community class of 2018 and the design of our space in the The class of 2018 graduated in in the Bayes Centre in the Bayes Centre emphasises this, with November, with 20 students University’s Central Area our MSc and PhD students having completing the programme. We • Increased cooperation with the study space in the building and School of Informatics wish them well in their next sharing facilities with staff. endeavours, which range from • Four new courses in the MSc PhDs within the University to Curriculum changes programmes and four new careers with multinational Personal Tutors As well as enjoying new facilities, companies. One of the class of • A scholarship and industrial the Class of 2019 will be our first 2018 has already become a dissertation project with the cohort to take the programmes’ Teaching Assistant on the Registers of Scotland to celebrate newly streamlined Degree programme, while two members of its 400th anniversary Programme Table and four new TeamEPCC at this year’s ISC optional courses (see EPCC News Student Cluster Competition have Welcoming the Class of 2019 83 for details). returned to talk to the current class The first cohort of EPCC students Staffing Update about their exploits and encourage to call the Bayes Centre ‘home’ is those who might follow in their our Class of 2019: 36 students of The increased size of the MSc footsteps at ISC 2019. 16 nationalities with backgrounds programmes necessitated a larger as diverse as biology, electrical student support team and we are engineering, economics, computer delighted to welcome Arno Proeme, Read about our MSc science, and physics. Darren White, Weronika Filinger (an programmes, including our MSc in HPC alumna), and Jane industry-based projects, on the Induction Week proved to be a new Kennedy (a second-generation experience for students, as they EPCC website: member of EPCC staff) as Personal www.epcc.ed.ac.uk/msc familiarised themselves with the Tutors. 14
Online learning at EPCC Our online learning options allow you to study with us wherever you are in the world. Supercomputing MOOC service. The final week consolidates David Henty and Jane Kennedy, learning by going through several EPCC In Summer 2017, EPCC in HPC case studies. In particular, the d.henty@epcc.ed.ac.uk collaboration with SURFsara in the topic of Quantum Computing j.kennedy@epcc.ed.ac.uk The Netherlands launched the first captured the imagination of this Supercomputing MOOC on the cohort! FutureLearn platform. Massively open online courses Technical online courses (MOOCs) are free courses delivered EPCC is also starting to deliver over a relatively short period. This, more technical courses online. plus the mode of delivery through Complementing the long-running videos, articles, and discussion series of ARCHER Virtual Tutorials forums, means they can be run (usually held at 3pm on the second even with a very large audience, Wednesday of each month), we (hence ‘massive’). The number of now run parallel programming participants for the latest run of our courses using the Blackboard Image: FutureLearn Supercomputing MOOC was Collaborate webinar system to smaller than for previous iterations, deliver live audio and video, and however we still had over 900 enable interaction with remote Our free 5-week Supercomputing people register in 95 countries. attendees via chat sessions. MOOC is designed for There is no programming required In 2018 we ran both the MPI and anyone interested in leading- or involved in the course. Instead OpenMP courses online, with edge computing technology. concepts such as message passing attendees given access to EPCC’s Registrations are open for and shared memory are explained Tier2 HPC system, Cirrus, for the next run (start date to be using simple analogies that are practical sessions. Sessions are confirmed). See: easier for non-technical learners to typically run over four consecutive www.futurelearn.com/courses/ visualise and understand. For Wednesday afternoons with around supercomputing example, Amdahl’s Law is explained two hours of lecture content per Our MPI and OpenMP courses by looking at the speed-up when week. The format is designed to will run again in 2019. See the travelling via Concorde versus a give attendees time to attempt the ARCHER training pages for jumbo jet over varying distances but hands-on practical exercises in the information, also for details of with fixed times for check-in, travel gap between each session, and we upcoming face-to-face training to and from the airport, and so on. spend time reviewing the previous evens and virtual tutorials: exercises at the start of each week. www.archer.ac.uk/training/ In addition to parallel programming, The ability to easily share the our Supercomputing MOOC also Read about all our online presenter’s screen means that we covers HPC hardware, from the offerings at: can do live coding sessions to basics of computer hardware to www.epcc.ed.ac.uk/online- explain key points from the examining a blade from HECToR, learning exercises or to answer questions. the UK’s previous supercomputing The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 15
SpiNNaker arrives at EPCC The SpiNNaker neuromorphic high-performance computing Alan Stokes and the SpiNNaker machine at platform, which aims to run 1% of the human brain in real time, EPCC in the Bayes Centre. Alan will be at the Bayes for three weeks each month. arrived at EPCC this year. SpiNNaker is a novel hardware platform due to its massive parallelism, multi-cast communication fabric and low power design. The SpiNNaker communication from the get go. This all makes Alan Stokes, School of Computing fabric takes a very different SpiNNaker a very interesting Science, University of Manchester approach to normal HPC systems. environment to work with. stokesa6@cs.man.ac.uk It is designed to be efficient in SpiNNaker at the Bayes sending small data packets very quickly to many places, instead of The SpiNNaker machine directly the standard approach of large available to users in the Bayes is a blocks of data to be sent to a few three-board toroid, which provides places periodically. With HPC the user with up to 2304 Arm cores systems becoming more multi-core to work with as shown in the image systems, a platform such as here. SpiNNaker is in an interesting Any user who requires more position to explore massive parallel resources can get access to the half designs. million cores currently available As each SpiNNaker chip is benched from Manchester University. This at 1W, the entire one-million core machine will have grown to one machine utilises approx. 140kw of million cores by the end of the year. power, when taking into account Current plans for use of the the cooling and the FPGAs that SpiNNaker platform at the Bayes each board uses to create the Find out more include supporting research into torus-shaped communication To learn about SpiNNaker or executing MicroPython in a fabric. This is a magnitude lower how to utilise the platform distributed fashion, as well as power than HPC systems for the speak to me directly, or supporting the standard users in same processor capability, and is a contact the SpiNNaker team computational neuroscience. Please step towards future HPC systems, through the user mailing list: keep an eye out for Masters which need to reduce the power https://groups.google.com/ projects which will use the used to operate them in today’s forum/#!forum/spinnakerusers SpiNNaker platform for a multitude culture. Applications that work of uses. http://apt.cs.manchester.ac.uk/ efficiently on the SpiNNaker machine are likely to also work well [1] Full-scale simulation of a cortical microcircuit on projects/SpiNNaker/ SpiNNaker. when ported back to traditional http://apt.cs.manchester. [2] ATIS + SpiNNaker: a Fully Event-based Visual HPC systems, as SpiNNaker forces Tracking Demonstration. ac.uk/ftp/pub/apt/papers/ the user to think about their [3] https://groups.google.com/forum/#!forum/ LAP_IEEEDandT_07.pdf problem in a distributed fashion spinnakerusers [4] http://www.cs.man.ac.uk/~stokesa6/ 16
On the frontline of energy-efficient computing One of the ACF’s roof-mounted dry air cooling towers. The Advanced Computing Facility (ACF) on the outskirts of Edinburgh is the high performance computing data centre of EPCC. Built in the 1970s and operated by Manager Calum Muir, I also have Paul Clark, EPCC EPCC since the turn of the overall responsibility for the HPC p.clark@epcc.ed.ac.uk millennium, the ACF site has had Systems team who design, build, significant investment over the commission, host and support a years. At present, there are three number of HPC systems within the Computer Rooms, imaginatively ACF, including the National Tier 1 called: Computer Room 1 (CR1), and Tier 2 systems, ARCHER and Computer Room 2 (CR2), and Cirrus, to name a few. Computer Room 3 (CR3). We have the experience and skills Each hosts specific HPC equipment to help continue to deliver future and is supported by associated systems and projects, and relish Any data centre is an plant rooms which provide working at the bleeding edge of exercise in dedicated power and cooling technology and innovation. thermodynamics – a infrastructure for each room. We have several pieces of equipment As an example (as EPCC Director computer turns and racks at our site which are Mark Parsons explained in EPCC electrical energy into News 83), the University of “traditionally” air cooled, but the Edinburgh and EPCC will play a heat energy as it runs, majority of our equipment and speciality high-performance major role in the Edinburgh and which in turn has to be computers are water cooled. South East Scotland City Region managed and cooled. Deal by delivering the deal’s Data- Yes. Water and electricity! Driven Innovation Programme. By utilising water to Thanks to the Scottish climate, for At the heart of this programme is help with the cooling much of the year the ACF benefits the World-Class Data Infrastructure process of the from something called free cooling. (WCDI). This is the development machines, which is The water which supports our and delivery of a new, state-of-the- cooling infrastructure is pumped to art computer room at the ACF to significantly more our roof-mounted dry air cooling accommodate the underpinning effective than air alone, towers and back again, allowing the infrastructure for the City Region the ACF is an extremely (cold!) outside temperature to cool Deal. our water as it passes through efficient data centre. This will be called, you’ve guessed them. It is only on extremely warm it, Computer Room 4 (CR4)! summer days when there is no free cooling and we need to use our Future editions of EPCC News will large-scale mechanical chillers to provide updates to this exciting cool the water circuits. project and the development of our future masterplan. Alongside the management of the data centre with our Data Centre The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 17
Making complex machines easier to use efficiently © iStock.com/matejmo Supercomputers are getting more complex. Faster components would be impossible to cool but, by doing more with less, we can still solve bigger problems faster than ever before. Hardware designers are trying to give high performance but avoid Dan Holmes, EPCC innovative designs and novel requiring the user to know about, or d.holmes@epcc.ed.ac.uk hardware options such as the new deal with, the intimate details of tensor unit in Volta GPUs, compute- each piece of novel hardware in in-network capabilities, and several each machine. Other partners will new technologies for memory – investigate how MPI and GPI can HBM, NVM, storage class memory be used directly on GPUs and and others. Future supercomputers FPGAs. will combine all of these specialised In collaboration with other partners, hardware components to create EPCC will also help to define general-purpose computing suitable abstractions for memory resources, but how much of this usage, and create a unified interface complexity should be exposed to, applicable to all memory and and controlled by, programmers? storage devices. The intention is What will that new functionality look that this will make programming like? How can we get high easier because code will be more performance of applications and portable, even between hardware high efficiency of resource usage? devices and components with The vision of the EPiGRAM-HS vastly different capabilities. project is to enable extreme scale A further technical challenge in the applications on heterogeneous project is how to integrate novel hardware. This means figuring out compute hardware such as FPGAs. how to use new hardware This is partly a scheduling problem: capabilities and how to combine on which compute device should different components to get the each piece of code execute? And best result. Together we will look at partly it is an API design choice: the challenge from four directions: how should novel compute network, memory, compute, and capability be exposed to applications. programmers? EPCC will focus on exploiting The EPiGRAM-HS project is part All of the work at the programming heterogeneity for high performance of the EU’s Horizon 2020. The model level and on prototype communication, building on proven project partners are: KTH, ETH, implementations will be validated programming models. We will use EPCC, Fraunhofer, Cray, and using real applications. The partners the newly standardised interface for ECMWF. have expertise in traditional HPC persistent collective operations in applications like Nek5000, iPIC3D, To stay up to date with the MPI to implement efficient high- and IFS, and also in data science project, please subscribe to our level communication patterns. Here applications like lung cancer quarterly newsletter by visiting the aim is to hide as much of the detection using TensorFlow. The our website: hardware complexity as possible project will also push for changes to and instead give the user access to https://epigram-hs.eu international standards to support a high-level abstraction. The goal is heterogeneous systems. 18
eCSE programme: funding software development for UK computational science Hull University’s VOX-FE is a bespoke bone modelling software tool for in silico experiments such as testing bone growth under stressed conditions. The eCSE programme enabled EPCC to work with Richard Fagan at Hull to The embedded Computational Science and Engineering (eCSE) greatly improve VOX-FE’s modelling capabilities programme has allocated funding to the UK computational science on machines such as ARCHER. community over a period of six years. Integral to ARCHER, the National HPC Service, there has been a series of regular eCSE Calls to fund software development activities. The last of the Calls has now closed science skills base and provide Lorna Smith, EPCC and all funding has been allocated. expert assistance and high quality l.smith@epcc.ed.ac.uk Although a number of projects are RSE work embedded within still on-going, this seems a good research communities across the time to review the benefits of the UK. As the map (right) shows, programme and to see whether its technical staff have been spread all aims have been met. across the UK. The programme aimed to: Enhancing the quality, quantity and range of science produced on the • Enhance the quality, quantity and ARCHER service is obviously core range of science produced on the to the programme. Scientific output ARCHER service through and impact will continue to be improved software. delivered throughout the lifetime of • Develop the computational the ARCHER service and beyond, science skills base, and provide hence the full scientific benefit of expert assistance embedded the programme will not be known within research communities, and realised for some time. across the UK. However one metric we can measure is the financial saving • Provide an enhanced and achieved from a number of projects. sustainable set of HPC software Many projects involved for UK science. performance optimisation, resulting The location of the technical members of staff on all To achieve this we set out to in a reduction in CPU utilisation and eCSe projects, with darker colours representing a greater number of people. provide a high quality, fair and a related financial saving. This objective eCSE selection process, saving is re-invested to allow delivering maximum value to the scientists to achieve more science community. Selection was made by from the same resource allocation. a series of independent panel Carrying out these measurements is members, we had regular Calls, and tricky, but helps demonstrate the the programme was not-for-profit, with all funds being spent on value of investment in software Embedded CSE webpages development and research software projects. In total, 46 institutions For further information and engineering. Overall the eCSE were involved across 100 eCSE project reports, see: programme cost around £6m and to projects. date we have seen a reported www.archer.ac.uk/community/ A key outcome of the eCSE benefit of around £21m, a more eCSE/ programme relates to people. We than three times return on aim to develop the computational investment. The newsletter of EPCC, the supercomputing centre at the University of Edinburgh 19
Exploratory image analysis in cell biology © iStock.com/the-lightwriter PickCells is an image analysis platform developed by the Centre for Regenerative Medicine (CRM) at The University of Edinburgh. It combines generic image analysis algorithms, visualisation modules and data mining functionality within a stand-alone Java application. PickCells provides a graphical developed using the Hugo static Mike Jackson, EPCC environment within which biologists website generator to render content, m.jackson@epcc.ed.ac.uk can study multidimensional and GitLab’s continuous integration biological images and explore 3D functionality, GitLab CI, is used to spatial relationships between trigger rebuilds of the website in objects within complex biological response to any changes in systems such as stem cell niches, documentation held within any of organoids, and embryos. the source code repositories. Since January 2018, EPCC has During our collaboration, Sally been working with CRM on the secured a Wellcome Trust development of the platform and its “enrichment” award to support supporting resources. further collaboration on PickCells. This next phase of work, which My EPCC colleagues Elena began in September, seeks to Breitmoser, Arno Proeme and improve the usability of PickCells for myself worked with CRM’s Sally biologists who wish to extract Lowell and Guillaume Blin to take information from their imaging data. PickCells into a state suitable for more widespread promotion with Our objectives are to: the intention of encouraging deeper •d evelop an in-application community engagement by users interactive assistant to help users and developers. We focused on get started with analyses using creating a website for users, PickCells filter information developers and contributors and providing consultancy on •p roduce video tutorials The PickCells website is still in developing and supporting open development, but please feel •u ndertake a usability evaluation of source software. free to visit it at: PickCells PickCells is highly-modular, and the https://pickcellslab.frama.io/ •u pdate PickCells and its documentation for each component docs/ supporting resources based upon is held within the associated source the outcomes of this usability Our work was funded by the code repository for that component. evaluation. Wellcome Trust Institutional These source code repositories are hosted by Framagit, a deployment We look forward to reporting on our Strategic Support Fund and The of GitLab for free open source experiences in a future post. Software Sustainability Institute. software. A website framework was 20
You can also read