Data science and AI in the age of COVID-19 - Reflections on the response of the UK's data science and AI community to the COVID-19 pandemic - The ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Data science and AI in the age of COVID-19 Reflections on the response of the UK’s data science and AI community to the COVID-19 pandemic
Executive summary Contents This report summarises the findings of a science activity during the pandemic. The Foreword 4 series of workshops carried out by The Alan message was clear: better data would Turing Institute, the UK’s national institute enable a better response. Preface 5 for data science and artificial intelligence (AI), in late 2020 following the 'AI and data Third, issues of inequality and exclusion related to data science and AI arose during 1. Spotlighting contributions from the data science 7 science in the age of COVID-19' conference. and AI community The aim of the workshops was to capture the pandemic. These included concerns the successes and challenges experienced about inadequate representation of minority + The Turing's response to COVID-19 10 by the UK’s data science and AI community groups in data, and low engagement with during the COVID-19 pandemic, and the these groups, which could bias research 2. Data access and standardisation 12 community’s suggestions for how these and policy decisions. Workshop participants challenges might be tackled. also raised concerns about inequalities 3. Inequality and exclusion 14 in the ease with which researchers could Four key themes emerged from the access data, and about a lack of diversity 4. Communication 17 workshops. within the research community (and the potential impacts of this). 5. Conclusions 20 First, the community made many contributions to the UK’s response to Fourth, communication difficulties Appendix A. Workshop participants and Organising 22 the pandemic, via national organisations, surfaced. While there have been excellent Committee research institutes and the healthcare examples of science communication sector. Researchers responded to the crisis throughout the pandemic, participants Appendix B. Workshop themes and reports 24 with ingenuity and determination, and the highlighted the challenges of result was a range of new projects and communicating research findings and collaborations that informed the pandemic uncertainties to policy makers and the response and opened up new areas for public in a timely, accurate and clear future study. manner. Second, the single most consistent In this report, we outline the workshop message across the workshops was the participants’ reflections and suggestions importance – and at times lack – of robust relating to each of these themes, with the and timely data. Problems around data aim of enabling the data science and AI availability, access and standardisation community to respond better to the ongoing spanned the entire spectrum of data pandemic, and future emergencies. Edited by Inken von Borzyskowski Anjali Mazumder Assistant Professor of AI and Justice and Human Political Science, University Rights Theme Lead, The Alan College London Turing Institute Bilal Mateen Michael Wooldridge Clinical Data Science Fellow, Turing Fellow and Programme The Alan Turing Institute; Co-Director for Artificial Clinical Technology Lead, Intelligence, The Alan Turing Wellcome Trust Institute Workshop theme leads Mark Briers (The Alan Turing Institute), Tao Cheng (UCL), Spiros Denaxas (UCL), John Dennis (University of Exeter), Sabina Leonelli (University of Exeter), David Leslie (The Alan Turing Institute), Marion Mafham (University of Oxford), Ed Manley (University of Leeds), Karyn Morrissey (University of Exeter), Deepak Parashar (University of Warwick), Jim Weatherall (AstraZeneca)
Foreword Preface The COVID-19 pandemic has seen scientific and non-experts – contain valuable reflections At the end of 2019, a new highly infectious pharmaceutical interventions. The response research move into public discourse in and suggestions for how the data science virus, now known as severe acute respiratory was remarkable for its breadth of engagement unparalleled ways. Across the scientific and AI community might prepare for future syndrome coronavirus 2 (SARS-CoV-2), across disciplines, as demonstrated by spectrum, researchers have stepped up, the emergencies. Indeed, the Turing has already was identified as the underlying pathogen the range of backgrounds of our workshop data science and AI community included, to begun a large-scale project1 which aims to for a series of unexplained pneumonias participants and the diverse set of insights that work alongside clinicians, policy makers and boost societal, governmental and economic (subsequently termed coronavirus disease they generated. the government at the heart of the response, resilience to shocks such as this pandemic. 2019, or COVID-19), clustered in Wuhan, directly impacting on our daily lives. China. By 30 January 2020, COVID-19 had Goals, origins and structure of the report Ahead of the G7 summit in the UK in June become so prevalent that the World Health Data science and AI is an inherently 2021, the leading scientific bodies of the G7 Organisation (WHO) declared it a Public Health This report was commissioned by The Alan interdisciplinary community, and our activities nations (the ‘Science 7’) recently published a Emergency of International Concern – the Turing Institute with the aim of reflecting on at The Alan Turing Institute in response to the call for more ‘data readiness’ in preparation WHO’s highest level of alarm. As we finish the UK's data science and AI response to the pandemic reflect this. Our researchers have for future health emergencies.2 This is a this report in spring 2021, the disease itself pandemic. The Turing is the UK’s national developed algorithms to monitor pedestrian timely amplification of the message in the has claimed over three million lives globally, institute for data science and AI, and partners density and ensure social distancing on the Turing’s report about the need for increased with more than 170 million confirmed cases, with many of the UK’s leading universities and streets of London; combined NHS datasets to data access and sharing, at a time when the and many more affected by the impacts of research centres to advance the country’s help answer clinical questions about the effects pandemic continues to have catastrophic lockdowns and the unprecedented disruption capacity and competitiveness in these areas, of COVID-19; explored what makes people impacts around the world. to the global economy. The UK has now been with the overall mission of changing the world vulnerable to health-related misinformation; through two waves of the virus, with infections, for the better. and improved the accuracy of the NHS My thanks to the editors for initiating and delivering this report, and to all the workshop hospitalisations and fatalities in the second The Turing’s goals in undertaking this work COVID-19 app. (See page 10 for more details of wave exceeding those in the first. our response.) theme leads and participants. We hope it were twofold: will be a valued contribution to the ongoing While pandemics appear to have occurred This report has been edited by four researchers discussions about the data science and AI 1. To capture the initiatives and resources that throughout human history, the COVID-19 have been developed by the data science and with backgrounds spanning AI, data science, response, alongside notable reports from the pandemic is unique in one important respect. public policy, human rights and medicine. They Centre for Data Ethics and Innovation,3 the Ada AI community during the pandemic. It is the first pandemic to occur in the age of have synthesised the views of 96 attendees to Lovelace Institute,4 the Royal Society’s DELVE data science and AI: the first pandemic in a 2. To gather the experiences and insights of a series of workshops held at the end of 2020, initiative5 and the Royal Statistical Society,6 world of deep learning, ubiquitous computing, this community during the pandemic – what which aimed to provide a snapshot of the uses among others. smartphones, wearable technology and social worked well, what didn’t, and how we as of data science and AI during the pandemic, media. It is thus unsurprising that governments a community could respond better to this and what we as a community can learn from The Turing looks forward to continuing its role to convene and deliver activities and across the globe, including the UK’s, looked pandemic and future emergencies. the experience. to data to inform their responses and help reflections in response to this pandemic, and in The work has its origins in a one-day While this represents a small part of what preparation for other crises. navigate challenges. The goal was to limit the spread of the disease and its medical, social conference 'AI and data science in the age will surely be a larger reflection exercise to of COVID-19,'7 which was held virtually by the come, the central findings – issues of data and economic consequences. As such, the Adrian Smith UK government stated that its policies were Turing on 24 November 2020 and featured talks access and quality; inequality among both the from some of the leading voices in the UK’s research community and wider society; and Institute Director and Chief Executive “guided by the science”, and later that ending The Alan Turing Institute lockdowns depended on “data, not dates”. response to COVID-19. The free event attracted communication difficulties between experts over 1,700 registrants from 35 countries, from In response, many members of the UK’s data academia, industry, the public sector and the science and AI community stepped forward, general public. spearheading initiatives that they hoped would assist the domestic and international response. A series of themed, virtual workshops followed These initiatives came from individual in November and December 2020. The academics, university research groups, the invitation to participate in these workshops healthcare sector, national institutes and was widely circulated within the UK academic others. They involved not just experts on community, via social media and the Turing’s virology and epidemiological modelling, but network of partners and affiliates. The Turing 1 also researchers studying, for example, the used a lightweight reviewing process to select social and economic consequences of non- the participants, who are listed in Appendix A. 1 1 https://www.turing.ac.uk/research/research-projects/shocks-and-resilience 2 https://royalsociety.org/-/media/about-us/international/g-science-statements/G7-data-for-international-health-emer- gencies-31-03-2021.pdf 3 https://www.gov.uk/government/publications/covid-19-repository-and-public-attitudes-retrospective 4 https://www.adalovelaceinstitute.org/summary/learning-data-lessons 7 https://www.turing.ac.uk/events/ai-and-data-science-age-covid-19 5 https://rs-delve.github.io/reports/2020/11/24/data-readiness-lessons-from-an-emergency.html 6 https://rss.org.uk/statistics-data-and-covid 4 5
1. Spotlighting contributions from the We took the view that, if the report was to and analysis were far more widespread. One have any credibility, it would be essential to possible reason for this, suggested by the include the views of as diverse a community CDEI report, is that COVID-19 was such a new as possible. The scope of ‘data science and AI’ for the purposes of this report is therefore phenomenon that there was a lack of data with which to train AI algorithms. data science and AI community deliberately broad, with participants including ethicists, clinicians, mathematicians, policy We also note that while we as editors have A key goal of the workshops was to document several healthcare technology companies. advisors, and many more. done our best to report the suggestions the contributions from the UK’s data science OpenSAFELY provides secure access to of workshop participants as accurately as and AI community in responding to the a database of over 58 million NHS patient The eight workshop themes are listed in possible, in some cases we have applied pandemic. In this chapter, we summarise the records, allowing researchers to answer urgent Appendix B. These themes are themselves a light editing. This especially applies to the contributions highlighted by the workshop clinical and public health questions related reflection of input from over 20 experts, who suggestions around data access in Chapter participants, which demonstrate some positive to COVID-19. Moreover, for the first time it formed the Organising Committee for the 2, where we have taken care to put the aspects of the data science and AI response: separates the development of the analysis conference and workshops. All workshops participants’ aspirations for more accessible increased data sharing, collaboration across code from the actual data (which never leave were led by the Centre for Facilitation8 together and open data in the context of the complex disciplines, and the development of new the servers of the electronic health record with one or two workshop-specific theme legal and ethical obligations surrounding datasets, repositories and initiatives that service provider or NHS Digital), and in doing leads who were selected by the Organising access to sensitive national health, economic mobilised COVID-19 research. so guarantees fully open and reproducible Committee to supplement the facilitators’ and social information. Data access is code. This is a noteworthy development that expertise with domain knowledge. For each clearly an important issue which needs to We emphasise that these spotlights are occurred largely due to the permissive nature workshop, the facilitators and theme leads be addressed, and there are many steps to nothing more than that: spotlights. The of the regulatory environment necessitated by summarised the discussions in a report, enabling responsible data flows. workshops provided a snapshot of the the pandemic. which have been lightly edited and linked from community’s response, but there were many Appendix B. These workshop reports provided One final and important word. Our report other contributions that were not covered in Another new initiative was the COVID-19 the ‘raw data’ for this report. Given the range is preliminary in the sense that we hope the workshops. An overview of The Alan Turing Genomics UK Consortium (COG-UK),13 and complexity of the workshops, it has not it will serve as a starting point for a more Institute’s key COVID-19 projects can be found which has sequenced more than 490,000 been possible to include all of the participants’ comprehensive and systematic review of on page 10 of this report, and further examples SARS-CoV-2 virus genomes to date, providing views and suggestions, but we as editors have the uses of data science and AI during the of the community’s response can be found important information on viral transmission and tried to draw out the main themes. pandemic. The report was originally conceived in resources such as the CDEI’s COVID-19 the emergence of new variants. COG-UK was in summer 2020, when daily reported COVID-19 respository.10 Finally, we acknowledge that achieved through the coordination of several Although our remit includes both data science fatalities in the UK were down to single figures, there may be some bias in this chapter, with organisations (NHS, the UK’s public health and AI – reflecting the Turing’s role as the and there was some hope that the virus might participants more likely to mention projects agencies, the Wellcome Sanger Institute and national institute for these two subjects – the be largely contained in a single wave. Work that they are affiliated with. various academic institutions). Having those majority of the feedback we received from on this report continued during the November sequences available from the earliest stages of workshop participants related specifically 2020 to spring 2021 lockdowns when the Noteworthy datasets and databases the pandemic was key to pushing forward the to data science, and our report reflects this. disease was again prevalent. In this sense, development of vaccines. This chimes with the 'COVID-19 repository many (but not all) of the observations made The UK government’s action to issue statutory and public attitudes' report9 published by the here relate to the first wave of the pandemic in regulators such as NHS Digital with a Control Existing systems and databases from ISARIC Centre for Data Ethics and Innovation (CDEI) the UK, from January to summer 2020, prior to of Patient Information (COPI) notice11 to (International Severe Acute Respiratory and in March 2021, which found that “conventional the onset of the UK’s second wave. We hope make confidential patient data available to emerging Infection Consortium)14 and ICNARC data analysis has been at the heart of the that the report will be of value to healthcare researchers and policy makers for COVID- (Intensive Care National Audit & Research COVID-19 response, not AI”. There were professionals, data science and AI researchers, 19-specific purposes had far-reaching Centre)15 were repurposed and expanded for certainly innovative applications of AI during and policy/decision makers as we continue to consequences, and participants noted the the pandemic, which kept costs down and the pandemic, but the evidence suggests that manage the ongoing pandemic. positive effect this had on data sharing. enabled the data science community to act more traditional methods of data collection more quickly. Similarly, linked data from GPs For example, it paved the way for a new and NHS trusts were available to researchers statistical analytics platform called through the pre-existing Discovery East OpenSAFELY,12 a collaboration between the London16 platform. Participants also University of Oxford’s DataLab, the London highlighted the utility of aggregate data on School of Hygiene & Tropical Medicine, and critical care patients collected by CHESS 1 10 https://www.gov.uk/government/publications/covid-19-repository-and-public-attitudes-retrospective 1 11 https://digital.nhs.uk/coronavirus/coronavirus-covid-19-response-information-governance-hub/control-of-patient-in- formation-copi-notice 12 https://opensafely.org 13 https://www.cogconsortium.uk 8 https://www.centreforfacilitation.co.uk 14 https://isaric.org 9 https://www.gov.uk/government/publications/covid-19-repository-and-public-attitudes-retrospective 15 https://www.icnarc.org 16 https://www.eastlondonhcp.nhs.uk/ourplans/discovery-east-london-case-study.htm 6 7
(COVID-19 Hospitalisation in England Participants also cited the DECOVID31 project, Other initiatives Workshop participants also noted positive Surveillance System),17 which was adapted which aimed to create a detailed, frequently examples of public engagement from the from the UK Severe Influenza Surveillance updated database of anonymised patient The Royal Society was highly active during data science and AI community, such as the System by Public Health England. health data during the pandemic. The project the pandemic, using its convening power contributions of Professor Christina Pagel and was initiated during the early stages of the to support the UK’s pandemic response Professor Devi Sridhar, in helping to improve Participants noted that a major breakthrough pandemic, and involved researchers from The through two initiatives. First, RAMP (Rapid understanding of the pandemic and policy in the data science community’s response to Alan Turing Institute and four other founding Assistance in Modelling the Pandemic)34 interventions. Participants also highlighted the pandemic was the opening up (and sharing) institutes. The initial funding, diverted from an brought together individuals with modelling the work by the Ada Lovelace Institute as a of various mobility datasets18 by commercial existing Turing grant from the Engineering and expertise from a diverse range of disciplines good example of how to effectively engage the providers, such as Cuebiq,19 Google,20 and Physical Sciences Research Council (EPSRC), to support the COVID-19 pandemic modelling public during lockdown. In May and June 2020 Facebook’s Data for Good.21 These enabled covered the transfer and combination of data effort. For example, workshop participants (during the first UK lockdown) the Ada Lovelace the detailed spatial analysis and large-scale from two NHS trusts, plus the data analysis highlighted a collaboration between RAMP’s Institute worked with Traverse,39 Involve,40 simulation of viral transmission that was planning. The work continues with in-kind Urban Analytics team and Improbable (a UK and Bang the Table41 to conduct a rapid online important for generating evidence for the contributions from researchers around the technology company), which developed a deliberation with 28 members of the public to government and the public on the potential UK, who are analysing the data (now covering ‘micro-simulation’ model of the spread of explore attitudes to the use of COVID-19-related and actual impacts of lockdowns. Public 185,000 patients) to shed light on four key COVID-19 based on highly realistic synthetic technologies, such as contact tracing apps, in transport data from Transport for London and questions, including when to put critical data of people’s daily activity – i.e. where they exiting lockdown.42 The aim of the project was the Department for Transport were cited as COVID-19 patients onto a ventilator, and how go (home, shops, school, work) and for how to test this new engagement methodology, central to arguments about the effectiveness of patients with long-term health conditions are long. The project’s main outcome was a model which could be used when face-to-face public the first lockdown. The ability to monitor data affected by COVID-19. Ethicists in the Turing’s of Devon’s entire 800,000-person population, deliberations are not possible, and it resulted in real time through dashboards (e.g. i-sense public policy programme have been embedded which allowed researchers to compare the in a report on how to build trust in technologies COVID RED,22 Evergreen,23 GOV.UK24 and within the research teams from the start of the impact of different intervention scenarios at a developed in response to the pandemic.43 Public Health Scotland25) also helped the project to ensure algorithms are implemented local level.35 data science community to respond and report to the highest standards of transparency and Lastly, although not mentioned by workshop trends to the public. Second, DELVE (Data Evaluation and Learning participants, the editors would like to highlight bias mitigation. The first results from DECOVID for Viral Epidemics)36 is a multidisciplinary are expected later in 2021. The Trinity Challenge44 as a particularly Workshop participants praised the work of group of researchers convened by the Royal exciting initiative to come out of the pandemic. the Office for National Statistics (ONS) in Finally, participants noted regional data Society to contribute data-driven analysis to This coalition of organisations from business, supplying data during the pandemic, including management systems that had proved the UK’s pandemic response, providing input academia and the social sector is offering its work on deaths stratified by ethnicity,26 valuable. The SAIL Databank32 data linkage through SAGE, the government’s Scientific an award pool of £10m for ground-breaking, the social impacts of COVID-19,27 and how service was instrumental in redeploying Advisory Group for Emergencies. Outputs from data-driven solutions that will help countries people spent their time during lockdown.28 existing datasets to support pandemic efforts the group include reports on the efficacy of to better identify, respond to, and recover from There was also praise for the work of Health in Wales. And DataLoch33 is a repository of face masks in tackling COVID-1937 and the risks outbreaks of disease. Data Research UK (HDR UK), specifically all routine health and social care data for the associated with pupils returning to schools in its Innovation Gateway,29 which provides Edinburgh and South East Scotland Region, September 2020.38 information about the datasets and tools held which provides data to a range of researchers by members of the UK Health Data Research to address COVID-19-related questions. Alliance,30 facilitating the navigation of multiple resources relevant to COVID-19. ...some positive aspects of the data 1 science and AI response: increased data 1 sharing, collaboration across disciplines, and the development of new datasets, 17 https://www.england.nhs.uk/coronavirus/wp-content/uploads/sites/52/2020/03/phe-letter-to-trusts-re-daily-covid- 19-hospital-surveillance-11-march-2020.pdf 18 http://theodi.org/wp-content/uploads/2021/04/Data4COVID19_0329_v3.pdf 19 https://www.cuebiq.com 20 https://www.google.com/covid19/mobility repositories and initiatives that mobilised 21 https://dataforgood.fb.com/docs/covid19 22 https://covid.i-sense.org.uk COVID-19 research. 23 https://www.evergreen-life.co.uk/covid-19-heat-map 24 https://coronavirus.data.gov.uk 25 https://public.tableau.com/profile/phs.covid.19#!/vizhome/COVID-19DailyDashboard_15960160643010/Overview 26 https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/coronavirus- covid19relateddeathsbyethnicgroupenglandandwales/2march2020to15may2020 34 https://royalsociety.org/topics-policy/Health%20and%20wellbeing/ramp 27 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthandwellbeing/bulletins/coro- 35 https://www.improbable.io/blog/improbable-synthetic-environment-technology-accelerates-uk-pandemic-modelling navirusandthesocialimpactsongreatbritain/4december2020 36 https://rs-delve.github.io 28 https://www.ons.gov.uk/economy/nationalaccounts/satelliteaccounts/bulletins/coronavirusandhowpeoplespent- 37 https://royalsociety.org/news/2020/05/delve-group-publishes-evidence-paper-on-use-of-face-masks theirtimeunderrestrictions/28marchto26april2020 38 https://royalsociety.org/news/2020/07/delve-opening-schools-should-be-prioritised-report 29 https://www.healthdatagateway.org 39 http://traverse.org.uk 30 https://ukhealthdata.org 40 https://www.involve.org.uk 31 https://www.decovid.org 41 https://www.bangthetable.com 32 https://saildatabank.com 42 https://www.adalovelaceinstitute.org/project/rapid-online-deliberation-on-covid-19-technologies 33 https://www.ed.ac.uk/usher/dataloch 43 https://www.adalovelaceinstitute.org/report/confidence-in-crisis-building-public-trust-contact-tracing-app 44 https://thetrinitychallenge.org 8 9
The Turing’s response to COVID-19 Understanding vulnerability to health-related misinformation Since the beginning of the pandemic, The Alan Turing Institute has been In response to the growing problem of misinformation around COVID-19, the Turing’s working to tackle the spread and effects of COVID-19. Here are some of public policy programme launched a project, funded by The Health Foundation, to our key projects – visit our dedicated webpage for more info. understand who is most vulnerable. They found that people with lower numerical, health and digital literacy tend to fare worse at assessing health-related statements, which suggests that developing people’s literacies has the potential to make a big difference to their ability to identify misinformation. The researchers are now hoping to feed into government policy-making around measures to counter the problem. Helping London to navigate lockdown safely Estimating positive COVID-19 test counts As London locked down in spring 2020, a team in the Turing’s data-centric engineering programme began Project Odysseus to monitor levels of The Turing has partnered with the Royal Statistical Society to activity on the city streets. Working with the Greater London Authority provide modelling and machine learning expertise to the UK (GLA) and Transport for London (TfL), the researchers fed their algorithms government’s Joint Biosecurity Centre. One of the key outputs so with data from traffic cameras and sensors to provide anonymised, near- far is a statistical model that uses incoming COVID-19 test data real time estimates of pedestrian densities and distances. TfL used this tool to estimate (‘nowcast’) the total number of positive tests in local during the pandemic’s first wave to make numerous interventions to keep authorities. It can take up to five days for PCR tests to be processed people socially distanced, such as moving bus stops, widening pavements and reported, so these nowcasts will give authorities an earlier and closing parking bays. picture of the disease’s spread and aid decision-making. Combining data from NHS trusts Improving the accuracy of the NHS Initiated by the Turing and four other partners, the COVID-19 app DECOVID project has created a detailed database of anonymised patient health data. A major breakthrough Turing researchers have played an integral role in the has been the transfer and combination of data from development of the NHS COVID-19 app, providing technical two NHS trusts, covering 185,000 patients. The work advice to the Department of Health and Social Care. The continues with in-kind contributions from researchers researchers improved the algorithm behind the app’s Bluetooth around the UK, who are analysing the data to shed light Low Energy contract tracing technology, so that it more on four key clinical questions, including when to put accurately calculates the risk that the phone’s user has been critical COVID-19 patients onto a ventilator, and how in contact with a COVID-positive person. A statistical analysis patients with long-term health conditions are affected by published in Nature estimated that the app prevented around COVID-19. The first results are expected later in 2021. 600,000 COVID-19 cases in October-December 2020 alone, due to people self-isolating following contact with an infected person. Modelling the spread of COVID-19 in urban areas Building resilience against future crises The Turing led the urban analytics workstream of the Royal Society’s Rapid Assistance in Modelling the Pandemic (RAMP) initiative. Using highly realistic data A new two-year, multidisciplinary project at the Turing is seeking to better of people’s daily activities in towns and cities, the workstream developed a model understand how interconnected health, social and economic systems are that simulates COVID-19 transmission at an individual level. A demonstration affected by shocks such as the pandemic. The ‘Shocks and resilience’ model of Devon’s entire population allowed researchers to compare the impact of project will develop a coupled epidemiologic and socio-economic model different lockdown strategies. The team is now scaling up its model to a national of the spread and societal effects of COVID-19, as well as more generalised level, and is in dialogue with policy makers about using it to inform decision- models for other complex, socio-economic systems. The overall aim is to making in this pandemic and future health emergencies. produce data, methodologies and tools to enable policy makers to make better informed decisions, boosting the resilience of local and national governments against future shocks. 10
2. Data access and standardisation Suggestions techniques work robustly, and at scale, with real-world data, while also considering data Reflecting on these data challenges, governance principles and practices. participants made the following suggestions Despite the contributions highlighted in expertise to the national effort. However, there for the data science and AI community: – Automate data collection systems. Chapter 1, workshop participants articulated was a strong perception in the workshops that Data collection is resource-intensive, so a number of challenges that hindered the providing researchers with better data access – Aspire to a research culture in which automated data collection could reduce efforts of the data science and AI community and generating a more level playing field data are shared as openly as legal and the reliance on frontline staff to record to contribute to the national effort. The most regardless of affiliation and connections (see ethical obligations permit, with central data using manual systems. This aspiration prominent of these were around data access also Chapter 3) would likely have helped better repositories, or ‘data lakes’, for cleaned and needs to be set against the obvious need and standardisation – not having access to address the knowledge and policy challenges anonymised data (or weblinks to data) ready to ensure the rights of individuals and the desired data, or having data that were not during the pandemic. for analysis. Document which data exist (to communities to privacy and agency. suitably formatted or documented. avoid duplication of effort) and make them – Encourage more data sharing widely available. Data standardisation agreements. Develop more general data Data access – Data repositories should signpost to sharing agreements as opposed to those for Many participants also noted a lack of data open datasets and, for non-open data, narrow purposes/groups of users to reduce While some members of the data science and standardisation as a significant hurdle for data to the details of data sharing agreements barriers and delays. AI community had easy and rapid access to scientists during the pandemic. Different data and protocols for securely accessing relevant data, several participants commented standards and codification of metadata, and – Develop protocols for minimum the datasets. This can ensure the wider that many researchers were limited in their lack of dataset documentation, meant that data standardisation of data fields (including provision and availability of shared datasets contributions because access to data was were difficult to find, link and assess in terms of some baseline data to always be included) and equitable data access. often restricted, inconsistent or slow. missingness and biases, limiting the scope of to ease linking of datasets and comparable and confidence in analyses. – Investigate ways in which access to metadata. As part of this, consider following For example, some hospitalisation data sensitive data (e.g. from the NHS) may the ‘FAIR Guiding Principles for scientific were not available early in the pandemic, For example, ideally it would have been be enabled while respecting professional, data management and stewardship’:47 and geographically disaggregated data for possible to link datasets from different studies, ethical and legal obligations surrounding the Findability, Accessibility, Interoperability and local analysis and crafting of solutions (e.g. but many studies and consortia funded data. This will require developing new ways Reusability. local lockdowns) were also not available by UK Research and Innovation (UKRI, e.g. to allow researchers to securely access to all relevant academic groups. Further, ISARIC4C45 and PHOSP-COVID46) are relatively – Conduct a stress test of the new personal data, perhaps using differential local-level economic data on employment standalone. It would also have been beneficial standardisation specifications by using privacy or federated learning techniques. and production by industry sector were to standardise codes across datasets, including regular events (such as the annual influenza This in turn presents new research not accessible or existent (and possibly variables for vulnerable and underserved season) to prepare for the next pandemic. challenges around making these latter often both), making the economic impact of groups, ethnicity and socio-economic status. proposed lockdown measures on different Better data standardisation would have industries, such as the hospitality sector, enabled data scientists to more easily find, more difficult to assess. More broadly, understand, link and triangulate information systematic data on non-pharmaceutical from different datasets in order to better interventions (social distancing, mask wearing, answer policy-relevant questions and to allow lockdowns), and particularly compliance testing for individual-level or context effects on with such interventions, have not been made outcomes (e.g. healthcare, employment, public appropriately available, making it difficult information). As editors, we acknowledge that to measure the impact of these policies on this aspiration for linked and standardised behaviour. datasets will be difficult to achieve in the near- Consequently, insights from the data science term: different research communities have 1 Aspire to a research culture in which and AI community were not as informative and different standards, formal or informal, different modes of working, and often use similar data are shared as openly as legal robust as they could have been. It is unclear how much the quantity and quality of research terminology with quite distinct meanings. Nevertheless, the value of such datasets is and ethical obligations permit, with could have been increased if a larger share of 1the community had been able to contribute its clear, and much more could clearly be done. central repositories, or ‘data lakes’, for cleaned and anonymised data ready for analysis. 45 https://isaric4c.net 46 https://www.phosp.org 47 https://www.go-fair.org/fair-principles 12 13
3. Inequality and exclusion not have access to the internet or a computer, Suggestions or who have not engaged with initiatives due to historical distrust within their community. Reflecting on these challenges, participants made the following suggestions for the data The COVID-19 pandemic has brought societal level. For instance, participants cited problems Finally, participants commented that the science and AI community. We note that many inequality into sharp focus, with the disease around the standardisation of capture of socio- data science and AI community had been of the suggestions in Chapter 2 regarding data having a much greater impact on some groups economic status and ethnicity. insufficiently engaged with disadvantaged access and standardisation are also relevant than others. In England and Wales during the groups on what COVID-19-related research to the data privilege issues highlighted in this first wave of the pandemic, nearly all minority While participants acknowledged that a robust would be useful for them, with the Ada chapter. ethnic groups had higher mortality rates than and rigorous system exists for obtaining Lovelace Institute cited as one of the few the White ethnic population.48 There were epidemiological data via institutions such as – Prioritise understanding the impact of organisations talking to these groups. also marked socio-economic differences, with Public Health England, they noted that such COVID-19 on different ethnic and social mortality rates in the most deprived areas a system does not exist for wider economic, groups, and ensure that these insights mobility and socio-economic data, and that this Inequality and exclusion within the research around double those in the least deprived are considered in conjunction with other gap hampered socio-economic research in community areas.49 key risk factors (e.g. age, pre-existing the early weeks of the pandemic. Participants Many of the issues around data access health conditions, profession, care home Alongside these broader societal issues, also noted a lack of sufficiently granular, local- highlighted in Chapter 2 are also issues of residency). This would help promote problems of inequality and bias relating to data level data on transmission of COVID-19 across inequality, with some researchers having much inclusion of underserved groups by funders, science and AI have also been highlighted communities, which hid some of the social easier access than others to sufficient quality researchers designing studies, reviewers by the pandemic. There have been long-term inequalities associated with the initial phase of data. Workshop participants recognised that evaluating focus areas, and teams delivering concerns about the potential of these research the pandemic. researchers with access to certain people modelling projects. areas to marginalise certain groups, but the or institutes were more privileged, leading to rise during the pandemic of new datasets, data – Address deficiencies in data sources. For “data haves and data have-nots”. For instance, example, the physical, mental health and capture/analysis methods and data-driven "The COVID-19 pandemic has those who had access to members of SAGE economic consequences of the pandemic technologies such as contact tracing apps, or who had established relationships with plus ongoing discussions around immunity brought societal inequality agencies in health and other relevant areas are likely to be severe on some groups, and and vaccine passports, has meant that these capturing any missing or incomplete data issues are more pertinent than ever. We need into sharp focus." were able to contribute more easily to the research response. could be key to understanding the medium- ongoing scrutiny to ensure that any tools and and long-term impacts on underrepresented technologies developed by the data science There was also geographical inequality within groups. Participants raised concerns that the scramble and AI community respond to people’s health, the data science response, with some areas – Develop clear protocols for collecting to use existing datasets, or to rapidly create socio-economic and security needs in a fair of the UK having far greater access to local data on protected characteristics, new ones, ran the risk of proceeding without and equitable manner. data and success working with it than others. including, for example, ethnicity, sex, due regard for sampling biases. These biases, Participants commented that higher quality, age and socio-economic status. Better Inequality and exclusion were common such as insufficient representation of minority spatio-temporal data could have assisted the monitoring of this information in studies and themes in the workshops, and the participants’ groups, could be ‘baked into’ datasets, and support of policy at both a local and national clinical trials could help identify critical data comments fell into two main categories: may be due to systemic discrimination, level. gaps. structural inequalities and/or data collection – Inequality and exclusion within society constraints. This could in turn lead to biased Finally, participants noted a lack of diversity – Develop clear protocols for generating research and policies that exacerbate pre- in the data science and AI community, anonymised and synthetic data, so that – Inequality and exclusion within the research existing inequalities. Data availability, detail which could bias research or policy in important demographic information can community and representativeness during clinical trials a number of ways, and emphasises the be included in open datasets (one example also had shortcomings, which inhibited the need to make diversity more of a priority is the OpenPseudonymiser51 application Inequality and exclusion within society inclusion of patient subgroups, and may have in recruitment. This lack of diversity could developed at the University of Nottingham). Participants felt that the potential of the data made it more difficult to assess, for example, lead to lower engagement with vulnerable Great strides have been made over the past science and AI community to help understand the efficacy of COVID-19 treatments and and underrepresented groups, for instance, decade in developing realistic synthetic the impact of COVID-19 on different ethnic and vaccines in minority ethnic groups. potentially resulting in data and policy biases simulation data (e.g. in ‘digital twins’ socio-economic groups was not fully realised due to insufficient representation. The low projects). Such work can be leveraged and due to several challenges. Participants also noted significant inequality take-up of vaccination in some minority extended to provide synthetic datasets for and exclusion challenges around the use of Many participants commented on a lack of ethnic groups demonstrates the need for modelling potential future pandemics and mobility, mobile device and other digital data in relevant or consistent data. While ONS and communication and engagement by trusted their impact on minority groups. behavioural analysis. For instance, traditionally other public data providers released up-to-date representatives50 – and these often differ from underrepresented groups also tend to be COVID-19 data on minority groups, many data the demographics of national representatives. scarce in digital data – especially those on the sources had gaps at both the national and local ‘invisible’ side of the digital divide, who might 1 43 50 https://www.theguardian.com/world/2021/mar/25/clinics-pop-up-in-london-to-help-low-vaccine-take-up 51 https://www.openpseudonymiser.org
4. Communication – Make more granular data available at the – Increase the diversity of representatives local level, and expertise and resources in academia, the government and easier to find and deploy, so that local the media. Increasing diversity and organisations have the autonomy to work descriptive representation in organisations with their data (as opposed to the data – among researchers and UK government Although collaboration within the data science – Make greater strides to bring being held, and decisions made, centrally). representatives alike – can create smarter, and AI community increased during the transdisciplinary groups together. As This can increase local resilience to health more innovative and more inclusive teams pandemic, as demonstrated by the examples well as research groups, this includes emergencies. and solutions.52 For example, diverse in Chapter 1, issues of communication were local communities, health and social care research teams will be key to addressing frequently raised by workshop participants, providers, local partners, analysts, third – Increase engagement with minority falling into three main categories: sector organisations, private organisations, the issue of algorithmic bias. Increasing and underrepresented communities. and industry. diversity also sets role models for future – Communication between experts Improve data representativeness, minimise generations and thus provides lasting – Speed up the research pipeline by algorithmic bias and develop trust by – Communication between researchers and benefits. One way to improve diversity is developing a robust research/analysis/ engaging and consulting with representative policy makers to make it a core criterion in hiring and review framework to reduce delays in groups, such as ‘citizen groups’, at regular appointment decisions. – Communication between researchers and publishing ideas while maintaining high intervals. Develop mechanisms for well- funded community involvement in setting the public standards of quality and transparency. research priorities and for the end-to-end – Promote global data sharing for health participatory design of projects. Communication between experts emergencies. Data sharing agreements Connecting expertise – the right people to between the UK and other countries take the right data to the right problem – was time to organise, and many were previously identified by workshop participants as an coordinated and/or financially supported by important bottleneck in working to understand EU organisations, which will undoubtedly COVID-19, constraining the community’s become a more complicated endeavour ability to respond. More collaboration between post-Brexit. Having more flexible regulations often disparate groups, such as data asset for future emergencies will help facilitate holders, subject matter experts, researchers global learning as well as research and and clinicians, could have helped to share policy solutions in the UK. The editors expertise, avoid duplication, and improve note that facilitating safe and equitable analyses. data sharing between countries is also recommended by the 'Data for international Participants also identified a need for more health emergencies' statement published by international collaboration. While data the science academies of the G7 nations in access and exchanges across national March 2021.54 borders happened to some degree during the pandemic, such as through the EULAR Communication between researchers and COVID-19 Database,53 many other efforts policy makers remained decentralised, informal and ad hoc. More could be learned by the UK from other The transfer of knowledge from academic countries in similar situations, particularly research to real-world policy was a crucial Increasing diversity and those with greater experience or capacity in dealing with health emergencies. aspect of the scientific response to the pandemic. Within the data science and descriptive representation AI community, participants noted that communication with policy makers could be Suggestions in organisations – among Participants made the following suggestions improved in order to better identify policy needs and improve knowledge transfer. researchers and UK government for the data science and AI community: Participants also highlighted a lack of representatives alike – can 1 – Develop collaborative working relationships between data scientists transparency in policy-making. It was difficult to know which studies had ‘cut through’ and create smarter, more innovative and clinicians. Data scientists can provide been considered by government and advisory insights about the collection and storage groups when making policy interventions, and and more inclusive teams and of data that clinicians may not be aware which data policy makers were using to inform of, while clinicians can provide valuable their decisions. Increased transparency would solutions. 1 insights about the multi-dimensional nature help researchers to understand what scientific of health and social data collection. approaches and insights are most valuable to 53 https://www.eular.org/eular_covid19_database.cfm 52 https://hbr.org/2016/11/why-diverse-teams-are-smarter ; https://www.turing.ac.uk/news/where-are-women-map- 54 https://royalsociety.org/-/media/about-us/international/g-science-statements/G7-data-for-international-health-emer- ping-gender-job-gap-ai gencies-31-03-2021.pdf 16 17
policy makers, and would also give the public Workshop participants agreed that, although – Increase the understanding and – Use data visualisation: this can be a clearer picture of the evidence behind policy there had been successful examples of public involvement of data scientists in media an effective means of communicating decisions, potentially increasing public trust engagement from the community during the processes. Media outlets have increasingly complex scientific topics to non-specialists. and compliance. pandemic, there were also shortcomings referenced preprints, which are of Participants cited Harry Stevens’s ‘corona in communication, particularly around the varying quality given preprint protocols simulator’ article in The Washington Post as limitations and uncertainties of research. and accelerated peer review processes. a particularly effective example.59 "The transfer of knowledge Helping the public to understand the findings Communicate to the public what is and what – Support independent input and and caveats of modelling studies, for example, is not (or not yet) quality, peer-reviewed from academic research could enhance trust in the research and research. communication: workshop participants highlighted the benefits of an Independent to real-world policy was a increase support for and compliance with policies. Trust and acceptance of policies might – Consider building on pre-existing SAGE,60 which provided a model for mechanisms to support communication, how scientists can analyse, interpret crucial aspect of the scientific also be increased by communicating how data for example via the Science Media and comment on a situation from a more science and modelling can promote positive response to the pandemic." health outcomes. For example, targeted Centre,58 which curates expert reactions to noteworthy science stories, helping politically neutral standpoint. communication might have helped improve journalists to access nuanced and unfiltered compliance with the isolation notifications sent information. Suggestions through the NHS Test and Trace system.57 Participants made the following suggestions for the data science and AI community: Suggestions Participants made the following suggestions – Build sustainable links between the data for the data science and AI community: science community and those who are closer to policy-making. – Provide clear, simplified, accessible – Provide training for policy makers to help information for non-specialists about them understand the research process, studies’ findings and predictions, as well as e.g. study design and limitations, and the the underlying research designs (data used, uncertainty around estimates and findings. data quality, computational processes, etc.). – Provide training for researchers and – Communicate transparently about the public health professionals to help them limitations of studies, such as uncertainty better communicate to policy makers the in the models/predictions and potential findings, limitations and uncertainties of biases in the data. their studies and models. – Build public trust by addressing concerns related to data security, privacy and Communication between researchers and confidentiality, and by communicating how the public the data and models are informing policies. Throughout the pandemic, data science and – Take a more proactive role in countering AI have been in the public eye as never before. misinformation, for example about testing, Every day presents a new swathe of statistics contact tracing and vaccines. More effective about COVID-19, and data scientists have communication could have mitigated been increasingly called upon to communicate against the many counter-messages, ‘fake their research to non-specialists. Meanwhile, news' stories, and biased reporting that the public has been able to directly input into emerged during the pandemic. Also, present COVID-19 research through initiatives such as data in a way that is complete and not prone the COVID Symptom Study app,55 and there to misinterpretation. have been vigorous debates on the ethics of – Train researchers in how to communicate AI and algorithmic bias, especially around the their findings to the public and media, 1UK GCSE and A-Level grading controversy in summer 2020.56 especially around the use of models in Throughout the pandemic, data making decisions and predictions. science and AI have been in the 1 public eye as never before. 55 https://covid.joinzoe.com 58 https://www.sciencemediacentre.org 56 https://en.wikipedia.org/wiki/2020_UK_GCSE_and_A-Level_grading_controversy 59 https://www.washingtonpost.com/graphics/2020/world/corona-simulator 57 https://www.medrxiv.org/content/10.1101/2020.09.15.20191957v1 60 https://www.independentsage.org 18 19
5. Conclusions Acknowledgements Since the first UK lockdown in March 2020, is key to reducing the chances of data and We thank all conference and workshop This work was supported by Wave 1 of the when little was known about COVID-19, our research being misused or misinterpreted. participants who have shared their experiences UKRI Strategic Priorities Fund under the knowledge of the disease and its effects have and insights with us, and have contributed to EPSRC Grant EP/T001569/1, particularly much progressed. The UK’s data science and Workshop participants have made many this report. the ‘Health’ theme within that grant and The AI community has played an integral role in suggestions for how the community might Alan Turing Institute. We also acknowledge this scientific effort, and this report provides address these challenges. These include We thank Jessie Wand for her amazing efforts the contributions of the ‘Health and medical a snapshot of the community’s contributions. the provision of accessible and centralised in organising the complex series of conference sciences’ programme at The Alan Turing But equally, the pandemic has pulled into sharp ‘data lakes’; more equitable data access; meetings and workshops that led to this report, Institute. focus a number of areas where we can and protocols for data standardisation; increased and James Lloyd, science writer, for his tireless should do better. representation of, and engagement with, efforts in pulling the final draft of this report minority groups; training for researchers in together. We are also grateful to the Centre First, participants reported difficulties communicating their work to non-specialists; for Facilitation for holding the workshops and accessing sufficiently timely, robust, granular, and initiatives to increase public understanding writing the workshop notes. standardised and documented data. There of research findings and uncertainty, to better were also issues of privilege, with some counter misinformation. researchers able to access data much more easily than others. If the community can make progress in these areas, then when we are next faced with a Second, the pandemic has highlighted pre- pandemic – and the historical record strongly existing societal issues of inequality and suggests that this is a ‘when’ rather than an ‘if’ exclusion, and the role that data science – we should be better placed as a collective to and AI can play in mitigating or exacerbating respond. these. In our workshops, participants raised concerns about a lack of data about, and Navigating our way through the pandemic engagement with, minority ethnic and socio- without the knowledge and resources of economic groups, and a lack of diversity within the data science and AI community would the research community and decision-making have been markedly more difficult. These are organisations. transformational times for the community as its research becomes ever more embedded Third, the pandemic has underlined both the in everyday life. We need to draw on our importance and difficulty of communicating experiences during this pandemic to ensure transparently with other researchers, that data science and AI continue to change policy makers and the public, particularly lives for the better. around issues of modelling and uncertainty. Participants agreed that better communication 20 21
You can also read