Data science and AI in the age of COVID-19 - Reflections on the response of the UK's data science and AI community to the COVID-19 pandemic - The ...

Page created by Renee Adkins
 
CONTINUE READING
Data science and AI in the age of COVID-19 - Reflections on the response of the UK's data science and AI community to the COVID-19 pandemic - The ...
Data science and AI
in the age of COVID-19
Reflections on the response of the UK’s
data science and AI community to the
COVID-19 pandemic
Executive summary                                                                                         Contents
This report summarises the findings of a              science activity during the pandemic. The           Foreword                                              4
series of workshops carried out by The Alan           message was clear: better data would
Turing Institute, the UK’s national institute         enable a better response.                           Preface                                               5
for data science and artificial intelligence
(AI), in late 2020 following the 'AI and data         Third, issues of inequality and exclusion
                                                      related to data science and AI arose during         1. Spotlighting contributions from the data science   7
science in the age of COVID-19' conference.                                                               and AI community
The aim of the workshops was to capture               the pandemic. These included concerns
the successes and challenges experienced              about inadequate representation of minority
                                                                                                                + The Turing's response to COVID-19             10
by the UK’s data science and AI community             groups in data, and low engagement with
during the COVID-19 pandemic, and the                 these groups, which could bias research             2. Data access and standardisation                    12
community’s suggestions for how these                 and policy decisions. Workshop participants
challenges might be tackled.                          also raised concerns about inequalities             3. Inequality and exclusion                           14
                                                      in the ease with which researchers could
Four key themes emerged from the                      access data, and about a lack of diversity          4. Communication                                      17
workshops.                                            within the research community (and the
                                                      potential impacts of this).                         5. Conclusions                                        20
First, the community made many
contributions to the UK’s response to                 Fourth, communication difficulties
                                                                                                          Appendix A. Workshop participants and Organising      22
the pandemic, via national organisations,             surfaced. While there have been excellent
                                                                                                          Committee
research institutes and the healthcare                examples of science communication
sector. Researchers responded to the crisis           throughout the pandemic, participants
                                                                                                          Appendix B. Workshop themes and reports               24
with ingenuity and determination, and the             highlighted the challenges of
result was a range of new projects and                communicating research findings and
collaborations that informed the pandemic             uncertainties to policy makers and the
response and opened up new areas for                  public in a timely, accurate and clear
future study.                                         manner.
Second, the single most consistent                    In this report, we outline the workshop
message across the workshops was the                  participants’ reflections and suggestions
importance – and at times lack – of robust            relating to each of these themes, with the
and timely data. Problems around data                 aim of enabling the data science and AI
availability, access and standardisation              community to respond better to the ongoing
spanned the entire spectrum of data                   pandemic, and future emergencies.

Edited by

                  Inken von Borzyskowski                              Anjali Mazumder
                  Assistant Professor of                              AI and Justice and Human
                  Political Science, University                       Rights Theme Lead, The Alan
                  College London                                      Turing Institute

                  Bilal Mateen                                        Michael Wooldridge
                  Clinical Data Science Fellow,                       Turing Fellow and Programme
                  The Alan Turing Institute;                          Co-Director for Artificial
                  Clinical Technology Lead,                           Intelligence, The Alan Turing
                  Wellcome Trust                                      Institute

Workshop theme leads
Mark Briers (The Alan Turing Institute), Tao Cheng (UCL), Spiros Denaxas (UCL), John Dennis (University
of Exeter), Sabina Leonelli (University of Exeter), David Leslie (The Alan Turing Institute), Marion
Mafham (University of Oxford), Ed Manley (University of Leeds), Karyn Morrissey (University of Exeter),
Deepak Parashar (University of Warwick), Jim Weatherall (AstraZeneca)
Foreword                                                                                                                    Preface
    The COVID-19 pandemic has seen scientific                     and non-experts – contain valuable reflections                At the end of 2019, a new highly infectious                  pharmaceutical interventions. The response
    research move into public discourse in                        and suggestions for how the data science                      virus, now known as severe acute respiratory                 was remarkable for its breadth of engagement
    unparalleled ways. Across the scientific                      and AI community might prepare for future                     syndrome coronavirus 2 (SARS-CoV-2),                         across disciplines, as demonstrated by
    spectrum, researchers have stepped up, the                    emergencies. Indeed, the Turing has already                   was identified as the underlying pathogen                    the range of backgrounds of our workshop
    data science and AI community included, to                    begun a large-scale project1 which aims to                    for a series of unexplained pneumonias                       participants and the diverse set of insights that
    work alongside clinicians, policy makers and                  boost societal, governmental and economic                     (subsequently termed coronavirus disease                     they generated.
    the government at the heart of the response,                  resilience to shocks such as this pandemic.                   2019, or COVID-19), clustered in Wuhan,
    directly impacting on our daily lives.                                                                                      China. By 30 January 2020, COVID-19 had                      Goals, origins and structure of the report
                                                                  Ahead of the G7 summit in the UK in June                      become so prevalent that the World Health
    Data science and AI is an inherently                          2021, the leading scientific bodies of the G7                 Organisation (WHO) declared it a Public Health               This report was commissioned by The Alan
    interdisciplinary community, and our activities               nations (the ‘Science 7’) recently published a                Emergency of International Concern – the                     Turing Institute with the aim of reflecting on
    at The Alan Turing Institute in response to the               call for more ‘data readiness’ in preparation                 WHO’s highest level of alarm. As we finish                   the UK's data science and AI response to the
    pandemic reflect this. Our researchers have                   for future health emergencies.2 This is a                     this report in spring 2021, the disease itself               pandemic. The Turing is the UK’s national
    developed algorithms to monitor pedestrian                    timely amplification of the message in the                    has claimed over three million lives globally,               institute for data science and AI, and partners
    density and ensure social distancing on the                   Turing’s report about the need for increased                  with more than 170 million confirmed cases,                  with many of the UK’s leading universities and
    streets of London; combined NHS datasets to                   data access and sharing, at a time when the                   and many more affected by the impacts of                     research centres to advance the country’s
    help answer clinical questions about the effects              pandemic continues to have catastrophic                       lockdowns and the unprecedented disruption                   capacity and competitiveness in these areas,
    of COVID-19; explored what makes people                       impacts around the world.                                     to the global economy. The UK has now been                   with the overall mission of changing the world
    vulnerable to health-related misinformation;                                                                                through two waves of the virus, with infections,             for the better.
    and improved the accuracy of the NHS                          My thanks to the editors for initiating and
                                                                  delivering this report, and to all the workshop               hospitalisations and fatalities in the second                The Turing’s goals in undertaking this work
    COVID-19 app. (See page 10 for more details of                                                                              wave exceeding those in the first.
    our response.)                                                theme leads and participants. We hope it                                                                                   were twofold:
                                                                  will be a valued contribution to the ongoing                  While pandemics appear to have occurred
    This report has been edited by four researchers               discussions about the data science and AI                                                                                  1. To capture the initiatives and resources that
                                                                                                                                throughout human history, the COVID-19                       have been developed by the data science and
    with backgrounds spanning AI, data science,                   response, alongside notable reports from the                  pandemic is unique in one important respect.
    public policy, human rights and medicine. They                Centre for Data Ethics and Innovation,3 the Ada                                                                            AI community during the pandemic.
                                                                                                                                It is the first pandemic to occur in the age of
    have synthesised the views of 96 attendees to                 Lovelace Institute,4 the Royal Society’s DELVE                data science and AI: the first pandemic in a                 2. To gather the experiences and insights of
    a series of workshops held at the end of 2020,                initiative5 and the Royal Statistical Society,6               world of deep learning, ubiquitous computing,                this community during the pandemic – what
    which aimed to provide a snapshot of the uses                 among others.                                                 smartphones, wearable technology and social                  worked well, what didn’t, and how we as
    of data science and AI during the pandemic,                                                                                 media. It is thus unsurprising that governments              a community could respond better to this
    and what we as a community can learn from                     The Turing looks forward to continuing its
                                                                  role to convene and deliver activities and                    across the globe, including the UK’s, looked                 pandemic and future emergencies.
    the experience.                                                                                                             to data to inform their responses and help
                                                                  reflections in response to this pandemic, and in                                                                           The work has its origins in a one-day
    While this represents a small part of what                    preparation for other crises.                                 navigate challenges. The goal was to limit the
                                                                                                                                spread of the disease and its medical, social                conference 'AI and data science in the age
    will surely be a larger reflection exercise to                                                                                                                                           of COVID-19,'7 which was held virtually by the
    come, the central findings – issues of data                                                                                 and economic consequences. As such, the
                                                                  Adrian Smith                                                  UK government stated that its policies were                  Turing on 24 November 2020 and featured talks
    access and quality; inequality among both the                                                                                                                                            from some of the leading voices in the UK’s
    research community and wider society; and                     Institute Director and Chief Executive                        “guided by the science”, and later that ending
                                                                  The Alan Turing Institute                                     lockdowns depended on “data, not dates”.                     response to COVID-19. The free event attracted
    communication difficulties between experts                                                                                                                                               over 1,700 registrants from 35 countries, from
                                                                                                                                In response, many members of the UK’s data                   academia, industry, the public sector and the
                                                                                                                                science and AI community stepped forward,                    general public.
                                                                                                                                spearheading initiatives that they hoped would
                                                                                                                                assist the domestic and international response.              A series of themed, virtual workshops followed
                                                                                                                                These initiatives came from individual                       in November and December 2020. The
                                                                                                                                academics, university research groups, the                   invitation to participate in these workshops
                                                                                                                                healthcare sector, national institutes and                   was widely circulated within the UK academic
                                                                                                                                others. They involved not just experts on                    community, via social media and the Turing’s
                                                                                                                                virology and epidemiological modelling, but                  network of partners and affiliates. The Turing
    1
                                                                                                                                also researchers studying, for example, the                  used a lightweight reviewing process to select
                                                                                                                                social and economic consequences of non-                     the participants, who are listed in Appendix A.
                                                                                                                                1

    1 https://www.turing.ac.uk/research/research-projects/shocks-and-resilience
    2 https://royalsociety.org/-/media/about-us/international/g-science-statements/G7-data-for-international-health-emer-
    gencies-31-03-2021.pdf
    3 https://www.gov.uk/government/publications/covid-19-repository-and-public-attitudes-retrospective
    4 https://www.adalovelaceinstitute.org/summary/learning-data-lessons                                                        7 https://www.turing.ac.uk/events/ai-and-data-science-age-covid-19
    5 https://rs-delve.github.io/reports/2020/11/24/data-readiness-lessons-from-an-emergency.html
    6 https://rss.org.uk/statistics-data-and-covid
4                                                                                                                           5
1. Spotlighting contributions from the
        We took the view that, if the report was to               and analysis were far more widespread. One
        have any credibility, it would be essential to            possible reason for this, suggested by the
        include the views of as diverse a community               CDEI report, is that COVID-19 was such a new
        as possible. The scope of ‘data science and
        AI’ for the purposes of this report is therefore
                                                                  phenomenon that there was a lack of data with
                                                                  which to train AI algorithms.
                                                                                                                          data science and AI community
        deliberately broad, with participants including
        ethicists, clinicians, mathematicians, policy             We also note that while we as editors have
                                                                                                                          A key goal of the workshops was to document                    several healthcare technology companies.
        advisors, and many more.                                  done our best to report the suggestions
                                                                                                                          the contributions from the UK’s data science                   OpenSAFELY provides secure access to
                                                                  of workshop participants as accurately as
                                                                                                                          and AI community in responding to the                          a database of over 58 million NHS patient
        The eight workshop themes are listed in                   possible, in some cases we have applied
                                                                                                                          pandemic. In this chapter, we summarise the                    records, allowing researchers to answer urgent
        Appendix B. These themes are themselves a                 light editing. This especially applies to the
                                                                                                                          contributions highlighted by the workshop                      clinical and public health questions related
        reflection of input from over 20 experts, who             suggestions around data access in Chapter
                                                                                                                          participants, which demonstrate some positive                  to COVID-19. Moreover, for the first time it
        formed the Organising Committee for the                   2, where we have taken care to put the
                                                                                                                          aspects of the data science and AI response:                   separates the development of the analysis
        conference and workshops. All workshops                   participants’ aspirations for more accessible
                                                                                                                          increased data sharing, collaboration across                   code from the actual data (which never leave
        were led by the Centre for Facilitation8 together         and open data in the context of the complex
                                                                                                                          disciplines, and the development of new                        the servers of the electronic health record
        with one or two workshop-specific theme                   legal and ethical obligations surrounding
                                                                                                                          datasets, repositories and initiatives that                    service provider or NHS Digital), and in doing
        leads who were selected by the Organising                 access to sensitive national health, economic
                                                                                                                          mobilised COVID-19 research.                                   so guarantees fully open and reproducible
        Committee to supplement the facilitators’                 and social information. Data access is
                                                                                                                                                                                         code. This is a noteworthy development that
        expertise with domain knowledge. For each                 clearly an important issue which needs to               We emphasise that these spotlights are                         occurred largely due to the permissive nature
        workshop, the facilitators and theme leads                be addressed, and there are many steps to               nothing more than that: spotlights. The                        of the regulatory environment necessitated by
        summarised the discussions in a report,                   enabling responsible data flows.                        workshops provided a snapshot of the                           the pandemic.
        which have been lightly edited and linked from                                                                    community’s response, but there were many
        Appendix B. These workshop reports provided               One final and important word. Our report
                                                                                                                          other contributions that were not covered in                   Another new initiative was the COVID-19
        the ‘raw data’ for this report. Given the range           is preliminary in the sense that we hope
                                                                                                                          the workshops. An overview of The Alan Turing                  Genomics UK Consortium (COG-UK),13
        and complexity of the workshops, it has not               it will serve as a starting point for a more
                                                                                                                          Institute’s key COVID-19 projects can be found                 which has sequenced more than 490,000
        been possible to include all of the participants’         comprehensive and systematic review of
                                                                                                                          on page 10 of this report, and further examples                SARS-CoV-2 virus genomes to date, providing
        views and suggestions, but we as editors have             the uses of data science and AI during the
                                                                                                                          of the community’s response can be found                       important information on viral transmission and
        tried to draw out the main themes.                        pandemic. The report was originally conceived
                                                                                                                          in resources such as the CDEI’s COVID-19                       the emergence of new variants. COG-UK was
                                                                  in summer 2020, when daily reported COVID-19
                                                                                                                          respository.10 Finally, we acknowledge that                    achieved through the coordination of several
        Although our remit includes both data science             fatalities in the UK were down to single figures,
                                                                                                                          there may be some bias in this chapter, with                   organisations (NHS, the UK’s public health
        and AI – reflecting the Turing’s role as the              and there was some hope that the virus might
                                                                                                                          participants more likely to mention projects                   agencies, the Wellcome Sanger Institute and
        national institute for these two subjects – the           be largely contained in a single wave. Work
                                                                                                                          that they are affiliated with.                                 various academic institutions). Having those
        majority of the feedback we received from                 on this report continued during the November
                                                                                                                                                                                         sequences available from the earliest stages of
        workshop participants related specifically                2020 to spring 2021 lockdowns when the
                                                                                                                          Noteworthy datasets and databases                              the pandemic was key to pushing forward the
        to data science, and our report reflects this.            disease was again prevalent. In this sense,
                                                                                                                                                                                         development of vaccines.
        This chimes with the 'COVID-19 repository                 many (but not all) of the observations made             The UK government’s action to issue statutory
        and public attitudes' report9 published by the            here relate to the first wave of the pandemic in        regulators such as NHS Digital with a Control                  Existing systems and databases from ISARIC
        Centre for Data Ethics and Innovation (CDEI)              the UK, from January to summer 2020, prior to           of Patient Information (COPI) notice11 to                      (International Severe Acute Respiratory and
        in March 2021, which found that “conventional             the onset of the UK’s second wave. We hope              make confidential patient data available to                    emerging Infection Consortium)14 and ICNARC
        data analysis has been at the heart of the                that the report will be of value to healthcare          researchers and policy makers for COVID-                       (Intensive Care National Audit & Research
        COVID-19 response, not AI”. There were                    professionals, data science and AI researchers,         19-specific purposes had far-reaching                          Centre)15 were repurposed and expanded for
        certainly innovative applications of AI during            and policy/decision makers as we continue to            consequences, and participants noted the                       the pandemic, which kept costs down and
        the pandemic, but the evidence suggests that              manage the ongoing pandemic.                            positive effect this had on data sharing.                      enabled the data science community to act
        more traditional methods of data collection                                                                                                                                      more quickly. Similarly, linked data from GPs
                                                                                                                          For example, it paved the way for a new                        and NHS trusts were available to researchers
                                                                                                                          statistical analytics platform called                          through the pre-existing Discovery East
                                                                                                                          OpenSAFELY,12 a collaboration between the                      London16 platform. Participants also
                                                                                                                          University of Oxford’s DataLab, the London                     highlighted the utility of aggregate data on
                                                                                                                          School of Hygiene & Tropical Medicine, and                     critical care patients collected by CHESS
                                                                                                                          1

                                                                                                                          10 https://www.gov.uk/government/publications/covid-19-repository-and-public-attitudes-retrospective
    1
                                                                                                                          11 https://digital.nhs.uk/coronavirus/coronavirus-covid-19-response-information-governance-hub/control-of-patient-in-
                                                                                                                          formation-copi-notice
                                                                                                                          12 https://opensafely.org
                                                                                                                          13 https://www.cogconsortium.uk
    8 https://www.centreforfacilitation.co.uk                                                                             14 https://isaric.org
    9 https://www.gov.uk/government/publications/covid-19-repository-and-public-attitudes-retrospective                   15 https://www.icnarc.org
                                                                                                                          16 https://www.eastlondonhcp.nhs.uk/ourplans/discovery-east-london-case-study.htm

6                                                                                                                     7
(COVID-19 Hospitalisation in England                         Participants also cited the DECOVID31 project,              Other initiatives                                          Workshop participants also noted positive
    Surveillance System),17 which was adapted                    which aimed to create a detailed, frequently                                                                           examples of public engagement from the
    from the UK Severe Influenza Surveillance                    updated database of anonymised patient                      The Royal Society was highly active during                 data science and AI community, such as the
    System by Public Health England.                             health data during the pandemic. The project                the pandemic, using its convening power                    contributions of Professor Christina Pagel and
                                                                 was initiated during the early stages of the                to support the UK’s pandemic response                      Professor Devi Sridhar, in helping to improve
    Participants noted that a major breakthrough                 pandemic, and involved researchers from The                 through two initiatives. First, RAMP (Rapid                understanding of the pandemic and policy
    in the data science community’s response to                  Alan Turing Institute and four other founding               Assistance in Modelling the Pandemic)34                    interventions. Participants also highlighted
    the pandemic was the opening up (and sharing)                institutes. The initial funding, diverted from an           brought together individuals with modelling                the work by the Ada Lovelace Institute as a
    of various mobility datasets18 by commercial                 existing Turing grant from the Engineering and              expertise from a diverse range of disciplines              good example of how to effectively engage the
    providers, such as Cuebiq,19 Google,20 and                   Physical Sciences Research Council (EPSRC),                 to support the COVID-19 pandemic modelling                 public during lockdown. In May and June 2020
    Facebook’s Data for Good.21 These enabled                    covered the transfer and combination of data                effort. For example, workshop participants                 (during the first UK lockdown) the Ada Lovelace
    the detailed spatial analysis and large-scale                from two NHS trusts, plus the data analysis                 highlighted a collaboration between RAMP’s                 Institute worked with Traverse,39 Involve,40
    simulation of viral transmission that was                    planning. The work continues with in-kind                   Urban Analytics team and Improbable (a UK                  and Bang the Table41 to conduct a rapid online
    important for generating evidence for the                    contributions from researchers around the                   technology company), which developed a                     deliberation with 28 members of the public to
    government and the public on the potential                   UK, who are analysing the data (now covering                ‘micro-simulation’ model of the spread of                  explore attitudes to the use of COVID-19-related
    and actual impacts of lockdowns. Public                      185,000 patients) to shed light on four key                 COVID-19 based on highly realistic synthetic               technologies, such as contact tracing apps, in
    transport data from Transport for London and                 questions, including when to put critical                   data of people’s daily activity – i.e. where they          exiting lockdown.42 The aim of the project was
    the Department for Transport were cited as                   COVID-19 patients onto a ventilator, and how                go (home, shops, school, work) and for how                 to test this new engagement methodology,
    central to arguments about the effectiveness of              patients with long-term health conditions are               long. The project’s main outcome was a model               which could be used when face-to-face public
    the first lockdown. The ability to monitor data              affected by COVID-19. Ethicists in the Turing’s             of Devon’s entire 800,000-person population,               deliberations are not possible, and it resulted
    in real time through dashboards (e.g. i-sense                public policy programme have been embedded                  which allowed researchers to compare the                   in a report on how to build trust in technologies
    COVID RED,22 Evergreen,23 GOV.UK24 and                       within the research teams from the start of the             impact of different intervention scenarios at a            developed in response to the pandemic.43
    Public Health Scotland25) also helped the                    project to ensure algorithms are implemented                local level.35
    data science community to respond and report                 to the highest standards of transparency and                                                                           Lastly, although not mentioned by workshop
    trends to the public.                                                                                                    Second, DELVE (Data Evaluation and Learning                participants, the editors would like to highlight
                                                                 bias mitigation. The first results from DECOVID             for Viral Epidemics)36 is a multidisciplinary
                                                                 are expected later in 2021.                                                                                            The Trinity Challenge44 as a particularly
    Workshop participants praised the work of                                                                                group of researchers convened by the Royal                 exciting initiative to come out of the pandemic.
    the Office for National Statistics (ONS) in                  Finally, participants noted regional data                   Society to contribute data-driven analysis to              This coalition of organisations from business,
    supplying data during the pandemic, including                management systems that had proved                          the UK’s pandemic response, providing input                academia and the social sector is offering
    its work on deaths stratified by ethnicity,26                valuable. The SAIL Databank32 data linkage                  through SAGE, the government’s Scientific                  an award pool of £10m for ground-breaking,
    the social impacts of COVID-19,27 and how                    service was instrumental in redeploying                     Advisory Group for Emergencies. Outputs from               data-driven solutions that will help countries
    people spent their time during lockdown.28                   existing datasets to support pandemic efforts               the group include reports on the efficacy of               to better identify, respond to, and recover from
    There was also praise for the work of Health                 in Wales. And DataLoch33 is a repository of                 face masks in tackling COVID-1937 and the risks            outbreaks of disease.
    Data Research UK (HDR UK), specifically                      all routine health and social care data for the             associated with pupils returning to schools in
    its Innovation Gateway,29 which provides                     Edinburgh and South East Scotland Region,                   September 2020.38
    information about the datasets and tools held                which provides data to a range of researchers
    by members of the UK Health Data Research                    to address COVID-19-related questions.
    Alliance,30 facilitating the navigation of multiple
    resources relevant to COVID-19.                                                                                                                     ...some positive aspects of the data
    1                                                                                                                                                   science and AI response: increased data
                                                                                                                             1
                                                                                                                                                        sharing, collaboration across disciplines,
                                                                                                                                                        and the development of new datasets,
    17 https://www.england.nhs.uk/coronavirus/wp-content/uploads/sites/52/2020/03/phe-letter-to-trusts-re-daily-covid-
    19-hospital-surveillance-11-march-2020.pdf
    18 http://theodi.org/wp-content/uploads/2021/04/Data4COVID19_0329_v3.pdf
    19 https://www.cuebiq.com
    20 https://www.google.com/covid19/mobility
                                                                                                                                                        repositories and initiatives that mobilised
    21 https://dataforgood.fb.com/docs/covid19
    22 https://covid.i-sense.org.uk                                                                                                                     COVID-19 research.
    23 https://www.evergreen-life.co.uk/covid-19-heat-map
    24 https://coronavirus.data.gov.uk
    25 https://public.tableau.com/profile/phs.covid.19#!/vizhome/COVID-19DailyDashboard_15960160643010/Overview
    26 https://www.ons.gov.uk/peoplepopulationandcommunity/birthsdeathsandmarriages/deaths/articles/coronavirus-
    covid19relateddeathsbyethnicgroupenglandandwales/2march2020to15may2020                                                   34 https://royalsociety.org/topics-policy/Health%20and%20wellbeing/ramp
    27 https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthandwellbeing/bulletins/coro-            35 https://www.improbable.io/blog/improbable-synthetic-environment-technology-accelerates-uk-pandemic-modelling
    navirusandthesocialimpactsongreatbritain/4december2020                                                                   36 https://rs-delve.github.io
    28 https://www.ons.gov.uk/economy/nationalaccounts/satelliteaccounts/bulletins/coronavirusandhowpeoplespent-             37 https://royalsociety.org/news/2020/05/delve-group-publishes-evidence-paper-on-use-of-face-masks
    theirtimeunderrestrictions/28marchto26april2020                                                                          38 https://royalsociety.org/news/2020/07/delve-opening-schools-should-be-prioritised-report
    29 https://www.healthdatagateway.org                                                                                     39 http://traverse.org.uk
    30 https://ukhealthdata.org                                                                                              40 https://www.involve.org.uk
    31 https://www.decovid.org                                                                                               41 https://www.bangthetable.com
    32 https://saildatabank.com                                                                                              42 https://www.adalovelaceinstitute.org/project/rapid-online-deliberation-on-covid-19-technologies
    33 https://www.ed.ac.uk/usher/dataloch                                                                                   43 https://www.adalovelaceinstitute.org/report/confidence-in-crisis-building-public-trust-contact-tracing-app
                                                                                                                             44 https://thetrinitychallenge.org
8                                                                                                                        9
The Turing’s response to COVID-19                                                     Understanding vulnerability to health-related
                                                                                           misinformation
     Since the beginning of the pandemic, The Alan Turing Institute has been
                                                                                           In response to the growing problem of misinformation around COVID-19, the Turing’s
     working to tackle the spread and effects of COVID-19. Here are some of                public policy programme launched a project, funded by The Health Foundation, to
     our key projects – visit our dedicated webpage for more info.                         understand who is most vulnerable. They found that people with lower numerical,
                                                                                           health and digital literacy tend to fare worse at assessing health-related statements,
                                                                                           which suggests that developing people’s literacies has the potential to make a big
                                                                                           difference to their ability to identify misinformation. The researchers are now hoping
                                                                                           to feed into government policy-making around measures to counter the problem.

     Helping London to navigate lockdown safely
                                                                                                              Estimating positive COVID-19 test counts
     As London locked down in spring 2020, a team in the Turing’s data-centric
     engineering programme began Project Odysseus to monitor levels of                                        The Turing has partnered with the Royal Statistical Society to
     activity on the city streets. Working with the Greater London Authority                                  provide modelling and machine learning expertise to the UK
     (GLA) and Transport for London (TfL), the researchers fed their algorithms                               government’s Joint Biosecurity Centre. One of the key outputs so
     with data from traffic cameras and sensors to provide anonymised, near-                                  far is a statistical model that uses incoming COVID-19 test data
     real time estimates of pedestrian densities and distances. TfL used this tool                            to estimate (‘nowcast’) the total number of positive tests in local
     during the pandemic’s first wave to make numerous interventions to keep                                  authorities. It can take up to five days for PCR tests to be processed
     people socially distanced, such as moving bus stops, widening pavements                                  and reported, so these nowcasts will give authorities an earlier
     and closing parking bays.                                                                                picture of the disease’s spread and aid decision-making.

     Combining data from NHS trusts                                                                             Improving the accuracy of the NHS
     Initiated by the Turing and four other partners, the
                                                                                                                COVID-19 app
     DECOVID project has created a detailed database of
     anonymised patient health data. A major breakthrough                                                       Turing researchers have played an integral role in the
     has been the transfer and combination of data from                                                         development of the NHS COVID-19 app, providing technical
     two NHS trusts, covering 185,000 patients. The work                                                        advice to the Department of Health and Social Care. The
     continues with in-kind contributions from researchers                                                      researchers improved the algorithm behind the app’s Bluetooth
     around the UK, who are analysing the data to shed light                                                    Low Energy contract tracing technology, so that it more
     on four key clinical questions, including when to put                                                      accurately calculates the risk that the phone’s user has been
     critical COVID-19 patients onto a ventilator, and how                                                      in contact with a COVID-positive person. A statistical analysis
     patients with long-term health conditions are affected by                                                  published in Nature estimated that the app prevented around
     COVID-19. The first results are expected later in 2021.                                                    600,000 COVID-19 cases in October-December 2020 alone, due
                                                                                                                to people self-isolating following contact with an infected person.

     Modelling the spread of COVID-19 in urban areas
                                                                                                     Building resilience against future crises
     The Turing led the urban analytics workstream of the Royal Society’s Rapid
     Assistance in Modelling the Pandemic (RAMP) initiative. Using highly realistic data             A new two-year, multidisciplinary project at the Turing is seeking to better
     of people’s daily activities in towns and cities, the workstream developed a model              understand how interconnected health, social and economic systems are
     that simulates COVID-19 transmission at an individual level. A demonstration                    affected by shocks such as the pandemic. The ‘Shocks and resilience’
     model of Devon’s entire population allowed researchers to compare the impact of                 project will develop a coupled epidemiologic and socio-economic model
     different lockdown strategies. The team is now scaling up its model to a national               of the spread and societal effects of COVID-19, as well as more generalised
     level, and is in dialogue with policy makers about using it to inform decision-                 models for other complex, socio-economic systems. The overall aim is to
     making in this pandemic and future health emergencies.                                          produce data, methodologies and tools to enable policy makers to make
                                                                                                     better informed decisions, boosting the resilience of local and national
                                                                                                     governments against future shocks.
10
2. Data access and standardisation                                                                           Suggestions                                                techniques work robustly, and at scale, with
                                                                                                                                                                             real-world data, while also considering data
                                                                                                                  Reflecting on these data challenges,                       governance principles and practices.
                                                                                                                  participants made the following suggestions
     Despite the contributions highlighted in          expertise to the national effort. However, there           for the data science and AI community:                  – Automate data collection systems.
     Chapter 1, workshop participants articulated      was a strong perception in the workshops that                                                                        Data collection is resource-intensive, so
     a number of challenges that hindered the          providing researchers with better data access              – Aspire to a research culture in which                   automated data collection could reduce
     efforts of the data science and AI community      and generating a more level playing field                    data are shared as openly as legal and                  the reliance on frontline staff to record
     to contribute to the national effort. The most    regardless of affiliation and connections (see               ethical obligations permit, with central                data using manual systems. This aspiration
     prominent of these were around data access        also Chapter 3) would likely have helped better              repositories, or ‘data lakes’, for cleaned and          needs to be set against the obvious need
     and standardisation – not having access to        address the knowledge and policy challenges                  anonymised data (or weblinks to data) ready             to ensure the rights of individuals and
     the desired data, or having data that were not    during the pandemic.                                         for analysis. Document which data exist (to             communities to privacy and agency.
     suitably formatted or documented.                                                                              avoid duplication of effort) and make them
                                                                                                                                                                          – Encourage more data sharing
                                                                                                                    widely available.
                                                       Data standardisation                                                                                                 agreements. Develop more general data
     Data access                                                                                                  – Data repositories should signpost to                    sharing agreements as opposed to those for
                                                       Many participants also noted a lack of data
                                                                                                                    open datasets and, for non-open data,                   narrow purposes/groups of users to reduce
     While some members of the data science and        standardisation as a significant hurdle for data
                                                                                                                    to the details of data sharing agreements               barriers and delays.
     AI community had easy and rapid access to         scientists during the pandemic. Different data
                                                                                                                    and protocols for securely accessing
     relevant data, several participants commented     standards and codification of metadata, and                                                                        – Develop protocols for minimum
                                                                                                                    the datasets. This can ensure the wider
     that many researchers were limited in their       lack of dataset documentation, meant that data                                                                       standardisation of data fields (including
                                                                                                                    provision and availability of shared datasets
     contributions because access to data was          were difficult to find, link and assess in terms of                                                                  some baseline data to always be included)
                                                                                                                    and equitable data access.
     often restricted, inconsistent or slow.           missingness and biases, limiting the scope of                                                                        to ease linking of datasets and comparable
                                                       and confidence in analyses.                                – Investigate ways in which access to                     metadata. As part of this, consider following
     For example, some hospitalisation data                                                                         sensitive data (e.g. from the NHS) may                  the ‘FAIR Guiding Principles for scientific
     were not available early in the pandemic,         For example, ideally it would have been                      be enabled while respecting professional,               data management and stewardship’:47
     and geographically disaggregated data for         possible to link datasets from different studies,            ethical and legal obligations surrounding the           Findability, Accessibility, Interoperability and
     local analysis and crafting of solutions (e.g.    but many studies and consortia funded                        data. This will require developing new ways             Reusability.
     local lockdowns) were also not available          by UK Research and Innovation (UKRI, e.g.                    to allow researchers to securely access
     to all relevant academic groups. Further,         ISARIC4C45 and PHOSP-COVID46) are relatively                                                                       – Conduct a stress test of the new
                                                                                                                    personal data, perhaps using differential
     local-level economic data on employment           standalone. It would also have been beneficial                                                                       standardisation specifications by using
                                                                                                                    privacy or federated learning techniques.
     and production by industry sector were            to standardise codes across datasets, including                                                                      regular events (such as the annual influenza
                                                                                                                    This in turn presents new research
     not accessible or existent (and possibly          variables for vulnerable and underserved                                                                             season) to prepare for the next pandemic.
                                                                                                                    challenges around making these latter
     often both), making the economic impact of        groups, ethnicity and socio-economic status.
     proposed lockdown measures on different           Better data standardisation would have
     industries, such as the hospitality sector,       enabled data scientists to more easily find,
     more difficult to assess. More broadly,           understand, link and triangulate information
     systematic data on non-pharmaceutical             from different datasets in order to better
     interventions (social distancing, mask wearing,   answer policy-relevant questions and to allow
     lockdowns), and particularly compliance           testing for individual-level or context effects on
     with such interventions, have not been made       outcomes (e.g. healthcare, employment, public
     appropriately available, making it difficult      information). As editors, we acknowledge that
     to measure the impact of these policies on        this aspiration for linked and standardised
     behaviour.                                        datasets will be difficult to achieve in the near-
      Consequently, insights from the data science
                                                       term: different research communities have
                                                                                                                  1
                                                                                                                                                               Aspire to a research culture in which
      and AI community were not as informative and
                                                       different standards, formal or informal, different
                                                       modes of working, and often use similar                                                                 data are shared as openly as legal
      robust as they could have been. It is unclear
      how much the quantity and quality of research
                                                       terminology with quite distinct meanings.
                                                       Nevertheless, the value of such datasets is
                                                                                                                                                               and ethical obligations permit, with
      could have been increased if a larger share of
     1the community had been able to contribute its
                                                       clear, and much more could clearly be done.                                                             central repositories, or ‘data lakes’, for
                                                                                                                                                               cleaned and anonymised data ready
                                                                                                                                                               for analysis.

     45 https://isaric4c.net
     46 https://www.phosp.org                                                                                     47 https://www.go-fair.org/fair-principles
12                                                                                                           13
3. Inequality and exclusion
                                                                                                        not have access to the internet or a computer,              Suggestions
                                                                                                        or who have not engaged with initiatives due to
                                                                                                        historical distrust within their community.                 Reflecting on these challenges, participants
                                                                                                                                                                    made the following suggestions for the data
The COVID-19 pandemic has brought societal         level. For instance, participants cited problems     Finally, participants commented that the                    science and AI community. We note that many
inequality into sharp focus, with the disease      around the standardisation of capture of socio-      data science and AI community had been                      of the suggestions in Chapter 2 regarding data
having a much greater impact on some groups        economic status and ethnicity.                       insufficiently engaged with disadvantaged                   access and standardisation are also relevant
than others. In England and Wales during the                                                            groups on what COVID-19-related research                    to the data privilege issues highlighted in this
first wave of the pandemic, nearly all minority    While participants acknowledged that a robust
                                                                                                        would be useful for them, with the Ada                      chapter.
ethnic groups had higher mortality rates than      and rigorous system exists for obtaining
                                                                                                        Lovelace Institute cited as one of the few
the White ethnic population.48 There were          epidemiological data via institutions such as                                                                    – Prioritise understanding the impact of
                                                                                                        organisations talking to these groups.
also marked socio-economic differences, with       Public Health England, they noted that such                                                                        COVID-19 on different ethnic and social
mortality rates in the most deprived areas         a system does not exist for wider economic,                                                                        groups, and ensure that these insights
                                                   mobility and socio-economic data, and that this      Inequality and exclusion within the research
around double those in the least deprived                                                                                                                             are considered in conjunction with other
                                                   gap hampered socio-economic research in              community
areas.49                                                                                                                                                              key risk factors (e.g. age, pre-existing
                                                   the early weeks of the pandemic. Participants        Many of the issues around data access                         health conditions, profession, care home
Alongside these broader societal issues,           also noted a lack of sufficiently granular, local-   highlighted in Chapter 2 are also issues of                   residency). This would help promote
problems of inequality and bias relating to data   level data on transmission of COVID-19 across        inequality, with some researchers having much                 inclusion of underserved groups by funders,
science and AI have also been highlighted          communities, which hid some of the social            easier access than others to sufficient quality               researchers designing studies, reviewers
by the pandemic. There have been long-term         inequalities associated with the initial phase of    data. Workshop participants recognised that                   evaluating focus areas, and teams delivering
concerns about the potential of these research     the pandemic.                                        researchers with access to certain people                     modelling projects.
areas to marginalise certain groups, but the                                                            or institutes were more privileged, leading to
rise during the pandemic of new datasets, data                                                                                                                      – Address deficiencies in data sources. For
                                                                                                        “data haves and data have-nots”. For instance,
                                                                                                                                                                      example, the physical, mental health and
capture/analysis methods and data-driven           "The COVID-19 pandemic has                           those who had access to members of SAGE
                                                                                                                                                                      economic consequences of the pandemic
technologies such as contact tracing apps,                                                              or who had established relationships with
plus ongoing discussions around immunity           brought societal inequality                          agencies in health and other relevant areas
                                                                                                                                                                      are likely to be severe on some groups, and
and vaccine passports, has meant that these                                                                                                                           capturing any missing or incomplete data
issues are more pertinent than ever. We need       into sharp focus."                                   were able to contribute more easily to the
                                                                                                        research response.
                                                                                                                                                                      could be key to understanding the medium-
ongoing scrutiny to ensure that any tools and                                                                                                                         and long-term impacts on underrepresented
technologies developed by the data science                                                              There was also geographical inequality within                 groups.
                                                   Participants raised concerns that the scramble
and AI community respond to people’s health,                                                            the data science response, with some areas                  – Develop clear protocols for collecting
                                                   to use existing datasets, or to rapidly create
socio-economic and security needs in a fair                                                             of the UK having far greater access to local                  data on protected characteristics,
                                                   new ones, ran the risk of proceeding without
and equitable manner.                                                                                   data and success working with it than others.                 including, for example, ethnicity, sex,
                                                   due regard for sampling biases. These biases,
                                                                                                        Participants commented that higher quality,                   age and socio-economic status. Better
Inequality and exclusion were common               such as insufficient representation of minority
                                                                                                        spatio-temporal data could have assisted the                  monitoring of this information in studies and
themes in the workshops, and the participants’     groups, could be ‘baked into’ datasets, and
                                                                                                        support of policy at both a local and national                clinical trials could help identify critical data
comments fell into two main categories:            may be due to systemic discrimination,
                                                                                                        level.                                                        gaps.
                                                   structural inequalities and/or data collection
– Inequality and exclusion within society          constraints. This could in turn lead to biased       Finally, participants noted a lack of diversity             – Develop clear protocols for generating
                                                   research and policies that exacerbate pre-           in the data science and AI community,                         anonymised and synthetic data, so that
– Inequality and exclusion within the research
                                                   existing inequalities. Data availability, detail     which could bias research or policy in                        important demographic information can
  community
                                                   and representativeness during clinical trials        a number of ways, and emphasises the                          be included in open datasets (one example
                                                   also had shortcomings, which inhibited the           need to make diversity more of a priority                     is the OpenPseudonymiser51 application
Inequality and exclusion within society
                                                   inclusion of patient subgroups, and may have         in recruitment. This lack of diversity could                  developed at the University of Nottingham).
Participants felt that the potential of the data   made it more difficult to assess, for example,       lead to lower engagement with vulnerable                      Great strides have been made over the past
science and AI community to help understand        the efficacy of COVID-19 treatments and              and underrepresented groups, for instance,                    decade in developing realistic synthetic
the impact of COVID-19 on different ethnic and     vaccines in minority ethnic groups.                  potentially resulting in data and policy biases               simulation data (e.g. in ‘digital twins’
socio-economic groups was not fully realised                                                            due to insufficient representation. The low                   projects). Such work can be leveraged and
due to several challenges.                         Participants also noted significant inequality
                                                                                                        take-up of vaccination in some minority                       extended to provide synthetic datasets for
                                                   and exclusion challenges around the use of
Many participants commented on a lack of                                                                ethnic groups demonstrates the need for                       modelling potential future pandemics and
                                                   mobility, mobile device and other digital data in
relevant or consistent data. While ONS and                                                              communication and engagement by trusted                       their impact on minority groups.
                                                   behavioural analysis. For instance, traditionally
other public data providers released up-to-date                                                         representatives50 – and these often differ from
                                                   underrepresented groups also tend to be
COVID-19 data on minority groups, many data                                                             the demographics of national representatives.
                                                   scarce in digital data – especially those on the
sources had gaps at both the national and local    ‘invisible’ side of the digital divide, who might
                                                                                                        1
                                                                                                            43

                                                                                                        50 https://www.theguardian.com/world/2021/mar/25/clinics-pop-up-in-london-to-help-low-vaccine-take-up
                                                                                                        51 https://www.openpseudonymiser.org
4. Communication
     – Make more granular data available at the                 – Increase the diversity of representatives
       local level, and expertise and resources                   in academia, the government and
       easier to find and deploy, so that local                   the media. Increasing diversity and
       organisations have the autonomy to work                    descriptive representation in organisations
       with their data (as opposed to the data                    – among researchers and UK government                   Although collaboration within the data science                 – Make greater strides to bring
       being held, and decisions made, centrally).                representatives alike – can create smarter,             and AI community increased during the                            transdisciplinary groups together. As
       This can increase local resilience to health               more innovative and more inclusive teams                pandemic, as demonstrated by the examples                        well as research groups, this includes
       emergencies.                                               and solutions.52 For example, diverse                   in Chapter 1, issues of communication were                       local communities, health and social care
                                                                  research teams will be key to addressing                frequently raised by workshop participants,                      providers, local partners, analysts, third
     – Increase engagement with minority                                                                                  falling into three main categories:                              sector organisations, private organisations,
                                                                  the issue of algorithmic bias. Increasing
       and underrepresented communities.                                                                                                                                                   and industry.
                                                                  diversity also sets role models for future              – Communication between experts
       Improve data representativeness, minimise
                                                                  generations and thus provides lasting                                                                                  – Speed up the research pipeline by
       algorithmic bias and develop trust by                                                                              – Communication between researchers and
                                                                  benefits. One way to improve diversity is                                                                                developing a robust research/analysis/
       engaging and consulting with representative                                                                          policy makers
                                                                  to make it a core criterion in hiring and                                                                                review framework to reduce delays in
       groups, such as ‘citizen groups’, at regular
                                                                  appointment decisions.                                  – Communication between researchers and                          publishing ideas while maintaining high
       intervals. Develop mechanisms for well-
       funded community involvement in setting                                                                              the public                                                     standards of quality and transparency.
       research priorities and for the end-to-end                                                                                                                                        – Promote global data sharing for health
       participatory design of projects.                                                                                  Communication between experts                                    emergencies. Data sharing agreements
                                                                                                                          Connecting expertise – the right people to                       between the UK and other countries take
                                                                                                                          the right data to the right problem – was                        time to organise, and many were previously
                                                                                                                          identified by workshop participants as an                        coordinated and/or financially supported by
                                                                                                                          important bottleneck in working to understand                    EU organisations, which will undoubtedly
                                                                                                                          COVID-19, constraining the community’s                           become a more complicated endeavour
                                                                                                                          ability to respond. More collaboration between                   post-Brexit. Having more flexible regulations
                                                                                                                          often disparate groups, such as data asset                       for future emergencies will help facilitate
                                                                                                                          holders, subject matter experts, researchers                     global learning as well as research and
                                                                                                                          and clinicians, could have helped to share                       policy solutions in the UK. The editors
                                                                                                                          expertise, avoid duplication, and improve                        note that facilitating safe and equitable
                                                                                                                          analyses.                                                        data sharing between countries is also
                                                                                                                                                                                           recommended by the 'Data for international
                                                                                                                          Participants also identified a need for more                     health emergencies' statement published by
                                                                                                                          international collaboration. While data                          the science academies of the G7 nations in
                                                                                                                          access and exchanges across national                             March 2021.54
                                                                                                                          borders happened to some degree during
                                                                                                                          the pandemic, such as through the EULAR                        Communication between researchers and
                                                                                                                          COVID-19 Database,53 many other efforts                        policy makers
                                                                                                                          remained decentralised, informal and ad hoc.
                                                                                                                          More could be learned by the UK from other                     The transfer of knowledge from academic
                                                                                                                          countries in similar situations, particularly                  research to real-world policy was a crucial

     Increasing diversity and                                                                                             those with greater experience or capacity in
                                                                                                                          dealing with health emergencies.
                                                                                                                                                                                         aspect of the scientific response to the
                                                                                                                                                                                         pandemic. Within the data science and
     descriptive representation                                                                                                                                                          AI community, participants noted that
                                                                                                                                                                                         communication with policy makers could be
                                                                                                                          Suggestions
     in organisations – among                                                                                             Participants made the following suggestions
                                                                                                                                                                                         improved in order to better identify policy
                                                                                                                                                                                         needs and improve knowledge transfer.
     researchers and UK government                                                                                        for the data science and AI community:
                                                                                                                                                                                         Participants also highlighted a lack of
     representatives alike – can
     1
                                                                                                                           – Develop collaborative working
                                                                                                                             relationships between data scientists
                                                                                                                                                                                         transparency in policy-making. It was difficult
                                                                                                                                                                                         to know which studies had ‘cut through’ and
     create smarter, more innovative                                                                                         and clinicians. Data scientists can provide                 been considered by government and advisory
                                                                                                                             insights about the collection and storage                   groups when making policy interventions, and
     and more inclusive teams and                                                                                            of data that clinicians may not be aware                    which data policy makers were using to inform
                                                                                                                             of, while clinicians can provide valuable                   their decisions. Increased transparency would
     solutions.                                                                                                           1
                                                                                                                             insights about the multi-dimensional nature                 help researchers to understand what scientific
                                                                                                                             of health and social data collection.                       approaches and insights are most valuable to

                                                                                                                          53 https://www.eular.org/eular_covid19_database.cfm
     52 https://hbr.org/2016/11/why-diverse-teams-are-smarter ; https://www.turing.ac.uk/news/where-are-women-map-        54 https://royalsociety.org/-/media/about-us/international/g-science-statements/G7-data-for-international-health-emer-
     ping-gender-job-gap-ai                                                                                               gencies-31-03-2021.pdf
16                                                                                                                   17
policy makers, and would also give the public             Workshop participants agreed that, although             – Increase the understanding and                          – Use data visualisation: this can be
     a clearer picture of the evidence behind policy           there had been successful examples of public              involvement of data scientists in media                   an effective means of communicating
     decisions, potentially increasing public trust            engagement from the community during the                  processes. Media outlets have increasingly                complex scientific topics to non-specialists.
     and compliance.                                           pandemic, there were also shortcomings                    referenced preprints, which are of                        Participants cited Harry Stevens’s ‘corona
                                                               in communication, particularly around the                 varying quality given preprint protocols                  simulator’ article in The Washington Post as
                                                               limitations and uncertainties of research.                and accelerated peer review processes.                    a particularly effective example.59
     "The transfer of knowledge                                Helping the public to understand the findings             Communicate to the public what is and what
                                                                                                                                                                                 – Support independent input and
                                                               and caveats of modelling studies, for example,            is not (or not yet) quality, peer-reviewed
     from academic research                                    could enhance trust in the research and                   research.
                                                                                                                                                                                   communication: workshop participants
                                                                                                                                                                                   highlighted the benefits of an Independent
     to real-world policy was a                                increase support for and compliance with
                                                               policies. Trust and acceptance of policies might
                                                                                                                       – Consider building on pre-existing                         SAGE,60 which provided a model for
                                                                                                                         mechanisms to support communication,                      how scientists can analyse, interpret
     crucial aspect of the scientific                          also be increased by communicating how data
                                                                                                                         for example via the Science Media                         and comment on a situation from a more
                                                               science and modelling can promote positive
     response to the pandemic."                                health outcomes. For example, targeted
                                                                                                                         Centre,58 which curates expert reactions
                                                                                                                         to noteworthy science stories, helping
                                                                                                                                                                                   politically neutral standpoint.
                                                               communication might have helped improve
                                                                                                                         journalists to access nuanced and unfiltered
                                                               compliance with the isolation notifications sent
                                                                                                                         information.
     Suggestions                                               through the NHS Test and Trace system.57
     Participants made the following suggestions
     for the data science and AI community:                    Suggestions
                                                               Participants made the following suggestions
     – Build sustainable links between the data
                                                               for the data science and AI community:
       science community and those who are
       closer to policy-making.                                – Provide clear, simplified, accessible
     – Provide training for policy makers to help                information for non-specialists about
       them understand the research process,                     studies’ findings and predictions, as well as
       e.g. study design and limitations, and the                the underlying research designs (data used,
       uncertainty around estimates and findings.                data quality, computational processes, etc.).

     – Provide training for researchers and                    – Communicate transparently about the
       public health professionals to help them                  limitations of studies, such as uncertainty
       better communicate to policy makers the                   in the models/predictions and potential
       findings, limitations and uncertainties of                biases in the data.
       their studies and models.                               – Build public trust by addressing concerns
                                                                 related to data security, privacy and
     Communication between researchers and                       confidentiality, and by communicating how
     the public                                                  the data and models are informing policies.
      Throughout the pandemic, data science and                – Take a more proactive role in countering
      AI have been in the public eye as never before.            misinformation, for example about testing,
      Every day presents a new swathe of statistics              contact tracing and vaccines. More effective
      about COVID-19, and data scientists have                   communication could have mitigated
      been increasingly called upon to communicate               against the many counter-messages, ‘fake
      their research to non-specialists. Meanwhile,              news' stories, and biased reporting that
      the public has been able to directly input into            emerged during the pandemic. Also, present
      COVID-19 research through initiatives such as              data in a way that is complete and not prone
      the COVID Symptom Study app,55 and there                   to misinterpretation.
      have been vigorous debates on the ethics of
                                                               – Train researchers in how to communicate
      AI and algorithmic bias, especially around the
                                                                 their findings to the public and media,
     1UK GCSE and A-Level grading controversy in
      summer 2020.56
                                                                 especially around the use of models in                                                            Throughout the pandemic, data
                                                                 making decisions and predictions.
                                                                                                                                                                   science and AI have been in the
                                                                                                                       1                                           public eye as never before.

     55 https://covid.joinzoe.com                                                                                      58 https://www.sciencemediacentre.org
     56 https://en.wikipedia.org/wiki/2020_UK_GCSE_and_A-Level_grading_controversy                                     59 https://www.washingtonpost.com/graphics/2020/world/corona-simulator
     57 https://www.medrxiv.org/content/10.1101/2020.09.15.20191957v1                                                  60 https://www.independentsage.org
18                                                                                                                19
5. Conclusions                                                                                               Acknowledgements
     Since the first UK lockdown in March 2020,         is key to reducing the chances of data and                We thank all conference and workshop                This work was supported by Wave 1 of the
     when little was known about COVID-19, our          research being misused or misinterpreted.                 participants who have shared their experiences      UKRI Strategic Priorities Fund under the
     knowledge of the disease and its effects have                                                                and insights with us, and have contributed to       EPSRC Grant EP/T001569/1, particularly
     much progressed. The UK’s data science and         Workshop participants have made many                      this report.                                        the ‘Health’ theme within that grant and The
     AI community has played an integral role in        suggestions for how the community might                                                                       Alan Turing Institute. We also acknowledge
     this scientific effort, and this report provides   address these challenges. These include                   We thank Jessie Wand for her amazing efforts        the contributions of the ‘Health and medical
     a snapshot of the community’s contributions.       the provision of accessible and centralised               in organising the complex series of conference      sciences’ programme at The Alan Turing
     But equally, the pandemic has pulled into sharp    ‘data lakes’; more equitable data access;                 meetings and workshops that led to this report,     Institute.
     focus a number of areas where we can and           protocols for data standardisation; increased             and James Lloyd, science writer, for his tireless
     should do better.                                  representation of, and engagement with,                   efforts in pulling the final draft of this report
                                                        minority groups; training for researchers in              together. We are also grateful to the Centre
     First, participants reported difficulties          communicating their work to non-specialists;              for Facilitation for holding the workshops and
     accessing sufficiently timely, robust, granular,   and initiatives to increase public understanding          writing the workshop notes.
     standardised and documented data. There            of research findings and uncertainty, to better
     were also issues of privilege, with some           counter misinformation.
     researchers able to access data much more
     easily than others.                                If the community can make progress in these
                                                        areas, then when we are next faced with a
     Second, the pandemic has highlighted pre-          pandemic – and the historical record strongly
     existing societal issues of inequality and         suggests that this is a ‘when’ rather than an ‘if’
     exclusion, and the role that data science          – we should be better placed as a collective to
     and AI can play in mitigating or exacerbating      respond.
     these. In our workshops, participants raised
     concerns about a lack of data about, and           Navigating our way through the pandemic
     engagement with, minority ethnic and socio-        without the knowledge and resources of
     economic groups, and a lack of diversity within    the data science and AI community would
     the research community and decision-making         have been markedly more difficult. These are
     organisations.                                     transformational times for the community as
                                                        its research becomes ever more embedded
     Third, the pandemic has underlined both the        in everyday life. We need to draw on our
     importance and difficulty of communicating         experiences during this pandemic to ensure
     transparently with other researchers,              that data science and AI continue to change
     policy makers and the public, particularly         lives for the better.
     around issues of modelling and uncertainty.
     Participants agreed that better communication

20                                                                                                           21
You can also read