Using Artificial Intelligence to Support Healthcare Decisions - A Guide for Society

Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

                                                                                             Why we need this guide
                                                                                             Artificial intelligence (AI) is software that can        Similarly, doctors and patients need to
                                                                                             use large amounts of data to assess and make             understand how reliable their AI-based
                                                                                             predictions – things that human ‘computing               information is when life-changing decisions are
                                                                                             power’ can’t do at all or can’t do quickly and           being made.
                                                                                             accurately. It is ‘intelligent’ because it works out
                                                                                             patterns in the data and tests them, rather than         But what if policymakers, healthcare agencies,
                                                                                             just identifying what it is instructed to find – for     journalists, doctors, and patients don’t know
                                                                                             example, finding patterns in genomic data that           the questions to ask about whether a new
                                                                                             might predict who gets a disease, where humans           breakthrough AI application is reliable or suitable
                                                                                             don’t yet know what to look for.                         for a particular use? What if they pass on flawed
                                                                                                                                                      information or make bad decisions because they
                                                                                             In healthcare, AI has made advances in                   don’t know where to find information about the
                                                                                             analysing data about how diseases progress. It is        model the AI is using? Who is accountable if
                                                                                             also being used to identify molecules that could         things go wrong?
                                                                                             make new drugs, diagnose medical conditions
                                                                                             more precisely, predict how patients will respond        This guide is not intended to train AI experts
                                                                                             to treatment, and improve the planning of                or show how interesting AI is, but to help with
                                                                                             resources such as hospital beds.                         the important conversations about its use in
                                                                                                                                                      healthcare. The guide is designed to equip
                                                                                             COVID-19 has sped up the introduction of                 patients, policymakers, journalists, clinicians
                                                                                             these new health technologies. For example,              and decision-makers with the questions for
                                                                                             the BenevolentAI platform took one weekend               discussing whether a technology is robust
                                                                                             to identify a drug that could be used to treat           enough for its intended use. It aims to transform
                                                                                             the new disease—conventional drug discovery              the conversation about AI from a complex and
                                                                                             methods would have taken eight years.1 But this          daunting one to an empowering one – one that
                                                                                             rapid introduction of technology has come with           can give us confidence in those technologies that
                                                                                             the trade-off of less time for robust testing.           do improve medical treatment and avoid harm
                                                                                                                                                      from those that don’t.
This guide was created through a partnership of: Lloyd’s Register Foundation Institute for   With AI development happening so rapidly, and
Public Understanding of Risk, a research institute at the National University of Singapore   healthcare providers using AI more and more,
                                                                                             it’s vital that more people know the important
committed to improving lives by transforming risk communication and the public
                                                                                             questions to ask about how reliable different
understanding of risk in Asia and internationally; the Korea Policy Center for the Fourth
                                                                                             applications are – the quality of the data they
Industrial Revolution, a research institute at KAIST working to understand and shape
                                                                                             are based on, and whether we can depend on
emerging technologies and governance of the Fourth Industrial Revolution for a better and    them to be right.
inclusive digital era; and Sense about Science, an independent charity that promotes the
public interest in sound science and evidence.                                               It is important for society to ask these questions
                                                                                             to make sure AI is used responsibly. This kind
We are grateful for the input and personal time given to us by the many data scientists,     of accountability makes a difference: patients
doctors, researchers and members of the public who were involved in the development and      asking questions about evidence and outcomes
testing of the guide.                                                                        has improved many aspects of healthcare.

Using Artificial Intelligence to Support Healthcare Decisions - A Guide for Society
Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

Terms                                                           p06      Questions to ask about AI in healthcare                               p12
Technical terms aren’t needed to ask the right questions. But where      What data is it based on?
they are used, it helps to know that terms like “AI”, “algorithm”,       To reduce the chance of the AI identifying false or misleading
“reliability”, “model” and “generalisability” have specific meanings.    relationships, it’s important to know how the data underpinning
                                                                         it was generated.
How AI is used in treating patients                             p08
                                                                         What assumptions is AI making about patients and disease?
AI is helping medical professionals in some fields to work more          An AI-supported diagnosis or treatment option might not be useful
quickly and accurately, but it can’t replace the doctor. Good use        if the results can’t be generalised across countries or groups, or if
of AI depends on its suitability for the decision and the expertise      key information is missing.
of the medical professionals interpreting it.
                                                                         How much decision weight can we put on it?

Reliability matters                                              p10     AI can only support a clinical decision if we know how well it performs.

There is a lot at stake. AI can base its recommendations on false        A reliable future                                                    p22
or misleading relationships it finds in the data, leading to bad
decisions. It can make biases in healthcare worse if the limits of the   To make sure we identify genuinely useful innovations, we must
data are not clear. We can only know how reliable AI is if its testing   ask the right questions now about the reliability of the AI being
and performance are clear. Understanding how to check on this is         used for different purposes. The questions in this guide will help
important for journalists who want to report on new developments         society create a benchmark for responsible discussion, that will
responsibly. It helps health authorities to select the applications      promote clarity and high standards for the use of AI in healthcare.
that genuinely improve patient treatment, and it helps the public
to have confidence in the right things.

Using Artificial Intelligence to Support Healthcare Decisions - A Guide for Society
Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                        Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society


Algorithm                                                       Artificial intelligence (AI)                    Model                                                         Reliability
A set of mathematical instructions to find or                   A machine or system that uses data and rules    An equation that an AI uses to represent                      How trustworthy an AI is or how consistently
calculate something. Algorithms can be used                     to make assessments or predictions like a       how conclusions can be made from data                         an AI produces the result we want (e.g. being
by AI to find relationships between things                      human would.                                    the AI hasn’t seen before. For example, new                   better at identifying the patients whose
(variables) in data.                                                                                            information about changes in smoking habits                   disease will improve with surgery) without
                                                                                                                can be used in a model to predict the number                  producing results we don’t want.
                                                                                                                of cases of lung cancer.
                                                                                                                                                                              It can also mean, technically, the ability of an
                                                                                                                                                                              AI to produce the same result every time.

Big data                                                        Generalisability                                Variable
A type of data that is large (volume), varies                   A measure of whether the conclusion made        A factor or characteristic that might be
in content and type (variety), and changes                      using a set of data is generally true or not.   relevant to answering a question. These
quickly (velocity).                                             For example, an AI that is not generalisable    could be numbers like age, weight, height,
                                                                can help with a diagnosis of bone conditions    temperature or income. Or they might fall
In the healthcare context, such data includes                   for only certain demographic groups but         into categories like eye or hair colour, ethnicity,
many variables (e.g. age, gender, height,                       not others.                                     field of work or hobbies.
weight, average weekly alcohol consumption,
smoking habits, chronic conditions, medical
treatments, test results and x-rays) and can
be in different formats (e.g. sounds, videos,
written records, images, charts and graphs).

Using Artificial Intelligence to Support Healthcare Decisions - A Guide for Society
Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                         Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

How AI is used in                                                                                                  Types of AI in healthcare

treating patients
AI is intended to help medical staff work quickly            In Germany, a diagnostic AI has been used to
and accurately and to make processes efficient.              detect potentially cancerous skin lesions. It was
                                                             tested against an international group of 58
Current AI-based software is limited to                      dermatologists and proved better at correctly
performing specific tasks to support a doctor’s              identifying the nature of more suspicious lesions.3
decision-making. It cannot perform complex
tasks such as making clinical decisions, and                 On the other hand, an eye disease diagnostic
doctors can consider things that the AI cannot,              developed by Google Health4 suffered from a
such as a patient’s cultural practices, when                 major drawback: the quality of many images
making a treatment plan.                                     taken by nurses was not high enough, so the
                                                             system rejected more than a fifth of the images       Clinical-decision support tools
At the current pace of technological development,            and more work had to be done to retake
this is likely to be the case for the near future: AI                                                              Medical devices and applications used
                                                             these images. The theoretical accuracy of the
can support but not replace the doctor.                                                                            by clinical practitioners to perform their
                                                             diagnostic prediction can only be realised if
                                                                                                                   work. AI is used in diagnostic imaging,
In South Korea, VUNO Med solutions are AI-                   medical professionals have the confidence and
                                                                                                                   predicting treatment outcomes, robotics
based diagnostic support systems that can read               training to use it.
                                                                                                                   in surgery and remote monitoring of
medical images or analyse biosignals. VUNO’s                                                                       patients who are using medical devices.
BoneAge assessment software compares bone
age with chronological age - for example, an                                                                       Patient-decision support tools
eight-year-old child whose bone age is nine
                                                                                                                   Medical devices and applications
years old is assessed to be growing too fast.2
                                                                                                                   used directly by patients or caregivers.
                                                                                                                   Examples include chatbots or other
                                                                                                                   online tools which help with self-
                                                                                                                   diagnosis, and lifestyle applications such
                                                                                                                   as fitness trackers.

                                                                                                                   Healthcare administration

                                                                                                                   Tools used by organisations to improve
                                                                                                                   operations and administration – AI
                                                                                                                   is used in resource allocation, cost
                                                                                                                   reduction (e.g. by reducing test
                                                                                                                   duplications) and automating processes
                                                                                                                   like dispensing medicines.

                                                                                                                   Therapeutics development

                                                                                                                   AI used in discovering new drugs and

Using Artificial Intelligence to Support Healthcare Decisions - A Guide for Society
Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                        Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

Reliability matters
The use of AI to help with diagnosis, predict the           Guarantees about privacy are not enough for a        So, scrutinising quality and reliability
outcome of treatment or prioritise resources is             technology to be useful, so key questions about      means checking that:
potentially life-changing.                                  the quality of data and reliability of AI need to
                                                            be asked.                                                      The source of the data is known
There is some suspicion about AI among the
public and healthcare practitioners. Its inner              Poor-quality data (or poorly understood data)
workings are difficult to see, which makes it               affects the accuracy of AI. Biases in AI arise
difficult to question or contest its conclusions, and       from missing or excluded data, existing bias in                The data has been collected
there are fears about how it uses personal data.            the training data or errors in the algorithm. Like             or selected for the purpose it’s
                                                            other data analytics, using data for a purpose                 being used for
Privacy issues are often raised, but reliability            it wasn’t collected for can introduce false or
issues have been neglected, perhaps because                 misleading relationships. And we can’t be sure
it is difficult to know how to question them. While         how reliably the AI performs if the model hasn’t               Limitations and assumptions
it is important for people to have confidence that          been rigorously tested in the real world.                      for that purpose have been
their data is secure, it is just as important to know                                                                      clearly stated
whether data is being used well. It’s unlikely that
any of us would accept a technology based on a
study with a 10-person sample size on the basis
                                                                                                                           Biases have been addressed
that the 10 participants’ data was kept safe.

                                                                                                                           It has been properly tested in
                                                                                                                           the real world

                                                                The use of AI to help with
                                                            diagnosis, predict the outcome of                    How do we know that someone has
                                                            treatment or prioritise resources                    done these checks? There are questions
                                                                                                                 that everyone can ask – whether a
                                                               is potentially life-changing.                     journalist, policymaker, clinician, patient
                                                                                                                 or relative – to find out. These questions
                                                                                                                 are set out in the next few sections.

Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                         Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

Q U E S T I O N S T O A S K A B O U T A I I N H E A LT H C A R E                                                   How was the data used to train the AI collected?

What data is it based on?                                                                                          If the data comes from an experiment, it should         AI systems trained using this type of data have
                                                                                                                   have been collected to answer a specific                a lower risk of having false or misleading
                                                                                                                   research question as part of a well-designed            relationships if those quality markers are there.
Data is obtained in different ways.                          So, the aspects of this question to consider are:     study. Signs of quality include:
                                                                                                                                                                           Observational data analysis involves looking
Experimental data (collected from experiments)               ` How the data used to train the AI was                                                                       at data that already exists and searching for
is collected to answer a specific question.                    collected                                                     A large sample size of participants           relationships between variables. There are
Researchers usually consider possible biases                                                                                                                               advantages to this approach, such as being
they will get in the data and what might be                  ` Whether the data represents the patients
                                                                                                                                                                           able to study many more variables than an
missing, and take steps to overcome these issues.              for whom the AI is being used
                                                                                                                             A control group of participants with          experiment would allow. While it is possible to
                                                             ` Whether the patterns and relationships                        similar characteristics to compare            correctly identify relationships with this type of
Observational data is recorded as we go about
                                                               identified by the AI are accurate                             results against (except for the variable      data, the data source should be clearly stated,
our business, such as withdrawing money from a
                                                                                                                             being measured)                               and information provided about the AI should
bank or travelling on public transport, and there            Not everyone will be able to ask about or assess                                                              include how biases have been considered.
is also administrative data that is recorded by              the details of these aspects, but any doctor,
institutions, such as speeding fines or the issuing          patient or reporter can insist on a clear statement                                                           We should also note if the data gathered
                                                                                                                             Error estimates
of prescriptions at hospitals. The biases and                of how these aspects have been addressed.                                                                     consists of objective measurements (e.g. vital
limitations in such data sources are usually not             Anyone commissioning the AI for use in health                                                                 signs from a device) or subjective self-reported
thought about until the records come to be used              services should be confident that they know                                                                   data (e.g. survey responses). Subjective data
as data for analysis.                                                                                                        A discussion of how well the research
                                                             the answers.                                                                                                  could have more inaccuracies or biases as
                                                                                                                             findings can be extrapolated to real life
                                                                                                                                                                           people’s responses vary for different reasons
All these data sources can be useful for
                                                                                                                                                                           and responses are self-selecting.
developing AI, but it’s important to consider
how good and relevant they are for a particular
purpose, especially if they’ve not been gathered
for that purpose.

For instance: ‘What factors cause patients
who have recovered from alcohol addiction                                                                          Singapore’s Health Promotion Board is collaborating with Apple on an app called LumiHealth.
to relapse?’                                                                                                       Developed in close collaboration with doctors and public health experts, LumiHealth aims to
Programmers might put together databases                                                                           deliver personal health recommendations based on factors such as age, gender and weight. These
containing information (variables) such as age,                                                                    recommendations are driven by AI using real-world data from users (obtained with consent) and
chronic medical conditions and genetic profiles.                                                                   include reminders to go for regular health checkups. By following the app’s recommendations, a user
                                                                                                                   can work towards weekly activity goals and participate
AI would look at these detailed datasets for                                                                       in challenges that aim to improve sleep habits and
relationships with relapse data.                                                                                   food choices.5

If the data came only from medical sources,                                                                        How personal and relevant a health app’s
the AI could miss potential major factors such                                                                     recommendations are depends on how the data behind
as unemployment and miss people who do not                                                                         it is gathered. LumiHealth uses user data carefully
engage with medical services.                                                                                      selected for relevance. But some apps don’t do that. If
                                                                                                                   an app uses observational data from other users of the
                                                                                                                   app to recommend when a person should visit a doctor,
                                                                                                                   the recommendation is likely to be skewed by the fact
                                                                                                                   that healthier people tend to use such apps.
Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                        Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

Does the data represent the patients for whom the AI is being used?                                                During the COVID-19 pandemic, Singapore rolled out the TraceTogether mobile app for contact
                                                                                                                   tracing. The idea behind it was the exchange of Bluetooth signals between mobile phones with
                                                                                                                   the installed app. Each phone could detect other participating TraceTogether phones nearby. The
Data might not be useful for training an AI if it            Overcoming problems with the representativeness
                                                                                                                   app estimated the distance between users and the duration of any time spent less than two metres
doesn’t represent the target population. It may              of data is a challenge. Some groups are under-
                                                                                                                   apart. Encrypted records of these contacts were stored on each user’s phone for 21 days. An app user
be missing information about different ethnicities,          represented in health studies so are under-
                                                                                                                   identified as having come into contact with a person who had tested positive for COVID-19 could
sexes, and age groups, and in some cases, this               represented in the data.
                                                                                                                   authorise their TraceTogether data to be accessed by the Ministry of Health (MOH). MOH would
missing information has important implications
                                                             Privacy regularly comes up in the public              then decipher the data and get the mobile numbers of the user’s close contacts from the previous
for health. For example, heart problems show
                                                             conversation about AI and the use of data.            21 days to contact-trace them, ask them to isolate, and test them.
up differently in men and women, or the data
may be based on people who can afford to seek                People are concerned that their personal
                                                                                                                   TraceTogether was unable to gain public trust. In June 2020, three months into the app’s launch,
treatment and therefore biased to the health of              medical data could be used to discriminate
                                                                                                                   approximately 30% of the population downloaded the application, falling short of the required
wealthier people.                                            against them. For this reason, certain categories
                                                                                                                   adoption rate of 50-70% for contact tracing apps to be effective. Many Singaporeans saw the app
                                                             of information, such as rare or genetic conditions,
                                                                                                                   as a phone surveillance mechanism. By December 2020, nine months after the app’s launch, its
In Germany, a skin diagnostic AI was trained                 require strong anonymisation procedures.
                                                                                                                   adoption rate had barely grown. However, through the government’s distribution of an alternative
and validated using images obtained primarily
                                                             Public concern about privacy influences whether       – an external device with the same function - Singapore achieved a 70% adoption.6
from fair-skinned people in the USA, Australia,
and Europe. If the algorithm bases most of                   people will share data, and this can affect the
its knowledge on how skin lesions appear                     accuracy of the AI’s recommendation by giving
on fair skin, then there’s a risk that lesions on            it too small a pool of information to draw reliable
patients with darker skin are more likely to                 conclusions from. By being transparent and
be misdiagnosed.                                             demonstrating the steps taken to check that the
                                                             AI is reliable, researchers and developers can
The absence of data from people with darker                  help give people confidence about providing
skin won’t make the diagnostic useless, if it can            their data.
                                                                                                                                                                                EXPOSURE                          400
                                                                                                                             CHECK IN                 SCAN QR                     ALERT                        EXCHANGES
reliably support diagnoses in some people. But                                                                                                                                                                   TODAY
clearly the absence of important data about
certain ethnic groups should be known about
in countries that have multiracial populations,
which is the case in many East and Southeast
Asian countries.

                                                                 Public concern about privacy
                                                                  influences whether people                        The experience of Singapore’s contact-tracing          Finding out if the data is appropriate for its
                                                                    will share data, and this                      app shows that real-world limits on data can           intended use helps to reduce the risk of AI systems
                                                                                                                   be hugely underestimated by app developers.            spotting false and misleading relationships.
                                                                   can affect the accuracy of
                                                                                                                   People are sometimes just not willing to provide
                                                                  the AI’s recommendation.                         the data that will make even a well-designed
                                                                                                                   application work. Some of these concerns may
                                                                                                                   be alleviated by greater transparency in the
                                                                                                                   technology itself, but in other cases they won’t.
                                                                                                                   We have to ask instead whether the application
                                                                                                                   is going to be fed with enough relevant data to
                                                                                                                   continue running reliably.
Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                          Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

Are the patterns and relationships identified by the AI accurate?                                                   Q U E S T I O N S T O A S K A B O U T A I I N H E A LT H C A R E

Data is fed into an algorithm, which can analyse             It shows the need to have a specific question to       What assumptions is the AI making
                                                                                                                    about patients and disease?
the data to find patterns between variables. An              answer when the AI is programmed to look for
AI can learn about these relationships between               patterns in a dataset, because it’s then less likely
variables as more data is fed, apply these                   to come up with random relationships that affect
relationships, and adjust them.                              the validity of the model.
                                                                                                                    If the algorithm analysing alcohol addiction
We’ve seen that the ability to quickly spot                  To make sure the relationships are real, anyone                                                                                PREDICTIVE MODEL
                                                                                                                    relapse data finds the relapse rate is higher
patterns in data is a key benefit of using AI in             commissioning an AI for healthcare should ask          among low-income people, then it might flag fall
healthcare and that it also presents challenges.             if it has been trained using big data and how          in income as a risk factor for relapse. Information                   Existing data on recovering
It’s possible that an AI might start to spot patterns        data scientists have identified the variables most     about relationships between variables can                           alcoholics who have relapsed is
that are not relevant.                                       relevant to what the AI is going to be used for.       then be used to create a predictive model – a                      fed into an algorithm to train it to
                                                             Moreover, even AI trained using big data can be        mathematical equation that uses information                         spot patterns and relationships.
Population-level data contains information                   rigorously tested using an independent dataset         about what happened in the past to make a
about lots of variables – for example, people’s              – as explained in the next section, AI providers       prediction about what could happen in the
age, gender, ethnicity, marital status, jobs,                should make clear whether this has been done.          future (see diagram on the right).
postcodes, what car they drive, whether they
are registered to vote. This type of data is often                                                                  How a model translates to the real world has                         The algorithm identifies which
called “big data”. If an AI searches through                                                                        implications for the reliability, generalisability                  variables are closely associated
enough big data, it will inevitably find patterns                                                                   and fairness of the AI.                                                with relapse (risk factors).
and relationships between variables that have
nothing to do with each other. This is known as                                                                     The essential aspects are:
data dredging.
                                                                                                                    ` That the right relationship is captured

                                                                                                                    ` Whether variables excluded from the model
                                                                                                                                                                                          A model is created that links
                                                                                                                      are indeed irrelevant                                             incidents of past relapses to the
                                                                                                                                                                                        variables associated with them.
                                                                                                                    ` Whether the results are generalisable

                                                                                                                    ` Whether the AI eliminates human prejudice
                                                                                                                      from decision-making

                                                                                                                                                                                            Data on new patients is
                                                                                                                                                                                           fed into the model as part
                                                                 We’ve seen that the ability to                                                                                             of a clinical assessment.

                                                                 quickly spot patterns in data
                                                                 is a key benefit of using AI in
                                                                  healthcare and that it also
                                                                                                                                                                                            The model uses the new
                                                                      presents challenges.                                                                                                  data to predict who is at
                                                                                                                                                                                            risk of relapse, using the
                                                                                                                                                                                          relationships it knows about.

Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                        Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

Is the right relationship captured?                                                                               Are the results generalisable?

Sometimes, observational data shows up                       prioritising patients from richer districts. There   AI doesn’t work well when it is required to make        Variables that could influence the generalisability
variables that seem to be related to each other              was a clear correlation between postcode and         a prediction or recommendation on something             of an AI application include age distribution,
(when one goes up, the other goes up or down).               length of hospital stay, but it is unlikely that a   that differs substantially from its training data       ethnicity, gender, geography and climate.
Those variables are “correlated”, but that doesn’t           person’s address itself causes them to stay          (the data used to develop it).                          Anyone commissioning an AI product should
mean one “causes” the other.                                 longer in the hospital.                                                                                      ask if the results are generalisable, and clinicians
                                                                                                                  In the VUNO Med-BoneAge example, the AI-                should feel confident in the accuracy of the AI’s
In 2017, the University of Chicago Academic                                                                       based diagnostic supporting solution is built on        recommendation for the particular group of
Hospital System (UCAHS) developed an AI to                                                                        South Korean population data and validated              patients they are treating.
predict patients’ length of stay.7 It was intended                                                                with multinational data. As the bone growth
to help doctors prioritise patients who were more                                                                 curve can vary according to race and ethnicity,
likely to be eligible for rapid discharge and free                                                                the accuracy of VUNO’s BoneAge assessment
up beds faster. The AI’s algorithm found patients’                                                                may differ when used with population data
postcodes to be one of the best predictors for                                                                    from other countries and ethnicities. In this case,
a length of stay. The postcodes associated                                                                        fine-tuning or retraining of the algorithm will be
with a longer length of stay were those in poor                                                                   necessary to make the output more accurate in
neighbourhoods. In effect, the AI recommended                                                                     Caucasian, Native American and African people.

Are the variables excluded from the model actually irrelevant?                                                    Does AI eliminate human prejudice from decision-making?

The real world contains many millions of                     effect of poverty on health is likely to be          One misconception about AI-supported                    Even representative data can embed prejudices,
variables changing at once. It would be                      compounded by a technology that diverts more         decision-making is that it is based on cold hard        biases and harmful assumptions. In the
impossible for a model to account for every                  treatment away from them.                            facts without prejudice. But an AI is trained on        complex modern world, AI predictions and
possible degree of change. Some variables                                                                         data from the real world. It sees the world the         recommendations can’t be divorced from social
may be readily available and others costly                   Anyone commissioning an AI product should            way it is, not as it could or should be. AI is not      realities. Anyone using an AI to aid a clinical
or even impossible to secure. There is only so               ask what variables might be missing from the         inherently good or bad, but it can compound             decision or any decision in a healthcare setting
much computer processing power to draw on                    model, why they are missing and how this might       unfairness in healthcare unless the limitations         should consider whether it has the capacity to
and so much time and money to spend. So, data                affect the outcome. It’s important for developers    of the data are understood by the developers.           encode prejudices.
scientists make assumptions and intentionally                to understand this themselves and have an open       Some AI research seeks to address these existing
exclude some variables.                                      and honest discussion about it with the people       biases through its programming.                         A commentator or patient can ask what
                                                             they are handing over the technology to.                                                                     assumptions are being made and how we are
In the Chicago hospital example, the developers                                                                   If blindly optimising the use of beds was the           sure these are fair, even if the AI technically does
needed to consider the missing variables.                                                                         sole objective, then using patients’ postcodes as       its job.
Perhaps there was a third factor at play, one                                                                     a proxy to predict who should be prioritised for
which caused people to stay longer in a hospital                                                                  treatment wouldn’t be so bad. But the ultimate          This doesn’t mean that any group should be
when they are ill. In this case, poverty: poor                                                                    problem that UCAHS ran into was that poorer             more concerned than another about how AI is
people live in neighbourhoods where housing                                                                       people in the USA are disproportionately African        used to support their treatment. When the right
is more affordable AND they also tend to have                                                                     American. By diverting treatment away from              conversations are had at the right time, everyone
poorer health outcomes and a higher risk of                                                                       patients who lived in poorer neighbourhoods,            involved can be confident in the clinical decision
suffering from chronic illnesses. The detrimental                                                                 the AI was prioritising white people over black         that’s made.
                                                                                                                  people. It only exacerbated existing racial health
                                                                                                                  inequalities in American society.
Terms   /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                           Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

Q U E S T I O N S T O A S K A B O U T A I I N H E A LT H C A R E                                                     Has its reliability been properly scrutinised?

How much decision                                                                                                    As well as accuracy, we should consider the AI’s        Where apps are based on collaboration

weight can we put on it?
                                                                                                                     reliability in making predictions. Independent          between public health and the private sector,
                                                                                                                     datasets can be used to test how good the AI            there is more opportunity to scrutinise reliability.
                                                                                                                     is at using the relationships it has identified to      For example, technologies developed in a
                                                                                                                     make a prediction about data it hasn’t seen             public-private collaboration are more likely to
We’ve seen that an AI’s performance depends                                                                          before – its reliability.                               have undergone clinical trials – set up to see
on the quality of data it is based on and what                                                                                                                               how well it performs against existing practices or
assumptions it makes about patients and                                                                              This is ideally done by holding back a section          human judgement.
disease. Taking all this into account makes it                                                                       of the training data and then seeing how well
more likely that the AI is of good quality, but is it                                                                the AI could identify the thing it’s looking for        Singapore’s LumiHealth app was developed via
good enough for its intended purpose?                                                                                or predict the outcome. Sometimes, an AI that           close collaboration between Apple and public
                                                                                                                     works well on the data used to train it is terrible     health authorities. To be authorised for use in
The essential aspects of this question are:                                                                          at making predictions from new data. That could         public health, the app needed to meet strict
                                                                                                                     be because the model has not weeded out                 criteria. Close collaboration with public health
` How well the AI really performs                                                                                                                                            experts reduced the risk of data not being
                                                                                                                     irrelevant variables or because the model has
` Whether its reliability has been properly                                                                          learnt the training data rather than its underlying     representative, because the app was not relying
  scrutinised                                                                                                        relationships. An AI that doesn’t make consistent       on volunteer-contributed datasets.
                                                                                                                     predictions on similar data is unreliable.
` Whether it makes a useful real-world

                                                                                                                     Does it make a useful real-world recommendation?

                                                                                                                     One way to determine this is to find out if the         deployment as it was during testing. The process
How well does the AI really perform?                                                                                 AI does any better than a human. It’s a good            would be led by experts independent of the AI
                                                                                                                     sign if healthcare professionals were involved in       developers and would show up failures and
                                                                                                                     the AI’s development or deployment. A clinician         unintended outcomes.
We need to know some basic performance                       But choosing the right way to measure
                                                                                                                     might look for trials that show whether the AI
measures that define how good the AI is at                   performance is important, and we should be                                                                      The process would also identify how the
                                                                                                                     performs better than, or at least as well as, their
predicting things or making recommendations.                 careful not to rely too heavily on theoretical                                                                  technology would work in practice when subject
                                                                                                                     trained colleagues.
One measure is accuracy (how often the AI gets               accuracy. With the hypothetical alcohol addiction                                                               to human errors in the way it’s used: for example,
its prediction right).                                       relapse AI, let’s say 10 in 100 recovering alcoholics   The German skin diagnostic AI was shown the             the performance of Google Health’s eye disease
                                                             in this dataset actually relapse after two years.       same images of skin lesions as an international         diagnostic was ultimately hampered by the fact
Google Health developed an AI system in                      If the AI is 85% accurate at predicting relapses,       group of 58 dermatologists. It correctly identified     that nurses were not confident in taking high-
Thailand to help identify diabetic retinopathy               then it’s wrong 15 times out of 100. That means it      the nature of nearly 87% of suspicious lesions          quality pictures.
and speed up the diagnosis process. The process              could miss every relapse and is not much use if         compared to 79% for the clinicians. This is one
took up to ten weeks while photos of patients’               it’s being used to assess who needs help.                                                                       We need, finally, to ask what is at stake. A lifestyle
                                                                                                                     good sign that the AI provided a useful aid to the
eyes were taken by nurses and dispatched to                                                                                                                                  app that gives people general advice about diet
                                                                                                                     clinician’s decision on treatment.9
a specialist for analysis. The AI system could               Even if the AI were highly reliable and                                                                         and exercise perhaps needs only to be roughly
produce results in under ten minutes with                    underpinned by the finest data, a clinician should      The AI might also be externally validated,              reliable. Where the real-world implications of
90% accuracy.8                                               consider its recommendation in the context of           which means tested in the real world. One               the AI being wrong will be very serious, though,
                                                             all the other medical evidence they have for a          example would be an AI-based healthcare                 we should expect to see strong evidence of test
                                                             particular diagnosis or treatment option. The           software company testing its program in a               data, trials and validation.
                                                             doctor makes the final decision.                        hospital setting to see if it was as accurate in
Terms     /   How AI is used   /   Reliability matters   /   Questions to ask about AI   /   A reliable future                                                 Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society

A reliable future                                                                                                            Project Teams                                           Contributors

                                                                                                                             LLOYD’S REGISTER FOUNDATION                             Dr Ashraf Abdul, Affinidi
Using AI to support clinicians in treating patients            By applying these questions, society can ensure               INSTITUTE FOR THE PUBLIC
holds great promise. From rapidly identifying                  AI developers’ solutions to modern healthcare                 UNDERSTANDING OF RISK                                   Prof Daniel Catalan, University Carlos III
new drug candidates in times of pandemic, to                   challenges are making good use of the data and                                                                        of Madrid
supporting the diagnosis of serious diseases,                  knowledge available, with minimal error, across               Prof Chan Ghee Koh, Lloyd’s Register
                                                                                                                             Foundation Professor, Director                          Prof Edward (Yoonjae) Choi, KAIST
helping hospitals to manage resources and                      different countries and populations, without
helping public health agencies to promote                      deepening inequalities that are already high.
                                                                                                                             Prof Leonard Lee, Deputy Director                       Dr Sarah Cumbers, Lloyd’s Register Foundation
healthy lifestyles, AI has demonstrated its value              These are the AIs that will make useful real-
and is here to stay.                                           world recommendations that clinicians can have                                                                        Dr Pin Sym Foong, National University
                                                                                                                             Nathaniel Tan, Senior Manager,
                                                               confidence in.                                                                                                        Health System
But problems arise if the quality of data                                                                                    Partnerships & Engagement
underpinning the AI is not properly scrutinised                As more people ask the questions in this guide,
                                                                                                                             Jared Ng, Assistant Manager,                            Prof Yong Jeong, KAIST
and if the AI’s reliability hasn’t been tested. From           more people in authority will expect to be asked. In
misdiagnosing a serious disease to exacerbating                this way, we create a virtuous circle of responsible          Communications
                                                                                                                                                                                     Dr Ilyoung Jung, Science & Technology
racial and economic health inequalities, AI gone               discussion, and ultimately, higher standards in
                                                                                                                             Celia Leo, Communications Associate                     Policy Institute
wrong can have life-or-death implications.                     using AI to guide healthcare decisions.
There’s confusion and fear out there – fear about                                                                                                                                    Dr Kyu-Hwan Jung, VUNO
robots taking people’s jobs, fear about data                                                                                 KOREA POLICY CENTER FOR THE
privacy, fear of who’s ultimately responsible if                                                                             FOURTH INDUSTRIAL REVOLUTION                            Prof Steve Keevil, Guy’s and St Thomas’ NHS
an AI-supported decision turns out to be wrong.                                                                                                                                      Foundation Trust
Rather than throwing out tools that can help us,                                                                             Prof So Young Kim, Director
we’ll be better off if we discuss the right questions                                                                                                                                Dr Kyunghoon Kim, Korea Information Society
                                                                                                                             Dr Hyeon Dae (Heidi) Rha,
now about the standards AIs should meet.                                                                                                                                             Development Institute
                                                                                                                             Senior Researcher
                                                                                                                                                                                     Prof Tackeun Kim, Seoul National University
                                                                                                                             Dr Cornelius Kalenzi,
                                                                                                                                                                                     Bundang Hospital
                                                                                                                             Postdoctoral Researcher
                                                                                                                                                                                     Prof Tze Yun Leong, NUS School of Computing
1    BenevolentAI Named as One of Fierce Medtech’s             6 Huang, Z., et al. (2021) Awareness, acceptance, and         Dr Moonjung Yim,
     Fierce 15 Of 2020 (2020). BenevolentAI. Available at:       adoption of the national digital contact tracing tool       Postdoctoral Researcher                                 Prof Brian Lim, NUS School of Computing              post COVID-19 lockdown among visitors to a public
     one-of-fierce-medtechs-fierce-15-of-2020                    hospital in Singapore. Available at: www.ncbi.nlm.nih.
2 VUNO Med®-BoneAge™. Available at:                 gov/pmc/articles/PMC7817417/                                SENSE ABOUT SCIENCE                                     Tern Poh Lim, AI Singapore
  en/boneage                                                   7 Nordling, L. (2019) A fairer way forward for AI in health
3 Goyal, M. et al., (2020) Artificial intelligence-based         care. Available at:         Tracey Brown OBE, Director                              Prof Tamra Lysaght, NUS Yong Loo Lin
  image classification methods for diagnosis of skin             019-02872-2
                                                                                                                                                                                     School of Medicine
  cancer: Challenges and opportunities. Available              8 Heaven, W.D. (2020) Google’s medical AI was                 Dr Hamid Khan, Programme Manager -
  at:                 super accurate in a lab. Real life was a different
  S0010482520303966                                              story. Available at: www.technologyreview.                  Research Culture and Quality                            Prof Kee Yuan Ngiam, National University
4 Heaven, W.D. (2020) Google’s medical AI was                    com/2020/04/27/1000658/google-medical-ai-                                                                           Health System
  super accurate in a lab. Real life was a different             accurate-lab-real-life-clinic-covid-diabetes-retina-        Ilaina Khairulzaman, Head of
                                                                 disease/                                                                                                            Prof Joon Beom Seo, University of Ulsan
  story. Available at: www.technologyreview.                                                                                 International Public Engagement,
  com/2020/04/27/1000658/google-medical-ai-                    9 Haenssle, H. A., et al. (2018) Man against machine:                                                                 College of Medicine/Asan Medical Center
                                                                                                                             Training and Marketing
  accurate-lab-real-life-clinic-covid-diabetes-retina-           diagnostic performance of a deep learning
  disease/                                                       convolutional neural network for dermoscopic
5 LumiHealth™ (2020). Available at:            melanoma recognition in comparison to 58                    Joshua Gascoyne, Policy Officer
                                                                 dermatologists. Available at:

