Using Artificial Intelligence to Support Healthcare Decisions - A Guide for Society
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Why we need this guide Artificial intelligence (AI) is software that can Similarly, doctors and patients need to use large amounts of data to assess and make understand how reliable their AI-based predictions – things that human ‘computing information is when life-changing decisions are power’ can’t do at all or can’t do quickly and being made. accurately. It is ‘intelligent’ because it works out patterns in the data and tests them, rather than But what if policymakers, healthcare agencies, just identifying what it is instructed to find – for journalists, doctors, and patients don’t know example, finding patterns in genomic data that the questions to ask about whether a new might predict who gets a disease, where humans breakthrough AI application is reliable or suitable don’t yet know what to look for. for a particular use? What if they pass on flawed information or make bad decisions because they In healthcare, AI has made advances in don’t know where to find information about the analysing data about how diseases progress. It is model the AI is using? Who is accountable if also being used to identify molecules that could things go wrong? make new drugs, diagnose medical conditions more precisely, predict how patients will respond This guide is not intended to train AI experts to treatment, and improve the planning of or show how interesting AI is, but to help with resources such as hospital beds. the important conversations about its use in healthcare. The guide is designed to equip COVID-19 has sped up the introduction of patients, policymakers, journalists, clinicians these new health technologies. For example, and decision-makers with the questions for the BenevolentAI platform took one weekend discussing whether a technology is robust to identify a drug that could be used to treat enough for its intended use. It aims to transform the new disease—conventional drug discovery the conversation about AI from a complex and methods would have taken eight years.1 But this daunting one to an empowering one – one that rapid introduction of technology has come with can give us confidence in those technologies that the trade-off of less time for robust testing. do improve medical treatment and avoid harm from those that don’t. This guide was created through a partnership of: Lloyd’s Register Foundation Institute for With AI development happening so rapidly, and Public Understanding of Risk, a research institute at the National University of Singapore healthcare providers using AI more and more, it’s vital that more people know the important committed to improving lives by transforming risk communication and the public questions to ask about how reliable different understanding of risk in Asia and internationally; the Korea Policy Center for the Fourth applications are – the quality of the data they Industrial Revolution, a research institute at KAIST working to understand and shape are based on, and whether we can depend on emerging technologies and governance of the Fourth Industrial Revolution for a better and them to be right. inclusive digital era; and Sense about Science, an independent charity that promotes the public interest in sound science and evidence. It is important for society to ask these questions to make sure AI is used responsibly. This kind We are grateful for the input and personal time given to us by the many data scientists, of accountability makes a difference: patients doctors, researchers and members of the public who were involved in the development and asking questions about evidence and outcomes testing of the guide. has improved many aspects of healthcare. 2 3
Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Terms p06 Questions to ask about AI in healthcare p12 Technical terms aren’t needed to ask the right questions. But where What data is it based on? they are used, it helps to know that terms like “AI”, “algorithm”, To reduce the chance of the AI identifying false or misleading “reliability”, “model” and “generalisability” have specific meanings. relationships, it’s important to know how the data underpinning it was generated. How AI is used in treating patients p08 What assumptions is AI making about patients and disease? AI is helping medical professionals in some fields to work more An AI-supported diagnosis or treatment option might not be useful quickly and accurately, but it can’t replace the doctor. Good use if the results can’t be generalised across countries or groups, or if of AI depends on its suitability for the decision and the expertise key information is missing. of the medical professionals interpreting it. How much decision weight can we put on it? Reliability matters p10 AI can only support a clinical decision if we know how well it performs. There is a lot at stake. AI can base its recommendations on false A reliable future p22 or misleading relationships it finds in the data, leading to bad decisions. It can make biases in healthcare worse if the limits of the To make sure we identify genuinely useful innovations, we must data are not clear. We can only know how reliable AI is if its testing ask the right questions now about the reliability of the AI being and performance are clear. Understanding how to check on this is used for different purposes. The questions in this guide will help important for journalists who want to report on new developments society create a benchmark for responsible discussion, that will responsibly. It helps health authorities to select the applications promote clarity and high standards for the use of AI in healthcare. that genuinely improve patient treatment, and it helps the public to have confidence in the right things. 4 5
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Terms Algorithm Artificial intelligence (AI) Model Reliability A set of mathematical instructions to find or A machine or system that uses data and rules An equation that an AI uses to represent How trustworthy an AI is or how consistently calculate something. Algorithms can be used to make assessments or predictions like a how conclusions can be made from data an AI produces the result we want (e.g. being by AI to find relationships between things human would. the AI hasn’t seen before. For example, new better at identifying the patients whose (variables) in data. information about changes in smoking habits disease will improve with surgery) without can be used in a model to predict the number producing results we don’t want. of cases of lung cancer. It can also mean, technically, the ability of an AI to produce the same result every time. Big data Generalisability Variable A type of data that is large (volume), varies A measure of whether the conclusion made A factor or characteristic that might be in content and type (variety), and changes using a set of data is generally true or not. relevant to answering a question. These quickly (velocity). For example, an AI that is not generalisable could be numbers like age, weight, height, can help with a diagnosis of bone conditions temperature or income. Or they might fall In the healthcare context, such data includes for only certain demographic groups but into categories like eye or hair colour, ethnicity, many variables (e.g. age, gender, height, not others. field of work or hobbies. weight, average weekly alcohol consumption, smoking habits, chronic conditions, medical treatments, test results and x-rays) and can be in different formats (e.g. sounds, videos, written records, images, charts and graphs). 6 7
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society How AI is used in Types of AI in healthcare treating patients AI is intended to help medical staff work quickly In Germany, a diagnostic AI has been used to and accurately and to make processes efficient. detect potentially cancerous skin lesions. It was tested against an international group of 58 Current AI-based software is limited to dermatologists and proved better at correctly performing specific tasks to support a doctor’s identifying the nature of more suspicious lesions.3 decision-making. It cannot perform complex tasks such as making clinical decisions, and On the other hand, an eye disease diagnostic doctors can consider things that the AI cannot, developed by Google Health4 suffered from a such as a patient’s cultural practices, when major drawback: the quality of many images making a treatment plan. taken by nurses was not high enough, so the system rejected more than a fifth of the images Clinical-decision support tools At the current pace of technological development, and more work had to be done to retake this is likely to be the case for the near future: AI Medical devices and applications used these images. The theoretical accuracy of the can support but not replace the doctor. by clinical practitioners to perform their diagnostic prediction can only be realised if work. AI is used in diagnostic imaging, In South Korea, VUNO Med solutions are AI- medical professionals have the confidence and predicting treatment outcomes, robotics based diagnostic support systems that can read training to use it. in surgery and remote monitoring of medical images or analyse biosignals. VUNO’s patients who are using medical devices. BoneAge assessment software compares bone age with chronological age - for example, an Patient-decision support tools eight-year-old child whose bone age is nine Medical devices and applications years old is assessed to be growing too fast.2 used directly by patients or caregivers. Examples include chatbots or other online tools which help with self- diagnosis, and lifestyle applications such as fitness trackers. Healthcare administration Tools used by organisations to improve operations and administration – AI is used in resource allocation, cost reduction (e.g. by reducing test duplications) and automating processes like dispensing medicines. Therapeutics development AI used in discovering new drugs and treatments. 8 9
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Reliability matters The use of AI to help with diagnosis, predict the Guarantees about privacy are not enough for a So, scrutinising quality and reliability outcome of treatment or prioritise resources is technology to be useful, so key questions about means checking that: potentially life-changing. the quality of data and reliability of AI need to be asked. The source of the data is known There is some suspicion about AI among the public and healthcare practitioners. Its inner Poor-quality data (or poorly understood data) workings are difficult to see, which makes it affects the accuracy of AI. Biases in AI arise difficult to question or contest its conclusions, and from missing or excluded data, existing bias in The data has been collected there are fears about how it uses personal data. the training data or errors in the algorithm. Like or selected for the purpose it’s other data analytics, using data for a purpose being used for Privacy issues are often raised, but reliability it wasn’t collected for can introduce false or issues have been neglected, perhaps because misleading relationships. And we can’t be sure it is difficult to know how to question them. While how reliably the AI performs if the model hasn’t Limitations and assumptions it is important for people to have confidence that been rigorously tested in the real world. for that purpose have been their data is secure, it is just as important to know clearly stated whether data is being used well. It’s unlikely that any of us would accept a technology based on a study with a 10-person sample size on the basis Biases have been addressed that the 10 participants’ data was kept safe. It has been properly tested in the real world The use of AI to help with diagnosis, predict the outcome of How do we know that someone has treatment or prioritise resources done these checks? There are questions that everyone can ask – whether a is potentially life-changing. journalist, policymaker, clinician, patient or relative – to find out. These questions are set out in the next few sections. 10 11
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Q U E S T I O N S T O A S K A B O U T A I I N H E A LT H C A R E How was the data used to train the AI collected? What data is it based on? If the data comes from an experiment, it should AI systems trained using this type of data have have been collected to answer a specific a lower risk of having false or misleading research question as part of a well-designed relationships if those quality markers are there. Data is obtained in different ways. So, the aspects of this question to consider are: study. Signs of quality include: Observational data analysis involves looking Experimental data (collected from experiments) ` How the data used to train the AI was at data that already exists and searching for is collected to answer a specific question. collected A large sample size of participants relationships between variables. There are Researchers usually consider possible biases advantages to this approach, such as being they will get in the data and what might be ` Whether the data represents the patients able to study many more variables than an missing, and take steps to overcome these issues. for whom the AI is being used A control group of participants with experiment would allow. While it is possible to ` Whether the patterns and relationships similar characteristics to compare correctly identify relationships with this type of Observational data is recorded as we go about identified by the AI are accurate results against (except for the variable data, the data source should be clearly stated, our business, such as withdrawing money from a being measured) and information provided about the AI should bank or travelling on public transport, and there Not everyone will be able to ask about or assess include how biases have been considered. is also administrative data that is recorded by the details of these aspects, but any doctor, institutions, such as speeding fines or the issuing patient or reporter can insist on a clear statement We should also note if the data gathered Error estimates of prescriptions at hospitals. The biases and of how these aspects have been addressed. consists of objective measurements (e.g. vital limitations in such data sources are usually not Anyone commissioning the AI for use in health signs from a device) or subjective self-reported thought about until the records come to be used services should be confident that they know data (e.g. survey responses). Subjective data as data for analysis. A discussion of how well the research the answers. could have more inaccuracies or biases as findings can be extrapolated to real life people’s responses vary for different reasons All these data sources can be useful for and responses are self-selecting. developing AI, but it’s important to consider how good and relevant they are for a particular purpose, especially if they’ve not been gathered for that purpose. For instance: ‘What factors cause patients who have recovered from alcohol addiction Singapore’s Health Promotion Board is collaborating with Apple on an app called LumiHealth. to relapse?’ Developed in close collaboration with doctors and public health experts, LumiHealth aims to Programmers might put together databases deliver personal health recommendations based on factors such as age, gender and weight. These containing information (variables) such as age, recommendations are driven by AI using real-world data from users (obtained with consent) and chronic medical conditions and genetic profiles. include reminders to go for regular health checkups. By following the app’s recommendations, a user can work towards weekly activity goals and participate AI would look at these detailed datasets for in challenges that aim to improve sleep habits and relationships with relapse data. food choices.5 If the data came only from medical sources, How personal and relevant a health app’s the AI could miss potential major factors such recommendations are depends on how the data behind as unemployment and miss people who do not it is gathered. LumiHealth uses user data carefully engage with medical services. selected for relevance. But some apps don’t do that. If an app uses observational data from other users of the app to recommend when a person should visit a doctor, the recommendation is likely to be skewed by the fact that healthier people tend to use such apps. 12 13
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Does the data represent the patients for whom the AI is being used? During the COVID-19 pandemic, Singapore rolled out the TraceTogether mobile app for contact tracing. The idea behind it was the exchange of Bluetooth signals between mobile phones with the installed app. Each phone could detect other participating TraceTogether phones nearby. The Data might not be useful for training an AI if it Overcoming problems with the representativeness app estimated the distance between users and the duration of any time spent less than two metres doesn’t represent the target population. It may of data is a challenge. Some groups are under- apart. Encrypted records of these contacts were stored on each user’s phone for 21 days. An app user be missing information about different ethnicities, represented in health studies so are under- identified as having come into contact with a person who had tested positive for COVID-19 could sexes, and age groups, and in some cases, this represented in the data. authorise their TraceTogether data to be accessed by the Ministry of Health (MOH). MOH would missing information has important implications Privacy regularly comes up in the public then decipher the data and get the mobile numbers of the user’s close contacts from the previous for health. For example, heart problems show conversation about AI and the use of data. 21 days to contact-trace them, ask them to isolate, and test them. up differently in men and women, or the data may be based on people who can afford to seek People are concerned that their personal TraceTogether was unable to gain public trust. In June 2020, three months into the app’s launch, treatment and therefore biased to the health of medical data could be used to discriminate approximately 30% of the population downloaded the application, falling short of the required wealthier people. against them. For this reason, certain categories adoption rate of 50-70% for contact tracing apps to be effective. Many Singaporeans saw the app of information, such as rare or genetic conditions, as a phone surveillance mechanism. By December 2020, nine months after the app’s launch, its In Germany, a skin diagnostic AI was trained require strong anonymisation procedures. adoption rate had barely grown. However, through the government’s distribution of an alternative and validated using images obtained primarily Public concern about privacy influences whether – an external device with the same function - Singapore achieved a 70% adoption.6 from fair-skinned people in the USA, Australia, and Europe. If the algorithm bases most of people will share data, and this can affect the its knowledge on how skin lesions appear accuracy of the AI’s recommendation by giving on fair skin, then there’s a risk that lesions on it too small a pool of information to draw reliable patients with darker skin are more likely to conclusions from. By being transparent and be misdiagnosed. demonstrating the steps taken to check that the AI is reliable, researchers and developers can The absence of data from people with darker help give people confidence about providing skin won’t make the diagnostic useless, if it can their data. EXPOSURE 400 CHECK IN SCAN QR ALERT EXCHANGES reliably support diagnoses in some people. But TODAY clearly the absence of important data about certain ethnic groups should be known about in countries that have multiracial populations, which is the case in many East and Southeast Asian countries. Public concern about privacy influences whether people The experience of Singapore’s contact-tracing Finding out if the data is appropriate for its will share data, and this app shows that real-world limits on data can intended use helps to reduce the risk of AI systems be hugely underestimated by app developers. spotting false and misleading relationships. can affect the accuracy of People are sometimes just not willing to provide the AI’s recommendation. the data that will make even a well-designed application work. Some of these concerns may be alleviated by greater transparency in the technology itself, but in other cases they won’t. We have to ask instead whether the application is going to be fed with enough relevant data to continue running reliably. 14 15
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Are the patterns and relationships identified by the AI accurate? Q U E S T I O N S T O A S K A B O U T A I I N H E A LT H C A R E Data is fed into an algorithm, which can analyse It shows the need to have a specific question to What assumptions is the AI making about patients and disease? the data to find patterns between variables. An answer when the AI is programmed to look for AI can learn about these relationships between patterns in a dataset, because it’s then less likely variables as more data is fed, apply these to come up with random relationships that affect relationships, and adjust them. the validity of the model. If the algorithm analysing alcohol addiction We’ve seen that the ability to quickly spot To make sure the relationships are real, anyone PREDICTIVE MODEL relapse data finds the relapse rate is higher patterns in data is a key benefit of using AI in commissioning an AI for healthcare should ask among low-income people, then it might flag fall healthcare and that it also presents challenges. if it has been trained using big data and how in income as a risk factor for relapse. Information Existing data on recovering It’s possible that an AI might start to spot patterns data scientists have identified the variables most about relationships between variables can alcoholics who have relapsed is that are not relevant. relevant to what the AI is going to be used for. then be used to create a predictive model – a fed into an algorithm to train it to Moreover, even AI trained using big data can be mathematical equation that uses information spot patterns and relationships. Population-level data contains information rigorously tested using an independent dataset about what happened in the past to make a about lots of variables – for example, people’s – as explained in the next section, AI providers prediction about what could happen in the age, gender, ethnicity, marital status, jobs, should make clear whether this has been done. future (see diagram on the right). postcodes, what car they drive, whether they are registered to vote. This type of data is often How a model translates to the real world has The algorithm identifies which called “big data”. If an AI searches through implications for the reliability, generalisability variables are closely associated enough big data, it will inevitably find patterns and fairness of the AI. with relapse (risk factors). and relationships between variables that have nothing to do with each other. This is known as The essential aspects are: data dredging. ` That the right relationship is captured ` Whether variables excluded from the model A model is created that links are indeed irrelevant incidents of past relapses to the variables associated with them. ` Whether the results are generalisable ` Whether the AI eliminates human prejudice from decision-making Data on new patients is fed into the model as part We’ve seen that the ability to of a clinical assessment. quickly spot patterns in data is a key benefit of using AI in healthcare and that it also The model uses the new presents challenges. data to predict who is at risk of relapse, using the relationships it knows about. 16 17
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Is the right relationship captured? Are the results generalisable? Sometimes, observational data shows up prioritising patients from richer districts. There AI doesn’t work well when it is required to make Variables that could influence the generalisability variables that seem to be related to each other was a clear correlation between postcode and a prediction or recommendation on something of an AI application include age distribution, (when one goes up, the other goes up or down). length of hospital stay, but it is unlikely that a that differs substantially from its training data ethnicity, gender, geography and climate. Those variables are “correlated”, but that doesn’t person’s address itself causes them to stay (the data used to develop it). Anyone commissioning an AI product should mean one “causes” the other. longer in the hospital. ask if the results are generalisable, and clinicians In the VUNO Med-BoneAge example, the AI- should feel confident in the accuracy of the AI’s In 2017, the University of Chicago Academic based diagnostic supporting solution is built on recommendation for the particular group of Hospital System (UCAHS) developed an AI to South Korean population data and validated patients they are treating. predict patients’ length of stay.7 It was intended with multinational data. As the bone growth to help doctors prioritise patients who were more curve can vary according to race and ethnicity, likely to be eligible for rapid discharge and free the accuracy of VUNO’s BoneAge assessment up beds faster. The AI’s algorithm found patients’ may differ when used with population data postcodes to be one of the best predictors for from other countries and ethnicities. In this case, a length of stay. The postcodes associated fine-tuning or retraining of the algorithm will be with a longer length of stay were those in poor necessary to make the output more accurate in neighbourhoods. In effect, the AI recommended Caucasian, Native American and African people. Are the variables excluded from the model actually irrelevant? Does AI eliminate human prejudice from decision-making? The real world contains many millions of effect of poverty on health is likely to be One misconception about AI-supported Even representative data can embed prejudices, variables changing at once. It would be compounded by a technology that diverts more decision-making is that it is based on cold hard biases and harmful assumptions. In the impossible for a model to account for every treatment away from them. facts without prejudice. But an AI is trained on complex modern world, AI predictions and possible degree of change. Some variables data from the real world. It sees the world the recommendations can’t be divorced from social may be readily available and others costly Anyone commissioning an AI product should way it is, not as it could or should be. AI is not realities. Anyone using an AI to aid a clinical or even impossible to secure. There is only so ask what variables might be missing from the inherently good or bad, but it can compound decision or any decision in a healthcare setting much computer processing power to draw on model, why they are missing and how this might unfairness in healthcare unless the limitations should consider whether it has the capacity to and so much time and money to spend. So, data affect the outcome. It’s important for developers of the data are understood by the developers. encode prejudices. scientists make assumptions and intentionally to understand this themselves and have an open Some AI research seeks to address these existing exclude some variables. and honest discussion about it with the people biases through its programming. A commentator or patient can ask what they are handing over the technology to. assumptions are being made and how we are In the Chicago hospital example, the developers If blindly optimising the use of beds was the sure these are fair, even if the AI technically does needed to consider the missing variables. sole objective, then using patients’ postcodes as its job. Perhaps there was a third factor at play, one a proxy to predict who should be prioritised for which caused people to stay longer in a hospital treatment wouldn’t be so bad. But the ultimate This doesn’t mean that any group should be when they are ill. In this case, poverty: poor problem that UCAHS ran into was that poorer more concerned than another about how AI is people live in neighbourhoods where housing people in the USA are disproportionately African used to support their treatment. When the right is more affordable AND they also tend to have American. By diverting treatment away from conversations are had at the right time, everyone poorer health outcomes and a higher risk of patients who lived in poorer neighbourhoods, involved can be confident in the clinical decision suffering from chronic illnesses. The detrimental the AI was prioritising white people over black that’s made. people. It only exacerbated existing racial health inequalities in American society. 18 19
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society Q U E S T I O N S T O A S K A B O U T A I I N H E A LT H C A R E Has its reliability been properly scrutinised? How much decision As well as accuracy, we should consider the AI’s Where apps are based on collaboration weight can we put on it? reliability in making predictions. Independent between public health and the private sector, datasets can be used to test how good the AI there is more opportunity to scrutinise reliability. is at using the relationships it has identified to For example, technologies developed in a make a prediction about data it hasn’t seen public-private collaboration are more likely to We’ve seen that an AI’s performance depends before – its reliability. have undergone clinical trials – set up to see on the quality of data it is based on and what how well it performs against existing practices or assumptions it makes about patients and This is ideally done by holding back a section human judgement. disease. Taking all this into account makes it of the training data and then seeing how well more likely that the AI is of good quality, but is it the AI could identify the thing it’s looking for Singapore’s LumiHealth app was developed via good enough for its intended purpose? or predict the outcome. Sometimes, an AI that close collaboration between Apple and public works well on the data used to train it is terrible health authorities. To be authorised for use in The essential aspects of this question are: at making predictions from new data. That could public health, the app needed to meet strict be because the model has not weeded out criteria. Close collaboration with public health ` How well the AI really performs experts reduced the risk of data not being irrelevant variables or because the model has ` Whether its reliability has been properly learnt the training data rather than its underlying representative, because the app was not relying scrutinised relationships. An AI that doesn’t make consistent on volunteer-contributed datasets. predictions on similar data is unreliable. ` Whether it makes a useful real-world recommendation Does it make a useful real-world recommendation? One way to determine this is to find out if the deployment as it was during testing. The process How well does the AI really perform? AI does any better than a human. It’s a good would be led by experts independent of the AI sign if healthcare professionals were involved in developers and would show up failures and the AI’s development or deployment. A clinician unintended outcomes. We need to know some basic performance But choosing the right way to measure might look for trials that show whether the AI measures that define how good the AI is at performance is important, and we should be The process would also identify how the performs better than, or at least as well as, their predicting things or making recommendations. careful not to rely too heavily on theoretical technology would work in practice when subject trained colleagues. One measure is accuracy (how often the AI gets accuracy. With the hypothetical alcohol addiction to human errors in the way it’s used: for example, its prediction right). relapse AI, let’s say 10 in 100 recovering alcoholics The German skin diagnostic AI was shown the the performance of Google Health’s eye disease in this dataset actually relapse after two years. same images of skin lesions as an international diagnostic was ultimately hampered by the fact Google Health developed an AI system in If the AI is 85% accurate at predicting relapses, group of 58 dermatologists. It correctly identified that nurses were not confident in taking high- Thailand to help identify diabetic retinopathy then it’s wrong 15 times out of 100. That means it the nature of nearly 87% of suspicious lesions quality pictures. and speed up the diagnosis process. The process could miss every relapse and is not much use if compared to 79% for the clinicians. This is one took up to ten weeks while photos of patients’ it’s being used to assess who needs help. We need, finally, to ask what is at stake. A lifestyle good sign that the AI provided a useful aid to the eyes were taken by nurses and dispatched to app that gives people general advice about diet clinician’s decision on treatment.9 a specialist for analysis. The AI system could Even if the AI were highly reliable and and exercise perhaps needs only to be roughly produce results in under ten minutes with underpinned by the finest data, a clinician should The AI might also be externally validated, reliable. Where the real-world implications of 90% accuracy.8 consider its recommendation in the context of which means tested in the real world. One the AI being wrong will be very serious, though, all the other medical evidence they have for a example would be an AI-based healthcare we should expect to see strong evidence of test particular diagnosis or treatment option. The software company testing its program in a data, trials and validation. doctor makes the final decision. hospital setting to see if it was as accurate in 20 21
Terms / How AI is used / Reliability matters / Questions to ask about AI / A reliable future Using Artificial Intelligence to Support Healthcare Decisions: A Guide for Society A reliable future Project Teams Contributors LLOYD’S REGISTER FOUNDATION Dr Ashraf Abdul, Affinidi Using AI to support clinicians in treating patients By applying these questions, society can ensure INSTITUTE FOR THE PUBLIC holds great promise. From rapidly identifying AI developers’ solutions to modern healthcare UNDERSTANDING OF RISK Prof Daniel Catalan, University Carlos III new drug candidates in times of pandemic, to challenges are making good use of the data and of Madrid supporting the diagnosis of serious diseases, knowledge available, with minimal error, across Prof Chan Ghee Koh, Lloyd’s Register Foundation Professor, Director Prof Edward (Yoonjae) Choi, KAIST helping hospitals to manage resources and different countries and populations, without helping public health agencies to promote deepening inequalities that are already high. Prof Leonard Lee, Deputy Director Dr Sarah Cumbers, Lloyd’s Register Foundation healthy lifestyles, AI has demonstrated its value These are the AIs that will make useful real- and is here to stay. world recommendations that clinicians can have Dr Pin Sym Foong, National University Nathaniel Tan, Senior Manager, confidence in. Health System But problems arise if the quality of data Partnerships & Engagement underpinning the AI is not properly scrutinised As more people ask the questions in this guide, Jared Ng, Assistant Manager, Prof Yong Jeong, KAIST and if the AI’s reliability hasn’t been tested. From more people in authority will expect to be asked. In misdiagnosing a serious disease to exacerbating this way, we create a virtuous circle of responsible Communications Dr Ilyoung Jung, Science & Technology racial and economic health inequalities, AI gone discussion, and ultimately, higher standards in Celia Leo, Communications Associate Policy Institute wrong can have life-or-death implications. using AI to guide healthcare decisions. There’s confusion and fear out there – fear about Dr Kyu-Hwan Jung, VUNO robots taking people’s jobs, fear about data KOREA POLICY CENTER FOR THE privacy, fear of who’s ultimately responsible if FOURTH INDUSTRIAL REVOLUTION Prof Steve Keevil, Guy’s and St Thomas’ NHS an AI-supported decision turns out to be wrong. Foundation Trust Rather than throwing out tools that can help us, Prof So Young Kim, Director we’ll be better off if we discuss the right questions Dr Kyunghoon Kim, Korea Information Society Dr Hyeon Dae (Heidi) Rha, now about the standards AIs should meet. Development Institute Senior Researcher Prof Tackeun Kim, Seoul National University Dr Cornelius Kalenzi, Bundang Hospital Postdoctoral Researcher Prof Tze Yun Leong, NUS School of Computing 1 BenevolentAI Named as One of Fierce Medtech’s 6 Huang, Z., et al. (2021) Awareness, acceptance, and Dr Moonjung Yim, Fierce 15 Of 2020 (2020). BenevolentAI. Available at: adoption of the national digital contact tracing tool Postdoctoral Researcher Prof Brian Lim, NUS School of Computing www.benevolent.com/news/benevolentai-named-as- post COVID-19 lockdown among visitors to a public one-of-fierce-medtechs-fierce-15-of-2020 hospital in Singapore. Available at: www.ncbi.nlm.nih. 2 VUNO Med®-BoneAge™. Available at: www.vuno.co/ gov/pmc/articles/PMC7817417/ SENSE ABOUT SCIENCE Tern Poh Lim, AI Singapore en/boneage 7 Nordling, L. (2019) A fairer way forward for AI in health 3 Goyal, M. et al., (2020) Artificial intelligence-based care. Available at: www.nature.com/articles/d41586- Tracey Brown OBE, Director Prof Tamra Lysaght, NUS Yong Loo Lin image classification methods for diagnosis of skin 019-02872-2 School of Medicine cancer: Challenges and opportunities. Available 8 Heaven, W.D. (2020) Google’s medical AI was Dr Hamid Khan, Programme Manager - at: www.sciencedirect.com/science/article/pii/ super accurate in a lab. Real life was a different S0010482520303966 story. Available at: www.technologyreview. Research Culture and Quality Prof Kee Yuan Ngiam, National University 4 Heaven, W.D. (2020) Google’s medical AI was com/2020/04/27/1000658/google-medical-ai- Health System super accurate in a lab. Real life was a different accurate-lab-real-life-clinic-covid-diabetes-retina- Ilaina Khairulzaman, Head of disease/ Prof Joon Beom Seo, University of Ulsan story. Available at: www.technologyreview. International Public Engagement, com/2020/04/27/1000658/google-medical-ai- 9 Haenssle, H. A., et al. (2018) Man against machine: College of Medicine/Asan Medical Center Training and Marketing accurate-lab-real-life-clinic-covid-diabetes-retina- diagnostic performance of a deep learning disease/ convolutional neural network for dermoscopic 5 LumiHealth™ (2020). Available at: www.lumihealth.sg melanoma recognition in comparison to 58 Joshua Gascoyne, Policy Officer dermatologists. Available at: www.sciencedirect.com/ science/article/pii/S0923753419341055 22 23
ipur.nus.edu.sg | kpc4ir.kaist.ac.kr | senseaboutscience.org Designed by Studio Giraffe
You can also read