Artificial intelligence: Characteristics, regulatory compliance, and legislation - RAPS
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation This article provides an introduction to tion of techniques, such as neural networks artificial intelligence (AI), its characteristics, and symbolic AI. The connectionist AI and how those characteristics affect regula- takes care of the messiness and correlations tory compliance. It also examines efforts to of the real world, for example, to detect regulate the ethical aspects of AI in the EU, patient position and anatomy, and help con- future legislative initiatives that may affect vert those into symbols that a symbolic AI Koen Cobbaert, AI in medical devices, and the crucial role can use to interact with the patient during MSc standards play in supporting legislation.* physiotherapy. The influence of the latter will likely increase in the future.12 Figure 1 Introduction presents a simplified diagram of the differ- Although different people may understand ent types of AI.13 artificial intelligence (AI) differently, it has been a reality in health care for decades. Artificial general intelligence (AGI) can Health care has adopted AI technology learn incrementally, reason abstractly, and in medical devices, workflows, and de- act effectively over a wide range of domains. cision-making processes. Rather than The author does not anticipate AGI to ap- replacing the human component of health pear on the market in the near or medium care delivery, AI has become a vital tool or a term, so this article will focus on “narrow companion to improve patient outcomes. AI,” that is, artificial intelligence with a spe- cific, rather than general, purpose. Artificial intelligence refers to a wide vari- ety of techniques.1 While neural networks Current medical device legislation applies are in the spotlight today, this article covers to machine learning devices. It appears fit all forms of AI, including classical AI (e.g., to assure safety and reliable performance, search and optimization techniques, expert including AI that continues changing after systems,2 hidden Markov models,3 and it is placed on the market.14 What is lacking older forms of computer vision); symbolic is guidance on how to comply with regu- AI4 (e.g., logical reasoning5 and decision lations practically. Despite a flurry of AI making); abstract syntax tree6 modifying standardization activity, practical guidance code, probabilistic reasoning,7 machine for medical devices is scarce. learning,8 knowledge representation,9 planning and navigation, natural language Definition processing,10 and perception.11 Hybrid Having one common, internationally ac- approaches also exist, which use a combina- cepted definition for AI would be helpful RegulatoryFocus.org June 2021 4
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 1. A simplified diagram of different AI techniques AI Reasoning-based AI Learning-based AI Basic (symbolic/classic) (machine learning) Advanced algorithm statistics Deep learning Robotics (hardware) Simplified There are roughly two different techniques: reason-based AI, which reasons based on symbols or rules programmed in the system. Instinctively, the reader can sense that the human has a bit more control of the rules. A second technique is learning- based AI. These are systems that use large amounts of data to construct a mathematical model to recognize correlations. This auto-construction can be done in a supervised or unsupervised manner. Supervised means that humans determine the categories of data themselves, e.g., these are cats; these are dogs, now learn to recognize them. Unsupervised, and deep learning, a sub-category of machine learning that belongs to this, is not given any labels. The developer throws data into the system, and by itself, it recognizes patterns and categories in the data. The big advantage is that the system recognizes things that a human might not have thought about. On the other hand, it might also generate categories that are problematic from a moral point of view. When people talk about black box systems, they predominantly talk about deep learning. So, it is incorrect that all AI is not transparent; it is mainly this category where it is difficult to determine how a decision was made. This is not the case for all AI systems. Sometimes, basic algorithms and statistics are also called AI. Robotics, also known as ‘embodied AI,’ is shown to be half in the box, half out of the box. Robots can work using AI but also can operate using simpler algorithms. The drawing overlaps all these types of AI because there are also hybrid forms of AI. when comparing AI investments and regulations skills.” If learning knowledge is sufficient, many digital across the world. There are several definitions of AI,15 products qualify as AI as soon as they acquire input each with different flavors. At the time of writing, the data. If learning a new skill is necessary, the meaning of International Medical Device Regulators Forum (IM- the term narrows significantly. DRF) was drafting a definition for machine learning medical devices.16 Today, it is uncertain what is, and what is not, cov- ered by the term AI. Therefore, this article focuses on In trying to capture the essence of AI, definitions characteristics commonly associated with AI and their tend to focus on AI’s learning ability or its intelligent regulatory implications. nature, two aspects that pose significant interpretation issues in a legal context.17 Regarding AI’s intelligent Machine learning nature, there is no scientific consensus on the meaning A characteristic of machine learning devices is that of “intelligence.”18 As society tends to strip older AI they can change based on training data (samples to fit techniques of their “intelligent” status, the term poses a a machine learning model), without being programmed large moving edge. In terms of learning ability, learning explicitly. In contrast, other AI technologies learn is generally understood as “acquiring knowledge or new without training data, such as through genetic program- RegulatoryFocus.org June 2021 5
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 2. Global and local model change Global Change Local Change model-server model-server model sync local model a local model b local model c local model a local model b local model c Local models change Server model deployed locally, in synch with server model then local models are trained with their own data ©Koen Cobbaert, 2021 ming19 or reasoning. For example, semantic computing The synchronization can occur in real-time or with a learns through semantic networks (a knowledge base delay, depending on the need for validation and con- that represents semantic relations between concepts in a formity assessment. Conversely, during local change, network). Through reason, deduction, and inference, the the local models learn and change independently of the AI may evolve or adapt during use. There are different global model (see Figure 2). perspectives to the aspect of change. Federated learning Global versus local change Global change is irrespective of whether the global During global change, the manufacturer or health insti- model learns based on local data or on data collected tution trains a machine learning model that is part of a separately, such as through the use of data repositories device, that is, “the global model.” After the validation (e.g., registries or decentralized personal data stores) and conformity assessment, if applicable, the device is or data generated through clinical investigations and deployed into clinical use through one or more local models. Each copy has a model identical to that of the postmarket clinical follow-up (PMCF). original device. The manufacturer or health institution may continue to train the global model. Through peri- Federated learning (see Figure 3) is a machine learn- odic or real-time updates, the manufacturer or health ing technique that trains an algorithm across multiple institution synchronizes the local and global models. decentralized edge devices or servers holding local data Updates can comprise changes to the settings or the samples without exchanging them with the develop- design of the device. The local models are identical to a er – for example, in health institutions. This approach version of the global model. contrasts with traditional centralized machine learning RegulatoryFocus.org June 2021 6
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation techniques, in which all the local datasets are uploaded Predetermined change is not unique to machine learn- to one server, and more classic decentralized methods ing devices. Many medical devices adapt to the patient that often assume local data samples are identical- or their environment or compensate for wear and tear ly distributed are used. The main advantage of using by reconfiguring or self-healing their design. For exam- federated approaches is to ensure data privacy or data ple, a CT machine undergoes regular calibration cycles secrecy. Indeed, no local data is uploaded externally, to adjust for wear and tear. The calibration reconfigures concatenated, or exchanged. Since the entire database the CT software to compensate for, and adapt to, hard- is segmented into local bits, it is more difficult to hack ware changes. into it. With federated learning, only machine learning parameters are exchanged. Also, such parameters can be Predetermined changes through machine learning-en- encrypted before sharing between learning rounds to able personalized health care include a modification extend privacy. Homomorphic encryption schemes can of the AI’s “working point” based on the local/patient be used to make computations directly on the encrypted environment; it allows AI to maximize performance data without decrypting them beforehand. for a given patient or situation and enables the device to adapt to the patient rather than forcing them to Predetermined change adapt to the device. For example, AI used in the joints Local change occurs in machine learning devices that of a bionic foot allows the kinetics to be adapted to the adapt, refine their performance, or calibrate themselves patient, rather than letting the patient adapt their gait to a specific patient or health care setting’s charac- to the prosthesis’s kinetics. teristics. An AI that learns and changes itself during clinical use requires manufacturers to determine the Change dynamics limits for such change to occur safely while assuring Depending on the device, the manufacturer, the health performance for the intended purpose. Manufacturers institution, the caregiver, the patient, or a combination should establish those boundaries of change before of these, can control the change. The actor responsible placing the product on the market. Set boundaries for the change and whether the change occurs before or determine the framework in which regulatory approval after the device was placed on the market brings forth allows for changes. different regulatory implications (see Figure 4). Figure 3. Visualization of federated learning Step 1 Step 2 Step 3 Step 4 model-server model-server model-server model-server average model sync upload worker-a worker-b worker-c worker-a worker-b worker-c worker-a worker-b worker-c worker-a worker-b worker-c Central server chooses Central server transmits Nodes train the model Central server pools model results a statistical model the initial model to locally with their own data and generates one global model to be trained several nodes without accessing any data RegulatoryFocus.org June 2021 7
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 4. Three scenarios for change of machine learning devices in relation to their placement on the market Pre-market Placed on the market health backend use Manufacturer Health Environment ASSESS The manufacturer analyzes and controls model changes and decides on updates operating for release to the market. The global 1 Locked change point change CURATE model, i.e. the model at the manufacturer through learning locked site, may learn ‘offline’ from real-world APPLY Release data on the market. Set Manufacturer The local model does not change during use, but the user can optionally select the appropriate working point ASSESS Learns in the field operating Continuous The local model is updated without point change 2 explicit manufacturer or user interaction change change Change through learning through learning APPLY through CURATE Optionally the user can select the learning Set appropriate working point during health or clinical use. ASSESS The local model learns ‘offline’ from operating real-world data generated through point change Continuous health or clinical use. A human, such as a change change 3 APPLY Change through CURATE healthcare professional, service engineer or patient, analyzes and controls the through learning through learning locked Set learning changes to the local model ‘in the Proceed? backend’, before putting a new state of backend health the local model into health or clinical use use, returning to a previous state or Manufacturer User resetting it to the factory defaults. This kind of change is for example used to calibrate the model to changing data inputs or clinical practices at the user site. The local model changes in the ‘backend’, but uses a ‘locked’ state in health or clinical use. Optionally the user can select the appropriate working point during health or clinical use. ©Koen Cobbaert, 2021 In the first scenario, the AI is locked, and the manu- there are some exemptions to this, for example, if the AI facturer controls the learning. “Locked AI” does not contains nondeterministic20 algorithms, or the user can change its design during runtime. Examples of locked change the working point on the operating curve. The AI are static look-up tables, decision trees, and com- performance of a “locked” AI can therefore still change. plex classifiers. Locked AI generally provides the same Consider an algorithm to screen for tuberculosis at result each time the same input is applied. However, airports. While the algorithm’s design is locked, the user RegulatoryFocus.org June 2021 8
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation may still choose a point on the operating curve that is It is worth noting that having a human in the loop different from the factory default, for example, to trade- during learning to control what model state is put into off increased sensitivity with reduced specificity. clinical use is different from having a human in the loop to control the device during clinical use. A continuous- AI can change during runtime in various other ways. In ly changing AI does not have a human in the loop to the second scenario – continuous change – the model control the learning but can nevertheless have a human continues to learn within the manufacturer’s boundaries. in the loop to control device functions during clinical use, Consider, for example, an AI for precision medicine. for example, through a device override or emergency stop. The AI calculates the optimal medication dose to reduce the tremor of a patient with Parkinson’s disease Postmarket significant change while limiting the drug’s side effects. As the disease pro- Medical device legislation requires a manufacturer to gresses over months or years, symptoms change. The al- determine whether a new device release changes sub- gorithm continues to evolve together with the patient’s stantially (the IMDRF refers to “substantial” changes). disease state. In this scenario, the user may still be able If a new software release changes significantly, the to change the working point on the operating curve. A manufacturer must perform a new conformity assess- patient may, for example, on a particular day, decide to ment before placing the device on the market. Under pick a point on the operating curve that lowers the dose, the EU Medical Device Regulation (EU MDR) and allowing for more tremor, but improving their cognitive EU In Vitro Diagnostic (EU IVDR), a health institu- performance. A patient may prefer a different operating tion developing a machine learning device for in-house point because the medication causes mild cognitive im- use must also perform such significant change determi- pairments, such as distraction, disorganization, difficulty nation and perform a new conformity assessment before planning, and accomplishing tasks. putting the device into clinical use. In some cases, the health institution may use a manufacturer’s machine In the third scenario, involving discrete changes, the learning component to build a new device, or the health learning initially occurs under the manufacturer’s con- trol. The model is then handed over to the user (or an- institution may change an existing device in a signifi- other party) for further calibration or adjustment to the cant way. A significant change that occurs postmarket local context or to a specific patient. The change occurs provides a fourth scenario (see Figure 5). within the intended use, within the change boundaries, and according to the manufacturer’s algorithm change In the fourth scenario, the user intentionally changes protocol (ACP; see next section for a description of the local model in a way not allowed by the manufac- change boundaries and ACP). The manufacturer re- turer’s change control plan, either by making a change: mains responsible for the device. Consider, for example, • Beyond the manufacturer’s predetermined change an AI for the prediction of sepsis. The hospital makes a envelope (also known as change boundaries or change in its local practices, now also encoding blood prespecifications), for example, for a purpose not parameters procalcitonin and interleukin 6 and making covered by the intended use or for more specific these available for the AI. The manufacturer has proved purposes than claimed by the manufacturer; or these blood parameters work as valid inputs, described • While not complying with the manufacturer’s ACP. in the ACP. The user can now further improve the local model’s performance by training the AI on these extra Alternatively, the user incorporates the device as a com- blood parameters, following the manufacturer’s ACP. ponent in a new device. Also, there may be hybrid scenarios, whereby the model In this fourth scenario, the device is intentionally continues to evolve, but the user has the ability to revert changed by the user in a significant way. The user is to an earlier state of the model or to the factory default. misusing the device. Consider, for example, an AI to read RegulatoryFocus.org June 2021 9
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 5. Visualization of postmarket significant change (4a) A manufacturer develops an AI-based device and places it on the market. (a) Placed on the market (b) Placed on the market The user intentionally changes the local model in a way not allowed by the manufacturer’s change control plan, either by making the change: (1) beyond the manufacturer’s pre-determined change envelope (a.k.a. change boundaries, pre-specifications) (2) while not complying with the manufacturer’s operating Algorithm Change Protocol user-initiated point change significant change Or by using the device as a component for a new change locked through device. learning through learning Set ASSESS (4b) If the user places the transformed device on the market, then the user is subject to backend (c) Put into service Continuous manufacturer requirements of the medical by and within (EU) health institution Change device legislation, with the original APPLY through CURATE learning manufacturer its supplier. (d) Placed on the market and used by regular user Proceed? (4c) If the user is an EU health institution having change through learning Manufacturer transformed the device for in-house use operating User purposes only, then the health institution is user-initiated point change an in-house manufacturer subject to significant change change locked through EU MDR/IVDR Art. 5(5). through learning learning Set (4d) If the user does not place the device on the market, is not a health institution that regular user changes the device for in-house use-only, then the user is misusing the device and not subject to medical device regulation. ©Koen Cobbaert, 2021 quantitative polymerase chain reaction (qPCR) assays. a reference to the master file of the original device. That The AI can handle qPCR curves generated for assays master file then has to be made available by the original for plant, animal, or human specimens. Assume a second manufacturer to the notified body. Of note is that the manufacturer or a health institution produces an assay US Food and Drug Administration (FDA) provides for the detection of SARS-COV-2. By continuing the the possibility of using master files. Depending on the AI training, it becomes especially good at reading qPCR technology used, the second manufacturer or health curves associated with a SARS-COV-2 assay. “Reading institution operating under this scenario may need to qPCR curves for SARS-COV-2” is a more specific claim implement technical measures, a quality agreement, and and requires a higher level of performance before it is monitoring of original device updates to ensure the new considered state-of-the-art than “reading qPCR curves device is safe and performant in light of the state-of- for assays for plant, animal or human specimen.” the-art. Consequently, the more specific claim is considered Change boundaries and algorithm change protocol a significant change. A manufacturer or EU-based The manufacturer must specify a change envelope or health institution (following EU MDR, Article 5(5) change boundaries and an ACP. As long as the device on in-house manufacturing) performing such change operates within the related change boundaries, its safety carries the manufacturer responsibilities under medical and performance are guaranteed (e.g., minimum speci- device regulations. If the second manufacturer can prove ficity and sensitivity). The manufacturer can ensure this safety and performance without having the techni- through procedural (e.g., acceptance testing) or tech- cal documentation of the original device, the original nical measures. How change envelopes can be defined manufacturer’s intellectual property remains protected depends on the technology used. The manufacturer if the notified body or competent authority accepts the should demonstrate why the chosen parameters and “new” device’s technical documentation containing just threshold values are valid in consideration of the clinical RegulatoryFocus.org June 2021 10
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation state of the art and the intended purpose of the device. to the approved design of a device and the intended Currently, there is no standard available for guiding the use or claims made for the device. Manufacturers must drafting of an ACP.21 submit plans for such changes for prior approval. Such changes may affect conformity with the general safety Manufacturers can increase the trust necessary for im- and performance requirements or with the conditions plementation by informing regulators and users about prescribed for the use of the product, including changes the following: related to limitations of the intended purpose or condi- • How the AI learns over time, tions of use. The notified body must assess the proposed • What the allowed change envelope/boundaries are for, changes and verify whether, after these changes, the • What caused a substantial change in the AI behavior, design of a device or type of a device still meets the • How the performance and safety is being assured as requirements of the regulation, must notify the manu- it adapts, facturer of its decision, and must provide a report or, as • How quality control of new training data is assured, applicable, a supplementary report to the EU technical • What triggers algorithm change, documentation assessment certificate containing the • What performance and confidence levels22 are justified conclusions of its assessment.27 during a given timeframe, and • Whether and how a user can reject an algorithm This implies that a manufacturer can place a device change or roll-back to a previous state. on the market that can change within a pre-defined change envelope or tolerances for which a conformity For troubleshooting purposes, a manufacturer also may assessment was carried out, provided the manufacturer want to monitor actual performance and change/drift respects the contractual agreements with the notified of an evolving algorithm to detect performance defi- body. This approach appears possible under most medi- cits, for example, by implementing traceability through cal device legislation. an audit trail.23 In contrast, no medical device legislation current- Change according to medical device legislation ly exists that allows manufacturers to place machine Most medical device regulations require a conformity learning devices on the market that are intended to assessment before a manufacturer can place a device on change outside of the change envelope or to suggest the market. The conformity assessment must demon- claims, intended uses, or use conditions to the device for strate the regulatory requirements are met.24 Devices which no conformity assessment was carried out (see can change in terms of design and characteristics after Figure 6). For example, the functionality of a machine conformity assessment, but only if the manufacturer has learning device placed on the market intended to detect a quality management system to address these changes frontotemporal dementia evolves during runtime to in a timely manner with regard to regulatory compli- detect dementia with Lewy bodies or Creutzfeldt-Jakob ance, including compliance with conformity assess- disease; this is considered a new intended purpose and ment procedures.25 Manufacturers are prohibited from requires a new conformity assessment. suggesting uses for the device other than those stated in the intended use for which the conformity assessment Changes outside of the change envelope or assigning was carried out.26 So, what does this mean for devices new claims or use conditions require an update of the that change during use? technical documentation, including the clinical evalua- tion and a new conformity assessment to be carried out.28 In the EU, notified bodies must have documented Significant changes to the predetermined ACP require procedures and contractual arrangements in place with an update of the technical documentation, including the the manufacturer relating to the assessment of changes clinical evaluation and a new conformity assessment. RegulatoryFocus.org June 2021 11
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 6. Illustration of AI change types Locked Change Change within outside pre-defined boundaries pre-defined boundaries for which the conformity assess- for which the conformity assess- ment was carried out ment was carried out ©Koen Cobbaert, 2021 Assume a natural or legal person (an individual or com- cannot be met or cannot be met at the appropriate pany, respectively) wants to place an existing device on level of performance by an equivalent device the market in the EU either by changing its intended available on the market. purpose or by modifying it in a way in which compli- • It provides information on the manufacturing, ance with the applicable requirements may be affected. modification, and use to its competent authority In those cases, the person will assume the obligations upon request and draws up a declaration that it shall incumbent on manufacturers, except if they change make publicly available.29 the device, without changing its intended purpose, to adapt it for an individual patient. Then manufacturer The exception to this rule is when a health institution obligations do not ensue, but an EU member state may adapts a device for a purpose not included in the scope still require the person to register as a manufacturer of the medical device definition, for example, a ma- of custom-made devices. This implies that a person chine learning bionic limb is adapted for superhuman can adapt a machine learning bionic eye to restore a purposes rather than to alleviate or compensate for a patient’s eyesight if the eye is intended for that purpose, disability. The health institution then must not meet the even if this involves a modification in such a way that conditions for in-house manufacturing. The author is compliance may be affected. not aware of any restrictions outside the EU that apply to health institutions performing manufacturing for In the EU, a health institution can change the intended in-house use. purpose or modify a device in such a way that compli- ance with the applicable requirements are affected so Controllability and human oversight that it can be used on multiple patients, but only if the Controllability refers to the ability of an external opera- health institution meets the conditions for in-house tor to intervene or deactivate a machine learning device manufacturing, meaning: in a timely manner.30 Figure 7 illustrates the different • The device is not transferred to another legal entity. types of controllability.31 Having a human in the loop • It has an appropriate quality management system in leverages the user’s situational awareness and judge- place. ment, which may be beneficial for the performance and • It justifies the target patient group’s specific needs safety of machine learning devices. RegulatoryFocus.org June 2021 12
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 7. Types of controllability DECIDE DECIDE DECIDE SENSE ACT SENSE ACT SENSE ACT Direct Control Supervisory Control No Control The AI performs a task and then The AI can sense, decide, and act on The AI can sense, decide, and act on waits for the human user to take an its own. The human user supervises its own. The human user cannot action before continuing its operation and can intervene intervene in a timely fashion. when required. Devices that are controlled or supervised by humans are also known as heteronomous devices. Devices that do not provide a means to intervene in a timely fashion are also known as autonomous devices. Hybrid devices provide direct control or supervisory control on certain functions and no control on other functions. ©Koen Cobbaert, 2021 Having a human in the loop requires situational aware- of the loop because of the user’s limited decision mak- ness, enough time to intervene, and a mechanism to ing capacity,34 limited situational awareness, and sensing interfere (a communication link or physical controls) uncertainties. In this case, the most effective human and take control or deactivate the device as required. supervision will be based on continuous oversight and periodic retrospective review of the performance for in- From a risk management and performance perspective, dividual patients or for cohorts of patients, for example, it may sometimes be necessary to take the human out through PMCF. of the loop to reduce the risk as far as possible to avoid human-machine interaction problems,32 such as a lack Automation is, of course, not a panacea. Manufacturers of operator situational awareness (sensing uncertain- and health institutions must be aware that automation ties/limited situational awareness), that is, the operator leads to deskilling in some circumstances or may require having insufficient knowledge of the state of the device more user training and higher-skilled individuals than at the time of intervention. actions performed without automation.35 When specific actions require the device to perform at Overtrust high speed or with safety and accuracy, it may be safer Manufacturers of machine learning devices sometimes to circumvent the limitations of immediate, real-time require a human in the loop to control when the AI is human control. For example, a machine learning robot less confident of its decision. Human behavior, however, for eye surgery33 may require the human to be taken out comes with its risks. For example, a Level 2 self-driving RegulatoryFocus.org June 2021 13
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation car36 requires a human in the loop to drive the car safely. to eliminate or reduce the risk of use error.40 Inappro- It might have to travel 100,000 km before the car has priate levels of trust in automation may cause subopti- an accident; the human may by then have put too much mal performance.41 confidence (overtrust) and automation bias37 in the car and no longer pays enough attention to take back Transparency and explicability control of the car in a timely and safe manner. Only by gaining the user’s trust will machine learning devices find their way into the care pathways. One way to In a perfect world, we would like users who trust AI gain confidence is to ensure transparency, both in terms when it works at 100% accuracy but to be hypersensitive of the organization that creates the AI and the AI itself. and identify when it is not. In reality, people often tend Transparency is also useful to clarify the liability, that is, to sway. If the first impression is positive, humans tend did the doctor or the user make a mistake? Was it the AI, not to see when the AI makes mistakes or may forget or incorrect or unforeseen input data, or a malicious actor, forgive the AI. How we must design our devices allow- for example, hackers or disgruntled employees? ing users to calibrate their trust appropriately and how we must educate them in their first interactions with Transparency may also be needed to allow manufactur- the device is an open field of research. ers to determine operating parameters (when the device works or does not work), limitations to the intended The medical domain can inspire. For example, findings use, contra-indications, inclusion, and exclusion criteria from a radiology study have shown that the human-de- for input data; or to enable debugging of the system and vice combination’s accuracy might improve when a detect potential issues of bias. computer-aided detection algorithm identifies more than Transparent AI presents core decision-making elements one choice to the radiologist.38 Offering multiple options in an open, comprehensive, accessible, clear, unambig- can maintain trust in the system and mitigate the risks of uous, and interpretable way.42 The first question that overtrust by putting the human expertise at work. comes to mind when considering algorithmic interpret- ability is: “Interpretable to whom?” The very word “in- In other cases, the AI’s performance alone might be terpretable” implies an observer or subjective recipient better than the performance of the combined hu- who will judge whether they can understand the algo- man-AI team. Sometimes, it is necessary to take the rithm’s model or its behaviors. Another challenge is the human out of the loop altogether to get the best perfor- question of what it is we want to be interpretable, that mance and reduce or eliminate use error. For example, is, the historical data used to train the model, the model, in a clinical diagnostic application used to read qPCR the performance of the model found by the algorithm assays, FDA requires manufacturers to deactivate the on a population cohort, or the model’s decisions for a possibility of the molecular biologist intervening, be- particular case? cause the precision and reliability of the AI outperforms that of the human-AI team.39 On the one hand, taking Early well-known machine learning models are rather the human out of the loop takes away human variabil- simple (see Figure 8) and principled (maximizing a ity and mitigates the risk of a lab using fewer control natural and clearly stated objective, such as accuracy), samples than required by the assay manufacturer. On and thus are interpretable or understandable to some the other hand, when AI is trained on rare diseases extent. However, the “rules” used by such models can with fewer datasets, it may require humans to be in the be complicated to understand fully. They may capture loop to reach maximum performance. Striking the right complex and opaque relationships between the variables balance is important and differs on a case-by-case basis. in what seemed to be a simple dataset. While radiol- Most medical device legislation requires manufacturers ogists may look for “a curved tube resembling a ram’s RegulatoryFocus.org June 2021 14
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 8. Transparency and interpretability of AI Transfer Activation Inputs Weights Output function function x0 w0 Reality x1 w1 + f() o x2 w2 Performance (accuracy, precision, ...) Neural networks comprise relatively simple models, with weights, transfer, and activation function. Still, due to the vast amount of data, a person will not be able to process this information to the point of understanding it. AI being technically interpretable or transparent does not automatically imply that a doctor or a patient can interpret it, i.e., to understand cause and effect. A different level of abstraction is required to make the neural network interpretable to users. ©Koen Cobbaert, 2021 horn, located in the inner region of the brain’s temporal decisions or predictions is the basis for some of the more lobe, to identify the hippocampus,” AI may use features promising research on interpretability.44 and patterns that are not articulable in human language. While this makes AI an extremely powerful tool for Explained variance, that is, “Given a blank sheet, what clinical decision-making, it also brings the risk of the would be the minimum input data needed to receive this AI reflecting spurious correlations in the data or overfit- decision?” is on the opposing end of counterfactuals, that ting to this particular dataset at the cost of transferabil- is, “Given the complete picture, what is the minimum ity. Thus, simple algorithms applied to simple datasets change needed to also change the answer?” Explained can nevertheless lead to inscrutable models. variance involves the AI providing “the minimum set of input data needed to come close to its decision.” The However, it is also entirely possible that even a model we minimum set of information depends on the desired level cannot understand “in the large,” or that is hidden from of closeness, which may differ for novice versus expert the user, can make specific decisions we can under- users. For example, an AI predicting the probability stand or rationalize post hoc, a phenomenon known as of survival from COVID-19 infection may explain to emergence.43 For example, a doctor can review the output the user that “age” contributed to 80% of its prediction. of an algorithm reporting on the wound healing stage For some users, this may be sufficient information. In (haemostasis, inflammatory, proliferative, or maturation) contrast, other users may want the AI to explain 99% of by looking at a wound picture to determine whether the its prediction, adding that patient history contributed to algorithm identified the healing phase correctly. Alter- 15%, specific lab results 3%, symptoms 0.5%, etc.45 natively, we can interrogate the model by having it tell us what it would do on any input. We can explore coun- “Inexplicable” devices are not unusual because health care terfactuals such as “what would be the smallest change has long been known for accepting “inexplicable” devices in input data that would change the decision?” This type for certain purposes, as long as the technical file includes of explanatory understanding at the level of individual a description of the technology and adequate evidence RegulatoryFocus.org June 2021 15
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation of safety and performance. For example, manufacturers Manufacturers must also be transparent on the use of demonstrated through randomized, controlled studies automated decision making. The rules in the EU Gen- that electroconvulsive therapy is highly effective for eral Data Protection Regulation 2016/679 (GDPR)48 severe depression, even though the mechanism of action imply that, when it is not immediately obvious that the user is interacting with an automated decision mak- remains unknown. The same holds true for many drugs ing process rather than a human (e.g., because there under the medicines regulations, such as selective sero- is no meaningful human involvement techniques, for tonin reuptake inhibitors or anesthetic agents. example, to improve a sensor or optics within a device), a software device must inform users of that fact, in Suppose a technology is significantly more effective particular patients, and include meaningful information than traditional methods in terms of diagnostic or about the logic involved, as well as the significance and therapeutic capabilities, but it cannot be explained. In the envisaged consequences of such processing for the that case, it poses ethical issues to hold back technology, data subject. simply on the basis that we cannot explain or under- stand it. Explicability is a means (to trust), not a goal. Ethics AI ethics is used in the meaning of respecting human A blanket requirement that machine learning systems values, including safety, transparency, accountability, and in medicine be explicable or interpretable is therefore morality, but also in the meaning of confidentiality, data unfounded and potentially harmful.46 Of course, the protection, and fairness. These aspects are not new, as advantage of making the model interpretable is that it philosophers have been debating ethics for millennia. helps the user gain confidence in the AI system faster, However, the science behind creating ethical algorithms allowing the company to be successful commercially. is relatively new. Computer scientists have the responsi- bility to think about the ethical aspects of technologies Enclosed AI, that is, AI with no actionable outcome, they are involved in and mitigate or resolve any issues. may not require transparency toward the health care Of these ethical aspects, fairness is probably the most provider or patient but requires sufficient explicabil- complicated because it can mean different things in ity47 to the manufacturer or service engineer to allow different contexts to different people.49 verification, error detection, and troubleshooting. An To consider the ethical aspects of software that changes example would be an AI that controls the cooling of a itself through learning, the European Parliamentary motor coil. Research Service50 frames it as a real-world experiment. In doing so, it shifts the question of evaluating the moral acceptability of AI in general, to the question of “under what conditions is it acceptable to experiment AI ethics is used in the meaning with AI in society?” It uses the analogy to health care, in of respecting human values, which medical experimentation requires manufacturers including safety, transparency, to be explicit about the experimental nature of health accountability, and morality, care technologies by following the rigorous procedures of a clinical investigation, subject to ethics committee but also in the meaning of approval, patient consent, and careful monitoring to confidentiality, data protection, protect the subjects involved or impacted. and fairness. Many AI ethics frameworks have appeared in recent years.51 These differ based on the organization’s goals RegulatoryFocus.org June 2021 16
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation and operating contexts. Such frameworks have limits, a single working point on the tradeoff curve between for example, because many AI systems comprise a accuracy and fairness that is acceptable worldwide. trade-off between algorithmic fairness and accu- racy.44,50 On the one hand, a fair algorithm should Bias provide the same benefits for the group while pro- From a scientific point of view, bias is the tendency of a tecting it from discrimination. On the other hand, an statistic to overestimate or underestimate a parameter.53 accurate algorithm, should make a prediction that is as From a legal perspective, bias is any prejudiced or partial precise as possible for a certain subgroup, for example, personal or social perception of a person or group.54 It according to age, gender, smoking history, previous is beyond this article’s scope to discuss the differences illnesses, and so on. When an algorithm lies on the between the scientific and legal definition; suffice it to trade-off curve between fairness and accuracy, it is say we attribute a broader meaning to the legal defini- often a matter of public policy, rather than an isolated tion, mainly because the scientific definition generally decision made by the organization. For example, a is understood to refer to systematic estimation errors.55 hypothetical example of an AI algorithm used during In contrast, the legal definition also can apply to one-off the COVID-19 pandemic determines what patients errors in perception. should receive treatment. Certain countries or regions would prefer to allocate their scarce resources to pa- Aiming for software to be unbiased is desirable. Zero tients who are predicted to have the highest chance of bias is, however, impossible to achieve. Bias may enter survival (accuracy prevails), whereas others may prefer the AI development chain at different stages (see to apply a fair allocation of resources rather than con- Figure 9). We humans all have our blind spots. There- sidering a patient’s age and gender (fairness prevails). fore, any data set that relies on humans making de- Such national and regional differences require a hand- cisions will have some form of bias. Every hospital is shake between the company’s ethics framework and different, every country is different, and every patient is that at the policy level. To accommodate for national different. You can never achieve zero bias when extrap- and regional differences, algorithms can be designed olating. In the medical world, clinical investigations of so that they can be adjusted by the ethics committees devices for adults have historically underrepresented at hospitals or at the regional level to meet the level of women, minority racial or ethnic groups, and to some accuracy versus fairness desired. extent, patients over age 65.56 AI can maintain or even amplify such bias through its decisions. In trying to Ethics committees emerged in health care in the 1970s optimize its function, AI might ignore a minority if the at the very heart of hospitals. Initially, they focused on minority looks different from the general population to research ethics for clinical investigations on human sub- optimize for the general population. jects. Such committees exist throughout Europe and are regulated by law. Today, many of these committees have For manufacturers to establish that bias is minimized, organized themselves at the regional or national level they need to assess the AI for bias and ensure bias has and also focus on clinical ethics. They have evolved into been minimized if considered harmful. For example, democratic platforms of public debate on medicine and they can compare the software output against an inde- human values. There is no single model. Every country pendent reference standard (e.g., a ground truth, biopsy, has created its own organizational landscape, according or expert consensus). They can perform specific sanity to its moral and ideological preferences, adapted to the tests in terms of accuracy on every group to efficiently political structure of its health system, and its method identify from the data.57 The challenge is for computer of financing health care.52 This complex reality and scientists to get an accurate picture of which group(s) the lack of a unified approach make it challenging for could be biased. The difference might not show up in companies to engage with ethics committees and find aggregate, but only when focusing on a sub-population RegulatoryFocus.org June 2021 17
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 9. Bias entry points in the AI development chain validation BI BIAS? AS ? application in context data data algorithm collection processing development algorithm deployment feedback loops BIAS? BIAS? feedback loops Bias can creep in at different stages of AI development. During (1) data collection (as people or the collection methods can be biased), (2) data processing operations (e.g. during cleaning, labelling, and aggregation, or through assumptions in the design of data processing pipelines), (3) algorithm development (e.g. human judgment about AI model construction, implementation and evaluation metrics) and (4) algorithm deployment (in particular through context changes or when learning from user interactions). Source: Arlette van Wissen within that group, where no test can exhaustively cover evaluation purposes.58 They can use these characteristics the space of all permutations. A big challenge to address to establish bias and determine whether the AI is suit- bias is that sufficient and complete data must be avail- able for a specific target population. However, mandato- able, which is rarely possible under GDPR, causing the ry certification of training data against these standards tradeoff between fairness and accuracy. Also, testing AI is not an effective mechanism to assure the AI is safe for nondiscrimination on an ethical basis is at odds with and effective for the target population or that bias is GDPR and poses risks to users’ privacy. Under current minimized. Manufacturers do not always have access to GDPR requirements, developers should not be able to the training data (see machine learning section). There access attributes such as ethnicity and, therefore, could are forms of AI that learn without training data. Also, not test for ethnic representation in a dataset. bias can enter the AI development chain at different points. Training data is only one of those entry points. Conversely, software can play an essential role in identi- Instead, manufacturers can make a more comprehen- fying and minimizing bias. This is not specific to artificial sive assessment of bias through the use of qualitative intelligence but to software in general. Historically, it was evaluation data. hard to prove unintended discriminatory bias based on race, for example. Software can enable feedback loops AI legislation that make it easier to detect and fix bias issues. European Union Legislators across the world are focusing on ethical Standardization bodies are currently developing stan- aspects of AI and the presence of bias. A 2019 heat map dards to characterize data sets. Manufacturers can use published by Anna Jobin59 shows that the number of these standards to establish data quality for training or published AI ethics guidelines has increased, especially RegulatoryFocus.org June 2021 18
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 10. Geographic distribution of issuers of ethical AI guidelines by number of documents released No guidelines 1 guideline Between 2-4 guidelines Between 5-14 guidelines 15 or more guidelines Source: Jobin A. Artificial Intelligence: The global landscape of ethics guidelines. June 2019. https://www.researchgate.net/ publication/334082218_Artificial_Intelligence_the_global_landscape_of_ethics_guidelines. Accessed 16 February 2021. in Europe and the US. As ethical guidelines are not to address the ethical aspects of AI.61 The EU is also enforceable, the EU is assessing if and how to regulate assessing whether and how to address the liability and ethical aspects of AI (see Figure 10). intellectual property of AI. People often attribute a broad meaning to ethics. Ethics China guidelines from the High-Level Expert Group on The Center for Medical Device Evaluation (CMDE), Artificial Intelligence,60 for example, include algo- a division of the Chinese regulatory authority, the rithmic as well as data aspects: Nondiscrimination, National Medical Products Administration (NMPA), issued comprehensive requirements62 encompassing diversity, bias, privacy and data governance, societal and scrutiny of machine learning devices across their entire environmental wellbeing (e.g., energy needed to train device lifecycle. The National Institute for Food and AI and its impact on global warming), human agency Drug Control, another NMPA division, supplements and oversight, transparency, accountability (related to these requirements with a growing body of standards, logging), technical robustness, and safety. Most of these for example, to characterize the data sets used for the aspects are already covered by existing regulations, such training or evaluation of AI (see Figure 11). Manu- as the medical device, data protection, product liability, facturers are well advised to take these recommended and carbon dioxide emission regulations, although they standards into account early in the development project are not as specific to AI as digital rights advocates and when drafting the clinical evaluation and clinical de- the European Parliament would like. The European velopment plan so that there are no surprises when the Commission recently published a legislative proposal regulatory submission for China is prepared. RegulatoryFocus.org June 2021 19
Vo l u m e 1 • N u m b e r 2 Artificial intelligence: Characteristics, regulatory compliance, and legislation Figure 11. Overview of NMPA regulation and standardization applicable to AI-enabled medical devices In Effect In Effect In Effect R EG U L AT I O N S Medical Software Medical Device Mobile Medical Device Review Guideline Cybersecurity Guideline Guideline Draft Artificial Intelligence Medical Device Review Guideline In Effect In Effect Key Review Points on Deep Learning Clinical Key Review Points on Decision Support Systems CT Pneumonia In Effect Proposal Proposal Proposal S TA N DA R D S Performance Index and Test AI Medical Device CT Lung AI Medical Device Fundus Development and Validation Method for CT Lung Nodule Image CAD Performance Image Assist Analysis Software of CT Pneumonia Decision Support Product Evaluation Method Performance Evaluation Method Decision Support Software Draft Draft Proposal Proposal AI Medical Device Quality AI Medical Device Quality AI Medical Device Quality AI Med. Dev. Software Production Requirements and Evaluation Requirement and Evaluation Requirement and Evaluation Part 3: Quality Management Part 1: Deep Part 1: Terminology Part 2: Data Set Data Annotation Requirements Learning Algorithm Development Through standardization, the Artificial Intelligence culture of quality, excellence, and responsiveness, FDA Medical Device Innovation and Cooperation Plat- believes that a streamlined approval process could form, a subdivision of CMDE, actively encourages the be allowed. The shift from a pure product focus to a creation of evaluation databases and test platforms, product-and-process viewpoint is a new pathway for starting with high prevalent diseases, such as lung can- FDA and is a step toward convergence with the quality cer and diabetic retinopathy. As the world is a long way management system approach used within the EU. from having databases covering all 55,000 diseases and At the time of writing, FDA is piloting the pre-cert conditions listed in the 11th International Classification program as a regulatory sandbox. According to person- of Diseases (ICD 11)63 published by the World Health al communication from K. Cobbaert on 8 May 2021, Organization, this Chinese initiative is a welcome start. a regulatory sandbox is a framework that provides a structured context for legislators to create and evaluate United States draft legislation by assessing its impact on a technology, In the US, the FDA published a discussion paper that product, service, or an approach and its stakeholders. focuses on machine learning devices that change during Where appropriate, such experiment may occur in a re- runtime, citing the Software Precertification (Pre-Cert) al-world environment with appropriate safeguards and Program64 as a possible regulatory pathway for AI. under regulatory supervision, while existing legislation The program is intended to be a regulatory model that is not enforced for a limited time and in a limited part is more streamlined and efficient, resulting in getting of a sector or area. products to market and to patients faster than existing 510(k), de novo, and premarket pathways. The role of AI standards Generally, legislation provides high-level requirements to The pre-cert program involves focusing on the prod- which a product must comply, while a standard provides uct developer instead of focusing primarily on the requirements on how a product must comply. Conse- product itself. If the developer can demonstrate a quently, standards are more prescriptive than legislation. RegulatoryFocus.org June 2021 20
You can also read