Louvain School of Management - Patient data and artificial intelligence, the panacea for today's health care? - DIAL@UCL

Page created by Raymond Warren
 
CONTINUE READING
Louvain School of Management - Patient data and artificial intelligence, the panacea for today's health care? - DIAL@UCL
Louvain School of Management

Patient data and artificial intelligence, the panacea for
                                     today’s health care?

                                                   Mémoire projet réalisé par
                                                      Brice Van Eeckhout

                                            en vue de l'obtention du titre de
                  Master 120 en ingénieur de gestion, à finalité spécialisée

                                                                Promoteur(s)
                                                      Olivier de Broqueville

                                              Année académique 2017-2018
Louvain School of Management - Patient data and artificial intelligence, the panacea for today's health care? - DIAL@UCL
2.

        “Intelligence and capability are not enough.
There must be the joy of doing something beautiful.”

                         Dr. Govindappa Venkataswamy
                    Founder of Aravind Eye Hospitals
Louvain School of Management - Patient data and artificial intelligence, the panacea for today's health care? - DIAL@UCL
3.

Patient data and artificial
intelligence, the panacea for
today’s health care?
Louvain School of Management - Master Thesis                    2018

Promotor                                                       Student
Olivier de Broqueville                               Brice Van Eeckhout
                           In collaboration with
                           Benelux Health Ventures
                            Maastricht University
Louvain School of Management - Patient data and artificial intelligence, the panacea for today's health care? - DIAL@UCL
I.
Louvain School of Management - Patient data and artificial intelligence, the panacea for today's health care? - DIAL@UCL
II.

Table of contents
Introduction ......................................................................................................................... 1

Part 1 – Medical data: Current context ................................................................................ 2
1.      Defining medical data .................................................................................................... 2
2.      Current medical data management ................................................................................ 5
     2.1 Gathering & storing medical data ................................................................................ 7
        2.1.1 Storage of Medical Data: Towards FAIR Data ......................................................... 7
        2.1.2 Enabling Patients to Re-own their Data ................................................................. 9
     2.2 Using & transforming medical data .............................................................................. 9
        2.2.1 Security Related Issues ........................................................................................ 11
3.      Prostate Cancer ........................................................................................................... 14

Part 2 – Machine learning applied to medical data: The Virtual Patient Avatar ................ 18
1.      The Virtual Patient Avatar™ (VPA™) ............................................................................. 18
     1.1 Key concepts.............................................................................................................. 19
        1.1.1 Decision Support Systems (DSS) .......................................................................... 19
        1.1.2 Individualized Patient Decision Aids (IPDA).......................................................... 21
        1.1.3 App based monitoring & lifestyle actions ............................................................ 22
        1.1.4 Distributed Learning ............................................................................................ 23
     1.2 The 4Ps of modern healthcare ................................................................................... 24
     ........................................................................................................................................ 26
2.      Controversial aspects of the use of tools such as the VPA™ ......................................... 26
3.      Actors involved and their interests............................................................................... 28
     3.1 Patients ..................................................................................................................... 28
     3.2 Doctors & Hospitals ................................................................................................... 29
     3.3 Health Insurance Companies, Employers & Society at large ....................................... 29
4.      Why prostate cancer as a first module? ....................................................................... 30
5.      Why is there an opportunity now? ............................................................................... 31
Louvain School of Management - Patient data and artificial intelligence, the panacea for today's health care? - DIAL@UCL
III.

Part 3 - Business Model Theory .......................................................................................... 33
1.      Definition ..................................................................................................................... 33
2.      Designing a good business model................................................................................. 35
3.      Assess the attractiveness of the model ........................................................................ 37
4.      The Business Model Canvas ......................................................................................... 38
5.      The Lean Startup .......................................................................................................... 39

Part 4 - Market Research: Are patients ready for this technology ? ................................... 41
1.      Research Questions ..................................................................................................... 41
2.      Objective of the Study.................................................................................................. 41
3.      Methodology ............................................................................................................... 42
     3.1 Format and order of the questions ............................................................................ 43
     3.2 Validation of the study ............................................................................................... 44
     3.3 Diffusion of the questionnaire & sample size ............................................................. 45
     3.4 Flowchart of the study ............................................................................................... 47
     3.5 Targeted segments of the population ........................................................................ 47
4.      Description of the results ............................................................................................. 48
     4.1 Socio-demographic characteristics of the sample ...................................................... 48
     4.2 Results of the behavioral and perception questions................................................... 52
        Potential users’ control over their medical data & its purpose ..................................... 52
        Potential user’s feeling towards cloud-based EHR lockers ............................................ 56
        Potential user’s perception of the VPA™’s features for preventive purposes ............... 58
        Potential user’s perception of the VPA™’s features for treatment support purposes ... 60
        Potential user’s perception of the importance of the quality of life.............................. 63
        Potential user’s perception of the potential payment schemes .................................... 64
        Potential user’s perception of the role of insurance companies in the accessibility to
        tools such as the VPA™ ................................................................................................ 65
5.      Discussion .................................................................................................................... 66
     5.1 Is there a gap to be filled and are patients aware of it?.............................................. 66
     5.2 Is the proposed solution the right one? ..................................................................... 67
     5.3 Whom to target and how? ......................................................................................... 68
IV.

        5.3.1 Patient segments................................................................................................. 69
        5.3.2 Financing system ................................................................................................. 69
        5.3.3 Promoting actors ................................................................................................. 70
     5.4 Limitations of the survey............................................................................................ 70
6.      Future Prospects .......................................................................................................... 71

Conclusion .......................................................................................................................... 72

Glossary .............................................................................................................................. 74
Bibliography ....................................................................................................................... 78
Acknowledgments .............................................................................................................. 84
1.

Introduction
Only 3% of medical data is currently being used in medical research and clinical trials. Indeed,
Evidence-Based Medicine (EBM)1 requires homogenous cohorts of patients with sufficient
follow up and thus many patients have to be excluded from trials. Moreover, the remaining
data that is actually being used is not representative of the patient population seeing as these
trials are based on relatively small cohorts. Elderly people are, for example, often absent from
these randomized trials.
At the same time, we now realize that a more personalized, predictive, preventive and
participatory approach is necessary in the treatment of chronic disease patients due to the
complexity of their health status, the important and often underestimated role that their
lifestyle plays, and the increasing number of therapeutic options.

This master thesis is aimed to enlighten the reader about the way that medical data is
currently gathered, used, stored and owned, and the ways it could be transformed into
actionable knowledge in order to improve global health in the years to come.

The main question we want to raise is: Can we make better use of the available medical data
to improve chronic disease management through personalized predictions, and gather more
data in an interoperable way that could be used for future clinical trials? And if so, how?

We will focus on prostate cancer as an example seeing as it is a well-documented, complex
chronic disease in which several therapeutic options are available. It is also the most
prevailing form of cancer in men. This being said, we’ll also explore other chronic diseases in
order to illustrate that the same principles can be used in many other cases.

1
  Evidence-Based Medicine (EBM) – “Evidence-based medicine is the conscientious explicit and
judicious use of current best evidence in making decisions about the care of individual patients”
(Sackett et al., 1996)
2.

Part 1 – Medical data: Current context

1. Defining medical data
The concept of medical data has become broader and broader with the discovery of
numerous new features that can be related to patient’s health. Many different types of data
can thus be considered to be part of medical data, here we will discuss them depending on
their source and type.

A first source of patient data is the healthcare system and healthcare providers. Ideally, the
patient’s medical record should provide a complete overview of the patient’s medical history
(e.g. appointments with doctors, blood tests, doctor notes, diagnosis, drug prescriptions and
other treatments). However, in practice, most patients have an incomplete medical file due
to the absence of a standardized collection and storage system. In some other cases, general
practitioners do collect all the necessary patient data but store them in an un-standardized
way, making it almost impossible for scientists to identify and properly use this data for
research purposes.
The standardization of patient data in Belgium is still a work in progress (see full transcript in
Appendix 12), many steps are being taken to increase standardization and interoperability2 of
patient data, allowing for better care for patients and making clinical research on a large scale
possible.

Denmark is another interesting example (see full interview transcript in Appendix 10 and 11),
they have had an efficient and interoperable system for medical records in place for years
now, it is widely used by the public and shows that such systems can be put in place
nationwide. However, some pain points still exist when it comes to giving access to patients

2
 Interoperability – “The ability of data or tools from non-cooperating resources to integrate or work
together with minimal effort.” (Wilkinson et al., 2016, p. 2)
3.

to their health records. For example, medical images (e.g. CT-scans3, ultrasounds,
radiography, Magnetic Resonance Images) are heavy files that are difficult to add in an
accessible system. For now, only a description of the images is available on the shared
systems.
Another question has come up about making doctor’s personal notes taken during
consultations available to patients. Some of the notes in question can contain very sensitive
information when it comes to situations such as the treatment of mental illnesses and family
related issues. These issues are of a complicated nature and the doctors’ notes could hinder
the patient-doctor relationship, therefore the access to these notes has become debatable
even though patients are supposed to be the owners of their own medical data. In Denmark
it has been decided not to make doctors’ notes available to the patients on the e-health
platform. In Belgium, an intermediary solution has been offered where the patient has to
notify his doctor or psychologist of his desire to get those notes and the health professional
has then 15 days to delete information he considers too sensitive from the copy he will
provide to the patients (Belgian Parlement, 2002).

A second source of data is biological. Recent scientific advances have made new types of data
more available, with more reasonable pricing and rapid analysis. A good example is DNA
sequencing. The price for the sequencing procedure has decreased 100.000 times in 15 years
and is expected to keep dropping due to the increasing computer power (see Figure 1). The
analysis of the patient’s sequenced genome can determine if the patient has more risk of
developing certain diseases or will present more toxicity after specific treatments, more
studies are currently being carried out to better comprehend the meaning behind the
sequenced genome.

A third relevant source of data is the patient himself. Indeed, patient preferences when it
comes to issues such as treatment, treatment side effects and lifestyle can be gathered and

3
 Computerized Tomography scans (CT-scans) – “A computerized tomography (CT) scan combines a
series of X-ray images taken from different angles around your body and uses computer processing to
create cross-sectional images (slices) of the bones, blood vessels and soft tissues inside your body. CT
scan images provide more-detailed information than plain X-rays do.” MAYO CLINIC
4.

provide essential insight into the patient’s life outside of the healthcare system. This data can
only be gathered when the patient has been properly informed about his disease and the
various treatment options. Indeed, enabling the patient to engage in Shared Decision Making
(SDM)4, or in other words, to be part of the treatment decision process, has shown to deliver
considerable benefits for the patient’s health and for society at large. We will further develop
this topic in the section on Patient Decision Aids (PDA)5.

                                                                   Figure 1: Cost of sequencing a
                                                                   human       genome    (National
                                                                   Human       Genome    Research
                                                                   Institute, 2016)

4
 Shared Decision Making (SDM) – “Shared Decision Making is a process in which patients, when they
reach a decision crossroads in their health care, can review all the treatment options available to them
and participate actively with their healthcare professional in making that decision.” NHS England

5
  Patient Decision Aid (PDA) – “Patient decision aids are tools that help people become involved in
decision making by making explicit the decision that needs to be made, providing information about
the options and outcomes, and by clarifying personal values. They are designed to complement, rather
than replace, counselling from a health practitioner.” The Ottawa Hospital Research Institute
5.

2. Current medical data management
An important step in the management of patient’s data is its collection and storage in a safe
and useable way to allow clinical research to be carried out and to improve patient care. In
most countries, patients’ medical data is currently split between various locations, making it
almost impossible for use in clinical trials. Most clinical trials need continuity in the patient’s
medical history and information from multiple sources (e.g. images, survival, histological
results). In countries such as Belgium for example patients don’t always go through their
general practitioner before consulting specialized doctors, therefore there is no way to ensure
continuity and interoperability in patients’ medical records.
There are three main ways to approach collecting patient’s data in national databases,
centralized, decentralized and a hybrid of both.

Belgium has launched several initiatives to start partly decentralized national databases (see:
http://www.plan-esante.be/).
In Belgium, the structure put in place is called a “metahub” structure (see full transcript of the
interview in Appendix 13). It bases itself on 4 hubs, 3 volts and a “metahub” or e-health
platform that connects them all. The four hubs are technical and organizational
collaborations, « Réseau santé Wallon », « Réseau santé Bruxelles », a hub for Gent and
Antwerp and the last hub covers Leuven, these hubs cover patient information from hospitals.
Each hub has its own technology and facilitates access from one care provider to the other
within the hub. So if you are going to your hospital in Oostende for instance but you need a
second opinion and you go to Leuven, in the same Hub, the physician in Leuven can access
records in Oostende. So that is what physicians can do to facilitate the continuity of care for
Belgian patients. However, when it comes to general practitioners (GPs), the majority of them
are not online 24/7, and if in a hospital, the Emergency room for example need urgent
information about medication or the patients’ current medical situation, they can’t always
just call the GP. That is the reason why Belgium has “Volts”. There is a volt in Flanders,
“Vitalink”, one in Brussels, called “Bruxsafe”, and a volt in Wallonia. These volts contain
concise summaries of the GPs health records called the “SUMER” for summary of e-health
6.

records. It is up to the GPs to manually transfer all the necessary information into the Volt,
and these volts are of course open and accessible 24/7 to all parties.
Finally, it is mandatory for all software providers in Belgium to have a direct connection with
the “Metahub”, the e-health platform that connect all the volts and hubs with each other.
Consequentially, in Belgium, if you enter a patient’s national registry number into the e-health
platform you should be able to see the patient’s medical history.

Denmark uses a partly decentralized system that has successfully been in place for years. In
Denmark, the five different “hospital regions” have contracted with different vendors for
platforms (see Appendix 3) making patients, practitioners, and authorized researchers able to
access all the different data-warehouses of the country and get a complete view of citizens’
health history. The fact that the platforms are offered by different companies ensures a
certain amount of product competitiveness. Product development is guided by common
standards to ensure interoperability between software in the way that medical data is
extracted and presented. While this kind of systems makes the gathering of all new patient
data far easier for public health administrations, previous patient data that used to mostly be
kept in paper files can’t immediately be included in the new digitalized system.

    Figure 2: Sundhed.dk basic architecture of the Danish citizens’ access to health data
                                    (Nøhr et al., 2017)

Other initiatives are being presented to create an interoperable EHR system on the European
level in order to facilitate transnational data exchange for research purposes. To be
successful, the systems have to enable data exchange between silos. These initiatives are still
7.

at the project state but could be launched in the coming years with the help of European
funds.

2.1 Gathering & storing medical data
The way of gathering and storing the data is crucial since these steps define if it will be possible
to transform the collected data into actionable knowledge. The amount of data available is
huge and still only about 3% of available data is used and there is an increasing unmet need
for data for research purposes and to train software. This is because when no common
standards exist in the way of handling the data, there is no interoperability between different
systems and developing tools to extract knowledge from the data is very costly if not
impossible. This represents one of the main barriers to personalized care.

As shown by the Danish system for example, there is no need to use a single software
developer or software to achieve this interoperability. Different data repositories with
different features can work together, provided that they have been built on a series of
common standards. International organizations, backed by the European Commission for
example, have developed these potential standards to push interoperability between medical
repository systems. A good example of useable standards are the FAIR data principles.

2.1.1 Storage of Medical Data: Towards FAIR Data
Recently, governmental agencies have been asking science researchers to create FAIR data, a
concept pushed, amongst others, by the FORCE11 (The Future of Research Communications
and e-Scholarship), “a community of scholars, librarians, archivists, publishers and research
funders that has arisen organically to help facilitate the change toward improved knowledge
creation and sharing.” (FORCE11, not mentioned). The FAIR principles provide a series of
common standards and advocate for data that is Findable, Accessible, Interoperable and
Reusable:
8.

Findable: The data has to be easy to find for both humans and computers, with metadata6
that facilitate searching for specific datasets,

Accessible: long term storage is necessary so that they can easily be accessed and/or
downloaded with well-defined license and access conditions (open access when possible),
whether at the level of metadata, or at the level of the actual data,

Interoperable: the data has also to be ready to be combined with other datasets by humans
or computers,

Reusable: it has finally to be ready to be used for future research and to be further processed
using computational methods.

While, other organizations have already proposed similar principals for data focusing on
human data management, the FAIR principles “put specific emphasis on enhancing the ability
of machines to automatically find and use the data, in addition to supporting its reuse by
individuals.” (Wilkinson et al., 2016, p. 1).

Many past initiatives have successfully developed tools that integrate health data from
different platforms in an increasingly FAIR way, but they have often failed to show a universal
utility. This is because interoperability7 is not frequently found in practice while it is highly
correlated with the success of the proposed tool.

6
  Metadata – “Metadata is contextual information about a piece of data or a data set that is stored
alongside the data. Metadata gives consumers of data, including applications and users, greater
insight into the meaning and properties of that data.”
“Data with strong metadata can be used to search and access records that meet certain criteria. For
example, in an organization with strong data governance, metadata allows the company to quickly
find comply with data discovery or regulatory compliance requests. It can help users find information
to do their jobs more efficiently. It can lead to greater business intelligence as well as cost savings
from greater data storage efficiency.” Informatica

7
 Interoperability – “The ability of data or tools from non-cooperating resources to integrate or work
together with minimal effort.” (Wilkinson et al., 2016, p. 2)
9.

Initiatives have been launched to define infrastructures for patient-owned medical records,
which may be a good starting point. While a German national initiative has been issued by the
Ministry of Health (Dutch Techcenter for Life Sciences, not mentioned), the Fraunhofer
Society is working on a project called the Medical Data Space (MedDS) aiming at data privacy
and data ownership, tailored to the specific needs of medical applications (Fraunhofer, not
mentioned).

2.1.2 Enabling Patients to Re-own their Data
The next step towards an international solution is to give citizens the opportunity to put their
own health data into a digital file respecting the FAIR principles. A cloud-based solution of
Electronic Health Records (EHR)8 would both provide convenience and high safety to the
patients. Indeed, they could have a personal health locker in a digital cloud where all their
health data is stored and from which they could grant specific accesses trustable parties.
Other solutions could be found in the use of the blockchain technology as a way to store
patients’ health records and ensure their security. We will further discuss the benefits of this
technology in a later section of this master thesis.
These potential solutions would put the patients back in control of their own medical data
and enable them to contribute to medical research and advances if that is what they would
like to do.

2.2 Using & transforming medical data
Data can be used for many different purposes and in many different ways.
A first, and essential use of patient data is to allow patients and their different doctors (GP
and specialists) to have immediate access to the patients complete medical history. In
Denmark, a platform called sundhed.dk is used by all, it provides many different services to
both doctors and patients and is a first step towards transparency and towards giving patients

8
 Electronic Health Records (EHR) – “An electronic health record (EHR) is a digital version of a patient’s
paper chart. EHRs are real-time, patient-centered records that make information available instantly
and securely to authorized users. While an EHR does contain the medical and treatment histories of
patients, an EHR system is built to go beyond standard clinical data collected in a provider’s office and
can be inclusive of a broader view of a patient’s care.” “One of the key features of an EHR is that health
information can be created and managed by authorized providers in a digital format capable of being
shared with other providers across more than one health care organization.” HealthIT.gov
10.

more control over their data. Through this platform, Danes are able to book appointments
and they are able to give access to their data to another doctor for a second opinion on their
health status. Patients in Denmark also have control of which part of their data they wish to
disclose9, this provides them with a strong data disclosure security10.

First, making data available to patients and putting them in touch with doctors through a
platform such as sundhed.dk in Denmark is a first step towards transparency for patients
while providing convenience advantages to both patients and doctors.
Indeed, patients in Denmark are able to book appointments through the platform, they can
provide their medical data to a doctor to get a second opinion on their health status.
Doctors, on their side, get immediate access to their patients’ complete medical history, it
allows them to easily contact their patients and they can even write online prescriptions for
their patients that are directly sent to pharmacies’ databases.

Building on these existing features and benefits, prediction models could provide added
insight to both patients and doctors when making medical decisions where multiple factors
have to be taken into consideration. Indeed, implementing an effective EHR system is only a
first step, technology could then transform the data into actionable information for health
care providers and patients. As such, if the medical data follow the FAIR principles it could
then be processed by software that uses machine learning to calculate the likelihood of
multiple outcomes when the patient is taking different possible actions (e.g. treatments,
physical activity, diet). These predictive models can also help inform the patient on his
different treatment options and give him advice based on his personal preferences and
diagnosis, thus better preparing him before he meets his doctor and allowing for an informed
discussion during the consultation.

9
 Patients have the right to block the physicians from accessing certain health related information. This
denied access can be overwritten by the doctors in the case of an emergency where the patient is
unable to provide proper consent. However, these unusual cases are closely checked by a board and
the doctors have to prove the need to access this data (see full interview transcript in Appendix 10).

10
  Patients are also notified every time a physician other than their usual care giver has a look at their
data, enabling the former to ask for explanations if they don’t recognize the physician (see full
interview transcript in Appendix 10).
11.

2.2.1 Security Related Issues
The previously mentioned type of Electronic Health Record (EHR) is currently being developed
by companies such as Apple® with its Health Record app “letting patients use their
smartphones to download their own medical records” (The Economist, 2018b) “to allow users
to view, manage and share their medical records” (The Economist, 2018a).
Simultaneously, new regulations on the protection of the European citizens’ data will be put
in action in May 2018 in the whole European Union.
Most companies are still facing difficulties in being compliant with these new regulations, the
healthcare sector is also particularly affected because it deals with extremely sensitive data
that are now highly protected by the GDPR.
The EU General Data Privacy Regulation (GDPR)
The new regulation (EU) 2016/679 on the protection of personal data is aimed at giving
citizens control over their own data, creating a highly protected environment for data inside
and outside the EU, diminishing the costs and administrative burden for businesses using data
through a single law (saving around €2.3 billions a year), and applying a single regulation all
across the EU to restore trust around data security.
This new regulation globally increases the safety requirements related to data collection,
storage and sharing, especially when it comes to sensitive data such as medical records. The
agreement was reached in December 2015 but will come in application from May 25, 2018.

A major change brought by this new regulation is its increased territorial scope. “The
jurisdiction of the GDPR is extended since it now applies to all companies processing the
personal data of data subjects residing in the Union, regardless of the company’s location”
(Home Page of EU GDPR, not mentioned).

New data extraction methods
Some promotors of Artificial Intelligence (AI), such as the famous surgeon and entrepreneur
Laurent Alexandre, co-founder of Doctissimo, see the EU GDPR as a hindrance for European
based health care data related projects (De Greef, 2018). Indeed, by increasing the protection
of European citizens’ data in a less restrictive global environment, the EU GDPR could be a
12.

barrier for European data processing firms to access the data they need and disadvantage
them with respect to their non-European competitors.
This being said, some of those projects might be able to use the regulation at their advantage.
New data extraction solutions such as distributed learning, which is further described in a
later section of this master thesis, can help get a step ahead of this new EU regulation when
it comes to the creation of predictive models. Indeed, the data used to train the software in
this case never leaves the firewalls of its sources (e.g. hospitals, national data repositories),
making sure that data breaches are impossible.
Cancer & Lifestyle data
Medicines and treatments can usually only be developed for specific cancers at specific stages
of advancement which means that we are far from “curing cancer”. This, of course, requires
huge investments of time and money while the outcome is highly specific medication.
For cancer at large, medical research studying the medical history of homozygote twins, who
by definition share the same genome, has shown that hereditary factors only have a minor
impact on the incidence of cancer while the biggest source remains lifestyle and
environmental factors (Lichtenstein et al., 2000). One of these studies, researching the co-
incidence of breast cancers in identical twins, concluded to only 20% disease concordance
(Hamilton & Mack, 2003).
Between 90 and 95% of all cancer cases would be attributable to patients’ lifestyle and
environment while only the remaining 5-10% are attributable to deterministic characteristic
such as genes (Anand et al., 2008). This first means that, by adopting the right lifestyle, people
can reduce the risk of getting affected by chronic diseases. Even more important, the same
medical papers show that, by changing inappropriate lifestyle factors, patients affected by
cancer are able to change their prognosis or lower their risk of mortality.
First, these lifestyle actions can enhance patients’ health when they are used as supportive
actions to common treatments (e.g. surgery, radiations).
Secondly, the papers go further by demonstrating that even without undergoing these
treatments, many patients affected by an early prostate cancer that are changing their
lifestyle are able to avoid invasive and costly treatments that would negatively impact their
quality of life.
13.

These findings are of high interest for a disease such as prostate cancer, where the mortality
rate is very low, and where it is often decided to “wait and see” in the case of early stage
cancer.
A study was designed by Ornish et al. to explore the effects of lifestyle adjustments on
prostate cancer patients’ prognosis (Ornish et al., 2005). The study was conducted on patients
who had decided with their physician, not to undergo a treatment and to monitor regularly
the evolution of their cancer. The participants were randomly assigned either to the
experimental group, taking part to an intensive lifestyle program (e.g. adapted diet, moderate
aerobic exercise, stress management techniques, and adherence support sessions), or to the
control group, keeping their lifestyle close to unchanged. The study followed patients during
a 1-year period and came to the following main results.
First, 6 out of 49 patients of the control group withdrew from the study to undergo a
conventional treatment while no member of the experimental group had to do so during the
study.
Second, the serum PSA decreased on average by 4% in the experimental group while it
increased on average by 6% in the control group. And the serum in the experimental group
inhibited LNCaP11 prostate cancer cell growth by 70% compared to only 9% (8 times less) in
the control group, suggesting that the changes in lifestyle affected tumor growth as well as
PSA.
The participants’ disease evolution was reviewed in a second study to see if those conclusions
could be extended to a 2-year period (Frattaroli et al., 2008). It was observed that only 5% of
the patients of the experimental group had undergone a conventional treatment after two
years compared to 27% in the control group.

Both studies show the significant benefits of lifestyle change actions in the fight against
prostate cancer. These findings can be extended to most chronic diseases since a common
feature between them is the importance of environmental and lifestyle factors in their
incidence and evolution.

11
  Lymph Node Carcinoma of the Prostate (LNCaP) – “The LNCaP cell line (Lymph Node Carcinoma of
the Prostate) is derived from human prostate adenocarcinoma cells from lymph node metastasis.”
lncap.com The growth of those cells is representative of prostate cancer development.
14.

In the case of patients suffering from cardiovascular diseases, statins have for example
demonstrated to have possible positive effects on certain patients’ risks. A study conducted
by Lytsy and Westerling for example demonstrated that patients with different risk factors,
the most rational measure of morbidity risk, tend to have a similar perception of the benefit
of their treatments. However, the satisfaction with the doctor’s treatment explanation and
the patient’s perceived health control are the most prevailing explanatory factors for these
perceived benefits. This means that the patients don’t estimate the benefits of their
treatment, a factor that is considered to positively affect the patient’s long-term compliance
to his treatment, on a rational basis (Lytsy et al., 2007).

The patients’ misconception of the lifestyle risk factor explains why many of them don’t adapt
these factors to prevent or cure their chronic disease. Patients with a detrimental lifestyle
factors should be more aware of the consequences of these, change their perception toward
the benefits of treatments and adopt new lifestyle habits to support the treatment.
At the same time, current trends towards tracking lifestyle habits (e.g. physical activity and
diet through trackers and apps) have enabled the development of effective lifestyle
monitoring hardware and software.
We thus have the tools necessary to help people realize the importance of their lifestyle on
their health and support them in adapting their lifestyle.

3. Prostate Cancer
Prostate cancer is the most common type of cancer for men (around 1 man out of 9 will be
affected during his life) (American-Cancer-Society). Although the evolution of the disease is
usually quite slow, and the mortality rate is relatively low (around 1 man out of 41) (American-
Cancer-Society) compared to other types, it is still globally responsible for the death of 1-2%
of men in the world (depending on national efforts on PSA12 screening). Prostate cancer can

12
  Prostate-Specific Antigen (PSA) – “A protein made by the prostate gland and found in the blood.
Prostate-specific antigen blood levels may be higher than normal in men who have prostate cancer,
benign prostatic hyperplasia (BPH), or infection or inflammation of the prostate gland. Also called
PSA.” NIH
15.

spread to other parts of the patient’s body, it can metastasize for example to the brain, lungs,
liver and bone tissues, leading to possible patient death.

To determine the risk of progression of a prostate cancer, it is allocated to one of the three
groups (low risk, intermediary risk, and high risk), based on the D’Amico risk classification for
prostate cancer. This classification first published by D’Amico et al. in 1998, takes three factors
into consideration. First, the clinical TNM stage13 takes into consideration the size of the main
tumor (determined through medical imaging), the number of nearby lymph nodes that have
cancer, and weather the cancer has metastasized or not. Second, the Gleason score assess
the aggressivity of the cancer based on its cellular architecture (determined through a biopsy).
Finally, the serum PSA level is determined through a blood test. The usefulness of a PSA test,
as a step in the diagnosis of prostate cancer, has been controversial. Indeed, a higher than
the norm serum PSA level can indicate to a multitude of origins such as prostate cancer but
also to prostatitis or other kinds of inflammations of the prostate. Furthermore, the test being
quite costly, it is not reimbursed anymore as a screening tool for prostate cancer in countries
such as Belgium (Mertens, 2014).
After a diagnosis is made, other factors, such as the symptomatology and the variations in
PSA level, are used to follow the evolution of the disease.

                                 Low Risk               Moderate Risk                High risk
     Primary tumor (T)             T1/2a                      T2b                      T2c/3
     PSA Value (ng/ml)              £ 10                    >10 £ 20                    >20
      Gleason score                  £6                         7                        ³8

13
  TNM staging system – “In the TNM system, each cancer is assigned a letter or number to describe
the tumor, node, and metastases: T stands for the original (primary) tumor; N stands for nodes. It tells
whether the cancer has spread to the nearby lymph nodes; M stands for metastasis. It tells whether
the cancer has spread to distant parts of the body.” American Cancer Society
16.

The main treatment options available are active surveillance, radiotherapy and
prostatectomy (surgery), which can be combined with other measures (e.g. spacer
placement, hormone therapy, chemotherapy). The treatment decision can be influenced by
many factors such as the disease’s stage of progression, the patient’s risk factors (e.g. lifestyle,
other health issues), but also patient preferences (e.g. based on treatment side effects).
While the main treatment options have demonstrated to have similar cure rates, they tend
to have very different side effects. This highlights the importance of patient involvement and
education when making a treatment choice since patients making an active and informed
choice tend to benefit more from their treatment, something we will address when focusing
on Patient Decision Aids.
However only certain patients can benefit, in the form of lesser side effects, from certain
procedures such as a spacer placing (see definition and Figure 3 below). The factors
influencing such decision are based on many variables that are difficult to simultaneously be
taken into consideration by one human being. This highlights the need for predictive models
supporting doctor-patient treatment choice, something we will also detail in another chapter.

Data from prostate cancer patients could be processed to fulfill three specific aims.
First, it could help determine if the patient should be treated or not. Not treating is, indeed,
an option which is taken by many doctors especially in the case of elderly patients.
Second, if it is decided to treat the patient, a choice of treatment has to be made mainly
between surgery and radio-therapy.
Third, if the radio-therapeutic option is favored, the doctor, together with the patient, still
have to decide if proton therapy and/or a spacer have to be administered.
Finally, we’ll explore if an “App” tool could be used to collect and process the data to follow
the patient’s health status in real-time.

A spacer is a device, usually in the form of a gel, that is injected between the rectum and the
 prostate. The space of about 1.0-1.5 cm created between these last enables to protect the
rectal wall from the high isodoses used during radiotherapy (Pinkawa, 2015). It is beneficial
                        for certain patients but has its own side-effects.
             (See explanatory video of Bio Protect: https://youtu.be/4lcrSs_4oiE)
17.

                                                              Figure     3:    T2   weighted
                                                              magnetic resonance imaging
                                                              without (left) and with (right)
                                                              a hydrogel spacer. Spacer
                                                              hyperintense, resulting in > 1
                                                              cm       separation   between
                                                              prostate        and    rectum.
                                                              (Pinkawa, 2015)

The key points to be remembered…
For diseases such as prostate cancer, where treatment decision making is based on
multiple variables in diverse formats (e.g. clinical data, medical imaging, and biopsy
analysis) and can result in multiple options (e.g. active surveillance, photon therapy,
proton therapy, surgery, spacer placement), decision making becomes highly complex.
Adding on to this complexity, new factors such as the patient’s lifestyle have shown to have
strong impacts on the incidence and evolution of these diseases.
To support patients and doctors in this process, tools can be created but their development
has been hampered by the difficulty to access large amounts of standardized data. Today,
important efforts are being made internationally and new data extraction methods are
being used to facilitate this access and enable the development of a new generation of
predictive models to support doctor and patient decision making.
18.

Part 2 – Machine learning applied to medical data: The
Virtual Patient Avatar

1. The Virtual Patient Avatar™ (VPA™)
With today’s technology, we are able to extract most of the patient’s relevant health data,
also called biomarkers14, and store them in a standardized way so that they can be processed
by computers. We are also able to identify most of the interactions between these health
factors and the way they together influence the health status of patients. These interactions
can in turn be expressed in an intelligible way for computers, which is called an ontology15.
Put together, this system can be considered as a virtual representation of an individual
patient, a digital copy of his personal dynamic health.

The interest of such virtual representation of a patient is that it is possible to virtually treat it
and see how the health status of the virtual patient evolves. The system is then capable of
determining which actions are the most favorable for the patient, depending on the
importance level assigned by the patient together with his doctor to certain variables (e.g.
occurrence of a disease, survival rate, occurrence of certain side effects)

We will call this virtual construction the Virtual Patient Avatar or VPA™ in the following
chapters of this master thesis. In short it would thus be a synthetic entity of standardized
biomarkers, linked to each other through an ontology (explanatory animation developed by
Benelux Health Ventures: https://www.youtube.com/watch?v=kbdbpWhBfg0&t=2s).

14
  Biomarkers – “The term “biomarker”, a portmanteau of “biological marker”, refers to a broad
subcategory of medical signs – that is, objective indications of medical state observed from outside
the patient – which can be measured accurately and reproducibly. Medical signs stand in contrast to
medical symptoms, which are limited to those indications of health or illness perceived by patients
themselves.” (Strimbu & Tavel, 2010)

15
  Ontology – “the core meaning within computer science is a model for describing the world that
consists of a set of types, properties, and relationship types.” “There is also generally an expectation
that there be a close resemblance between the real world and the features of the model in an
ontology.” (Lars Marius, 2004, p. 384)
19.

1.1 Key concepts
The key concepts that we will be looking at in this chapter are the main sub-elements of the
VPA™:

      Figure 4: Representation of the main building blocks of the Virtual Patient Avatar

1.1.1 Decision Support Systems (DSS)
Decision Support Systems (DSS)16 can be used in the medical environment by doctors to cope
with the increasing amount of patient data available and the increasing number and
complexity of therapeutic options in chronic disease care (see appendix 4). Clinical DSSs are,
for example, defined by Ida Sim (MD, PhD) et al. as “software that is designed to be a direct
aid to clinical decision-making, in which the characteristics of an individual patient are
matched to a computerized clinical knowledge base and patient-specific assessments or
recommendations are then presented to the clinician or the patient for a decision” (Sim et al,

16
  Decision Support System (DSS) – “A decision support system (DSS) is a computerized information
system used to support decision-making in an organization or a business. A DSS lets users sift through
and analyze massive reams of data and compile information that can be used to solve problems and
make better decisions.
The benefits of decision support systems include more informed decision-making, timely problem
solving and improved efficiency for dealing with problems with rapidly changing variables.”
Investopedia
20.

2001).
The aim of these tools is not to replace the doctors, whose knowledge, instincts and emotional
capacity are essential to making the right decisions, but the aim is to enhance the doctors
using them by supporting their judgment with actionable data. This is reflected in studies that
compared the ability of both doctors and predictive models to predict certain patients’
outcomes based on identical information. Oberije et al. studied Radiation Oncologists’ and
predictive models’ ability to forecast treatment outcomes of lung cancer patients and found
the latter more successful to do so. They concluded that “models, although not perfect, can
offer valuable assistance in clinical decision making. By choosing cut-off points based on the
model predictions we are able to define clinically-relevant low- or high-risk groups.” (Oberije
et al., 2014, p. 6). Another paper from Bennett and Hauser explored the added value of
bringing a non-disease-specific Artificial Intelligence (AI)17 framework to help clinicians make
better treatment decisions and states that “Combining autonomous AI with human clinicians
may serve as the most effective long-term path. Let humans do what they do well, and let
machines do what they do well. In the end, we may maximize the potential of both.” (C. C.
Bennett & Hauser, 2013, p. 2)
The VPA™ will include a novel type of DSS that goes even further. Indeed, some multifactorial
Decision Support Systems are being developed by players such as the D-Lab in Maastricht
(see video from Delbressine Media: https://vimeo.com/241154708). For this more evolved
type of tools, the developers seek to “integrate diverse, multimodal information (clinical,
imaging and molecular data) in a quantitative manner to provide specific clinical predictions
that accurately and robustly estimate patient outcomes as a function of the possible
decisions.” (Lambin, van Stiphout, et al., 2013, p. 1). In other words, the aim is to combine the
explanatory power of several factors to choose the most appropriate treatment or to make
the right diagnosis, this provides more precise and more accurate indications than if these
factors were taken individually into consideration. These predictive models are based on
sophisticated Artificial Intelligence (AI) algorithms that integrate large volumes of data to be
highly accurate and specific rather than using mere general population data.

17
  Artificial Intelligence (AI) – “The field of computer science dedicated to solving cognitive problems
commonly associated with human intelligence, such as learning, problem solving, and pattern
recognition.” (Amazon, 2018)
21.

The complexity of developing those multifactorial DSSs is due to the completely different
formats of the factors it has to take into consideration (e.g. genome sequence, medical
imaging, clinical data, cost data, biological data).
By providing the right treatment for the right patient and at the right dose, Multifactorial
DSSs’ potential benefits are four-fold: “Improve the quality of care, save costs and ensure
value, facilitate the reimbursement of specialized and novel treatments, and decrease risk of
medical errors while increasing protocol compliance.” (Delbressine Media, 2017)
Additionally, the DSSs that are to be integrated in the VPA™ have been externally validated
and are compliant to the TRIPOD guidelines which is a “checklist of 22 items, deemed essential
for transparent reporting of a prediction model study.” (Collins, Reitsma, Altman, & Moons,
2015)

1.1.2 Individualized Patient Decision Aids (IPDA)
Adapted DSSs can also be used by a patient when he is facing a disease with multiple
treatments that will have different implications on his Quality of Life (QoL)18, these tools are
then usually called Patient Decision Aids (PDA).
There is Evidence Level 119 that these tools have a positive impact on the patient’s
understanding of his disease and options, on his involvement in the decision process, on his
trust in the chosen treatment while he tends to have a treatment that better matches his
preferences without increasing the length of consultations (Stacey et al., 2017).

18
  Quality of Life (QoL) – “Health-related quality of life is a multi-dimensional concept that includes
domains related to physical, mental, emotional, and social functioning. It goes beyond direct measures
of population health, life expectancy, and causes of death, and focuses on the impact health status
has on quality of life. A related concept of health-related quality of life is well-being, which assesses
the positive aspects of a person’s life, such as positive emotions and life satisfaction.”
HeathyPeople.gov

19
  Evidence Level 1 – “The levels of evidence are a ranking system used to describe the strength of the
results measured in a clinical trial or research study. The design of the study (such as a case report for
an individual patient or a randomized double-blinded controlled clinical trial) and the endpoints
measured (such as survival or quality of life) affect the strength of the evidence.” NIH
“To obtain Evidence Level 1, the study must take the form of a Randomized Controlled Trial (RCT), a
study in which patients are randomly assigned to the treatment or control group and are followed
prospectively, or a meta-analysis of randomized trials with homogeneous results.” Moore, D.
22.

When using software that is based on predictive models calculating the patient’s own risk
factors (based on his own data instead of population data) such as in the VPA™, we can say
that those decision aid tools become individualized (IPDA)20, a new generation of PDAs
enabling an even more precise care. Additionally, the PDAs integrated in the VPA™ are
compliant to the International Patient Decision Aid Standards instrument (Elwyn et al., 2009).

1.1.3 App based monitoring & lifestyle actions
“Cancer is a lifestyle disease that may require lifestyle changes”. The implication of people’s
environment and lifestyle has shown to have tremendous implications on the incidence,
evolution and recurrence of chronic diseases. Studies conducted on a community called the
Seventh Day Adventists which encourages its members take on a healthy way of living through
adequate diets, exercise and the avoidance of other risk factors such as smoking, showed that
the incidence and mortality rate of chronic diseases was significantly lower in the community
(Beeson, Mills, Phillips, Andress, & Fraser, 1989; Mills, Beeson, Phillips, & Fraser, 1989).
While, during the last decade, a focus has been put on genes to explain cancer incidence, the
lifestyle and environment of the patient is estimated to account for 90-95% of chronic
diseases (Anand et al., 2008). Following the review of Anand et al., “tobacco, diet, infection,
obesity, and other factors contribute approximately 25–30%, 30–35%, 15–20%, 10–20%, and
10–15%, respectively, to the incidence of all cancer deaths in the USA”, and “almost 90% of
patients diagnosed with lung cancer are cigarette smokers” (Anand et al., 2008). Interesting
cues of the importance of these lifestyle effects on cancer occurrence can be found in the
higher prevalence of this last in western countries and immigrating populations compared to
other parts of the world (Lee, Demissie, Lu, & Rhoads, 2007).
While most prostate cancer treatments have considerable side-effects that will negatively
impact the Quality of Life (QoL) of patients, changes in lifestyle have very promising effects
on the disease’s incidence risk, the risk of mortality and even on the patient’s QoL.

20
  Individualised Patients Decision Aids (IPDA) – Tool that helps people to become more involved in
the decision-making process by providing certified medical information about their disease and the
potential treatments as well as the different side effects. Predictive models are included in the tool
providing the user with individualised statistics tailored to his personal characteristics.
You can also read