A Data Mining Approach for Sepsis Validation - CSCE'19 Online ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Int'l Conf. Artificial Intelligence | ICAI'19 | 323
A Data Mining Approach for Sepsis Validation
Venkata Mayukha Cheekati, Bhargav Pingle, and Ahmad Y. Javaid
Electrical Engineering and Computer Science Department
The University of Toledo
Toledo, OH 43606, USA
Venkatamayukha.Cheekati, Bhargav.Pingle, ahmad.javaid@utoledo.edu
Abstract— This paper aims to assist a medical doctor in morbidity. [3] studies show, each year, in the United States,
decision-making by predicting the Sepsis level. Sepsis is a around seven hundred and fifty thousand patients are prone
life-threatening and the prevailing reason for death, which to sepsis and septic shock. [4] claims Sepsis to be the 10th
occurs when a human’s immune system becomes inefficient leading cause of death in the United States, and there is
to avoid the pathogenic growth in the body. This inefficiency twenty to ninety percent range of mortality during severe
leads to severe damage to tissues of various organs of sepsis and septic shock. To reduce the effect of sepsis, it
the body, which is fatal. Early prediction of sepsis with should be identified in the initial stages. [2] affirms that
appropriate treatment helps in decreasing the mortality of during the early stages when mild sepsis prevails, and it can
a sepsis-affected patient. With the huge chunks of medical be treated with vaccinations for particular pathogens. But
data that is generated, analysis can be done to predict and if the sepsis is not detected in the earlier phase, there are
raise an alarm for the impending sepsis. A methodology is very high chances of mortality and the critically-ill patient is
presented to predict the septic-condition of a patient using admitted to the Intensive Care Unit (ICU) for more vigorous
data mining algorithms in an open source tool Weka. We treatment. To prevent such organ dysfunction and urgency
also integrated the preprocessed data with Sequential Organ of treatment, sepsis can be prevented with better healthcare
Failure Assessment (SOFA) score analysis for enhancing facilities.
the results. The organ dysfunctional trends evaluation had With increasing advancements in technology, healthcare
an improved result on integration with SOFA. For this is also changing for a superior future. With the advent of
analysis, a publicly available database called MIMIC-III computers, sensors, electronic health record systems, it made
is used. Various experiments are performed to obtain a easier for the hospitals to store a large amount of data that
better prediction model. A detailed description of the process can be studied and analyzed to predict that condition of the
followed to improve the accuracy of prediction is presented patient at a given point. For a critically ill patient affected
in this paper. The final model has achieved an accuracy of by sepsis, the data generated in ICU’s is a vast amount. It is
about 92.34%. a large size complex data difficult to be interpreted in real-
time by a human. With the help of this kind of data, a wide
Keywords: sepsis, prediction, data mining, SOFA score, MIMIC- variety of approaches have been proposed in the literature to
III predict the condition of a critically-ill ICU admitted patient
which are discussed in the literature review.
1. Introduction
Sepsis is defined as a systemic response to an infection 2. Related Work & Motivation
[1]. Sepsis may occur due to extensive use of broader septic [5] quotes that about US$14.6 billion is spent annually
antibiotics, intake of agents that can potentially suppress to treat sepsis. Systemic Inflammatory response syndrome
immune power, etc. It occurs when the barriers to host (SIRS) results if sepsis has occurred due to infection. For
invasion like skin, mucous membranes of respiratory sys- the patient to fall under SIRS criteria their heart rate is
tems, gastrointestinal systems fail. [2] states, Sepsis may ≥ 90 beats/min, temperature either ≥ 38C or ≤ 36C,
also happen due to insect bites which inject pathogens into respiratory rate (RR) ≥20 breaths/min, and white blood cell
the host body, skin burns, and breaks. Sepsis has three count (WBC) either ≥12 000 or ≤4000 cells/mm3 (or ≥10%
stages namely, sepsis, severe sepsis and septic shock. The bands). At least two of the SIRS criteria should be present in
severity of the condition increases with each stage. During a patient to be diagnosed with sepsis. In case there is induced
sepsis, septic shock occurs when there is a steep fall in hypotension, i.e., systolic blood pressure is ≤ 90mmHg,
the blood pressure of the body which leads to damage to 40mm Hg below baseline or Mean Arterial Pressure (MAP)
other body organs like the heart, respiratory system, etc. ≤ 70mm Hg, septic shock occurs.
These phenomena are very dangerous and fatal to human Many methods have been proposed to predict the mortality
life. Higher the sepsis severity, higher the mortality and of a person in the ICU in literature to prevent the patient to
1
ISBN: 1-60132-501-0, CSREA Press ©324 Int'l Conf. Artificial Intelligence | ICAI'19 |
fall ill due to sepsis. [6], [7] shows that one of the many Data partitioning, Feature construction, Classifier training,
ways to quantify the organ failure is calculating the Se- Classifier testing.
quential Organ Failure Assessment (SOFA) score. This was The Insight technique approach is superior to scores
introduced in 1994. The SOFA score presents the number like SOFA, qSOFA, etc. Insight is also shown by these
of organs failed (which is termed as âĂIJsepsisâĂİ) and the experiments to be relatively resistant to performance loss
severity of the failure of the organ. This score is calculated from reduced measurement availability. Insight performs
based on various levels of the respiratory system, nervous superior even in input dropout cases. Although this cannot
system (Glasgow coma scale), cardiovascular system, liver, be computed manually like conventional scores, it is a
renal failure, and coagulation. very good alternative to other scores in an improving and
These values are considered during the 24 hours of stay increasing EHR hospital environment. âĂIJInsightâĂİ is
cycles in the ICU. The SOFA score gives a range of 0-4 a dynamic and active learning approach which is trained
for each of the failures mentioned above. To get the SOFA by older data to predict better. This can help health-care
score, the condition should be treated to be a continuous providers assess the sepsis condition almost accurately to
process instead of treating it as an event, i.e., the assessment minimize potential harm to a patient.
should be based on a time-scale instead of an abrupt event [18] developed a better efficient tool to predict patient
saying yes/no. The various ranges of SOFA scores give stability called Auto Triage is developed and described in
us the anomalies of an organ/system which is suitable for January 2017. This tool claims its accuracy to be 92.9%
the Contextual Anomaly Detection. Higher the SOFA score with 94.5% specificity indicating its highly sensitive nature
higher the mortality chances. The initial, highest, and mean towards the data. The focus in our work is mainly on
SOFA scores correlated well with mortality. Initial and manipulating the data obtained from the ICU for better
highest scores of more than 11 or mean scores of more than prediction. These scoring systems assist the clinician in
5 are equivalent to mortality of more than 80%. estimating the damage occurred and augments the ability to
Apart from the SOFA, quick SOFA (qSOFA) is also serve the sepsis patient better. From the literature review,
introduced in February 2016. But it does not take as many it is evident that there has been a lot of improvement
parameters and gives only a rough idea of a patient's condi- in prediction accuracy using machine learning techniques
tion if it's good or bad. This score uses only blood pressure like TREWScore, InSight, and Auto Triage. But there has
(0/ 1), respiration rate (0/1) and Glasgow coma scale (0/1) been no significant research in data mining techniques. [19]
( [8]).It will not continuously evaluate the parameters but study on data mining on ICU patient deterioration prediction
rather gives an index of zero or one for the levels. Since there done, suggested the features that most affect the accurate
are only 3 parameters, score ranges from 0-3. If qSOFA≥2, prediction. So, with the help of those attributes further
the mortality rate is very high. qSOFA can be performed at research is carried out on improving the accuracy of sepsis
the time of emergencies because it's fast enough and during prediction.
emergencies, the values of these parameters are usually The forthcoming sections of the paper help in under-
extreme. qSOFA serves as a screening mechanism to prompt standing the process followed in improving the accuracy
the clinician to further investigate for sepsis or to transfer to of the model. Section 3 describes the dataset used and the
a higher level of care. attributes to predict sepsis. Various classification algorithms
Like SOFA and qSOFA scores, other scores like Modi- are also described in Section 4. All the types of experiments
fied Early warning score (MEWS) ( [9]), simplified acute done in the way to achieve a higher prediction probability
physiological score (SAPS II) ( [10]), Vital Pac Early are explained in Section 5. Section 6 discusses the results
Warning Score (ViEWS) ( [11]), Simple Clinical Score ( obtained from the mentioned experiments.
[12]), Mortality in Emergency Department Sepsis (MEDS)
( [13]), Acute Physiology and Chronic Health Evaluation II 3. Data collection & pre-processing
(APACHE II) ( [14]), Rapid Emergency Medicine Score (
[15]) are proposed. [16] proposed a targeted real-time early Our research is done based on the Medical Information
warning score (TREWScore) which has a specificity of 67% Mart for Intensive Care -III (MIMIC III) database. [20], [21]
with 85% sensitivity. [17], unlike SOFA, qSOFA and other contributions made it available for free with the details of
prediction of sepsis-like MEWS etc., in September 2016 over forty thousand de-identified patients who were treated
developed a new Machine learning technique named ĂIJIn- between 2001 and 2012 in the ICUs of Beth Israel Dea-
sightĂİ. This technique is different from the older scores coness Medical Center. This work is done with the help of
because it shows and analyses the trends of the patient’s grants from National Institute of Biomedical Imaging and
organ failure data instead of just classifying them.Using this Bioengineering (NIBIB) and National Institutes of Health
method, instead of using fixed parameters for evaluation (NIH). MIMIC-III has a diverse range of data high temporal
once can include any other parameter if needed. The training resolution data which includes lab results, bedside monitor
and testing process for Insight technique has 4 stages: trends, and waveforms, and electronic documentation.
2
ISBN: 1-60132-501-0, CSREA Press ©Int'l Conf. Artificial Intelligence | ICAI'19 | 325
In our work, we mainly concentrate on a few tables 6) Glucose, ItemId = 51478
namely, admissions which have the patient age, primary From d_items:
diagnosis results, admit time and discharge/ death time 1) Partial Pressure of Oxygen (PaO2), ItemId = 490
(if any); d_labitems, the table which has all the names/ 2) Fraction of Inhaled Oxygen, (FiO2) ItemId = 3420
definitions of laboratory items that can be measured in 3) Platelets, ItemId = 225170
ICU; d_items table which has definitions of items in ICU 4) Bilirubin, ItemId = 225690
databases; Chart events the table which consists of the value 5) Glasgow coma score Total, ItemId = 19
of the measured item and the time at which the measurement 6) Heart Rate, ItemId = 211
was recorded. With the help of these tables, an approach is 7) Mean Arterial Pressure (MAP), ItemId = 438
built to predict the condition of sepsis patient. As described 8) Blood Pressure, ItemId = 52
by the MIMIC-III, we have accessed the database through 9) Temperature, Itemid = 223761
the PostgreSQL server.
With the help of the above-defined fifteen attributes and
3.1 Gold standard the proposed gold standard, datasets are created, and pre-
[22] studies are helpful in identifying the patients with diction and validation tests are carried out for various cases
sepsis with the ICD9 code of 995.9. After selection of the using Weka.
patients in the retrospective dataset for inclusion, each of
the patients underwent a binary classification process to 4. Classification algorithms
designate them as positive or negative for having acquired From the collection of algorithms present in Weka, the
in-hospital sepsis. The classification was made based on classification algorithms used in our approach are described.
the patient meeting both of the following criteria: (1) The 1) ZeroR: [25] studies show that it is a simple clas-
patient record contains an ICD9 code (995.9) indicating in- sification algorithm which does not consider any of
hospital contraction of sepsis (2) The patient meets the 1991 its predictors as explained. It only predicts the ma-
Systemic Inflammatory Response Syndrome (SIRS) criteria jor category. Although ZeroR does not give a pre-
for sepsis for a persistent 5âĂŘhour period. The beginning diction results based on predictors, it serves as a
of the patient’s first 5âĂŘhour SIRS event is defined as the baseline/benchmark to compare the efficiency of other
zero hour. classification algorithms.
3.2 Attribute Selection 2) Naive Bayesian: [25] explains that this classifier is
built on the Bayes theorem of probability with the
The d_items and d_labitems has an exhaustive list of the assumptions of independent predictors. It is easy to
measurements taken in the ICU. But considering all the build a Bayesian model due to its independence of
attributes gives less accuracy. [19] proposed to use feature attributes and improves the ease of classification of
selection. So only limited attributes called test attributes very large datasets.
are selected by using the technique of feature selection. 3) J48: As studied by [26], this algorithm is a decision
The feature selection is carried out in an open source data tree which is based on Iterative dichotomiser 3 (ID3)
mining tool by name Weka introduced in [23]. Weka is a algorithm. The decision tree consists of decision nodes
collection of data mining and machine learning algorithms and leaves. When this algorithm is applied, the dataset
for preprocessing, classification and regression analysis of is broken down into smaller and smaller subsets asso-
data. The algorithm can also be loaded with a custom- ciations are developed depth-wise. Information gain
designed algorithm, which can be implemented using Java. and entropy are the driving factors of this algorithm.
[24] helped us with feature selection and we use su- Entropy is the degree of heterogeneity in the data and
pervised.attribute. InfoGainAttributeEval. This filter is a Information gain is based on the decrease in entropy
wrapper for the Weka class that computes the information after the split in the tree.
gain on a class. 4) Random Tree: According to [27], Random Treeś pre-
• Attribute Subset Evaluator: InfoGainAttributeEval dictions depend on the random vectors which are sam-
• Search Method: Ranker pled independently. K-attributes are chosen at random,
• Evaluation mode: evaluate all training data and several splits are made in the tree based on the
The attributes selected for prediction are: selected K attributes and the best split is chosen.
From d_labitems: 5) Sequential minimal optimization (SMO): [28] de-
1) Urea Nitrogen, ItemId = 50851 scribes that SMO is a way of solving Support Vector
2) Creatinine, ItemId = 50912 Machine (SVM) training problem which is more effi-
3) INR, ItemId = 51237 cient than conventional quadratic programming solver.
4) Partial Thromboplastin Time (PTT) = 51275 By default, normalization is done by this algorithm.
5) Prothrombin Time (PT), ItemId = 51274 This algorithm uses the SVM output; converts them
3
ISBN: 1-60132-501-0, CSREA Press ©326 Int'l Conf. Artificial Intelligence | ICAI'19 |
into probabilities based on its heuristics and applies a In this experiment, we use SOFA score analysis attributes
standard sigmoid function to optimize before classify- for the dataset construction. The attributes are:
ing the data. 1) Partial pressure of Oxygen / Fraction of Inhaled Oxy-
gen (PaO2 / FiO2)
5. Experiments 2) Platelet Count
Using the tables from the MIMIC-III dataset, a shortened 3) Bilirubin
table is built with the test attributes proposed in Section 4) Glasgow coma scale
2. Attributes like the patient Id, patient age, gender, 5) Creatinine
admit time in ICU, discharge from ICU time or death 6) Hypotension
time if the patient is dead, the status of the patient Except for hypotension, all the above attributes are
(dead/alive) are also incorporated into the final shortened available in MIMIC III, which is calculated from Mean
table. Several experiments are conducted by building the Arterial Pressure (MAP). If MAP is greater than 1
datasets in various ways. The manipulations of the dataset and the person is on vasopressors, then hypotension is
presented below are achieved using PostgreSQL and Python. calculated from Dopamine, Dobutamine, Epinephrine,
Norepinephrine [30].The dataset is built by assigning the
Experiment 1:The first dataset was built by taking the sofa scores on to the above-mentioned attributes based on
average of all the test result of each unrepeated attribute an hourly binned average. So, all the columns are now filled
from d_items and d_labitems for each patient i.e. one with SOFA scores instead of the actual values of the dataset.
patient would comprise of a row of attributes of patient
details mentioned above and attributes taken from d_items Experiment 6:
and d_labitems as shown in Table 1. If the attribute is not The experiment uses a SOFA score analysis on the
present in the person’s record then it is flagged zero. If the type of dataset used in experiment 2. For the experiment
person is dead the the “Dead?" column is flagged as 1. If 2, the dataset which has fifteen test attributes averages,
he is alive, then the column is set as “0". SOFA score attributes are added. Some of the SOFA scores
attributes already exist in the test attributes. All such
Experiment 2: attributes which can be given SOFA scores are assigned
Instead of taking the entire set of attributes from the with the scores and the remaining test attributes are given
MIMIC III dataset, the test attributes mentioned in Section their regular values. In this dataset, the SOFA scores are
2 are taken to build the dataset. Cutting down the features assigned to attributes of the average data which does not
employed in training an algorithm is called Dimensionality have any kind of hourly for 5-mins bins.
reduction. [29] study stated that this technique improves
accuracy greatly. A patient still has a single of the row as Experiment 7:
in Table 1 but only with 10 test attributes. In this experiment, SOFA scores are applied to experiment
4 dataset in the similar fashion of experiment 6. But there
Experiment 3: are hourly and 5 min bins in this dataset. Also, the SOFA
Instead of taking the average for each attribute, the score column is added as an additional attribute along with
dataset in this experiment was built by taking the binning other attributes. In this kind of dataset, we can clearly see
the average of the test attributes for every hour from the how SOFA scores vary just before the patient’s death due to
admits time to discharge/death time. So, in this dataset, a sepsis.
person will have rows equal to the number of hours spent
in the hospital.
6. Results
Experiment 4: The datasets constructed in the above-mentioned experi-
Another kind of dataset is constructed by binning the test ments are evaluated using the data mining tool Weka using
attributes for each hour when the difference between the the classifiers mentioned in Section 4. The accuracy and
attribute at ti and ti+1 is negligible but binning all the test other metrics of each experiment with respect to each clas-
attributes average for a 5-minute span if the difference in sifier are presented in Table 2, Figure 1, Figure 2, Figure 3.
the attribute value at ti and ti+1 is significant and change All the seven experiments are carried out with 10 cross-
in the value affects the patient’s health. This experiment is validation folds with the default settings for each classifier
done with a hypothesis that the model might understand algorithm.
the data to predict better when such significant changes are For experiment 1, the highest accuracy obtained is only
considered. 73.53% from Random tree algorithm. Naive Bayes classifi-
cation fell even below the baseline. Due to the consideration
Experiment 5: of all the existing features from both d_items and d_labitems,
4
ISBN: 1-60132-501-0, CSREA Press ©Int'l Conf. Artificial Intelligence | ICAI'19 | 327
Table 1: Sample dataset
Patient ID Avg(Heart rate) Avg(Attribute 2 Avg(Attribute 3) ..... Avg(Attribute 1000) Dead?
10185 81 10 1.2 .... 0 1
10185 78 11 1.9 .... 0 1
10124 73 15 0.5 .... 0.4 0
Table 2: Experiment wise accuracy
Algorithm Experiment 1 Experiment 2 Experiment 3 Experiment 4 Experiment 5 Experiment 6 Experiment 6
ZeroR 71.36 71.36 74.85 73.43 74.85 71.36 73.43
J48 72.31 76.78 81.67 77.86 87.60 77.18 92.34
Random Tree 73.53 76.53 77.43 76.40 79.17 75.63 87.95
SMO 71.95 72.12 73.59 75.79 81.52 74.68 82.59
Naive Bayes 56.93 67.91 75.39 75.14 82.19 70.12 86.13
Fig. 1: Experiment wise accuracy
Fig. 2: Experiment wise precision
5
ISBN: 1-60132-501-0, CSREA Press ©328 Int'l Conf. Artificial Intelligence | ICAI'19 |
the classifier algorithms couldn’t predict the status of a the usage of this model. This higher accuracy is obtained
patient based on all the available attributes and hence the by employing methods like dimensionality reduction, data
accuracy is low. [31] research results aligns with this result binning with respect to time, integrating SOFA score. [19]
that shows Random Trees handles larger data better. studies show that accuracy achieved by them using similar
For experiment 2, there is a little improvement in classifier algorithms in Weka on the MIMIC II dataset is
accuracy to 76.78% which is obtained by using J48 77.58%. Our result can be considered as an improvement
decision tree algorithm. This improvement in percentage over that study.
is due to lesser number of features used to predict which Future work includes improving the accuracy of the early
reduced the number of classifications to be done when prediction of sepsis by developing a model using data mining
compared to the previous experiment where a very large with machine learning. Data mining helps in pre-processing
number of classifications happen due to a higher number of the data while techniques like neural networks in machine
attributes. learning help in advance prediction. The final goal is to
develop a Sepsis Assessment Module. This module devel-
The accuracy of prediction for experiment 3 improved oped during this project should assist healthcare providers
because of the hourly binning of data. This enables the in identifying patients at risk of sepsis. This will provide
system to understand the data of each attribute in a time span a potential mechanism for earlier therapeutic interventions
of every hour and classify the data in a much better way. The that may avoid or mitigate against patient harm.
highest accuracy is obtained from the usage of J48 classifier
which is 81.67%. For experiment 4, we expected that the
accuracy might improve because the algorithm can learn References
better in different instances. In contradiction, the accuracy
decreased. The highest accuracy obtained in experiment 4 [1] Roger C Bone, Robert A Balk, Frank B Cerra, R Phillip Dellinger,
is 77.86% with the J48 classifier. The decrease in accuracy Alan M Fein, William AKnaus, Roland MH Schein, and William J
Sibbald. Definitions for sepsis and organ failureand guidelines for
shows that the system was unable to understand the patterns the use of innovative therapies in sepsis.Chest, 101(6):1644âĂŞ1655,
to achieve better classification with the unequal intervals of 1992.
time (i.e. some data is for a one hour time span and some [2] Andrew Lever and Iain Mackenzie. Sepsis: definition, epidemiology,
and diagnosis.Bmj,335(7625):879âĂŞ883, 2007.
data is for a 5-min time span). [3] Lioudmila V Karnatovskaia and Emir Festic. Sepsis: A review for the
With the dataset, in experiment 5, the accuracy has neurohospitalist.TheNeurohospitalist, 2(4):144âĂŞ153, 2012.
improved significantly. On the integration of SOFA score [4] Mani Chopra and Avadhesh C Sharma. Distinct cardiodynamic and
with the hourly average dataset only for certain attributes, molecular characteris-tics during early and late stages of sepsis-
induced myocardial dysfunction.Life sciences,81(4):306âĂŞ316,
the improved accuracy is 87.60% achieved by J48 algorithm. 2007.
The accuracy improved because it is easier for the algorithm [5] Eren Gultepe, Jeffrey P Green, Hien Nguyen, Jason Adams, Timothy
to classify a bunch of 0’s, 1’s, 2’s, 3’s and 4’s instead of Albertson, and Ilias Tagkopoulos. From vital signs to clinical out-
comes for patients with sepsis: a machine learning basis for a clinical
numerous real values. The accuracy of experiment 6 has decision support system. Journal of the American Medical Informatics
fallen below experiment 5. The highest accuracy is 77.18% Association, 21(2):315âĂŞ325, 2013.
which is lower than the experiment 5 highest accuracy but [6] Alan E Jones, Stephen Trzeciak, and Jeffrey A Kline. The sequential
organ failure as-sessment score for predicting outcome in patients with
more than experiment 2 highest accuracy. The experiment severe sepsis and evidence of hypoperfusion at the time of emergency
2 doesn’t have SOFA score analysis and so has lower department presentation.Critical care medicine,37(5):1649, 2009.
accuracy. But the experiment 5 has both hourly bins along [7] J-L Vincent, Rui Moreno, Jukka Takala, Sheila Willatts, Arnaldo
De Mendon Ìğca, HajoBruining, CK Reinhart, PeterM Suter, and
with SOFA score analysis thereby having a higher accuracy. LG Thijs. The sofa (sepsis-related organfailure assessment) score to
So, to predict a patient’s situation using this model, if he describe organ dysfunction/failure, 1996.
is dead/alive, this model can give an 87 percent accurate [8] Eamon P Raith, Andrew A Udy, Michael Bailey, Steven McGloughlin,
prediction. Christopher MacIsaac, Rinaldo Bellomo, and David V Pilcher. Prog-
nostic accuracy of the sofa score, sirs criteria, and qsofa score for
The dataset in experiment 7 has been trained so well that in-hospital mortality among adults with suspected infection admitted
its accuracy has significantly improved to 92.34% classified to the intensive care unit. Jama, 317(3):290âĂŞ300, 2017.
by J48 algorithm. This was a little unexpected as the [9] VC Burch, G Tarr, and C Morroni. Modified early warning score
predicts the need forhospital admission and inhospital mortal-
experiment 4 accuracy decreased. For the same reason for ity.Emergency Medicine Journal, 25(10):674âĂŞ678, 2008.
classifying in a range of 0 to 4 the accuracy has improved. [10] Jean-Roger Le Gall, Stanley Lemeshow, and Fabienne Saulnier. A new
simplified acutephysiology score (saps ii) based on a european/north
7. Conclusion and future work american multicenter study.Jama,270(24):2957âĂŞ2963, 1993.
[11] David R Prytherch, Gary B Smith, Paul E Schmidt, and Peter I Feath-
In this paper, an improved model of sepsis prediction erstone.ViewsËEatowards
˛ a national early warning score for detecting
using data mining is presented. All the experiments that adult inpatient deterioration.Resuscitation, 81(8):932âĂŞ937, 2010.
[12] J Kellett and B Deane. The simple clinical score predicts mortality
were conducted and the accuracies of each experiment are for 30 days afteradmission to an acute medical unit.Journal of the
discussed. The accuracy of 92.34% can be obtained on Association of Physicians, 99(11):771âĂŞ781, 2006.
6
ISBN: 1-60132-501-0, CSREA Press ©Int'l Conf. Artificial Intelligence | ICAI'19 | 329
Fig. 3: Experiment wise recall
[13] Nathan I Shapiro, Richard E Wolfe, Richard B Moore, Eric Smith, logical estimates.Critical Care, 20(1):313, 2016.
Elizabeth Burdick, andDavid W Bates. Mortality in emergency depart- [23] Geoffrey Holmes, Andrew Donkin, and Ian H Witten. Weka: A
ment sepsis (meds) score: a prospectivelyderived and validated clinical machine learning workbench. In Intelligent Information Systems,
prediction rule.Critical care medicine, 31(3):670âĂŞ675, 2003. 1994. Proceedings of the 1994 Second Australian and New Zealand
[14] William A Knaus, Jack E Zimmerman, Douglas P Wagner, Eliz- Conference on, pages 357âĂŞ361. IEEE, 1994.
abeth A Draper, and Diane ELawrence. Apache-acute physiology [24] Lei Yu and Huan Liu. Feature selection for high-dimensional data: A
and chronic health evaluation: a physiologicallybased classification fast correlation-basedfilter solution. InProceedings of the 20th interna-
system.Critical care medicine, 9(8):591âĂŞ597, 1981. tional conference on machine learning(ICML-03), pages 856âĂŞ863,
[15] Thomas Olsson and Lars Lind. Comparison of the rapid emergency 2003.
medicine score andapache ii in nonsurgical emergency department simple bayesian classifierunder zero-one loss.Machine learning, 29(2-
patients.Academic emergency medicine,10(10):1040âĂŞ1048, 2003. 3):103âĂŞ130, 1997.
[16] Katharine E Henry, David N Hager, Peter J Pronovost, and Suchi [26] Neeraj Bhargava, Girja Sharma, Ritu Bhargava, and Manish Mathuria.
Saria. A targeted realtime early warning score (trewscore) for septic Decision tree anal-ysis on j48 algorithm for data mining.Proceedings
shock. Science translational medicine, 7 (299):299ra122âĂŞ299ra122, of International Journal of AdvancedResearch in Computer Science
2015. and Software Engineering, 3(6), 2013.
[17] Thomas Desautels, Jacob Calvert, Jana Hoffman, Melissa Jay, Yaniv [27] Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin Leyton-
Kerem, Lisa Shieh, David Shimabukuro, Uli Chettipally, Mitchell D Brown. Auto-weka: Combined selection and hyperparameter opti-
Feldman, Chris Barton, et al. Prediction of sepsis in the intensive care mization of classification algorithms. In Proceedings of the 19th ACM
unit with minimal electronic health record data: a machine learning SIGKDD international conference on Knowledge discovery and data
approach. JMIR medical informatics, 4(3), 2016. mining, pages 847âĂŞ855. ACM, 2013.
[18] Jacob S Calvert, Daniel A Price, Christopher W Barton, Uli K [28] Lei Chen, Chen Chu, and Kaiyan Feng. Predicting the types of
Chettipally, and RitankarDas. Discharge recommendation based on metabolic pathway of compounds using molecular fragments and
a novel technique of homeostatic analysis.Journal of the American sequential minimal optimization. Combinatorial Chemistry High
Medical Informatics Association, 24(1):24âĂŞ29, 2016. Throughput Screening, 19(2):136âĂŞ143, 2016.
[19] Noura Alnuaimi, Mohammad M Masud, and Farhan Mohammed. Icu [29] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl.
patient deteriorationprediction: a data-mining approach.arXiv preprint Application of dimen-sionality reduction in recommender system-a
arXiv:1511.06910, 2015. case study. Technical report, MinnesotaUniv Minneapolis Dept of
[20] Alistair EW Johnson, Tom J Pollard, Lu Shen, H Lehman Li-wei, Computer Science, 2000.
Mengling Feng, Moham-mad Ghassemi, Benjamin Moody, Peter [30] Flavio Lopes Ferreira, Daliana Peres Bota, Annette Bross, Chris-
Szolovits, Leo Anthony Celi, and Roger G Mark.Mimic-iii, a freely tian MÂt’elot, and Jean-Louis Vincent. Serial evaluation of the
accessible critical care database.Scientific data, 3:160035, 2016. sofa score to predict outcome in critically ill patients. Jama,
[21] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, 286(14):1754âĂŞ1758, 2001.
Plamen Ch Ivanov,Roger G Mark, Joseph E Mietus, George B Moody, [31] Jo ÌČao Maroco, Dina Silva, Ana Rodrigues, Manuela Guerreiro,
Chung-Kang Peng, and H EugeneStanley. Physiobank, physiotoolkit, Isabel Santana, and Alexan-dre de Mendon Ìğca. Data mining methods
and physionet.Circulation, 101(23):e215âĂŞe220, 2000. in the prediction of dementia: A real-datacomparison of the accu-
[22] C Bouza, T Lopez-Cuadrado, and JM Amate-Blanco. Use of explicit racy, sensitivity and specificity of linear discriminant analysis,logistic
icd9-cm codes toidentify adult severe sepsis: impacts on epidemio- regression, neural networks, support vector machines, classification
[25] Pedro Domingos and Michael Pazzani. On the optimality of the trees and ran-dom forests.BMC research notes, 4(1):299, 2011.
7
ISBN: 1-60132-501-0, CSREA Press ©You can also read