Document Type : Original Research

Authors

1 PhD, Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

2 MD, Hematopoietic Stem Cell Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran

3 PhD, Department of Computer Science, Sama Technical and Vocational Training College, Tehran Branch (Tehran), Islamic Azad University (IAU), Tehran, Iran

4 PhD, Hematopoietic Stem Cell Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran

10.31661/jbpe.v0i0.2012-1244

Abstract

Background: Acute graft-versus-host disease (aGvHD) is a complex and often multisystem disease that causes morbidity and mortality in 35% of patients receiving allogeneic hematopoietic stem cell transplantation (AHSCT).
Objective: This study aimed to implement a Clinical Decision Support System (CDSS) for predicting aGvHD following AHSCT on the transplantation day.
Material and Methods: In this developmental study, the data of 182 patients with 31 attributes, which referred to Taleghani Hospital Tehran, Iran during 2009–2017, were analyzed by machine learning (ML) algorithms which included XGBClassifier, HistGradientBoostingClassifier, AdaBoostClassifier, and RandomForestClassifier. The criteria measurement used to evaluate these algorithms included accuracy, sensitivity, and specificity. Using the machine learning developed model, a CDSS was implemented. The performance of the CDSS was evaluated by Cohen’s Kappa coefficient.
Results: Of the 31 included variables, albumin, uric acid, C-reactive protein, donor age, platelet, lactate Dehydrogenase, and Hemoglobin were identified as the most important predictors. The two algorithms XGBClassifier and HistGradientBoostingClassifier with an average accuracy of 90.70%, sensitivity of 92.5%, and specificity of 89.13% were selected as the most appropriate ML models for predicting aGvHD. The agreement between CDSS prediction and patient outcome was 92%.
Conclusion: ML methods can reliably predict the likelihood of aGvHD at the time of transplantation. These methods can help us to limit the number of risk factors to those that have significant effects on the outcome. However, their performance is heavily dependent on selecting the appropriate methods and algorithms. The next generations of CDSS may use more and more machine learning approaches.

Keywords

Introduction

Acute graft-versus-host disease (aGvHD) is a complex and often multisystem disease that causes morbidity and mortality in 35%-50% of patients receiving allogeneic hematopoietic stem cell transplantation (AHSCT) [ 1 ]. On the first 100 days after transplantation, donor T cells invade the host tissue and lead to dysfunction of the skin, gastrointestinal tract, and liver [ 1 - 4 ]. Given that it occurs at the stage of severe tissue damage, its diagnosis is late [ 5 ].

In recent years, biomarkers related to aGvHD have been considered as a tool in predicting the occurrence [ 5 ]. But the multiplicity of these biomarkers and the complexity of the various factors, contributing to the disease have made an accurate quick decision difficult. Besides, in previous studies [ 6 - 8 ]. The analyses performed on these biomarkers was univariate using classical statistical methods [ 9 - 11 ].

Since the 1960s, medical informatics experts have become interested in using clinical decision support systems (CDSS) to classify patient outcomes, reduce health-care costs, and alert physicians about the potential for dangerous medication interactions, resulting in the improvement of physicians’ diagnostic process, and provide diagnostic suggestions, and also increase safety and quality of patient care [ 12 - 15 ].

CDSS is defined as “a computer system designed to impact clinician decision-making about individual patients at the point in time that these decisions are made” [ 13 ]. CDSSs are divided into two categories of knowledge-based and non-knowledge-based [ 13 , 14 ]. In the knowledge-based type, the goal is to build a system that can simulate human thinking. These types of CDSSs use the knowledge as a rule or set of if-then rules in which they are specifically coded in clinical practice guidelines (CPG). Whereas, non-knowledge-based CDSSs use machine learning (ML) algorithms to extract knowledge [ 13 ].

Machine learning (ML) is a subset of artificial intelligence (AI) in which the algorithms, executing the prediction process extract the necessary knowledge from past experiences and/or find patterns in data [ 16 - 18 ]. ML is any process in which an algorithm is improved or “trained” by performing repetition on a training dataset to perform a task, usually a classification or identification [ 16 , 19 ]. The trained algorithms can then be evaluated by measuring its performance based on the test dataset [ 17 , 19 , 20 ].

There are several learning methods in ML, one of the most widely used and the popular of which is supervised learning. The goal of a supervised learning algorithm is to use the dataset to produce a model, taking a feature vector x as an input and outputting information, resulting in deduction if the label for this feature vector [ 20 ].

The two major types of supervised learning are classification and regression. Examples of classification are ensemble methods, K-nearest neighbors, support vector machine, decision trees, random forest, neural networks and so on. Regression examples are linear regression and logistic regression [ 17 , 19 - 21 ].

The ensemble is an ML concept in which the idea is to train multiple models using the same learning algorithm [ 22 ]. Ensemble algorithms are divided into two main types, including boosting and bagging. Ensemble methods include algorithms such as eXtreme Gradient Boosting classifier (XGBClassifier), AdaBoost classifier (AdaBoostClassifier), Histogram-based Gradient Boosting Classification Tree (HistGradientBoostingClassifier), and Random Forest classifier [ 20 ].

The XGBClassifier is a highly adaptable algorithm, working in most classifications. Boosting is a method, seeking to create a strong classifier based on weak classifiers. Weak and strong classification models mention to the correlation of outputs and actual class. By appending classifiers on top of each other iteratively, the next classifier can modify the errors of the previous one. This process is recurred until the training data set is accurately predicted [ 23 ].

The HistGradientBoostingClassifier has support for missing values. During training, the tree grower learns at each split point whether samples with missing values should go to the left or right child, based on the potential gain [ 19 ].

An AdaBoost classifier (AdaBoostClassifier) is one of the most popular algorithms for building robust classifiers with linear combinations of member classifiers. The member classifiers are chosen to minimize the errors in each iteration during the training process [ 24 ].

RandomForestClassifier synthesizes several randomized decision trees and gathers their predictions by averaging. In settings where the number of variables is much greater than the number of observations, this method has shown excellent performance [ 25 ].

Pre-occurrence prediction by these algorithms helps physicians to identify high-risk patients and reduce health care costs by performing time-consuming treatment interventions [ 26 ].

Previous studies have shown that neural network algorithms, support vector machine, naive bayes, K-nearest neighbors, regression, decision trees, and ensemble methods have been used to predict aGvHD [ 27 ]. Although in recent years the decision trees and ensemble methods have been given more attention for predicting aGvHD, there is no evidence that these algorithms are successfully used in the clinical setting [ 28 - 30 ]. Therefore, this study aimed to design, implement, and validate a clinical decision support system using ensemble methods to predict aGvHD following AHSCT on transplant day.

Material and Methods

Data Source, Study Roadmap, and Tools

In a developmental study, 31 variables [ 27 ] (which were classified into two groups: base-line and biomarker), which could potentially affect the transplantation outcome, were gathered on the day of transplantation from 190 patients who received AHSCT in Taleghani Hospital, Tehran, Iran, from 2009 to 2017. Then the CDSS was designed and implemented using Python programming language in four stages as pre-processing, learning, evaluation, and CDSS implementation as is shown in the below roadmap diagram (Figure 1).

Figure 1. Roadmap for building clinical decision support systems based on machine learning.

Pre-processing

Imputing missing value

In this phase, the raw data were imputed using the following two processes:

1- Records and variables with missing values (greater than 50%) were excluded from the dataset.

2- The missing values of continuous and discrete variables were replaced separately with mean and mode in each class, respectively.

Under-sampling

Under-sampling methods normalize the distribution of all classes by decreasing the number of majority class records in the imbalanced dataset [ 31 ]. An imbalanced class distribution will have one or more classes with few samples (the minority classes) and one or more classes with many samples (the majority classes). In this study, the RandomUnderSampler method was used to decrease the number of majority class records [ 32 ]. RandomUnderSampler is a fast and simple method to balance the patient’s dataset by randomly choosing a subset of data for the targeted classes.

Data Splitting

In this phase, patients’ datasets were divided into training and testing sets with a ratio of 70% and 30%, respectively.

Feature Scaling

In this phase, the training and test data sets were scaled separately using the normalizing method (Equation 1) [ 32 ]. The numerical values of the data set are between zero and one.

1) Xnormalized = (X-Xmin)/(Xmax-Xmin)

Feature Selection

In this phase, the Boruta algorithm, which is a type of wrapper method feature selection, was used to select the most important predictors of aGvHD prediction. This method, using RandomForestClassifier algorithm, identifies important features of the dataset as unbiased and stable [ 33 ].

Learning

Hyperparameters are parameters, governing the learning process, but they are not the part of the learning process. Besides, they have a great impact on the performance and results of modeling ML algorithms [ 34 ]. Adjustment of these hyperparameters is considered as an optimization problem and their search is usually done manually using methods such as randomized parameter optimization with k-fold cross-validation method (RandomizedSearchCV) [ 19 , 35 ]. In the present study, the RandomizedSearchCV method was used to optimize the hyperparameters of four ML algorithms, including XGBClassifier, HistGradientBoostingClassifier, AdaBoostClassifier, and RandomForestClassifier.

Evaluation

After modeling the ML algorithms, their performance was evaluated using the accuracy, sensitivity, specificity, F-measure, and AUC (area under the curve) criteria (Equations 2 to 5) [ 36 ].

2) Accuracy = TP + TN TP + TN + FP + FN

3) Sensitivity (TPR) = TP TP + FN

4) Specificity (TNR) = TN TN + FP

6) = F - measure = 2 × TP 2 × TP + FP + FN

A ROC chart is defined by false positive rate (FPR) and true positive rate (TPR) as x and y axes, respectively, depicting relative trade-offs between true positive (TP) and false-positive (FP) [ 36 ].

Where TP is the number of actual patients, predicted correctly to have aGvHD. TN is the number of non-patients, predicted correctly not to have aGvHD. False-positive (FP) is the number of non-patients, predicted incorrectly to have aGvHD, and false-negative (FN) is the number of patients, predicted incorrectly not to have aGvHD [ 36 ].

CDSS

After selecting the most appropriate ML models, a CDSS was designed and implemented using Python programming language and MySQL database management system (Figure 1 Part D).

Then performance of CDSS was evaluated by calculating the agreement between CDSS prediction and the actual patient outcome after 100 days of transplantation, using Cohen’s Kappa coefficient and transplant data 30 patients, receiving AHSCT in 2018 [ 37 ].

Results

a. Patient Characteristics

Table 1 presents the most significant variables for predicting aGvHD.

Type Row Variable Description Role
base-line 1 Patient Gender Input
2 Donor Gender Input
3 Donor-Patient Gender Input
4 Patient Blood group Input
5 Donor Blood group Input
6 Delivery The process of giving birth for Donor. Input
7 Marital Status Input
8 Smoking Input
9 Blood group Compatibility Donor and recipient have the same blood group antigens and plasma antibodies. Input
10 Donor recipient relationship The relation between donor and patient gender including Related and Sibling. Input
11 Patient Age Input
12 Donor Age Input
biomarker 1 Prophylaxis Regimen Regimen use for the prevention of a specific disease. Input
2 Chemotherapy Regimen Regimen 1-3: Myeloablative is an intensive conditioning regimen to destroy the bone marrow cells. Regimen 4: Reduced intensity conditioning that uses less chemotherapy and radiation than the Regimen 1-3. Input
3 Diagnosis Input
4 Complete Remission Including: tests, physical exams, and scans show that all signs of cancer are gone. Input
5 Radiothrapypre Bone Marrow Transplantation The treatment of disease with ionizing radiation. Input
6 White Blood Cells Input
7 Platelet count Input
8 lactate dehydrogenase (LDH) Input
9 cluster of differentiation 3 (CD3) Input
10 cluster of differentiation 34 (CD34) The CD34 antigen identifies on a myeloid leukemia cell line. Input
11 mononuclear cell (MNC) Input
12 Diagnosis to Transplantation The time between disease diagnosis and hematopoietic stem cell transplantation Input
13 Patient Body mass index Input
14 Donor Body mass index Input
15 Hemoglobin Input
16 Creatinine Input
17 Uric Acid Input
18 Albumin Input
19 C-Reactive Protein (CRP) Input
Acute graft-versus-host disease (aGvHD) Target
Table 1. The dataset variables and their descriptions.

b. Pre-processing

After discarding incomplete patient records, the patient dataset was reduced to 182 patients (71 case-patients diagnosed with aGvHD vs. 111 control patients who did not experience aGvHD). As a result of under-sampling, the final number of scaled (normalized) patient records was 142 patients (71 cases vs. 71 controls), of which 70% (99 patients) were selected for the training dataset and 30% (43 cases) for the testing dataset.

The results of feature selection showed that of the 31 included variables, albumin, uric acid, C-Reactive Protein, donor age, platelet, Lactate Dehydrogenase, and Hemoglobin were identified as the seven most important predictors of aGvHD (Table 2) of which, albumin had the highest importance.

Feature Importance
Albumin 0.409
Uric Acid 0.151
C-Reactive Protein 0.148
Donor age 0.085
Platelet 0.081
Lactate Dehydrogenase 0.071
Hemoglobin 0.055
Table 2. The most important predictors of acute graft-versus-host disease

c. Predictive performance

The results of tuning the hyperparameters of ML algorithms are presented in Table 3.

Classifier Best F-measure %
*XGBClassifier 94
HistGradientBoostingClassifier 90
AdaBoostClassifier 90
RandomForestClassifier 95
*eXtreme Gradient Boosting classifier
Table 3. Results of optimized hyperparameters of machine learning algorithms

The evaluation results of ML models based on the test data set are shown in Figure 2 and Table 4. Based on the evaluation criteria including, accuracy, sensitivity, specificity, F-measure, and area under the curve (AUC), the two algorithms XGBClassifier (eXtreme Gradient Boosting Classifier) and HistGradientBoostingClassifier had the best performance. According to the mean of evaluation criteria, the XGBClassifier algorithm with 90.82 and the lowest number of false negative and false positive had the best performance (Table 4).

Figure 2. Results of classification report and Area under the curve (AUC) curve of machine learning models.

Row Classifier Accuracy Sensitivity Specificity F-measure AUC Mean
1 XGBClassifier 90.70 95.00 86.96 90.48 90.98 90.82
2 HistGradientBoostingClassifier 90.70 90.00 91.30 90.00 90.65 90.53
- Average 1 and 2 90.70 92.50 89.13 90.24 90.82 -
3 AdaBoostClassifier 86.05 75.00 95.65 83.33 85.33 85.07
4 RandomForestClassifier 83.72 80.00 86.96 82.05 83.48 83.24
AUC: Area under the curve, XGBClassifier: eXtreme Gradient Boosting classifier
Table 4. Results of performance evaluation of machine learning models

d. CDSS

Using the machine learning developed the model, a CDSS was designed and implemented, which is accessible via the https://agprcdss.ir/ (Figure 3). The agreement between CDSS prediction and the actual outcome that occurred within 100 days after AHSCT was 92%.

Figure 3. Graphical user interface of the clinical decision support system.

Discussion

In this study, we designed and implemented the AGPRC (Acute GvHD Prediction Transplant Day CDSS) for predicting the likelihood of aGvHD on transplantation day. Considering the most important aGvHD predictors and ML classification models, in the following lines, we have discussed some important aspects of this study.

I. The most important predictors for aGvHD

Biomarkers play a key role in predicting aGvHD as they help oncologists to identify patients who are at higher risk for aGvHD, and to select appropriate pre and post transplantation care plans for them. In this study, seven variables were identified as the most important factors associated with aGvHD on the transplantation day. These variables included albumin, uric acid, CRP, donor age, platelet, LDH, and hemoglobin.

In our study, the relative importance of albumin in predicting aGvHD was about 41%. Similarly, previous studies have also emphasized on the importance of the albumin level for predicting aGvHD [ 38 - 40 ], and low amounts of albumin alone and without dependence on other predictors affect overall mortality of aGvHD patients [ 41 ].

The second predictor for aGvHD is uric acid with the importance of 15.1%. In previous studies, this variable has been cited as a strong immunological risk signal. Low levels of this predictor, especially on the day of transplantation, increase the likelihood of aGvHD [ 42 , 43 ].

The third predictor for predicting aGvHD is CRP with the importance of 14.8%. High levels of this predictor in patients increase the risk of aGvHD, especially grade II to IV, asymptomatic death, and decreased overall survival [ 44 - 48 ].

The fourth predictor for aGvHD is the donor age with the importance of 8.5%. In studies related to AHSCT, donor age is considered as an important predictor [ 49 ], and one of the appropriate predictors that can be achieved easily and without cost.

The fifth predictor for aGvHD is platelets with the importance of 8.1%. Previous studies have emphasized that to prevent the likelihood of aGvHD in AHSCT patients, it is essential to maintain platelet counts above 10,000 mm3 [ 50 ].

The sixth predictor for aGvHD is LDH with the importance of 7.1%. Studies have shown that low levels of LDH and high levels of serum cyclosporine reduce the likelihood of aGvHD [ 51 ]. In some previous studies, this variable has been presented as one of the important predictors [ 10 , 52 ].

The seventh predictor for predicting aGvHD is hemoglobin with the importance of 5.5%. Previous studies have emphasized that to prevent the likelihood of aGvHD in AHSCT patients, maintaining a hemoglobin level above 8 to 9 g/dL is essential. Therefore, red blood cells and platelets are injected continuously in these patients [ 50 ].

According to the literature review, each one of previous studies has focused on the importance of a marker based on the diagnosis of aGvHD. Thus, the impact of the combination of these significant factors on aGvHD detection is reported for the first time in this study.

II. Selected machine learning models

After identifying the most important predictors, the algorithm modeling process was performed using the optimization of their hyperparameters. Based on the evaluation criteria of accuracy, sensitivity, specificity, AUC, F-measure, and the average of these criteria, XGBClassifier model and HistGradientBoostingClassifier had the best performance.

In previous studies, ML algorithms have been mainly used in laboratory settings, not in clinical practice. In the present study, the selected and tuned ML models were used as the inference engine of a CDSS to predict aGvHD in the transplantation unit of the target hospital.

In 2015, a study by Cocho et al, [ 53 ] aimed at using different ML algorithms to diagnose aGvHD by gene expression data, used support vector machine (SVM), shrinkage discriminant analysis (SDA), K-nearest neighbors (KNN) algorithms without tuning their hyper-parameters. The reported sensitivity, specificity and AUC were 100%, 92.9% and 99.5% for SVM, 92.9%, 92.9% and 95.9% for SDA and 92.9%, 92.9% and 92.9% for KNN, respectively. The ML models presented in this study had very good performance evaluation criteria, but there are three main criticisms for this study as follows: 1) this study aimed to diagnose aGvHD only based on gene expression data. 2) These ML models have not been tested in a clinical setting, and 3) the system cannot predict aGvHD before the patient goes through transplantation because the study was designed in such way, measured after transplantation.

In 2018, Arai et al, [ 28 ] conducted a study entitled “Predicting aGvHD following AHSCT using an ML algorithm” using the ADTree without the hyper-parameter optimization method. The reported AUC for grades 2-4 aGvHD was 61.6% and for grades 3-4 was 62.3%. This study aimed to develop ML models to accurately predict grades 2 to 4 of aGvHD, However, the performance of their models was poor. In contract, in our study, all performance measures of the selected ML models had values over 90%, which demonstrates a much better overall performance than Arai et al, study.

In 2018, Lee et al, [ 29 ] conducted a study entitled as “Predicting the absolute risk of aGvHD following AHSCT” using the ensemble method without optimizing the hyper-parameters of the employed algorithms. The reported AUC was in the range of 61.3% to 64% for these ensemble models. Despite the fact that in the present study, the models are also of the ensemble type, because of the hyperparameter tuning, the performance was much better compared to the study conducted by Lee et al.

In 2020, Tang et al, [ 54 ] conducted a study entitled “Predicting aGvHD using Machine Learning and Longitudinal Vital Signs Data from Electronic Health Records” using logistic regression without hyper-parameters optimization methods. The reported AUC for grades 2-4 aGvHD was 65.9%. This study, like a few other studies [ 28 , 29 , 53 ], was performed to diagnose aGvHD after transplantation. Compared to the ML models of the present study, the model proposed in the study of Tang et al, has a lower performance, and has not been used in a clinical setting.

Comparing the performance evaluation criteria of XGBClassifier and HistGradientBoostingClassifier with ML model presented in previous studies [ 28 , 29 , 53 , 54 ], it seems that the use of these ML models in CDSS to predict aGvHD in the process of modifying the care plan of patients who received AHSCT can be useful and effective. Thus, we designed and developed a CDSS and applied it in the transplantation unit of the target hospital to predict aGvHD on the day of transplantation.

III. CDSS performance evaluation

In terms of developing aGvHD, there was 92% agreement between the CDSS prediction outcome and the actual patient outcome that was measured 100 days after the AHSCT transplantation.

Given that the criteria of the average evaluation of the ML models used in this CDSS were 91%, it seems that this CDSS had acceptable performance.

Conclusion

According to the current results and previous research, it is obvious that training a model based on the aggregation of the most significant features achieves the better performance in comparison with generating a model concerning each important feature, separately.

In this study, seven variables were identified as the most important factors associated with aGvHD on the transplantation day. These variables included albumin, uric acid, CRP, donor age, platelet, LDH, and hemoglobin. Ensembled Machine learning methods can reliably predict the likelihood of aGvHD at the time of transplantation. These methods can help us to limit the number of risk factors to those that have the significant effects on the outcome. However, their performance is heavily dependent on selecting the appropriate methods and algorithms. Future studies should focus on determining the most appropriate aGvHD predictive models.

References

  1. Gopalakrishnan R, Jagasia M. Pathophysiology and Management of Graft-Versus-Host Disease. In Hematopoietic Cell Transplantation for Malignant Conditions. 2019;301-19. DOI
  2. Ferrara J L, Levine J E, Reddy P, Holler E. Graft-versus-host disease. Lancet. 2009; 373(9674):1550-61. Publisher Full Text | DOI | PubMed
  3. Blazar B R, Murphy W J, Abedi M. Advances in graft-versus-host disease biology and therapy. Nat Rev Immunol. 2012; 12(6):443. Publisher Full Text | DOI | PubMed
  4. Ball L, Egeler R M. Acute GvHD: pathogenesis and classification. Bone Marrow Transplant. 2008; 41(S2):S58. DOI | PubMed
  5. Ali A M, DiPersio J F, Schroeder M A. The Role of Biomarkers in the Diagnosis and Risk Stratification of Acute Graft-versus-Host Disease: A Systematic Review. Biology of blood and marrow transplantation. Journal of the American Society for Blood and Marrow Transplantation. 2016; 22(9):1552-64. Publisher Full Text | DOI | PubMed
  6. Betts B, Anasetti C, Pidala J. Biomarkers for GVHD prognosis. Lancet Haematol. 2015; 2(1):e4-5. DOI | PubMed
  7. Berger M, Signorino E, Muraro M, Quarello P, Biasin E, Nesi F, et al. Monitoring of TNFR1, IL-2Rα, HGF, CCL8, IL-8 and IL-12p70 following HSCT and their role as GVHD biomarkers in paediatric patients. Bone Marrow Transplant. 2013; 48(9):1230. DOI | PubMed
  8. Gimondi S, Dugo M, Vendramin A, Bermema A, et al. Circulating miRNA panel for prediction of acute graft-versus-host disease in lymphoma patients undergoing matched unrelated hematopoietic stem cell transplantation. Exp Hematol. 2016; 44(7):624-34. DOI | PubMed
  9. Levine J E, Braun T M, Harris A C, Holler E, et al. Blood and Marrow Transplant Clinical Trials Network. A prognostic score for acute graft-versus-host disease based on biomarkers: a multicentre study. Lancet Haematol. 2015; 2(1):e21-9. Publisher Full Text | DOI | PubMed
  10. Paczesny S, Krijanovski O I, Braun T M, Choi S W, et al. A biomarker panel for acute graft-versus-host disease. Blood. 2009; 113(2):273-8. Publisher Full Text | DOI | PubMed
  11. Levine J E, Logan B R, Wu J, Alousi A M, et al. Acute graft-versus-host disease biomarkers measured during therapy can predict treatment outcomes: a Blood and Marrow Transplant Clinical Trials Network study. Blood. 2012; 119(16):3854-60. Publisher Full Text | DOI | PubMed
  12. Shortliffe E H. Biomedical Informatics: the Science and the pragmatics. In Biomedical informatics. 2014;3-37. DOI
  13. Berner E S. Clinical decision support systems. Springer: New York ; 2007.
  14. Alther M, Reddy C K. Clinical Decision Support Systems. 2015.
  15. El-Sappagh S H, El-Masri S. A distributed clinical decision support system architecture. Journal of King Saud University - Computer and Information Sciences. 2014; 26(1):69-78. DOI
  16. Raschka S, Mirjalili V. Python machine learning. Packt Publishing Ltd: Birmingham, UK ; 2017.
  17. Géron A. Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems 1st Edition. 2017.
  18. Asadi F, Salehnasab C, Ajori L. Supervised Algorithms of Machine Learning for the Prediction of Cervical Cancer. J Biomed Phys Eng. 2020; 10(4):513. Publisher Full Text | DOI | PubMed
  19. Géron A. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems 2st Edition. 2019.
  20. Alpaydin E. Introduction to machine learning. 2014.
  21. Rezaianzadeh A, Dastoorpoor M, Sanaei M, et al. Predictors of length of stay in the coronary care unit in patient with acute coronary syndrome based on data mining methods. Clinical Epidemiology and Global Health. 2020; 8(2):383-8. DOI
  22. Kumar A, Jain M. Ensemble Learning for AI Developers. BApress: Berkeley, CA; 2020.
  23. Sang V, Yano S, Kondo T J S. On-Body Sensor Positions Hierarchical Classification. Sensors. 2018; 18(11):3612. DOI
  24. An. A new diverse AdaBoost classifier. International Conference on Artificial Intelligence and Computational Intelligence; Sanya, China: IEEE; 2010.
  25. Biau G, Scornet E J T. A random forest guided tour. Test. 2016; 25(2):197-227. DOI
  26. Obermeyer Z, Emanuel E J. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med. 2016; 375(13):1216. Publisher Full Text | DOI | PubMed
  27. Salehnasab C, Hajifathali A, Asadi F, et al. Machine Learning Classification Algorithms to Predict aGvHD following Allo-HSCT: A Systematic Review. Methods of Information in Medicine. 2019; 58(06):205-12. DOI | PubMed
  28. Arai Y, Kondo T, Fuse K, Shibasaki Y, Masuko M, et al. Using a machine learning algorithm to predict acute graft-versus-host disease following allogeneic transplantation. Blood Adv. 2019; 3(22):3626-34. Publisher Full Text | DOI | PubMed
  29. Lee C, Haneuse S, Wang H L, Rose S, et al. Prediction of absolute risk of acute graft-versus-host disease following hematopoietic cell transplantation. PLoS One. 2018; 13(1):e0190610. Publisher Full Text | DOI | PubMed
  30. Paun O, Phillips T, Fu P, Novoa R A, et al. Cutaneous complications in hematopoietic cell transplant recipients: impact of biopsy on patient management. Biol Blood Marrow Transplant. 2013; 19(8):1204-9. DOI | PubMed
  31. Barkai K. Accomplishment Classifier with Machine Learning. Minerva Schools at KGI: San Francisco, CA 94103 ; 2017.
  32. Swamynathan M. Mastering machine learning with python in six steps: A practical implementation guide to predictive data analytics using python. Apress: Berkeley, CA; 2019.
  33. Kursa M B, Rudnicki W R. Feature selection with the Boruta package. J Stat Softw. 2010; 36(11):1-3. DOI
  34. Claesen M, De Moor B. Hyperparameter search in machine learning. ArXiv:1502.02127v2. 2015.
  35. Effectiveness of random search in SVM hyper-parameter tuning. International Joint Conference on Neural Networks (IJCNN); Killarney, Ireland: IEEE: 2015.
  36. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. Australasian joint conference on artificial intelligence; Berlin, Heidelberg: Springer; 2006.
  37. Artstein R, Poesio M. Inter-coder agreement for computational linguistics. Computational Linguistics. 2008; 34(4):555-96. DOI
  38. Ayuk F, Bussmann L, Zabelina T, Veit R, et al. Serum albumin level predicts survival of patients with gastrointestinal acute graft-versus-host disease after allogeneic stem cell transplantation. Ann Hematol. 2014; 93(5):855-61. DOI | PubMed
  39. Harada K, Sekiya N, Konishi T, Nagata A, et al. Predictive implications of albumin and C-reactive protein for progression to pneumonia and poor prognosis in Stenotrophomonas maltophilia bacteremia following allogeneic hematopoietic stem cell transplantation. BMC Infect Dis. 2017; 17(1):638. Publisher Full Text | DOI | PubMed
  40. Gadallah H A, Khalaf M H, Mohamed H S. Pretransplant C-reactive protein, ferritin, albumin, and platelet count as prognostic biomarkers of hematopoietic stem cell transplantation outcome in hematological malignancies. The Egyptian Journal of Haematology. 2018; 43(2):76. DOI
  41. Artz A S, Logan B R, Zhu X, et al. Pre-Transplant C-Reactive Protein (CRP), Ferritin and Albumin As Biomarkers to Predict Transplant Related Mortality (TRM) after Allogeneic Hematopoietic Cell Transplant (HCT). Blood. 2014; 124 (21):422. DOI
  42. Haen S P, Eyb V, Mirza N, Naumann A, Peter A, et al. Uric acid as a novel biomarker for bone-marrow function and incipient hematopoietic reconstitution after aplasia in patients with hematologic malignancies. J Cancer Res Clin Oncol. 2017; 143(5):759-71. DOI | PubMed
  43. Ostendorf B N, Blau O, Uharek L, Blau I W, Penack O. Association between low uric acid levels and acute graft-versus-host disease. Ann Hematol. 2015; 94(1):139-44. DOI | PubMed
  44. Artz A S, Wickrema A, Dinner S, Godley LA, et al. Pretreatment C-reactive protein is a predictor for outcomes after reduced-intensity allogeneic hematopoietic cell transplantation. Biol Blood Marrow Transplant. 2008; 14(11):1209-16. Publisher Full Text | DOI | PubMed
  45. Fuji S, Kim S W, Fukuda T, Mori S, Yamasaki S, et al. Preengraftment serum C-reactive protein (CRP) value may predict acute graft-versus-host disease and nonrelapse mortality after allogeneic hematopoietic stem cell transplantation. Biol Blood Marrow Transplant. 2008; 14(5):510-7. DOI | PubMed
  46. Jordan K K, Christensen I J, Heilmann C, Sengeløv H, Müller K G. Pretransplant C-reactive protein as A prognostic marker in allogeneic stem cell transplantation. Scand J Immunol. 2014; 79(3):206-13. DOI
  47. Minculescu L, Kornblit B T, Friis L S, Schiødt I, et al. C-Reactive Protein Levels at Diagnosis of Acute Graft-versus-Host Disease Predict Steroid-Refractory Disease, Treatment-Related Mortality, and Overall Survival after Allogeneic Hematopoietic Stem Cell Transplantation. Biol Blood Marrow Transplant. 2018; 24(3):600-7. DOI | PubMed
  48. Sato M, Nakasone H, Oshima K, Ishihara Y, et al. Prediction of transplant-related complications by C-reactive protein levels before hematopoietic SCT. Bone Marrow Transplant. 2013; 48(5):698-702. DOI | PubMed
  49. Baygan A, Aronsson-Kurttila W, Moretti G, et al. Safety and Side Effects of Using Placenta-Derived Decidual Stromal Cells for Graft-versus-Host Disease and Hemorrhagic Cystitis. Front Immunol. 2017; 8:795. Publisher Full Text | DOI | PubMed
  50. Govindan R. The Washington manual of oncology. 2007.
  51. Song M K, Chung J S, Seol Y M, et al. Influence of lactate dehydrogenase and cyclosporine a level on the incidence of acute graft-versus-host disease after allogeneic stem cell transplantation. J Korean Med Sci. 2009; 24(4):555-60. Publisher Full Text | DOI | PubMed
  52. Amini M, Kazemnejad A, Rasekhi A, Zayeri F, et al. Application of latent class analysis in diagnosis of graft-versus-host disease by serum markers after allogeneic haematopoietic stem cell transplantation. Sci Rep. 2020; 10(1):3633. Publisher Full Text | DOI | PubMed
  53. Cocho L, Fernández I, Calonge M, Martínez V, et al. Gene Expression-Based Predictive Models of Graft Versus Host Disease-Associated Dry Eye. Invest Ophthalmol Vis Sci. 2015; 56(8):4570-81. DOI | PubMed
  54. Tang S, Chappell G T, Mazzoli A, Tewari M, et al. Predicting Acute Graft-Versus-Host Disease Using Machine Learning and Longitudinal Vital Sign Data From Electronic Health Records. JCO Clin Cancer Inform. 2020; 4:128-135. Publisher Full Text | DOI | PubMed