MySurgeryRisk is an advanced predictive model designed to assess the likelihood of patients requiring prolonged mechanical ventilation (MV) following major surgical procedures. Specifically, it forecasts the risk of a patient needing mechanical ventilation for more than 48 hours post-surgery.
University of Florida Intelligent Clinical Care Center (ic3-center@ufl.edu)
v1.0, Dec 5, 2024
CC BY-NC 4.0
The model is a random forest classifier.
Tabular data with 78 features including 1) Socio-demographics (e.g., age, sex, race, ethnicity, language, area median income); 2) Admission information (e.g., emergent admission, admission source, night admission); 3) Comorbidities (e.g., diabetes, hypertension, cancer); 4) Scheduled procedure information (e.g., procedure code, surgeons, anesthesia type); 5) Historical medications (e.g., vancomycin, aspirin, beta-blokers); 6) Preoperative laboratory results (i.e., serum creatinine, hemoglobin, serum anion gap)
The model outputs a probability score, ranging from 0 to 1, indicating the likelihood of a patient requiring prolonged MV post-surgery.
UFH Gainesville training dataset: The dataset included all patients 18 years or older who were admitted to University of Florida Health (UFH) Gainesville for any type of inpatient surgical procedure. The final cohort consisted of 41,812 patients who received 52,117 procedures between June 1, 2014 and November 27, 2018. Each patient's medical record contained heterogeneous variables (eg, demographic characteristics and medical history, diagnoses and procedures, medications, laboratory results, and vital signs).
Labeling: The use of mechanical ventilation was identified using EHR data representing respiratory devices, ventilation modes, and measured values for respiratory vitals that include oxygen flow rate, tidal volume, and positive end-expiratory pressure. The detailed logic for mechanical ventilation identification is illustrated in Figure 2. Additionally, the outcome distribution was present in Figure 3.
UFH Gainesville evaluation dataset: The dataset included all patients 18 years or older who were admitted to University of Florida Health (UFH) Gainesville for any type of inpatient surgical procedure. The final cohort consisted of 19,132 patients who received 22,300 procedures between November 28, 2018 and September 20, 2020. We present the outcome distribution in Figure 4.
The model was trained on the entire training dataset using the selected hyperparameters, which were selected using 5-fold cross validation.
| min_samples_leaf | 10 |
|---|---|
| n_estimators | 1500 |
| max_features | 10 |
| class_weight | balanced |
The model performance was evaluated using several metrics, including area under the receiver operating characteristic curve (AUROC), area under the precision recall curve (AUPRC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (APV). 95% confidence intervals (CI) for all performance measures were calculated using bootstrao sampling and nonparametric methods. Detailed evaluation result was presented in the 'Evaluation Results' section.
| UFH Gainesville evaluation dataset | |
|---|---|
| AUROC | 0.91 (0.9-0.91) |
| AUPRC | 0.45 (0.41-0.48) |
| NPV | 0.99 (0.99-0.99) |
| PPV | 0.21 (0.2-0.24) |
| Sensitivity | 0.85 (0.82-0.87) |
| Specificity | 0.82 (0.8-0.84) |
Utilizing SHapley Additive exPlanations (SHAP) on the evaluation dataset, we identified the key features contributing to prolonged MV risk prediction, as illustrated in Figure 7. The primary procedure code emerged as the most significant feature. Other top contributors included the attending surgeon, preoperative serum calcium and glucose levels, and surgery type, all ranking among the five most influential features.
We evaluated the bias from the dataset and the prediction model across three sensitive attribute including sex, race and age. The evaluation results are shown in Figure 8 and Figure 9. We observed that while our prediction model and dataset satisfies several important fairness criteria, such as statistical parity, average odds, and equal opportunity, the disparate impact metric indicates potential unfairness in terms of selection rates across different groups (sex and race). This suggests that while the model maintains overall balance in its predictions, there may be subtle distributional differences that disproportionately affect certain groups (Figures 3 and 4). The 80% rule or Four-Fifths Rule has been applied to determine if there is bias.
Disparate Impact (DI): DI compares the proportion of individuals that receive a favorable outcome for two groups, a protected group and a reference group. DI=P(outcome|protected group)/P(outcome|reference group).
Statistical Parity Difference (SPD): SPD measures the difference that the protected and reference classes receive a favorable outcome. SPD=P(outcome|protected group)-P(outcome|reference group).
Equal Opportunity Difference (EOD): EOD measures the difference in true positive rates (TPR) between the protected group and the reference group. EOD=TPR(protected group)-TPR(reference group).
Average Odds Difference (AOD): AOD measures the average of two differences: 1) The difference in false positive rates (FPR) between groups; 2) The difference in true positive rates (TPR) between groups. AOD = 0.5 * [(FPR(protected group)-FPR(reference group)) + (TPR(protected group)-TPR(reference group))].
Theil Index (TI): TI measures the inequality in the distribution of outcomes across different groups. Lower values indicate more equality among groups. T=(1/n) * Σ [(yi/μ) * ln(yi/μ)], where n is the number of groups, yi is the outcome for group i, μ is the mean outcome across all groups and ln is the natural logarithm.