Files

readm

Jul 28, 2022

6315424 · Jul 28, 2022

Name	Name	Last commit message	Last commit date
parent directory ..
combine_puf	combine_puf	uploading files	Feb 8, 2022
hyperparameter_search	hyperparameter_search	uploading files	Feb 8, 2022
all_models.ipynb	all_models.ipynb	uploading files	Feb 8, 2022
curves.ipynb	curves.ipynb	uploading files	Feb 8, 2022
preproc.ipynb	preproc.ipynb	uploading files	Feb 8, 2022
readme.txt	readme.txt	Update readme.txt	Jul 28, 2022
shap.ipynb	shap.ipynb	uploading files	Feb 8, 2022
table1.ipynb	table1.ipynb	uploading files	Feb 8, 2022

readme.txt

This repository accompanies the manuscript "Development and Validation of Machine Learning Models to Predict Readmission after Colorectal Surgery" submitted to Journal of GI Surgery and contains the code which can be used to reproduce work.

The colectomy and proctectomy procedure targeted datasets were downloaded from the participant use data file website (https://www.facs.org/quality-programs/acs-nsqip/participant-use). Microsoft Excel was used to convert TXT files to CSV files. Further data processing was then performed using the Pandas library in Python.
Data missing values for readmission were dropped. A BMI column was generated from height and weight. Patients undergoing ostomy placement were identified using CPT codes (44211, 44212, 45113, 45119, 44155, 44157, 44158, 44125, 44187, 44141, 44143, 44144, 44146, 44150, 44151, 44206, 44208, 44210, 44187, 44188, 44320, 44310). The ‘COL_APPROACH’ column was condensed, with SILS, endoscopic, NOTES, ‘other MIS’, and hybrid cases recoded as laparoscopic. Procedures were categorized based on CPT codes to L, R, and total colectomy and LAR, APR, and proctectomy with perineal approach. A race/ethnicity column was generated by combining the race and ethnicity columns. Missing categorical values were filled with the string ‘Unknown.’ Missing numerical values were filled with the median value of the column. The numerical columns were scaled using RobustScaler. The categorical columns were encoded using LabelEncoder.
RandomSearchCV was used to identify the best hyperparameters for each model. RF and XGB combinations were tested for 100 iterations with 5-fold cross-validation on the test/validation data. NN combinations were tested for 50 iterations with 5-fold cross-validation. NN models consisted of a series of fully-connected layers, with Dense layers with “relu” activation, followed by Batch Normalization and Dropout. The Adam optimizer and binary crossentropy loss were used. Hyperparameter search showed a 2 layer model, with 1000 nodes each followed by 1 output node, with 80% dropout and a learning rate of 3 x 10-3 had the best performance. Training was performed with early stopping with a patience of 25 epochs and a minimum change of 1x10-8. The Delong test was implemented using code from https://biasedml.com/roc-comparison/.

The notebooks in the combine_puf folder can be used to combine the colectomy and proctectomy datasets. Once the combined csv is created, it can be pre-processed using 'preproc.ipynb'. 'table1.ipynb' can be used to generate summary statistics. Scripts in the hyperparameter_search folder can be used to find optimal hyperparameters for each model. Then these parameters can be inputted into 'all_models.ipynb' and metrics calculated. These notebooks also produces TPR/FPR's and precision/recall's to be used in 'curves.ipynb'. Finally, 'shap.ipynb' can be used to build a NN model and perform SHAP analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

readm

readm

readme.txt

Files

readm

Directory actions

More options

Directory actions

More options

Latest commit

History

readm

Folders and files

parent directory

readme.txt