This is the source code for reproducing the inpatient dataset experiments found in the paper "MediSim: Multi-Granular Simulation for Enriching Longitudinal, Multi-Modal Electronic Health Records"
This code interfaces with the pubilc MIMIC-III ICU stay database. Before using the code, you will need to apply, complete training, and download the tables referenced in utils/genMediSim.py
and utils/genNotes.py
from https://physionet.org. From there, generate an empty directory data/
and data_notes/
before editing the mimic_dir
variable in the two files and run them. Finally, apply for and download the MIMIC-CXR-JPG database, generate an empty data_images/
, and configure then run that file. This will generate all of the relevant data files.
Next, a model can be training by creating an empt save/
directory and running all of the desired train_model.py
scripts. The only requirement is to run train_base*
scripts before the corresponding train_ss*
scripts.
Next, any desired baseline models may be trained by changing your working directory to temporal_baselines/
or modality_baselines/
and running the corresponding {baseline_model}.py
script.
The baseline scripts will evaluate as a part of their script. For the MediSim and ablation models, run the corresponding test*
scripts.
To generate simulated/extended dataset, run any desired files in the generate_datasets/
directory and its subdirectories (for baseline models)
To evaluate the utility of enriched data, run augmentation_prediction_temporal.py
and augmentation_prediction_modality.py
. These may take awhile as they loop through all of the compared models.
MediSim code and model weights are released under the MIT License. See LICENSE for additional details.