Skip to content

part-cw/biomed-derivative-project

Repository files navigation

Biomed Derivative Project

Aim

To generate synthetic data which will most accurately resemble the original real dataset.

Files and folders

  • util/preProcess.py: contains data preprocessing and cleaning function

  • util/tools.py: contains function to create synthetic data with CTGAN, TVAE, GaussianCopula GAN and synthPop

  • util/metrics.py: contains functions to compute t-test, ks-test and kl-divergence over all columns of two datasets

  • util/plots.py: contains function to generate a histogram, a cumulative sums graphs, etc.

  • genSynthData.ipynb: script to generate synthetic datasets using CTGAN, TVAE, GaussianCopula GAN and synthPop

  • evaluateSynthData.ipynb: contains t-test, ks-test, histograms and cummulative sums graphs, to evaluate the performance of the datasets

  • plots/: saved histograms and cummulative sums graphs from evaluateSynthData.ipynb

  • syntheticData/: folder that contains gererated datasets

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published