To generate synthetic data which will most accurately resemble the original real dataset.
-
util/preProcess.py: contains data preprocessing and cleaning function
-
util/tools.py: contains function to create synthetic data with CTGAN, TVAE, GaussianCopula GAN and synthPop
-
util/metrics.py: contains functions to compute t-test, ks-test and kl-divergence over all columns of two datasets
-
util/plots.py: contains function to generate a histogram, a cumulative sums graphs, etc.
-
genSynthData.ipynb: script to generate synthetic datasets using CTGAN, TVAE, GaussianCopula GAN and synthPop
-
evaluateSynthData.ipynb: contains t-test, ks-test, histograms and cummulative sums graphs, to evaluate the performance of the datasets
-
plots/: saved histograms and cummulative sums graphs from evaluateSynthData.ipynb
-
syntheticData/: folder that contains gererated datasets