Skip to content

Latest commit

 

History

History
 
 

single_column

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Single Column Metrics

The metrics found on this folder operate on individual columns (or univariate random variables), passed as two 1 dimensional arrays.

Implemented metrics:

  • Statistical: Metrics that compare the arrays using statistical tests
    • CSTest: Chi-Squared test to compare the distributions of two categorical columns.
    • KSTest: Kolmogorov-Smirnov test to compare the distributions of two numerical columns using their empirical CDF.

SingleColumnMetric

All the single column metrics are subclasses form the sdmetrics.single_column.SingleColumnMetric class, which can be used to locate all of them:

In [1]: from sdmetrics.single_column import SingleColumnMetric

In [2]: SingleColumnMetric.get_subclasses()
Out[2]:
{'CSTest': sdmetrics.single_column.statistical.cstest.CSTest,
 'KSTest': sdmetrics.single_column.statistical.kstest.KSTest}

Single Column Inputs and Outputs

All the single column metrics operate on just two inputs:

  • real_data: A 1d numpy array, coming from the real dataset.
  • synthetic_data: A 1d numpy array, coming from the synthetic dataset.

For example, this how the KSTest metric can be computed for the age column from the demo data:

In [3]: from sdmetrics import load_demo

In [4]: real_data, synthetic_data, metadata = load_demo()

In [5]: from sdmetrics.single_column import KSTest

In [6]: real_column = real_data['users']['age'].to_numpy()

In [7]: synthetic_column = synthetic_data['users']['age'].to_numpy()

In [8]: KSTest.compute(real_column, synthetic_column)
Out[8]: 0.8