Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Privacy Metrics #55

Merged
merged 49 commits into from
Mar 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
aa378c1
Implemented base for categorical privacy metrics
ZhuofanXie Dec 28, 2020
753244c
Implemented interface for Attacker Models
ZhuofanXie Dec 28, 2020
960c9a2
Implemented support for customized model_kwargs for the ENS(majority …
ZhuofanXie Jan 4, 2021
4117780
implemented CAP related categorical metrics
ZhuofanXie Jan 4, 2021
de4452f
implemented ENS privacy metric
ZhuofanXie Jan 4, 2021
4979b3a
renamed ENS metrics to CatENS
ZhuofanXie Jan 4, 2021
4d25688
Implemented sklearn based metrics
ZhuofanXie Jan 4, 2021
44cfe0d
Fixed type errors. Now everything is fully working!
ZhuofanXie Jan 4, 2021
9920036
Revised docstrings.
ZhuofanXie Jan 4, 2021
208f4ba
Implemented nearest neighbor based metrics
ZhuofanXie Jan 6, 2021
d7e1ec8
Fixed loopy imports
ZhuofanXie Jan 7, 2021
6ff0d58
Implemented sklearn based numerical metrics
ZhuofanXie Jan 8, 2021
a66684b
Supported filtering out privacy metrics which do not apply
ZhuofanXie Jan 9, 2021
eed0af4
Make ENS able to return nan when model_kwargs is bad
ZhuofanXie Jan 11, 2021
e1d642d
Implemented unit tests, as well as fixed a bug in naive Bayes metric
ZhuofanXie Jan 11, 2021
f022185
Add dependence for copulas.
ZhuofanXie Jan 11, 2021
8de8424
Pulled from origin master and resolved conflicts
ZhuofanXie Jan 11, 2021
45ec9fb
Fixed codestyle
ZhuofanXie Jan 11, 2021
06657e8
Fixed more code style issues
ZhuofanXie Jan 11, 2021
99a2936
Fixed indentations
ZhuofanXie Jan 11, 2021
eac20e5
Fixed more indentations.
ZhuofanXie Jan 11, 2021
c373aa1
Fixed imports
ZhuofanXie Jan 11, 2021
037bb3c
Reorganized imports
ZhuofanXie Jan 11, 2021
a7414ac
Reorganized imports and rewrite init files
ZhuofanXie Jan 12, 2021
b1e4bc3
renamed files
ZhuofanXie Jan 12, 2021
a89fa41
Finished code style fix.
ZhuofanXie Jan 12, 2021
487c15a
Merge pull request #38 from ZhuofanXie/master
csala Mar 3, 2021
aaee991
Merge branch 'master' into 0.3.0-dev
csala Mar 3, 2021
df0a6f5
Draft of possible tests
fealho Mar 17, 2021
5bd7da2
Add notes
fealho Mar 17, 2021
27afeda
Add more notes
fealho Mar 17, 2021
3a609cd
Name changes
fealho Mar 19, 2021
d0ba7d5
Code refactoring
fealho Mar 19, 2021
772d88c
Fix lint
fealho Mar 19, 2021
ff83ab3
Add backslash
fealho Mar 19, 2021
51d58b3
New test
fealho Mar 19, 2021
b1cd61d
Add tests
fealho Mar 19, 2021
65a2605
Fix lint
fealho Mar 19, 2021
ca20fc3
Fix lint
fealho Mar 19, 2021
0a1799d
Fix lint
fealho Mar 19, 2021
00450fd
Lot's of small fixes everywhere
fealho Mar 22, 2021
83a7990
Fix lint
fealho Mar 22, 2021
a868e2f
Fix lint
fealho Mar 22, 2021
7047f97
Update documentation
fealho Mar 22, 2021
8af0003
Fix lint
fealho Mar 22, 2021
c30aac0
Fixes torch < 1.8
fealho Mar 22, 2021
3281a52
Address code style feedback
fealho Mar 23, 2021
2d20b67
Fix lint
fealho Mar 23, 2021
5e65421
Update README and documentation
fealho Mar 24, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
python-version: ${{ matrix.python-version }}
- if: matrix.os == 'windows-latest'
name: Install dependencies - Windows
run: pip install 'torch>=1,<2' -f https://download.pytorch.org/whl/torch_stable.html
run: pip install 'torch>=1,<1.8' -f https://download.pytorch.org/whl/torch_stable.html
- name: Install package
run: pip install invoke .[dev]
- name: invoke lint
Expand Down Expand Up @@ -58,7 +58,7 @@ jobs:
python-version: ${{ matrix.python-version }}
- if: matrix.os == 'windows-latest'
name: Install dependencies - Windows
run: pip install 'torch>=1,<2' -f https://download.pytorch.org/whl/torch_stable.html
run: pip install 'torch>=1,<1.8' -f https://download.pytorch.org/whl/torch_stable.html
- name: Install package and dependencies
run: pip install invoke .[test]
- name: invoke pytest
Expand Down Expand Up @@ -105,7 +105,7 @@ jobs:
- if: matrix.os == 'windows-latest'
name: Install dependencies - Windows
run: |
pip install 'torch>=1,<2' -f https://download.pytorch.org/whl/torch_stable.html
pip install 'torch>=1,<1.8' -f https://download.pytorch.org/whl/torch_stable.html
choco install graphviz
- name: Install package and dependencies
run: pip install invoke jupyter .[ctgan]
Expand Down
3 changes: 2 additions & 1 deletion conda/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ requirements:
- pomegranate >=0.13.4,<0.14.2
- pytorch >=1.4,<2
- sktime >=0.4,<0.6
- copulas>=0.5.0,<0.6
- rdt >=0.4.0,<0.5

run:
- python >=3.6,<3.9
- scikit-learn >=0.23,<1
Expand All @@ -36,6 +36,7 @@ requirements:
- pomegranate >=0.13.4,<0.14.2
- pytorch >=1.4,<2
- sktime >=0.4,<0.6
- copulas>=0.5.0,<0.6
- rdt >=0.4.0,<0.5

about:
Expand Down
37 changes: 36 additions & 1 deletion sdmetrics/single_table/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,30 @@ Implemented metrics:
* `MLEfficacy`: Generic ML Efficacy metric that detects the type of ML Problem associated
with the dataset by analyzing the target column type and then applies all the metrics
that are compatible with it.
* Privacy Metrics: Metrics that fit an adversial attacker model on the synthetic data and
then evaluate its accuracy (or probability of making the correct attack) on the real data.
* `CategoricalCAP`: Privacy Metric for categorical columns, based
on the Correct Attribution Probability method.
* `CategoricalZeroCAP`: Privacy Metric for categorical columns, based
on the Correct Attribution Probability method.
* `CategoricalGeneralizedCAP`: Privacy Metric for categorical columns, based
on the Correct Attribution Probability method.
* `NumericalMLP`: Privacy Metric for numerical columns, based
on MLPRegressor from scikit-learn.
* `NumericalLR`: Privacy Metric for numerical columns, based
on LinearRegression from scikit-learn.
* `NumericalSVR`: Privacy Metric for numerical columns, based
on SVR from scikit-learn.
* `CategoricalKNN`: Privacy Metric for categorical columns, based
on KNeighborsClassifier from scikit-learn.
* `CategoricalNB`: Privacy Metric for categorical columns, based
on CategoricalNB from scikit-learn.
* `CategoricalRF`: Privacy Metric for categorical columns, based
on RandomForestClassifier from scikit-learn.
* `CategoricalEnsemble`: Privacy Metric for categorical columns, based
on an 'ensemble' of other categorical Privacy Metrics.
* `NumericalRadiusNearestNeighbor`: Privacy Metric for numerical columns, based
on an implementation of the Radius Nearest Neighbor method.
* MultiSingleColumn Metrics: Metrics that apply a Single Column metric on each column from
the table that is compatible with it and then compute the average across all the columns.
* `CSTest`: MultiSingleColumn metric based on applying the Single Column CSTest on all
Expand Down Expand Up @@ -86,7 +110,18 @@ Out[2]:
'KSTest': sdmetrics.single_table.multi_single_column.KSTest,
'KSTestExtended': sdmetrics.single_table.multi_single_column.KSTestExtended,
'ContinuousKLDivergence': sdmetrics.single_table.multi_column_pairs.ContinuousKLDivergence,
'DiscreteKLDivergence': sdmetrics.single_table.multi_column_pairs.DiscreteKLDivergence}
'DiscreteKLDivergence': sdmetrics.single_table.multi_column_pairs.DiscreteKLDivergence,
'CategoricalCAP': sdmetrics.single_table.privacy.cap,
'CategoricalGeneralizedCAP': sdmetrics.single_table.privacy.cap,
'CategoricalZeroCAP': sdmetrics.single_table.privacy.cap,
'CategoricalKNN': sdmetrics.single_table.privacy.cap,
'CategoricalNB': sdmetrics.single_table.privacy.cap,
'CategoricalRF': sdmetrics.single_table.privacy.cap,
'CategoricalEnsemble': sdmetrics.single_table.privacy.ensemble,
'NumericalLR': sdmetrics.single_table.privacy.numerical_sklearn,
'NumericalMLP': sdmetrics.single_table.privacy.numerical_sklearn,
'NumericalSVR': sdmetrics.single_table.privacy.numerical_sklearn,
'NumericalRadiusNearestNeighbor': sdmetrics.single_table.privacy.radius_nearest_neighbor}
```

## Single Table Inputs and Outputs
Expand Down
25 changes: 25 additions & 0 deletions sdmetrics/single_table/privacy/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from sdmetrics.single_table.privacy.base import CategoricalPrivacyMetric, NumericalPrivacyMetric
from sdmetrics.single_table.privacy.cap import (
CategoricalCAP, CategoricalGeneralizedCAP, CategoricalZeroCAP)
from sdmetrics.single_table.privacy.categorical_sklearn import (
CategoricalKNN, CategoricalNB, CategoricalRF)
from sdmetrics.single_table.privacy.ensemble import CategoricalEnsemble
from sdmetrics.single_table.privacy.numerical_sklearn import (
NumericalLR, NumericalMLP, NumericalSVR)
from sdmetrics.single_table.privacy.radius_nearest_neighbor import NumericalRadiusNearestNeighbor

__all__ = [
'CategoricalCAP',
'CategoricalZeroCAP',
'CategoricalGeneralizedCAP',
'NumericalMLP',
'NumericalLR',
'NumericalSVR',
'CategoricalKNN',
'CategoricalNB',
'CategoricalRF',
'CategoricalPrivacyMetric',
'NumericalPrivacyMetric',
'CategoricalEnsemble',
'NumericalRadiusNearestNeighbor'
]
Loading