Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New SDMetrics from Smartnoise #123

Open
lurosenb opened this issue Apr 28, 2022 · 1 comment
Open

New SDMetrics from Smartnoise #123

lurosenb opened this issue Apr 28, 2022 · 1 comment
Labels
feature request Request for a new feature

Comments

@lurosenb
Copy link

Hi all!
I am one of the smartnoise-sdk maintainers (part of the OpenDP collaboration). Specifically, I work on differentially private (DP) data synthesizers.

Problem Description

It would be nice if SDMetrics had some more methods geared towards DP synthesizers! (specifically, methods from from https://arxiv.org/pdf/2004.07740.pdf, https://arxiv.org/pdf/1604.06651.pdf and https://arxiv.org/pdf/1806.11345.pdf)

Expected behavior

SDMetrics will be able to produce pMSE (Snoke et al) and Wasserstein randomization test (Arnold et al) scores for single_table synthetic data (under privacy). (Potentially, also SRA scores (Jordon et al), although this is not as high priority, and may require too much support code to be feasible.)

Additional context

Here, we have some light implementations of the aforementioned methods. Though we use them to evaluate DP synthetic data, these metrics would also work for general purpose synthetic data (pMSE and Wasserstein essentially fit the interface described by the single_table metrics as is).

Reasoning for transition: The SDMetrics package is far more mature and well supported than our DP synthetic data gym, and so we would like to be able to use SDMetrics instead of our gym for smartnoise synthesizer evaluations. Metric parity would be nice before that transition, and so we hope that we can contribute at least pMSE, hopefully Wasserstein, and perhaps SRA, to the SDMetrics package.

I'm adding this issue to gather feedback, before I begin this effort in earnest! Would these metrics be welcome in SDMetrics? Are there concerns/limitations I should be aware of?

@lurosenb lurosenb added new feature pending review This issue needs to be further reviewed, so work cannot be started labels Apr 28, 2022
@npatki
Copy link
Contributor

npatki commented Jul 14, 2022

Thanks for filing and linking to your code @lurosenb. Let's keep this feature request open for tracking and updating progress.

One thing you should know: We are actively looking into making SDMetrics more usable. You are welcome to propose new metrics but as we think about incorporating them with the package, it would be great to ensure that the metric can handle all the cases that we currently consider. This Discussion contains some questions to think about.

@npatki npatki added feature request Request for a new feature and removed pending review This issue needs to be further reviewed, so work cannot be started new feature labels Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

2 participants