You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all!
I am one of the smartnoise-sdk maintainers (part of the OpenDP collaboration). Specifically, I work on differentially private (DP) data synthesizers.
SDMetrics will be able to produce pMSE (Snoke et al) and Wasserstein randomization test (Arnold et al) scores for single_table synthetic data (under privacy). (Potentially, also SRA scores (Jordon et al), although this is not as high priority, and may require too much support code to be feasible.)
Additional context
Here, we have some light implementations of the aforementioned methods. Though we use them to evaluate DP synthetic data, these metrics would also work for general purpose synthetic data (pMSE and Wasserstein essentially fit the interface described by the single_table metrics as is).
Reasoning for transition: The SDMetrics package is far more mature and well supported than our DP synthetic data gym, and so we would like to be able to use SDMetrics instead of our gym for smartnoise synthesizer evaluations. Metric parity would be nice before that transition, and so we hope that we can contribute at least pMSE, hopefully Wasserstein, and perhaps SRA, to the SDMetrics package.
I'm adding this issue to gather feedback, before I begin this effort in earnest! Would these metrics be welcome in SDMetrics? Are there concerns/limitations I should be aware of?
The text was updated successfully, but these errors were encountered:
Thanks for filing and linking to your code @lurosenb. Let's keep this feature request open for tracking and updating progress.
One thing you should know: We are actively looking into making SDMetrics more usable. You are welcome to propose new metrics but as we think about incorporating them with the package, it would be great to ensure that the metric can handle all the cases that we currently consider. This Discussion contains some questions to think about.
Hi all!
I am one of the smartnoise-sdk maintainers (part of the OpenDP collaboration). Specifically, I work on differentially private (DP) data synthesizers.
Problem Description
It would be nice if SDMetrics had some more methods geared towards DP synthesizers! (specifically, methods from from https://arxiv.org/pdf/2004.07740.pdf, https://arxiv.org/pdf/1604.06651.pdf and https://arxiv.org/pdf/1806.11345.pdf)
Expected behavior
SDMetrics will be able to produce pMSE (Snoke et al) and Wasserstein randomization test (Arnold et al) scores for single_table synthetic data (under privacy). (Potentially, also SRA scores (Jordon et al), although this is not as high priority, and may require too much support code to be feasible.)
Additional context
Here, we have some light implementations of the aforementioned methods. Though we use them to evaluate DP synthetic data, these metrics would also work for general purpose synthetic data (pMSE and Wasserstein essentially fit the interface described by the single_table metrics as is).
Reasoning for transition: The SDMetrics package is far more mature and well supported than our DP synthetic data gym, and so we would like to be able to use SDMetrics instead of our gym for smartnoise synthesizer evaluations. Metric parity would be nice before that transition, and so we hope that we can contribute at least pMSE, hopefully Wasserstein, and perhaps SRA, to the SDMetrics package.
I'm adding this issue to gather feedback, before I begin this effort in earnest! Would these metrics be welcome in SDMetrics? Are there concerns/limitations I should be aware of?
The text was updated successfully, but these errors were encountered: