Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace smartnoise-eval with an evaluation suite #556

Closed
joshua-oss opened this issue May 11, 2023 · 2 comments
Closed

Replace smartnoise-eval with an evaluation suite #556

joshua-oss opened this issue May 11, 2023 · 2 comments

Comments

@joshua-oss
Copy link
Contributor

joshua-oss commented May 11, 2023

Currently smartnoise-eval is a confusing name, because people assume it is useful for evaluating accuracy/privacy tradeoffs of a privacy mitigation pipeline, when it is in fact a stochastic tester. We should rename it to something that indicates its function as a tester, and replace it with a library that actually measures utility.

This is responsive to #502, since often the only way (and usually the best way) to measure utility is to simulate. Things like clamping and reservoir sampling tend to change the accuracy guarantees enough that experimentation is the only way to know the accuracy intervals. Nearly everyone who uses SmartNoise ends up rolling their own evaluation scripts, and we have a lot of boilerplate we regularly use that should be generically available to save people time.

Desiderata:

  • Support for both synthetic data and ad-hoc SQL workloads
  • Support for prioritized list of workloads; e.g. n-way marginals, range queries, etc. expressed in a way that can be tested regardless of SQL or synthetic data
  • Support for counting queries and sums
  • Support for other continuous statistics like mean, variance, etc. with different utility measures
  • Support for measuring bias as well as absolute error
  • Support for measuring suppression of dimension combinations (up to an arbitrary length)
  • Support for measuring fabrication of dimensions combinations (up to an arbitrary length)
  • Optional support for more generic things like PMSE
  • Support for reporting all of the above grouped by bin size/PDF size (e.g. small bins or areas of low probability mass should by definition perform worse than large bins or large mass)
  • Also group all of the above by marginal width (e.g. 1-way, 2-way, etc.)
  • Report all of the above by "in workload" and "out of workload". For example, a mitigation might optimize for 2-way marginals within a pre-defined workload, but we also want to know how bad the 2-way marginals not in the workload turned out.
  • All evaluation can be optionally enabled or disabled
  • Evaluation can run in a long-running pipeline with results incrementally and independently collected and analyzed over time.

Diagnostics:

  • Show how many in original data are singled out, given a range of "k".
  • Show which columns and column combinations are most linkable
  • Show which columns and column combinations are feasible to measure given a particular synthesizer or DPSU setting?

Inputs:

  • Evaluation should take as input the original (non-mitigated) data, and either one or many mitigated outputs. If only one, many of the above can still be computed (e.g. binned by marginal length or bin size). If many, the variance in the algorithm can be more faithfully captured.

These are all fairly simple and code exists for most. Proposal is to require PySpark for the evaluation suite, to avoid writing a bunch of special-case code to handle measurement of redactions and nulls and so on. This may or may not be a good idea. Another option would be to use SqlAlchemy to generate measurements that could be used against Spark as well as any supported engine.

@joshua-oss
Copy link
Contributor Author

This also relates to the design of "dpsdgym", which is more about evaluating a list of synthesizers on some generic criteria, versus checking a specific set of hyperparameters against a known-important workload

@joshua-oss
Copy link
Contributor Author

Addressed by #582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant