Replace smartnoise-eval with an evaluation suite #556

joshua-oss · 2023-05-11T01:44:14Z

Currently smartnoise-eval is a confusing name, because people assume it is useful for evaluating accuracy/privacy tradeoffs of a privacy mitigation pipeline, when it is in fact a stochastic tester. We should rename it to something that indicates its function as a tester, and replace it with a library that actually measures utility.

This is responsive to #502, since often the only way (and usually the best way) to measure utility is to simulate. Things like clamping and reservoir sampling tend to change the accuracy guarantees enough that experimentation is the only way to know the accuracy intervals. Nearly everyone who uses SmartNoise ends up rolling their own evaluation scripts, and we have a lot of boilerplate we regularly use that should be generically available to save people time.

Desiderata:

Support for both synthetic data and ad-hoc SQL workloads
Support for prioritized list of workloads; e.g. n-way marginals, range queries, etc. expressed in a way that can be tested regardless of SQL or synthetic data
Support for counting queries and sums
Support for other continuous statistics like mean, variance, etc. with different utility measures
Support for measuring bias as well as absolute error
Support for measuring suppression of dimension combinations (up to an arbitrary length)
Support for measuring fabrication of dimensions combinations (up to an arbitrary length)
Optional support for more generic things like PMSE
Support for reporting all of the above grouped by bin size/PDF size (e.g. small bins or areas of low probability mass should by definition perform worse than large bins or large mass)
Also group all of the above by marginal width (e.g. 1-way, 2-way, etc.)
Report all of the above by "in workload" and "out of workload". For example, a mitigation might optimize for 2-way marginals within a pre-defined workload, but we also want to know how bad the 2-way marginals not in the workload turned out.
All evaluation can be optionally enabled or disabled
Evaluation can run in a long-running pipeline with results incrementally and independently collected and analyzed over time.

Diagnostics:

Show how many in original data are singled out, given a range of "k".
Show which columns and column combinations are most linkable
Show which columns and column combinations are feasible to measure given a particular synthesizer or DPSU setting?

Inputs:

Evaluation should take as input the original (non-mitigated) data, and either one or many mitigated outputs. If only one, many of the above can still be computed (e.g. binned by marginal length or bin size). If many, the variance in the algorithm can be more faithfully captured.

These are all fairly simple and code exists for most. Proposal is to require PySpark for the evaluation suite, to avoid writing a bunch of special-case code to handle measurement of redactions and nulls and so on. This may or may not be a good idea. Another option would be to use SqlAlchemy to generate measurements that could be used against Spark as well as any supported engine.

The text was updated successfully, but these errors were encountered:

joshua-oss · 2023-05-11T01:46:17Z

This also relates to the design of "dpsdgym", which is more about evaluating a list of synthesizers on some generic criteria, versus checking a specific set of hyperparameters against a known-important workload

joshua-oss · 2023-11-10T17:39:43Z

Addressed by #582

joshua-oss closed this as completed Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace smartnoise-eval with an evaluation suite #556

Replace smartnoise-eval with an evaluation suite #556

joshua-oss commented May 11, 2023 •

edited

Loading

joshua-oss commented May 11, 2023

joshua-oss commented Nov 10, 2023

Replace smartnoise-eval with an evaluation suite #556

Replace smartnoise-eval with an evaluation suite #556

Comments

joshua-oss commented May 11, 2023 • edited Loading

joshua-oss commented May 11, 2023

joshua-oss commented Nov 10, 2023

joshua-oss commented May 11, 2023 •

edited

Loading