Integrate: answer to question "can observations be made public" into preprocessors and release text #295

raprasad · 2021-09-10T20:19:07Z

1. create a checklist table of current stats and if/how the computation changes when the # of observations cannot be made public

See google doc: https://docs.google.com/document/d/1xUihcjh4zmfnhG0-2EC-uG-qzpde8WXphRksB0NvHe8/edit#

(Redo steps below after doc discussion)

~~2. update the StatSpec class (stat_spec.py) to include a variable indicating is_dataset_size_public~~
~~3. ^ update the computation chains for existing stats appropriately.~~
- ~~e.g. if the is_dataset_size_public == True, update the chain, use a different chain, etc.~~
- ~~include tests for each stat. (Check taht if the dataset size is private then more epsilon is used, etc.~~
~~4. Integrate into larger workflow. e.g. ValidateReleaseUtil.build_stat_specs()~~
- ~~ValidateReleaseUtil.__init__ : add self.is_dataset_size_public = None~~
- ~~ValidateReleaseUtil.run_preliminary_steps: set self.is_dataset_size_public to True or False~~
- ~~Add function DatasetInfo.is_dataset_size_public()~~
  - ~~similar to get_dataset_size()~~
  - ~~except finds answer to the dataset question within DepositorSetupInfo~~
- ~~ValidateReleaseUtil.build_stat_specs(), user self.is_dataset_size_public when building the StatSpec objects~~

The text was updated successfully, but these errors were encountered:

ecowan · 2022-05-23T21:06:50Z

There are two avenues here, each with its own set of logical steps:

Using DP Count:

When the user selects private count = True, then the "create statistic" view should be pre-populated with a row for a DP count, the result of which will be passed into any other statistics that the user selects
If the user selects private count = True and in "create statistic" selects a count, it should override the pre-populated one - we only need this to be calculated once.

Using User Estimation:

One of the views (likely create statistic) needs a way for the user to specific their best estimation for the count, which is then passed to the backend and used in the computation chains.
If a DP Count is also requested, then we would need to decide which takes precedence.

@raprasad @ekraffmiller

Thanks to @Shoeboxam for the discussion

ecowan · 2022-05-23T22:41:27Z

Needed for computing DP counts:

Select any one of the columns in the data set
Set a parameter (epsilon/10, etc.) that determines how much budget should be used to calculate the count estimate
Construct a new class with similar functionality to ValidateReleaseUtil that can return a DP count only
Result of this class needs to be passed into ValidateReleaseTool to be used in the resize step of each statistic
ValidateReleaseUtil also needs to lower the maximum_epsilon based on how much was used by the DP count

raprasad · 2022-05-25T19:58:47Z

An old slide. We're not getting user input--yet.

This ticket is for implementing the green box labeled: "Use privacy budget to capture size"

ecowan · 2022-06-01T16:39:24Z

@raprasad Why don't we approach this incrementally, and first build a feature where the user has to answer yes. This way, we can first develop the part of the code that takes the estimate from the front end and passes it into the process. Once this is merged, we can add functionality for the case where they say "no".

ecowan · 2022-06-01T22:39:06Z

Another option is to create 2 analysis objects, one for the dp count and one for the rest, and split the budget between them. This way we could reuse the existing ValidateReleaseUtil class to compute what we need, rather than creating new classes to compute the dp count separately.

The workflow could look like this:

User selects "count is private"
Make two API calls to create new analyses, and link them to each other
When dp count analysis completes, save the dp count to the analysis object
When the second analysis runs, look to the linked analysis object and take the dp count from it

raprasad added the OpenDP App label Sep 10, 2021

raprasad added this to the Create Statistic milestone Sep 10, 2021

raprasad changed the title ~~profiler/epsilon question, can observations be made public~~ Integrate: answer to question "can observations be made public" into preprocessors and release text Sep 23, 2021

raprasad added the Stats-enhancement label May 9, 2022

raprasad assigned ecowan May 23, 2022

raprasad assigned raprasad and unassigned ecowan Jun 23, 2022

raprasad mentioned this issue Jun 23, 2022

Revisit question: Can the # of observations in the dataset be made public knowledge? #256

Closed

raprasad added a commit that referenced this issue Jul 5, 2022

#295, add "is_dataset_size_public" to DepositorSetupInfo

b9e4515

raprasad added a commit that referenced this issue Jul 11, 2022

in process #295

f34831c

raprasad modified the milestones: Create Statistic, Create stats fixes Jul 11, 2022

raprasad linked a pull request Jul 22, 2022 that will close this issue

Connect dataset count private to underlying functionality #667

Open

raprasad added a commit that referenced this issue Jul 25, 2022

test for epsilon redistribution #295

ba2311a

raprasad added a commit that referenced this issue Jul 25, 2022

#295 restore redistribute_epsilon

50f6432

raprasad added a commit that referenced this issue Jul 25, 2022

fixed bad hint #295

26b1aad

raprasad added a commit that referenced this issue Jul 26, 2022

(saving; will break tests #295)

81ab036

raprasad added a commit that referenced this issue Aug 2, 2022

saving wip #295

0a5849c

raprasad removed this from the Create stats fixes milestone Aug 3, 2022

raprasad added this to the Updating Dataset Questions and DP Computation milestone Sep 29, 2022

raprasad added the dev: server side label Nov 30, 2022

raprasad moved this to Up Next in OpenDP Library Development Mar 21, 2023

raprasad added this to OpenDP Library Development Mar 21, 2023

raprasad added this to DP Creator Development Apr 3, 2023

raprasad removed this from OpenDP Library Development Apr 4, 2023

raprasad modified the milestones: Updating Dataset Questions and DP Computation, 2023-Q3: Analyst mode May 10, 2023

raprasad removed the DP Creator label May 10, 2023

raprasad added the analyst mode label Jul 19, 2023

raprasad moved this to Needs Discussion in DP Creator Development Jul 19, 2023

raprasad modified the milestones: 2023-Q3: Analyst mode, 2023-Q3: Improved OpenDP Lib Integration Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate: answer to question "can observations be made public" into preprocessors and release text #295

Integrate: answer to question "can observations be made public" into preprocessors and release text #295

raprasad commented Sep 10, 2021 •

edited

Loading

ecowan commented May 23, 2022

ecowan commented May 23, 2022

raprasad commented May 25, 2022 •

edited

Loading

ecowan commented Jun 1, 2022

ecowan commented Jun 1, 2022

Integrate: answer to question "can observations be made public" into preprocessors and release text #295

Integrate: answer to question "can observations be made public" into preprocessors and release text #295

Comments

raprasad commented Sep 10, 2021 • edited Loading

ecowan commented May 23, 2022

ecowan commented May 23, 2022

raprasad commented May 25, 2022 • edited Loading

ecowan commented Jun 1, 2022

ecowan commented Jun 1, 2022

raprasad commented Sep 10, 2021 •

edited

Loading

raprasad commented May 25, 2022 •

edited

Loading