Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NA bin to histogram when user enters a list of histogram bins that are a subset of all variable levels (tabled until library restructured) #91

Open
MeganFantes opened this issue Aug 12, 2019 · 2 comments
Assignees

Comments

@MeganFantes
Copy link
Contributor

Right now there are 2 cases when making a histogram for a categorical variable:

  1. The user enters a list of bins, and the laplace mechanism is used
  2. The user does NOT enter a list of bins, and the stability mechanism is used

We want to implement a third case:
3) the user enters a list of bins, but the list is a subset of the full list of levels the variable takes. So we add an NA bins to the list of bins, set all levels that were not entered in the list of bins to NA, and then use the stability mechanism

In implementing this third case, we will use the existing histogramCategoricalBins function in utilities-histogram.R

@MeganFantes MeganFantes self-assigned this Aug 12, 2019
@MeganFantes
Copy link
Contributor Author

MeganFantes commented Aug 20, 2019

Updated idea:

Do not implement a third case, instead change the first case:

  1. bins entered: use Laplace mechanism, check impute parameter, always add NA bucket if impute = False
  2. bins not entered: use stability mechanism

Need to update histogram vignette to make sure impute is used in all contexts

MeganFantes added a commit that referenced this issue Aug 20, 2019
@MeganFantes
Copy link
Contributor Author

Ira and I discussed this at length, and we decided this issue should be tabled for now.

Given the way the library is structured now, where there are export() statements in the statistics to call the mechanisms, there is no logical way to set a local attribute in a subclass and the check for its existence.

We plan to do major restructuring of the library to have the mechanisms and statistics be completely separate entities, and in this case it will be more possible to set impute as an attribute of only the histogram statistic.

When the library is restructured, we can revisit the issue of conditioning the call to fillMissing() on impute for the histogram statistic.

@MeganFantes MeganFantes changed the title Add NA bin to histogram when user enters a list of histogram bins that are a subset of all variable levels Add NA bin to histogram when user enters a list of histogram bins that are a subset of all variable levels (tabled until library restructured) Aug 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant