You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current SKLearn Pipeline noise mechanism "operator" for Local DPassumes that the dataset given contains only all numerical features. This means that noise is calculated on top of the sensitivity calculation on numerical features.
However, most often, datasets also contain categorical features, which requires a different method to calculate the sensitivity. The "operator" should also support categorical features.
This applies to all the noise mechanisms: LaplaceMechanism, GaussianMechanism, and GeometricMechanism.
Note: as far as this issue was created, only LaplaceMechanism has been implemented, so it's a good starting point to start with LaplaceMechanism. Once GeometricMechanism and GaussianMechanism have been implemented, the specifications for categorical feature support are the same.
Additional Context
Preferably, the support for the categorial features would be in the form of parameters for the "operator" class.
For example, in the case of LaplaceMechanism, it would look something like this:
# Set a privacy budget accountant
accountant = BudgetAccountant(10000)
# Set sensitivity function for numerical data
sensitivity = lambda x: (max(x) - min(x))/ (len(x) + 1)
# Set sensitivity function for categorical data
sensitivity_cat = lambda x: ...
# Indecies of the categorical features in the dataset
cat_features = [0, 1, ...]
# Set laplace mechanism with epsilon, sensitivity, and accountant
laplace = LaplaceMechanism(
epsilon=0.1,
sensitivity=sensitivity,
accountant=accountant,
sensitivity_cat=sensitivity_cat,
cat_features=cat_features
)
# Initialize scaler and naive bayes extimator
scaler = StandardScaler()
nb = GaussianNB()
# Create the pipeline
pipe = Pipeline([('scaler', scaler), ('laplace', laplace), ('nb', nb)])
For more examples, please have look at the notebook example of Laplace Mechanism's implementation.
As starting guidance, please refer to the source code for LaplaceMechanism in here.
The text was updated successfully, but these errors were encountered:
Feature Description
The current SKLearn Pipeline noise mechanism "operator" for Local DPassumes that the dataset given contains only all numerical features. This means that noise is calculated on top of the sensitivity calculation on numerical features.
However, most often, datasets also contain categorical features, which requires a different method to calculate the sensitivity. The "operator" should also support categorical features.
This applies to all the noise mechanisms:
LaplaceMechanism
,GaussianMechanism
, andGeometricMechanism
.Note: as far as this issue was created, only
LaplaceMechanism
has been implemented, so it's a good starting point to start withLaplaceMechanism
. OnceGeometricMechanism
andGaussianMechanism
have been implemented, the specifications for categorical feature support are the same.Additional Context
Preferably, the support for the categorial features would be in the form of parameters for the "operator" class.
For example, in the case of
LaplaceMechanism
, it would look something like this:For more examples, please have look at the notebook example of Laplace Mechanism's implementation.
As starting guidance, please refer to the source code for
LaplaceMechanism
in here.The text was updated successfully, but these errors were encountered: