Membership Inference Attack

Full demonstration of the complete attack pipeline (train/attack/evaluate) in 3 domain (table, NLP, Image) (Classification)
Multi-Threading
Easy to use API of 4 attack methods

Data Generation

Naïve Hill-Climbing Search of all space of possible input of a model, proposed by R. Shokri, trying to get the data for which the model gives the higher confidence than threshold, if not, changes the data randomly and reiterate.

Warning :

Very Costly, should only be used when the dimension of data is relatively small and the data consists only bool, int or float.

Shadow Model

Train the Shadow Models / reuse the trained Shadow Models to infer (Scikit-Learn or torch) confidence vector for the data used to train the Shadow Models.

This class is integrated into the Confidence Vector Attack, Boundary Attack and Noise Attack. (Augmentation Attack is unsupervised)

Attack Model

Note :

transform is to be preformed on images
collate_fn is used in Dataloader to perform extra transformation of data

Confidence Vector

Direct Classification of Confidence Vectors by a simple MLP defined in MIA.utils.attackmodel

Warning :

when topx = -1, the whole Confidence Vector will be used, an attack model will be trained for each class

when topx = k >= 1, k most big probabilities of each Confidence Vector will be used, only one attack model will be trained

attack_model = ConfVector(shadow_models, attack_nepoch, device, topx, transform)
attack_model.train()
# show the 3D distribution of Confidence Vectors (topx forced to 3)
# valable for Confidence Vectors longer than 3
attack_model.show()
# evaluate on shadowmodel
attack_model.evaluate()
# evaluate on target
attack_model.evaluate(target, *train_test_split(target_X, target_Y, test_size=0.5, random_state=42))

Augmentation

For each data, calculate a vector of size (len(tran)*time) of 0 (augmented data classified correctly) and 1 (augmented data classified incorrectly)
clustering of vectors calculated (Kmeans)

Warning :

This method is unsupervised, training of Shadow Model is unnecessary
The transformation methods used should be tailored to dataset

attack_model = Augmentation(device, trans, times, transform, collate_fn, batch_size)
# show : TSNE visualization of vectors calculated
attack_model.evaluate(target, *train_test_split(target_X, target_Y, test_size=0.5, random_state=42), show=True)

Boundary Distance

For each data, use Carlini Wagner Attack of cleverhans to generate adversarial example associated
Calculate the distance between the data and the adversarial example associated as the distance to decision boundary
Using the fact that the larger the distance, the more probable that it is used when training
Find two thresholds that maximizes the accuracy and precision

Warning :

Only valid for continuous data
Very time-consuming

attack_model = Boundary(shadow_models, device, classes, transform)
# historam
attack_model.train(show=True)
attack_model.evaluate(target, *train_test_split(target_X, target_Y, test_size=0.5, random_state=42))

Noise

For each data, try adding gaussian noise with stddev in input stddev list
Calculate the number of times when it is classified correctly
Using the fact that the larger the number of times when it is classified correctly, the more probable that it is used when training
Find two thresholds that maximizes the accuracy and precision

Warning :

Only valid for continuous data
should manually set the range of valid
The stddev should be tailored to dataset

attack_model = Noise(shadow_models, stddev, device, transform)
# historam
attack_model.train(show=True)
attack_model.evaluate(target, *train_test_split(target_X, target_Y, test_size=0.5, random_state=42))

Defense Methods

Dropout

The effectiveness of this method is related to reducing overfitting.

L2 regularization

Edit the optimizer by adding the parameters of weight_decay to add L2 regularization.

The effectiveness of this method is related to reducing overfitting.

Mix-Up

mixup_data(x, y, alpha, device) takes two data and mix by a coefficient of beta distribution beta(alpha,alpha)

Warning :

The loss function must be replaced by mixup_criterion

The effectiveness of this method could be find in this paper.

Smart-Noise

mix(X, Y, ratio)

More information on the parameters of the Synthesizer, please refer to library Smart-Noise

MemGuard

memguard(scores,epsilon) is the simplified version of MemGuard proposed by C.C.Christopher

# better perform softmax before memguard
output_in=F.softmax(output_in, dim=-1)
# memguard use numpy
output_in = memguard(output_in.cpu().numpy())

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
MIA		MIA
demo		demo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Membership Inference Attack

Data Generation

Shadow Model

Attack Model

Confidence Vector

Augmentation

Boundary Distance

Noise

Defense Methods

Dropout

L2 regularization

Mix-Up

Smart-Noise

MemGuard

About

Releases

Packages

Contributors 2

Languages

License

qawsedrg/Membership_Inference_Attack

Folders and files

Latest commit

History

Repository files navigation

Membership Inference Attack

Data Generation

Shadow Model

Attack Model

Confidence Vector

Augmentation

Boundary Distance

Noise

Defense Methods

Dropout

L2 regularization

Mix-Up

Smart-Noise

MemGuard

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages