Example Code for "Adversarial Explanations for Understanding Image Classification Decisions and Improved Neural Network Robustness"
This code demonstrates the techniques from the above paper, a pre-print of which is available on ArXiv. Note that this was not the exact code used in the research, but is a cleaned-up reproduction of the paper's key insights.
From scratch without a Python environment, installation takes 10-20 minutes. With Python already installed, installation takes only a few minutes.
Install PyTorch, torchvision
, and click
, potentially via Miniconda with Python 3:
$ conda install -c pytorch pytorch torchvision
$ pip install click
Code was tested with:
- Python 3.6
- PyTorch 1.1 + torchvision 0.2.2
click
7.0
Any operating system supporting the above libraries should work; we tested using Ubuntu 18.04.
An NVIDIA GPU is not required, but one or more GPUs will greatly accelerate network training.
This repository contains several pre-built networks, corresponding with the CIFAR-10 networks highlighted in the paper.
The application has two modes: explaining a trained model, and training a model from scratch.
When running the application, the CIFAR-10 dataset will be automatically downloaded via the torchvision
library; the desired download location for the CIFAR-10 data must be specified via the environment variable CIFAR10_PATH
.
The repository contains four prebuilt networks:
prebuilt/resnet44-standard.pt
: A standard ResNet-44 with no special training.prebuilt/resnet44-adv-train.pt
: A ResNet-44 trained with--adversarial-training
.prebuilt/resnet44-all.pt
: A ResNet-44 trained with--robust-additions
,--adversarial-training
, and--l2-min
.prebuilt/resnet44-robust.pt
: A ResNet-44 trained with--robust-additions
.
These correspond with, but are not the same as, the networks denoted N1, N2, N3, and N4 in the paper. The training of these networks resulted in the following statistics:
See the paper or the "github-prebuilt-images" command in main.py
for additional information on the above table and its images.
Attack and BTR ARAs may be calculated via the calculate-ara
command. For example, to use a pre-built network with both adversarial training and the robustness additions from the paper:
$ python main.py calculate-ara prebuilt/resnet44-all.pt [--n-images 1000] [--eps 20] [--steps 450] [--momentum 0.9]
Note that arguments in [brackets]
are optional. This produces textual output which indicates the calculated attack and BTR ARAs as per Section III.A of the paper. The resulting ARAs for all prebuilt networks are demonstrated in the table above. Calculating both ARAs as in the original paper (default settings) takes around 30 minutes per network, depending on GPU.
To generate explanations on the first 10 CIFAR-10 testing examples with a trained network, use the explain
command. For example, to use a pre-built network with both adversarial training and the robustness additions from the paper:
$ python main.py explain prebuilt/resnet44-all.pt [--eps 0.1]
This will create images in the output/
folder, designed to be viewed in alphabetical order. For example, output/0-cat
will contain _input.png
, the unmodified input image; _real_was_xxx.png
, an explanation using g_{explain+}
from the paper on the real class (cat); _second_dog_was_xxx.png
, an explanation using g_{explain+}
on the most confident class that was not the correct class; and 0_airplane_was_xxx.png
, 1_automobile_was_xxx.png
, 2_bird_was_xxx.png
, ..., 9_truck_was_xxx.png
, an explanation targeted at each class of CIFAR-10 as indicated in the filename. In all cases, the _xxx
preceding the .png
extension indicates the post-softmax confidence of that class on the original image. The images look like this:
_input | _real | _second | |||||||
0_airplane | 1_automobile | 2_bird | 3_cat | 4_deer | |||||
5_dog | 6_frog | 7_horse | 8_ship | 9_truck |
Note that arguments in [brackets]
are optional. --eps X
specifies that the adversarial explanations should be built with rho=X
. The process could be further optimized, but presently takes a minute or two.
To train a new network:
$ python main.py train path/to/model.pt [--adversarial-training] [--robust-additions] [--l2-min]
See python main.py train --help
for additional information on these options.
Training time varies greatly based on available GPU(s). With both adversarial training and the robustness additions from the paper, training can take up to several days on a single computer. Turning off either adversarial training or robustness additions would lead to a significant speedup.
At the top of the main.py
file are many CAPITAL_CASE
variables which may be modified to affect the training process. Their definitions match those in the paper.