Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning
A TensorFlow implementation of the adversarial examples generating models described in the paper:
"Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning"
Hongge Chen*, Huan Zhang*, Pin-Yu Chen, Jinfeng Yi and Cho-Jui Hsieh
This paper is accepted by the 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018). Hongge Chen and Huan Zhang contribute equally to this work.
Full text available at: https://arxiv.org/pdf/1712.02051.pdf
The Show-and-Fool model is designed to generate adversarial examples for neural image captioning. Our model is based on Show and Tell.
For example:
The following python packages are required to run this code repository:
sudo apt-get install python3-pip
sudo pip3 install --upgrade pip
sudo pip3 install pillow scipy numpy tensorflow-gpu keras
Note: Tensorflow 1.3 or later is recommended because tf.contrib.rnn.BasicLSTMCell
is used in Show and Tell model.
Clone this directory:
git clone https://github.com/IBM/Image-Captioning-Attack.git
Download pretrained model for Show-and-Tell:
./download_model.sh
Pretrained model will be saved to im2txt/pretrained
.
To quickly try our image captioning attack, a demo script is provided. By default, this script will turn the caption of a specified image into the caption of another arbitrary target image:
./demo.sh <image_to_attack> <image_of_target_sentence> <result_output_directory> [additional options] ...
Specifically, given an image <image_to_attack>
to attack and another
irrelevant target image <image_of_target_sentence>
as our target, we first
infer the caption using Show-and-Tell model on the target image. This caption
is called the target caption. Then we try to generate an adversarial image
that looks almost identical to <image_to_attack>
but on which the neural
image captioning system will generate exactly the same caption as the target
caption. Resulting adversarial image (and some relevant information) is saved
into folder <result_output_directory>
.
We provided some example images from the COCO dataset in the examples
folder,
so you can quickly test the targeted attack by running:
./demo.sh examples/image1.jpg examples/image2.jpg result_dir
In this demo we provide 4 attack modes: targeted caption attack, untargeted
caption attack, targeted keyword attack and untargeted keyword attack.
We can pass additional parameters to demo.sh
to select an attack mode. There are
two boolean parameters to control the attack mode, use_keywords
and
targeted
. For example, --use_keywords=False
and
--targeted=True
give you targeted caption attack. If you are using
untargeted attack, <image_of_target_sentence>
is ignored.
By default, --targeted=True --use_keywords=False
.
An example of targeted keywords attack with keywords dog
and frisbee
is:
./demo.sh examples/image1.jpg examples/image2.jpg result_dir --use_keywords --input_feed="dog frisbee"
To run multiple attacks on MSCOCO dataset, first you need to download MSCOCO dataset (images and caption files). Please refer to the "Prepare the Training Data" section in Show and Tell's readme file (we also have a copy here in this repo as ShowAndTellREADME.md
). After you finish your download, please go to run_attack.sh
and specify 2 paths:
(i) ${CAPTION_FILE}
is the path to the validation set's caption file, in JSON format (for example ../coco-caption/annotations/captions_val2014.json)
(ii) ${IMAGE_DIRECTORY}
is the directory of MSCOCO validation set (for example ../mscoco/image/val2014/)
Then you need to do ./run_attack.sh
. It is similar to ./demo.sh
. There are 3 required parameters, OFFSET
. NUM_ATTACKS
and OUTPUT_DIR
. When we do the experiments on MSCOCO validation set, we first randomly shuffle the images. Then we pick images in this queue one by one to attack. NUM_ATTACKS
detemines the number of experiments. One experiment means attack on one image. OFFSET
is the index of the first image in the queue to be attacked. OUTPUT_DIR
is the directory in which you save the results. We also create a /fail_log
directory in the result directory to save the fail imagess' attcak results with each C. If you choose untargeted caption attack, all attempts are assumed to be fail.
We also have a parameter use_logits
for you to choose between the logits loss or log-prob loss. To use logits loss, simply add --use_logits=True
and to use log-prob loss, add --use_logits=False
. By default we use logits loss. The detailed forms of our losses can be find in our paper. There are other parameters for you to tune, such as number of iterations, initial constant C, norm (l2 or l_infinity) and beam search size. You can check run_attack_BATCH_search_C.py
for details.
Some examples:
(i) (using default parameters) ./run_attack.sh 0 100 result_dir
Attack 100 images starting from the 0th image. The attacks are targeted caption attack with 1000 iterations for each C. The attack results will be saved in /result_dir
.
(ii) ./run_attack.sh 0 100 result_dir --use_keywords=True --keywords_num=2 --iters=200
Attack 100 images starting from the 0th image. The attacks are targeted keyword attack with 2 keywords selected for each image and 200 iterations for each C. The attack results will be saved in /result_dir
.
(iii) ./run_attack.sh 0 100 result_dir --targeted=False --iters=200
Attack 100 images starting from the 0th image. The attacks are untargeted caption attack with 200 iterations for each C. The attack results will be saved in /result_dir
.