Skip to content

seclab-yonsei/amplifying-exposure

Repository files navigation

Amplifying Exposure

This repository is the official implementation of our paper, 'Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships.'

graphical_abstract

Environments

To prepare the dataset, a single GPU with 24GB of VRAM is sufficient (e.g., one NVIDIA RTX 3090 GPU).

To perform PPO fine-tuning, at least two GPUs with 32GB VRAM are required (e.g., two NVIDIA Tesla V100 GPUs). This environment is capable of training up to the OPT-2.7B model.

Requirements

This repository requires a python 3.10 or higher environment:

python -V
>>> Python 3.10.11

To install requirements:

pip install --upgrade pip
pip install torch transformers easydict tqdm scikit-learn pandas==2.0 tensorboard
DS_BUILD_OPS=0 pip install transformers[deepspeed]
sudo apt-get install -y libaio-dev

Then use the ds_report command to verify that your environment is ready for inference using DeepSpeed.

ds_report

Pseudo-Labeling based on Machine-Generated Probabilities

Generating Texts

Create a candidate training dataset from the target model:

deepspeed --num_gpus=2 extract.py \
    --pretrained_model_name facebook/opt-1.3b \
    --n_generated_samples 100_000 \
    --batch_size 256 \
    --do_sample \
    --min_new_tokens 256 \
    --max_new_tokens 256 \
    --no_repeat_ngram_size 3 \
    --top_p 0.95 \
    --top_k 40 \
    --temperature 1.0 \
    --mi_metrics ce_loss \
    --assets assets \
    --nowtime 20230927-114523 \
    --deepspeed ./ds_config/ds_config_zero3.json

Then you get a file named facebook_opt-1.3b/facebook_opt-1.3b.100000.20230927-114523.json under your assets folder. Here's an example of an element:

Expand

[
    {
        "text": "If you don't mind me asking what is the reason you are considering a switch? I'm considering the same thing as well, but I've heard that the graphics are terrible.\nGraphics have been really, really good with this console. Some people have problems with a bit more motion blur, but its still a very enjoyable experience, and no one's ever said anything bad about the visuals.\nI heard the same about PS3. Then the graphics fell way off compared to the next gen.\nWell they had to push the graphics more with the ps3 as its one of the oldest consoles. But its not a problem with the switch.\nThat's good to know. If you have any links to games that you feel are worth playing on the switch I would love to see them!\nYeah, it all comes down to if you think you're going to spend the extra money. I got it for $30 CAD and its a fun and cheap game. But if you're not a hardcore gamer you might want to check out something else.\nIt's funny you say that, I've been looking at some of the other console games. One of them is \"Legend of Zelda: Breath of the Wild\", which I was originally gonna get on the PS3",
        "n_tokens": 256,
        "n_words": 208,
        "ce_loss": 2.087890625
    },
    ...
]

Perturbing Generated Texts

We then create the perturbation discrepancy in a mask-and-fill manner for each of the generations:

deepspeed --num_gpus=2 perturb.py \
    --mask_filling_model_name t5-large \
    --pretrained_model_name facebook/opt-1.3b \
    --n_generated_samples 100_000 \
    --threshold 20 \
    --span_length 2 \
    --buffer_size 2 \
    --pct_words_masked 0.3 \
    --n_perturbed_samples 10 \
    --batch_size 128 \
    --do_sample \
    --min_new_tokens 64 \
    --max_new_tokens 256 \
    --no_repeat_ngram_size 3 \
    --top_p 0.95 \
    --top_k 40 \
    --temperature 1.0 \
    --assets assets \
    --nowtime 20230927-114523 \
    --deepspeed ./ds_config/ds_config_zero3.json

Then you get a file named facebook_opt-1.3b/facebook_opt-1.3b.100000.20230927-114523.perturb.json under your assets folder. Here's an example of an element:

Expand

[
    {
        "text": "If you don't mind me asking what is the reason you are considering a switch? I'm considering the same thing as well, but I've heard that the graphics are terrible.\nGraphics have been really, really good with this console. Some people have problems with a bit more motion blur, but its still a very enjoyable experience, and no one's ever said anything bad about the visuals.\nI heard the same about PS3. Then the graphics fell way off compared to the next gen.\nWell they had to push the graphics more with the ps3 as its one of the oldest consoles. But its not a problem with the switch.\nThat's good to know. If you have any links to games that you feel are worth playing on the switch I would love to see them!\nYeah, it all comes down to if you think you're going to spend the extra money. I got it for $30 CAD and its a fun and cheap game. But if you're not a hardcore gamer you might want to check out something else.\nIt's funny you say that, I've been looking at some of the other console games. One of them is \"Legend of Zelda: Breath of the Wild\", which I was originally gonna get on the PS3",
        "n_tokens": 256,
        "n_words": 208,
        "ce_loss": 2.087890625,
        "masked_text_0": "If you don't mind me asking what is the reason you are considering a switch? I'm considering the same thing as well, but I've heard that the <extra_id_0> terrible.\nGraphics have been really, really good with this console. Some people have problems with a bit more motion blur, but its still a very enjoyable experience, and no one's ever said anything bad about <extra_id_1> heard the same about <extra_id_2> the graphics fell way off compared to the next gen.\nWell <extra_id_3> to push the graphics more with the ps3 as <extra_id_4> of the oldest <extra_id_5> its not a problem with the switch.\nThat's good to know. If you have any links to games that you <extra_id_6> worth playing on the switch I would love to see them!\nYeah, <extra_id_7> comes down to if <extra_id_8> you're going to spend the extra money. I got it for $30 CAD and <extra_id_9> fun and cheap game. But if you're not a hardcore gamer you might want to check out something else.\nIt's funny you say that, I've been looking at some of the other console games. One of them is \"Legend of Zelda: Breath of the Wild\", <extra_id_10> was originally gonna get on the PS3",
        ...,
        "masked_text_9": "If you don't mind me asking what is the reason you are considering a switch? I'm considering the same thing as <extra_id_0> I've heard that <extra_id_1> are terrible.\nGraphics have been really, really good with this console. Some people have problems with a bit more motion blur, but its still a very enjoyable experience, and no one's ever said anything bad about the visuals.\nI heard the same about PS3. <extra_id_2> graphics fell way <extra_id_3> to the next gen.\nWell they had to push the graphics more with the ps3 as <extra_id_4> of the oldest consoles. But its not a problem with the switch.\nThat's good to know. If you have any links to games that you feel <extra_id_5> playing on the switch <extra_id_6> love to see them!\nYeah, it all comes down to if you think you're going to <extra_id_7> extra money. I got it <extra_id_8> CAD and its a fun and cheap game. But if you're not a hardcore gamer you <extra_id_9> to check out something else.\nIt's funny you say that, I've been looking at some of the other console games. One of them is \"Legend of Zelda: Breath of the Wild\", which I was originally <extra_id_10> on the PS3",
        "perturbed_text_0": "If you don't mind me asking what is the reason you are considering a switch? I'm considering the same thing as well, but I've heard that the graphics have become terrible.\nGraphics have been really, really good with this console. Some people have problems with a bit more motion blur, but its still a very enjoyable experience, and no one's ever said anything bad about it. I heard the same about the ps3 the graphics fell way off compared to the next gen.\nWell yeah they try to push the graphics more with the ps3 as one of the oldest systems so its not a problem with the switch.\nThat's good to know. If you have any links to games that you believe are worth playing on the switch I would love to see them!\nYeah, honestly it comes down to if you enjoy the game and if you're going to spend the extra money. I got it for $30 CAD and it was a fun and cheap game. But if you're not a hardcore gamer you might want to check out something else.\nIt's funny you say that, I've been looking at some of the other console games. One of them is \"Legend of Zelda: Breath of the Wild\", which I was originally gonna get on the PS3",
        ...,
        "perturbed_text_9": "If you don't mind me asking what is the reason you are considering a switch? I'm considering the same thing as you, but I've heard that the graphics are terrible.\nGraphics have been really, really good with this console. Some people have problems with a bit more motion blur, but its still a very enjoyable experience, and no one's ever said anything bad about the visuals.\nI heard the same about PS3. Even then, the graphics fell way down compared to the next gen.\nWell they had to push the graphics more with the ps3 as it is one of the oldest consoles. But its not a problem with the switch.\nThat's good to know. If you have any links to games that you feel you should try playing on the switch , I'd love to see them!\nYeah, it all comes down to if you think you're going to play it and have extra money. I got it for $10 CAD and its a fun and cheap game. But if you're not a hardcore gamer you might want to check out something else.\nIt's funny you say that, I've been looking at some of the other console games. One of them is \"Legend of Zelda: Breath of the Wild\", which I was originally looking forward to playing on the PS3"
    },
    ...
]

Calculating Perturbation Discrepancy for Each Generated Text and Pseudo-Labeling Texts through Perturbation Discrepancy

Finally, we compute the perturbation discrepancy based on the log likelihood for each text and pseudo-label based on the perturbation discrepancy:

deepspeed --num_gpus=2 detectgpt.py \
    --pretrained_model_name facebook/opt-1.3b \
    --n_generated_samples 100_000 \
    --batch_size 128 \
    --n_perturbed_samples 10 \
    --test_size 0.2 \
    --assets assets \
    --nowtime 20230927-114523 \
    --deepspeed ./ds_config/ds_config_zero3.json

Then you get three files named facebook_opt-1.3b/facebook_opt-1.3b.100000.20230927-114523.perturb.pairs.train.json, facebook_opt-1.3b/facebook_opt-1.3b.100000.20230927-114523.perturb.pairs.eval.json, facebook_opt-1.3b/facebook_opt-1.3b.100000.20230927-114523.perturb.detectgpt.json under your assets folder. Here's an example of a pseudo-labeled pair:

Expand

[
    {
        "prompt": "",
        "chosen": "A bit late, but this was the first place where I came across this awesome video. I'm not a fan of the video that was originally linked, but it really speaks for itself.\n\nIf you haven't read this excellent book, check it out. It takes a deep dive into how the mind works and how the brain perceives the world around us.\nThe authors are amazing people who seem to know much more about the brain than we do.\nIf this sounds like something you would like, there are many places to download it for free.\nhttp://www.theshadesofdeath.net/\n\n\"In the end, however, the ultimate determiners of our fate are our genes. As we can see, the choice of one person over another does not make a difference. But that is exactly what those genes say they do. They are saying, in a loud voice, \u2018Be like me.\u2019\n\nThe more we realize that the genes say what they say, the more we can understand the value of love, the value and worth of friendships, the importance of community, and the importance and value of work.\"\n\nOne of the reasons people become depressed when things go wrong is the inability to process things",
        "chosen_score": -1.8647980075,
        "rejected": "If you don't mind me asking what is the reason you are considering a switch? I'm considering the same thing as well, but I've heard that the graphics are terrible.\nGraphics have been really, really good with this console. Some people have problems with a bit more motion blur, but its still a very enjoyable experience, and no one's ever said anything bad about the visuals.\nI heard the same about PS3. Then the graphics fell way off compared to the next gen.\nWell they had to push the graphics more with the ps3 as its one of the oldest consoles. But its not a problem with the switch.\nThat's good to know. If you have any links to games that you feel are worth playing on the switch I would love to see them!\nYeah, it all comes down to if you think you're going to spend the extra money. I got it for $30 CAD and its a fun and cheap game. But if you're not a hardcore gamer you might want to check out something else.\nIt's funny you say that, I've been looking at some of the other console games. One of them is \"Legend of Zelda: Breath of the Wild\", which I was originally gonna get on the PS3",
        "rejected_score": -0.3322853854,
        "score_diff": 1.5325126221
    },
    ...
]

To summarize, you should have the following files ready to go:

./assets/facebook_opt-1.3b/
├── [112M]  facebook_opt-1.3b.100000.20230927-114523.json
├── [2.2G]  facebook_opt-1.3b.100000.20230927-114523.perturb.detectgpt.json
├── [2.2G]  facebook_opt-1.3b.100000.20230927-114523.perturb.json
├── [ 22M]  facebook_opt-1.3b.100000.20230927-114523.perturb.pairs.eval.json
└── [ 88M]  facebook_opt-1.3b.100000.20230927-114523.perturb.pairs.train.json

Reinforcement Learning with Self-Generations

Setup

We use DeepSpeedExamples, a canned repository for reinforcement learning from human feedback (RLHF). First, download the repository and install the requirements for the library:

## Download
git clone https://github.com/microsoft/DeepSpeedExamples.git

## Setup
pip install -r DeepSpeedExamples/applications/DeepSpeed-Chat/requirements.txt

Then move the dataset for training to the data folder. Note that training with custom json files uses fixed file names train.json and eval.json, so you need to rename the files. Also, if you use multiple versions of the OPT model (e.g., OPT-2.7B or OPT-13B), you need to remove the cache files beforehand so that you build a different dataset each time:

## Make a directory
mkdir -p ./DeepSpeedExamples/applications/DeepSpeed-Chat/data/

## Remove cache
rm -rf ~/.cache/huggingface/datasets/*
rm -rf /tmp/data_files/*

## Copy train and eval data
cp ./assets/facebook_opt-1.3b/*.pairs* ./DeepSpeedExamples/applications/DeepSpeed-Chat/data/
mv ./DeepSpeedExamples/applications/DeepSpeed-Chat/data/*train.json ./DeepSpeedExamples/applications/DeepSpeed-Chat/data/train.json
mv ./DeepSpeedExamples/applications/DeepSpeed-Chat/data/*eval.json ./DeepSpeedExamples/applications/DeepSpeed-Chat/data/eval.json

Finally, copy the scripts for reward model and PPO fine-tuning:

## Step2: reward model training
cp ./scripts/step2_single_node_run.sh ./training/step2_reward_model_finetuning/training_scripts/opt/single_node/

## Step3: PPO fine-tuning
cp ./scripts/step3_single_node_run.sh ./training/step3_rlhf_finetuning/training_scripts/opt/single_node/

Reward Model Training

Fine-tune the reward model. The default reward model is OPT-350M:

## Run script
cd ./DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/
bash ./training_scripts/opt/single_node/step2_single_node_run.sh 

The execution results (logs and trained model weights) will be stored in ./DeepSpeedExamples/applications/DeepSpeed-Chat/training/step2_reward_model_finetuning/output.

PPO Fine-tuning

PPO fine-tuning of the target model:

cd ../step3_rlhf_finetuning/
bash ./training_scripts/opt/single_node/step3_single_node_run.sh facebook/opt-1.3b ../step2_reward_model_finetuning/output 3 3

The execution results (logs and trained model weights) will be stored in ./DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/output.

Performing TDE Attacks on the Referece Model and Fine-tuned Model

Move all models and their tensorboard logs to the assets folder:

mkdir -p ./assets/facebook_opt-1.3b/actor_ema
cp -rf ./DeepSpeedExamples/applications/DeepSpeed-Chat/training/step3_rlhf_finetuning/output/actor_ema/ ./assets/facebook_opt-1.3b

Then perform training data extraction (TDE) attacks:

## Extract from the reference model
deepspeed --num_gpus=2 extract.py \
    --load_file \
    --pretrained_model_name facebook/opt-1.3b \
    --n_generated_samples 100_000 \
    --n_selected_samples 100 \
    --batch_size 128 \
    --do_sample \
    --min_new_tokens 256 \
    --max_new_tokens 256 \
    --no_repeat_ngram_size 3 \
    --top_p 0.95 \
    --top_k 40 \
    --temperature 1.0 \
    --mi_metrics ce_loss ppl zlib lower window \
    --assets assets \
    --do_scoring \
    --nowtime 20230927-114523 \
    --deepspeed ./ds_config/ds_config_zero3.json

## Extract from the fine-tuned model
deepspeed --num_gpus=2 extract.py \
    --pretrained_model_name ./assets/facebook_opt-1.3b/actor_ema  \
    --n_generated_samples 100_000 \
    --n_selected_samples 100 \
    --batch_size 128 \
    --do_sample \
    --min_new_tokens 256 \
    --max_new_tokens 256 \
    --no_repeat_ngram_size 3 \
    --top_p 0.95 \
    --top_k 40 \
    --temperature 1.0 \
    --mi_metrics ce_loss ppl zlib lower window \
    --assets assets \
    --do_scoring \
    --nowtime 20230927-114523 \
    --deepspeed ./ds_config/ds_config_zero3.json

Finally, we will have the following files ready to go:

./assets/facebook_opt-1.3b/
├── [4.0K]  actor_ema
│   ├── [ 770]  config.json
│   ├── [446K]  merges.txt
│   ├── [2.5G]  pytorch_model.bin
│   └── [780K]  vocab.json
├── [146M]  facebook_opt-1.3b.100000.20230927-114523.extract.json (<- !! EXTRACTED FROM REFERENCE LM !!)
├── [112M]  facebook_opt-1.3b.100000.20230927-114523.json
├── [2.2G]  facebook_opt-1.3b.100000.20230927-114523.perturb.detectgpt.json
├── [2.2G]  facebook_opt-1.3b.100000.20230927-114523.perturb.json
├── [ 22M]  facebook_opt-1.3b.100000.20230927-114523.perturb.pairs.eval.json
├── [ 88M]  facebook_opt-1.3b.100000.20230927-114523.perturb.pairs.train.json
└── [141M]  facebook_opt-1.3b_actor_ema.100000.20230928-123456.extract.json (<- !! EXTRACTED FROM FINE-TUNED LM !!)

Evaluation

To perform an evaluation of the extracted files, please refer to the following repository. See this issue for how to compare the two datasets.

Pre-trained Models and Extracted Generations

You can get fine-tuned model weights and extracted generations from our anonymous google drive. We will release the weights publicly on huggingface hub after under review.

Results

Our approach achieves the following performance on:

[50,64) [64,128) [128,192) [192,256) 256 Total
OPT Ref. Tuned. Ref. Tuned. Ref. Tuned. Ref. Tuned. Ref. Tuned. Ref. Tuned. Inc.
125M 64 54 24 80 5 20 8 15 0 0 101 169 ×1.7↑
350M 103 91 64 128 11 35 29 71 0 0 207 325 ×1.6↑
1.3B 58 241 38 337 0 52 1 139 0 6 97 775 ×1.7↑
2.7B 53 216 72 253 2 21 0 27 0 0 127 517 ×1.7↑
6.7B 87 174 57 220 1 53 0 98 0 0 145 545 ×1.7↑
13B 87 347 101 394 5 27 0 18 0 0 193 786 ×1.7↑

Here, ref refers to the reference model without fine-tuning, and tuned refers to the fine-tuned model with fine-tuning.

Reference

@article{mitchell2023detectgpt,
  title={Detectgpt: Zero-shot machine-generated text detection using probability curvature},
  author={Mitchell, Eric and Lee, Yoonho and Khazatsky, Alexander and Manning, Christopher D and Finn, Chelsea},
  journal={arXiv preprint arXiv:2301.11305},
  year={2023}
}

@article{ouyang2022training,
  title={Training language models to follow instructions with human feedback},
  author={Ouyang, Long and Wu, Jeffrey and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and others},
  journal={Advances in Neural Information Processing Systems},
  volume={35},
  pages={27730--27744},
  year={2022}
}

Citation

@misc{oh2024amplifying,
      title={Amplifying Training Data Exposure through Fine-Tuning with Pseudo-Labeled Memberships}, 
      author={Myung Gyo Oh and Hong Eun Ahn and Leo Hyun Park and Taekyoung Kwon},
      year={2024},
      eprint={2402.12189},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published