IDEATOR

Code for our paper: IDEATOR: Jailbreaking Large Vision-Language Models Using Themselves

The IDEATOR framework leverages large Vision-Language Models (VLMs) as powerful red team models to autonomously generate malicious multimodal prompts for black-box jailbreak attacks. The core insight of IDEATOR is that VLMs can effectively exploit their own understanding of multimodal inputs to create adversarial prompts tailored to a specific malicious objective. Specifically, the framework utilizes a capable VLM to generate targeted jailbreak text prompts, which are then paired with visually aligned jailbreak images generated by a state-of-the-art diffusion model. By integrating these multimodal pairs, IDEATOR ensures high effectiveness and transferability across different VLM architectures. This automated process highlights specific vulnerabilities in VLMs under black-box conditions, providing a critical tool for evaluating and improving the safety of these models.

Basic Setup

Prepare the pretrained weights for MiniGPT-4 (Vicuna-13B v0)： Please refer to the guide from the MiniGPT-4 repository to get the weights of Vicuna. Then, set the path to the vicuna weight in the model config file here. Get MiniGPT-4 (the 13B version) checkpoint: download from here. Then, set the path to the pretrained checkpoint in the minigpt4_eval.yaml.

Use MiniGPT-4 to jailbreak

python ideator_attack_minigpt4.py --cfg-path minigpt4_eval.yaml  --gpu-id 0

Use Gemini to jailbreak (demo)

We found that stronger base models significantly increase jailbreaking success rates. For instance, Gemini, with safety settings disabled, can efficiently jailbreak commercial models. As an example, we used Gemini combined with Stable Diffusion 3.5 Large to achieve a 46% success rate in jailbreaking GPT-4o. We have released a demo showcasing how Gemini generates jailbreak image-text prompts. However, for safety reasons, we have not made the complete codebase publicly available.

python ideator_attack_gemini_demo.py

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
harmful_corpus		harmful_corpus
minigpt4		minigpt4
README.md		README.md
environment.yml		environment.yml
ideator_attack_gemini_demo.py		ideator_attack_gemini_demo.py
ideator_attack_minigpt4.py		ideator_attack_minigpt4.py
intro.png		intro.png
minigpt4_eval.yaml		minigpt4_eval.yaml
model.png		model.png
system_prompts.py		system_prompts.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IDEATOR

Basic Setup

Use MiniGPT-4 to jailbreak

Use Gemini to jailbreak (demo)

About

Releases

Packages

Languages

roywang021/IDEATOR

Folders and files

Latest commit

History

Repository files navigation

IDEATOR

Basic Setup

Use MiniGPT-4 to jailbreak

Use Gemini to jailbreak (demo)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages