Skip to content

Releases: cybozu/prompt-hardener

v0.2.0

03 Jun 07:33
b27268c
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.1.0...v0.2.0

v0.1.0

06 May 21:48
Compare
Choose a tag to compare

✨ Prompt Hardener v0.1.0

The first official release of Prompt Hardener – a tool for improving, evaluating, and testing the security of LLM system prompts.

🚀 Features

  • Supports both OpenAI and Anthropic Claude APIs
  • Self-refines system prompts via LLM-based evaluation and improvement
  • Implements multiple hardening techniques:
    • Spotlighting: marks user input with tags and encoding (e.g., ^encoded^content )
    • Signed Prompt: isolates trusted instructions with <{{RANDOM}}> tags
    • Rule Reinforcement: repeats key security constraints within prompt
    • Structured Output Enforcement: enforces JSON/XML-like responses
    • Role Consistency: ensures proper separation of user/system roles
  • Automatic prompt injection testing with several attack categories.
  • Full report generation (HTML + JSON)

🛠 CLI Updates

  • CLI supports different APIs/models for:
    • Evaluation & Improvement (--eval-api-mode, --eval-model)
    • Attack execution (--attack-api-mode, --attack-model)
    • Injection success judgment (--judge-api-mode, --judge-model)
  • Optional tool specifications via --tools-path
  • Report output via --report-dir

📦 Web UI (Gradio)

  • Simple browser interface for uploading a prompt, applying techniques, and downloading results
  • Support for multi-API selection and test configuration

📘 Examples

  • AI Assistant prompt hardening with tools and output report
  • Comment summarization prompt hardening with JSON-formatted inputs
  • View tutorials under README.md → 💪 Tutorials

🔒 Built with a focus on robust security, evaluability, and cross-model compatibility.