Releases: cybozu/prompt-hardener
Releases · cybozu/prompt-hardener
v0.2.0
What's Changed
- Separate eval and improve in llm_client by @melonattacker in #24
- Support bedrock by @melonattacker in #25
- Enhance Prompt Handling with Multi-Format Support and Workflow Improvements by @melonattacker in #26
- Update prompt improvement by @melonattacker in #27
- Fix interface by @melonattacker in #28
Full Changelog: v0.1.0...v0.2.0
v0.1.0
✨ Prompt Hardener v0.1.0
The first official release of Prompt Hardener – a tool for improving, evaluating, and testing the security of LLM system prompts.
🚀 Features
- Supports both OpenAI and Anthropic Claude APIs
- Self-refines system prompts via LLM-based evaluation and improvement
- Implements multiple hardening techniques:
- Spotlighting: marks user input with tags and encoding (e.g., ^encoded^content )
- Signed Prompt: isolates trusted instructions with <{{RANDOM}}> tags
- Rule Reinforcement: repeats key security constraints within prompt
- Structured Output Enforcement: enforces JSON/XML-like responses
- Role Consistency: ensures proper separation of user/system roles
- Automatic prompt injection testing with several attack categories.
- Full report generation (HTML + JSON)
🛠 CLI Updates
- CLI supports different APIs/models for:
- Evaluation & Improvement (
--eval-api-mode
,--eval-model
) - Attack execution (
--attack-api-mode
,--attack-model
) - Injection success judgment (
--judge-api-mode
,--judge-model
)
- Evaluation & Improvement (
- Optional tool specifications via
--tools-path
- Report output via
--report-dir
📦 Web UI (Gradio)
- Simple browser interface for uploading a prompt, applying techniques, and downloading results
- Support for multi-API selection and test configuration
📘 Examples
- AI Assistant prompt hardening with tools and output report
- Comment summarization prompt hardening with JSON-formatted inputs
- View tutorials under
README.md → 💪 Tutorials
🔒 Built with a focus on robust security, evaluability, and cross-model compatibility.