A rate limiting and spend management tool for AI APIs. Strainer helps you control and manage your API usage by providing configurable rate limiting, budget controls, and usage tracking.
- Rate limiting for AI API calls with configurable thresholds
- Spend management and budget controls
- Support for multiple API providers
- Configurable limits and quotas
- Automatic backoff strategies
- Usage tracking and reporting
cargo install strainer
Basic usage involves creating a configuration file and running strainer with it:
strainer --config config.toml
Strainer uses TOML for configuration and looks for configuration files in the following locations, in order of priority:
- CLI specified config file (using
--config
flag) - User config:
~/.config/strainer/config.toml
- System-wide config:
/etc/strainer/config.toml
Configuration values can also be overridden by environment variables, and finally by CLI arguments which take the highest precedence.
Here's a complete example configuration file with all available options:
# API Configuration
[api]
# Provider Configuration - Choose one of the following provider sections:
# For Anthropic:
[api.provider]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}" # Can use environment variables
base_url = "https://api.anthropic.com/v1" # Optional, defaults to official API
model = "claude-2"
max_tokens = 100000
# For OpenAI (example):
# [api.provider]
# type = "openai"
# api_key = "${OPENAI_API_KEY}"
# model = "gpt-4"
# max_tokens = 100000
#
# For Mock Provider (testing):
# [api.provider]
# type = "mock"
# delay_ms = 100 # Simulated API delay
# Rate Limits
[limits]
requests_per_minute = 100 # Optional: limit requests per minute
tokens_per_minute = 100000 # Optional: limit tokens per minute
input_tokens_per_minute = 50000 # Optional: limit input tokens per minute
# Threshold Configuration
[thresholds]
warning = 30 # Percentage at which to start showing warnings (default: 30)
critical = 50 # Percentage at which to stop processing (default: 50)
resume = 25 # Percentage at which to resume after hitting critical (default: 25)
# Backoff Configuration
[backoff]
min_seconds = 5 # Minimum backoff time in seconds (default: 5)
max_seconds = 60 # Maximum backoff time in seconds (default: 60)
# Process Configuration
[process]
pause_on_warning = false # Pause process when warning threshold is reached
pause_on_critical = true # Pause process when critical threshold is reached (default: true)
# Logging Configuration
[logging]
level = "info" # Log level: error, warn, info, debug, trace
format = "text" # Log format: text or json
All configuration values can be set via environment variables using the ${VAR_NAME}
syntax in the TOML file. For example:
[api.provider]
type = "anthropic"
api_key = "${ANTHROPIC_API_KEY}"
The provider type and other configuration settings can also be set via environment variables:
STRAINER_PROVIDER_TYPE=anthropic
STRAINER_API_KEY=your-api-key
warning
: When usage reaches this percentage, warnings will be logged but processing continuescritical
: When usage reaches this percentage, processing stops and maximum backoff is appliedresume
: After hitting critical, processing resumes when usage drops below this percentage
Rate limits can be configured for:
- Requests per minute
- Total tokens per minute (input + output)
- Input tokens per minute
If any limit is omitted, that particular limit won't be enforced.
Example with only request limiting:
[limits]
requests_per_minute = 100
When limits are approached, Strainer implements an automatic backoff strategy:
- Below warning threshold: Uses minimum backoff time
- At warning threshold: Uses minimum backoff time with warnings
- At critical threshold: Uses maximum backoff time and pauses processing
- Below resume threshold: Resumes processing with minimum backoff
We provide a convenient script to run all quality checks:
./scripts/check.sh
This script will:
- Check and fix code formatting
- Run clippy lints with all features enabled
- Run all tests (basic and extended)
- Generate code coverage reports
The script requires cargo-tarpaulin
for coverage reporting and will install it if not present.
The project uses a two-tier testing strategy:
- Basic Tests (No Mocking Required)
cargo test
This runs:
- Unit tests
- Basic integration tests
- Property tests for configuration validation
- Extended Tests (With Mocking Support)
cargo test --features testing
This runs all tests, including:
- All unit tests
- Extended integration tests with mocks
- Property-based tests with mocked providers
- Test utilities for custom test scenarios
- Core rate limiting logic: 100% coverage
- Process control code: 100% coverage
- Overall minimum: 90% coverage
- All PRs must include tests
- Integration tests must cover CLI workflows
This project is licensed under the MIT License - see the LICENSE file for details.