Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A/B testing #680

Open
boxabirds opened this issue Feb 17, 2025 · 1 comment
Open

A/B testing #680

boxabirds opened this issue Feb 17, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@boxabirds
Copy link

boxabirds commented Feb 17, 2025

Problem: with such wild variability in output based on not only the LLMs but the prompts, small changes can result in quite significant differences.

Solution: ability to specify a list of prompt variations and a list of different LLMs to try.

You could use Optuna for efficient evaluation (cf DSPy), along with argilla the human evaluation.

@boxabirds boxabirds added the enhancement New feature or request label Feb 17, 2025
@sysradium
Copy link
Contributor

@boxabirds have you got an API in mind?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants