Skip to content

πŸŽ‰ New Interfaces, Reduce ETL Code < 50%!

Latest
Compare
Choose a tag to compare
@penguine-ip penguine-ip released this 04 Aug 07:15
· 267 commits to main since this release

Less Code to Load Data In and Out of DeepEval's Ecosystem :)

If you're using any of the features below, you'll likely see a 50% reduction in code required, especially around ETL for formatting things in and out of DeepEval's ecosystem. This includes:

πŸ†š Arena-GEval

The first LLM-arena-as-a-Judge metric, now runs a blinded experiment and swaps positions randomly for a fair verdict on which LLM output is better.

Docs: https://deepeval.com/docs/metrics-arena-g-eval

βš›οΈ You can now run component-level evals by simply running a for loop against your dataset of goldens.

Simply run your loop -> call your agent X number of times -> get your evaluation results. No more trying to fit non-test-case-friendly formats. Instead DeepEval will find your LLM traces automatically to run evals.

from somewhere import your_async_llm_app # Replace with your async LLM app
from deepeval.dataset import EvaluationDataset, Golden

dataset = EvaluationDataset(goldens=[Golden(input="...")])

for golden in dataset.evals_iterator():
    # Create task to invoke your async LLM app
    task = asyncio.create_task(your_async_llm_app(golden.input))
    dataset.evaluate(task)

Docs: https://deepeval.com/docs/evaluation-component-level-llm-evals

πŸ’¬ Conversation simulator is now based on goldens.

Previously you have to define a list of user intentions, profile items, with a ton of more configs to juggle between. Now you can define a list of goldens with a standardized benchmark of scenarios to have turns generated for.

from deepeval.test_case import Turn
from deepeval.simulator import ConversationSimulator

# Create ConversationalGolden
conversation_golden = ConversationalGolden(
    scenario="Andy Byron wants to purchase a VIP ticket to a cold play concert.",
    expected_outcome="Successful purchase of a ticket.",
    user_description="Andy Byron is the CEO of Astronomer.",
)

# Define chatbot callback
async def chatbot_callback(input):
    return Turn(role="assistant", content=f"Chatbot response to: {input}")

# Run Simulation
simulator = ConversationSimulator(model_callback=chatbot_callback)
conversational_test_cases = simulator.simulate(goldens=[conversation_golden])
print(conversational_test_cases)

Docs: https://deepeval.com/docs/conversation-simulator

We also updated our docs with more improvements to come πŸ‘€