Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] evaluate() #542

Merged
merged 36 commits into from Mar 27, 2024
Merged

[Python] evaluate() #542

merged 36 commits into from Mar 27, 2024

Conversation

hinthornw
Copy link
Collaborator

@hinthornw hinthornw commented Mar 23, 2024

  • base evaluate api
  • support wrapper to support OTS evaluators from langchain
  • add examples

Feedback I'd love (all welcome)

  • Desired behavior if the project you specify already exists
  • 2-step API (2 functions? predict, evaluate vs. 1...)

To add in a second PR:

  • async support
  • hill-climbing as a first-class citizen

hinthornw added a commit that referenced this pull request Mar 25, 2024
Also auto-trace in the runevaluator if possible

Related to: #542
@hinthornw hinthornw changed the title V2 API Test [Python] evaluate() Mar 26, 2024
@hinthornw hinthornw marked this pull request as ready for review March 26, 2024 14:07
reference_example_id = langsmith_extra.get("reference_example_id")
id_ = langsmith_extra.get("run_id")
if (
not project_cv
Copy link
Collaborator Author

@hinthornw hinthornw Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still trace if you're manually providing example, project, etc. via the context var

@hinthornw hinthornw force-pushed the wfh/eval2.0 branch 2 times, most recently from 1c03f7f to 18bfc35 Compare March 26, 2024 16:31
python/langsmith/client.py Show resolved Hide resolved
Comment on lines 407 to 423
if isinstance(project, uuid.UUID) or _is_uuid(project):
runs = client.list_runs(project_id=project)
else:
runs = client.list_runs(project_name=project)

treemap: DefaultDict[uuid.UUID, List[schemas.Run]] = collections.defaultdict(list)
results = []
all_runs = {}
for run in runs:
if run.parent_run_id is not None:
treemap[run.parent_run_id].append(run)
else:
results.append(run)
all_runs[run.id] = run
for run_id, child_runs in treemap.items():
all_runs[run_id].child_runs = sorted(child_runs, key=lambda r: r.dotted_order)
return results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just map trace_id to list of runs, then sort each entry based on dotted order?

Copy link
Collaborator Author

@hinthornw hinthornw Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to reconstruct the tree right? It can be nested at an arbitrary depth

I could also just default to only the roots, which is all people need in 90% of usage rn

python/langsmith/evaluation/_runner.py Outdated Show resolved Hide resolved
TARGET_T = PIPELINE_T
# dataset-name, dataset_id, or examples
DATA_T = Union[str, uuid.UUID, Iterable[schemas.Example]]
SUMMARY_EVALUATOR_T = Callable[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is summary evaluator?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregate level over whole experiment

python/langsmith/evaluation/_runner.py Outdated Show resolved Hide resolved
python/langsmith/evaluation/_runner.py Show resolved Hide resolved
@hinthornw hinthornw merged commit 545b7fa into main Mar 27, 2024
7 checks passed
@hinthornw hinthornw deleted the wfh/eval2.0 branch March 27, 2024 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants