New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] evaluate()
#542
[Python] evaluate()
#542
Conversation
Also auto-trace in the runevaluator if possible Related to: #542
reference_example_id = langsmith_extra.get("reference_example_id") | ||
id_ = langsmith_extra.get("run_id") | ||
if ( | ||
not project_cv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still trace if you're manually providing example, project, etc. via the context var
1c03f7f
to
18bfc35
Compare
if isinstance(project, uuid.UUID) or _is_uuid(project): | ||
runs = client.list_runs(project_id=project) | ||
else: | ||
runs = client.list_runs(project_name=project) | ||
|
||
treemap: DefaultDict[uuid.UUID, List[schemas.Run]] = collections.defaultdict(list) | ||
results = [] | ||
all_runs = {} | ||
for run in runs: | ||
if run.parent_run_id is not None: | ||
treemap[run.parent_run_id].append(run) | ||
else: | ||
results.append(run) | ||
all_runs[run.id] = run | ||
for run_id, child_runs in treemap.items(): | ||
all_runs[run_id].child_runs = sorted(child_runs, key=lambda r: r.dotted_order) | ||
return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we just map trace_id to list of runs, then sort each entry based on dotted order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to reconstruct the tree right? It can be nested at an arbitrary depth
I could also just default to only the roots, which is all people need in 90% of usage rn
TARGET_T = PIPELINE_T | ||
# dataset-name, dataset_id, or examples | ||
DATA_T = Union[str, uuid.UUID, Iterable[schemas.Example]] | ||
SUMMARY_EVALUATOR_T = Callable[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is summary evaluator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregate level over whole experiment
evaluate
apiFeedback I'd love (all welcome)
predict, evaluate
vs. 1...)To add in a second PR: