Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tool testing #164

Merged
merged 12 commits into from
Jul 10, 2024
Merged

Add tool testing #164

merged 12 commits into from
Jul 10, 2024

Conversation

dillonalaird
Copy link
Member

@dillonalaird dillonalaird commented Jul 8, 2024

This PR adds several items:

  • The planner now outputs 3 plans
  • A new tester agent tests the plans and picks the best one
  • LMM now supports mp4 files as media input
  • Max tokens increased, and traceback stripped of color codes
  • Remove reflection

Copy link
Collaborator

@shankar-vision-eng shankar-vision-eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

@@ -29,23 +29,110 @@
{feedback}

**Instructions**:
1. Based on the context and tools you have available, write a plan of subtasks to achieve the user request.
2. Go over the users request step by step and ensure each step is represented as a clear subtask in your plan.
1. Based on the context and tools you have available, create a plan of subtasks to achieve the user request.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT - create a set of different plans to achieve the target in user request instead of create a plan of subtasks to achieve the user request. Subtasks seems to be more like sub plans. You might have to run thru benchmark

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, that's a good suggestion I'll test it out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not entirely sure why but this change significantly lowered the benchmark results

vision_agent/agent/vision_agent_prompts.py Show resolved Hide resolved
vision_agent/agent/vision_agent_prompts.py Show resolved Hide resolved
vision_agent/lmm/lmm.py Show resolved Hide resolved
vision_agent/lmm/lmm.py Show resolved Hide resolved
Copy link
Collaborator

@shankar-vision-eng shankar-vision-eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left few more comments

vision_agent/agent/vision_agent.py Show resolved Hide resolved
vision_agent/agent/vision_agent.py Show resolved Hide resolved
Copy link
Collaborator

@shankar-vision-eng shankar-vision-eng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Regarding the change for plan prompt, I leave it to you, Can you go thru the benchmark cases which scored lower. I'm assuming the benchmark will also not be deterministic so there should be some SD. If its within the SD, i will go with the prompt change

@dillonalaird dillonalaird merged commit 169d650 into main Jul 10, 2024
8 checks passed
@dillonalaird dillonalaird deleted the add-tool-testing branch July 10, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants