Vision Agent v3 #89

dillonalaird · 2024-05-18T04:59:44Z

A couple of pros and cons with Data Interpreter and Agent Coder lead to this design.

Data Interpreter Cons:

The subtasks are too small, it will take tasks that GPT-4o can handle fine, and subdivide them into 5-7 smaller tasks that are unnecessarily small
When the subtasks are small, it tends to overwrite correct code a lot. For example it will write correct code for subtask 1, and then remove it by subtask 5

Data Interpreter Pros:

Tool recommendation helps scale to more tools
Long term memory helps the agent framework "learn" to some degree
Planning helps with tool choice and task decomposition if it's executed in the correct way

Agent Coder Pros:

Does very well on coding tasks (first place on human eval)

Agent Coder Cons:

Can't plan long term
No tool recommendation, needs all tools at once
Nothing like long term memory to help it "learn"

This version of vision agent basically keeps planning, but does the entire plan in one shot using the Agent Coder framework. It uses the plan to do tool recommendation and also allows for long term memory lookup. For planning beyond the initial plan it will do reflection and see if it needs to execute an additional plan.

shankar-vision-eng

Looks good. Left very minor comment wherever the prompt was not very clear.

vision_agent/agent/vision_agent_v3_prompts.py

shankar-vision-eng · 2024-05-20T22:02:56Z

vision_agent/agent/vision_agent_v3_prompts.py

+"""
+
+
+SIMPLE_TEST = """


What is the difference between TEST and SIMPLE_TEST ? When do we use one over other ?

Good question, so TEST is based off of the original Agent Coder, but the problem is it requires the tester to be very thorough but for these vision tasks you can't really test a lot of the edge cases (like test on images with out a person, or where the text for OCR is in a different spot), so it tends to hallucinate file names (even after I tell it specifically not to). So I made a SIMPLE_TEST that just asks it to test basic functionality. This seems to be working better on the pool case

Just went thru the code again, i see that its not being used and only used in v1 (agent coder) and v2. Should we remove it from this file then ?

Will leave it in the file as of now, we can remove it later

shankar-vision-eng

LGTM

dillonalaird added 9 commits May 17, 2024 21:53

renamed prompt to prompts

c8d5343

added vision agent v3

6f3e62f

fixed execute issue

81ea443

fixed type issue

a61497b

fixed flake8

482fe2d

black and isort

6d78637

switch to simple test case

a539cbc

fixed issue with chat not resetting

b23a480

removed unused import

f74eacb

shankar-vision-eng reviewed May 20, 2024

View reviewed changes

dillonalaird and others added 4 commits May 21, 2024 07:16

prmopt updates

f08fb90

update prompts

e28fd08

remove debug

0b1fa8b

Merge branch 'main' into vision-agent-v3

f901b4a

shankar-vision-eng approved these changes May 21, 2024

View reviewed changes

shankar-vision-eng merged commit 54ae662 into main May 21, 2024
7 checks passed

dillonalaird deleted the vision-agent-v3 branch May 29, 2024 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision Agent v3 #89

Vision Agent v3 #89

dillonalaird commented May 18, 2024 •

edited

Loading

shankar-vision-eng left a comment

shankar-vision-eng May 20, 2024

dillonalaird May 20, 2024

shankar-vision-eng May 20, 2024

shankar-vision-eng May 21, 2024

shankar-vision-eng left a comment

Vision Agent v3 #89

Vision Agent v3 #89

Conversation

dillonalaird commented May 18, 2024 • edited Loading

shankar-vision-eng left a comment

Choose a reason for hiding this comment

shankar-vision-eng May 20, 2024

Choose a reason for hiding this comment

dillonalaird May 20, 2024

Choose a reason for hiding this comment

shankar-vision-eng May 20, 2024

Choose a reason for hiding this comment

shankar-vision-eng May 21, 2024

Choose a reason for hiding this comment

shankar-vision-eng left a comment

Choose a reason for hiding this comment

dillonalaird commented May 18, 2024 •

edited

Loading