-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tool testing #164
Add tool testing #164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments
@@ -29,23 +29,110 @@ | |||
{feedback} | |||
|
|||
**Instructions**: | |||
1. Based on the context and tools you have available, write a plan of subtasks to achieve the user request. | |||
2. Go over the users request step by step and ensure each step is represented as a clear subtask in your plan. | |||
1. Based on the context and tools you have available, create a plan of subtasks to achieve the user request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT - create a set of different plans to achieve the target in user request
instead of create a plan of subtasks to achieve the user request
. Subtasks seems to be more like sub plans. You might have to run thru benchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, that's a good suggestion I'll test it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure why but this change significantly lowered the benchmark results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few more comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Regarding the change for plan prompt, I leave it to you, Can you go thru the benchmark cases which scored lower. I'm assuming the benchmark will also not be deterministic so there should be some SD. If its within the SD, i will go with the prompt change
This PR adds several items: