Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't seem to notice when it hallucinates a command? #35

Open
7 of 12 tasks
swapneils opened this issue May 5, 2023 · 1 comment
Open
7 of 12 tasks

Doesn't seem to notice when it hallucinates a command? #35

swapneils opened this issue May 5, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@swapneils
Copy link

Please check that this issue hasn't been reported before.

  • I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I ran the sample for getting weather information, but without setting up the fetch_weather command, which it tried to run anyway. Ideally, the system would notice that fetch_weather failed and construct an alternate plan not using the OpenWeatherMap API, and then continue with the rest of the goals (getting dressing tips and writing them to dressing_tips.txt).

Current behaviour

Instead, the system pretended that it was successful, and said it had "not been given any new commands since the last time [it] provided an output", choosing the do_nothing action.

Steps to reproduce

Run the WeatherGPT example on the README, but comment out the code setting up GetWeather.

Possible solution

I don't see any places where the system itself is being asked whether it has completed a step of the plan. Maybe add that to the prompt, or add a "cheap model" (Auto-GPT uses ada, as I recall) to evaluate this based on the output and the plan step?

Which Operating Systems are you using?

  • Linux
  • macOS
  • Windows

Python Version

  • >= v3.11
  • v3.10
  • v3.9
  • <= v3.8

LoopGPT Version

latest

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this bug has not been reported yet.
  • I am using the latest version of LoopGPT.
  • I have provided enough information for the maintainers to reproduce and diagnose the issue.
@swapneils swapneils added the bug Something isn't working label May 5, 2023
@iskandarreza
Copy link
Contributor

Yes, I am also encountering the same issue even with simple built-in tools in cases where I removed the tool, but they still run the command, so the output is a predictable error message, but their response can be unpredictable. Sometimes they acknowledge the error and attempt a different action, but I have recorded many cases where they hallucinate it was a success or misinterpret a response as a success. The misinterpretation happens very often if it was a command run by a delegate agent/subagent.

With the misinterpreted command responses from delegate agents spawned through the create_agent command, there are two recurring cases that I have observed and recorded:

  1. The main agent creates a subagent and assigns a task, that they will follow up on later, then later requests for a report by using the mesage_agent command but they send the request to an invalid agent ID, and misinterpret the 'agent not found' error as a failure, or sometimes bizarrely as a success.
  2. They create a subagent, assign a task then immediately get a response from the subagent to clarify their request, which the main agent then interprets as the end result of the subagent's task.

I have noticed that the agent will sometimes misinterpret their own commands too. I think the issue to these problems lie somewhere in the following code blocks:
https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L154-L181
https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L287-L340

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants