Doesn't seem to notice when it hallucinates a command? #35

swapneils · 2023-05-05T04:17:51Z

Please check that this issue hasn't been reported before.

I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I ran the sample for getting weather information, but without setting up the fetch_weather command, which it tried to run anyway. Ideally, the system would notice that fetch_weather failed and construct an alternate plan not using the OpenWeatherMap API, and then continue with the rest of the goals (getting dressing tips and writing them to dressing_tips.txt).

Current behaviour

Instead, the system pretended that it was successful, and said it had "not been given any new commands since the last time [it] provided an output", choosing the do_nothing action.

Steps to reproduce

Run the WeatherGPT example on the README, but comment out the code setting up GetWeather.

Possible solution

I don't see any places where the system itself is being asked whether it has completed a step of the plan. Maybe add that to the prompt, or add a "cheap model" (Auto-GPT uses ada, as I recall) to evaluate this based on the output and the plan step?

Which Operating Systems are you using?

Linux
macOS
Windows

Python Version

>= v3.11
v3.10
v3.9
<= v3.8

LoopGPT Version

latest

Acknowledgements

My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of LoopGPT.
I have provided enough information for the maintainers to reproduce and diagnose the issue.

iskandarreza · 2023-05-06T18:36:39Z

Yes, I am also encountering the same issue even with simple built-in tools in cases where I removed the tool, but they still run the command, so the output is a predictable error message, but their response can be unpredictable. Sometimes they acknowledge the error and attempt a different action, but I have recorded many cases where they hallucinate it was a success or misinterpret a response as a success. The misinterpretation happens very often if it was a command run by a delegate agent/subagent.

With the misinterpreted command responses from delegate agents spawned through the create_agent command, there are two recurring cases that I have observed and recorded:

The main agent creates a subagent and assigns a task, that they will follow up on later, then later requests for a report by using the mesage_agent command but they send the request to an invalid agent ID, and misinterpret the 'agent not found' error as a failure, or sometimes bizarrely as a success.
They create a subagent, assign a task then immediately get a response from the subagent to clarify their request, which the main agent then interprets as the end result of the subagent's task.

I have noticed that the agent will sometimes misinterpret their own commands too. I think the issue to these problems lie somewhere in the following code blocks:
https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L154-L181
https://github.com/farizrahman4u/loopgpt/blob/main/loopgpt/agent.py#L287-L340

swapneils added the bug Something isn't working label May 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doesn't seem to notice when it hallucinates a command? #35

Doesn't seem to notice when it hallucinates a command? #35

swapneils commented May 5, 2023

iskandarreza commented May 6, 2023

Doesn't seem to notice when it hallucinates a command? #35

Doesn't seem to notice when it hallucinates a command? #35

Comments

swapneils commented May 5, 2023

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Possible solution

Which Operating Systems are you using?

Python Version

LoopGPT Version

Acknowledgements

iskandarreza commented May 6, 2023