-
Notifications
You must be signed in to change notification settings - Fork 527
Description
Description
Hi team,
First of all, thank you for this fantastic repository! I've been exploring OpenManus-RL and it works great.
I am currently trying to enhance the agent's capabilities by integrating a long-term memory module (e.g. mem0). My goal is to train this memory-augmented agent using GRPO with the verl backend.
I would appreciate some guidance on the architectural changes required to achieve this properly. Specifically, I have questions regarding both the retrieval and storage phases:
-
Retrieval (Injecting Memory):
To inject retrieved memory into the context window during the rollout phase, which component should I prioritize modifying?- Should this be handled inside the Environment wrapper (treating memory as part of the observation)?
- Or should I modify the Actor/Rollout Worker logic directly to intercept the prompt before it is sent to the model?
-
Storage (Updating Memory):
I also need to store the successful interactions (or full trajectories) back intomem0to evolve the memory. Where is the best place to access the complete context for storage?- Is there a specific callback or a post-episode hook in the
RolloutWorkerwhere the full trajectory is available? - Or should this logic reside in the
RewardManagersince it evaluates the final outcome?
- Is there a specific callback or a post-episode hook in the
-
Verl Compatibility:
Sinceverlhandles distributed rollouts, are there any specific constraints I need to be aware of when dynamically changing the prompt length (due to retrieved memories) across different interaction steps? I want to ensure this doesn't break the batch processing or PPO/GRPO data collection pipeline.
Any high-level advice or pointers to the relevant code sections (e.g., specific files in verl or openmanusagent) would be incredibly helpful!
Thanks again for your hard work.
Additional Information
No response