Asymmetric Actor Critic and Related Memory Processing #180

patricknaughton01 · 2024-07-30T20:38:06Z

patricknaughton01
Jul 30, 2024

@Toni-SM, thanks so much for your work on this library. I'm trying to use it to train an agent with IsaacGym as a simulator, and wanted to use the asymmetric actor critic variant of PPO like is done in the IsaacGymEnvs repo (for example, in the IndustReal environments). Because of this, my observation is currently a dict that looks like:

{
    "obs": torch.Tensor, # --> given to the policy network
    "state": torch.Tensor # --> given to the value network
}

I'm using PPO_RNN as the agent. The difficulty I run into when trying to run this is the memory class is built specifically with raw tensors in mind. I wrote a subclass of the RandomMemory class to handle the storage of these elements (the state) separately, but this loop seems to be checking all elements of the memory to see if they are float tensors and filling them with nans, which causes an error when it gets to my shoehorned dict. Currently I've resolved this by just commenting out these lines in my installation of skrl, but I was wondering if they need to be there at all? It looks like the last commit on those lines mentions that they are there for backwards compatibility with old versions of torch, but I didn't follow why exactly that needs to be done to support old versions of torch.

Another issue I ran into is this line in the PPO_RNN class itself. The cast to float messes up my code because the observation is a dict not a tensor. I was wondering if that cast needs to be there at all though since the user controls what type the state is anyway when they write the environment, so they can just ensure they're giving float tensors as input.

Please also let me know if there's a better way to implement asymmetric actor critic models in skrl, I didn't see anything in the docs, but it's possible I just missed it.

Thanks for your time, and thanks again for all the work on the repo! It's been very readable and easy to work with.

Toni-SM · 2024-08-04T17:33:39Z

Toni-SM
Aug 4, 2024
Maintainer

Hi @patricknaughton01

Currently, there is necessary to modify several components in skrl to support asymmetric learning, example:

state_space returned the the environment wrapper (used when instantiating the Value model)
Define tensor to storage the state when initializing the memory
Forward the state when calling the value model during rollouts stage
Storage the state when calling record_transition
Sample the state on each agent _update and forward the state when calling the value model during the training stage

I'm working on separating (staring with the environment wrappers on this branch) the concepts of observation and state (currently mixed in skrl) to support asymmetric learning, but it may take some time.

1 reply

patricknaughton01 Aug 12, 2024
Author

Hi @Toni-SM, thanks for the info. Do you have any thoughts about the two lines that I mentioned? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Asymmetric Actor Critic and Related Memory Processing #180

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Asymmetric Actor Critic and Related Memory Processing #180

patricknaughton01 Jul 30, 2024

Replies: 1 comment · 1 reply

Toni-SM Aug 4, 2024 Maintainer

patricknaughton01 Aug 12, 2024 Author

patricknaughton01
Jul 30, 2024

Replies: 1 comment 1 reply

Toni-SM
Aug 4, 2024
Maintainer

patricknaughton01 Aug 12, 2024
Author