Asymmetric Actor Critic and Related Memory Processing #180
Unanswered
patricknaughton01
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Currently, there is necessary to modify several components in skrl to support asymmetric learning, example:
I'm working on separating (staring with the environment wrappers on this branch) the concepts of observation and state (currently mixed in skrl) to support asymmetric learning, but it may take some time. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@Toni-SM, thanks so much for your work on this library. I'm trying to use it to train an agent with IsaacGym as a simulator, and wanted to use the asymmetric actor critic variant of PPO like is done in the IsaacGymEnvs repo (for example, in the IndustReal environments). Because of this, my observation is currently a dict that looks like:
I'm using PPO_RNN as the agent. The difficulty I run into when trying to run this is the memory class is built specifically with raw tensors in mind. I wrote a subclass of the
RandomMemory
class to handle the storage of these elements (the state) separately, but this loop seems to be checking all elements of the memory to see if they are float tensors and filling them with nans, which causes an error when it gets to my shoehorned dict. Currently I've resolved this by just commenting out these lines in my installation of skrl, but I was wondering if they need to be there at all? It looks like the last commit on those lines mentions that they are there for backwards compatibility with old versions of torch, but I didn't follow why exactly that needs to be done to support old versions of torch.Another issue I ran into is this line in the
PPO_RNN
class itself. The cast tofloat
messes up my code because the observation is a dict not a tensor. I was wondering if that cast needs to be there at all though since the user controls what type the state is anyway when they write the environment, so they can just ensure they're giving float tensors as input.Please also let me know if there's a better way to implement asymmetric actor critic models in skrl, I didn't see anything in the docs, but it's possible I just missed it.
Thanks for your time, and thanks again for all the work on the repo! It's been very readable and easy to work with.
Beta Was this translation helpful? Give feedback.
All reactions