-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple terminal states #3
Comments
I think you don't need to change the code. It should work with a list of states thanks to numpy's indexing. The general idea is that you set all terminal states equal to one, i.e. |
I am using MaxEnt for Iterated Prisoner's Dilemma. It is an unending game and I limited it to n iterations. So, any state can be terminal state. Will try adding all the states to the list, thanks! |
Ah, interesting. I haven't worked with this (and IRL in general) for a while so please be aware that I might not be the best option on answering theoretical questions about this. However, I think that initializing all states to one and then iterating for some fixed number of steps should theoretically work and be a sound approximation to the unending game. Now, there could be a technical issue in that the Instead of that, however, you probably rather want to consider using maximum causal entropy IRL (see Also, for the causal function you'll have to specify the terminal reward function yourself (because of this check), but you can just set it to zero everywhere and it should work fine (i.e. |
Thanks. As a first step (to verify the new world env I added) I set one of the states as the terminal state. That worked fairly well, and the reward function was recovered close to the original. After that I tried the infinite horizon case. For that I commented out At this point I am not sure if it is a question of tweaking some parameters or I have got something fundamentally wrong. |
Sorry for the late response. I think this suggests that the expected SVF doesn't converge to something meaningful, which is probably more of a theoretical or algorithmical issue than just tweaking parameters. Re convergence, Ziebart et al. write
There is a paper (https://arxiv.org/abs/2012.00889) proposing an algorithm that doesn't explicitly rely on terminal states (and has some other improvements), so maybe that would work better here. |
Thanks a lot for such a well-documented code. It is making it really easy for me to adapt for my use-case.
My MDP has multiple terminal states and I was wondering how to change the local_action_probabilities() code for that scenario. The Ziebert paper mentions you have to do it for all terminal states but not sure how to combine them. Thanks for your help!
The text was updated successfully, but these errors were encountered: