-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DictPolicy
and special Q-learning based on key-value storage
#459
Comments
@NeroBlackstone sorry that we never responded to this! This is actually something that people often want to do. If you're still interested in contributing it, I think we can integrate it in with a few small adjustments. Let me know if you're interested in doing that. |
Hi, thanks for your comment. I will do these things:
I will open PR for the first step soon. If there are code problems, please point them out. Thank you very much again. |
If we have a discrete space, discrete action, generative MDP.
And states space and actions space are hard to enumerate. But we still want to use the traditional tabular RL algorithm to solve it.
So, I implement a
DictPolicy
, it used to store state-action pair values. (Sure. Users need to addBase.isequal()
andBase.hash()
for their state and action type.)DictPolicy.jl :
Then we have a special Q-learning based on key-value storage, we don't need to enumerate states space and actions space in MDP definition. (okay, most code copy from TabularTDLearning.jl, but change Q-value store and read.
dict_q_learning.jl :
What's your point of view? Do you have any advice?
Thank you for taking the time to read my issue.
If you think it's meaningful, I can opne a PR and add some test.
It's okay if you think it's meaningless and no versatility. I just finish it for solve my MDP.
The text was updated successfully, but these errors were encountered: