Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dict policy #491

Merged
merged 10 commits into from
Jun 10, 2023
Merged

Dict policy #491

merged 10 commits into from
Jun 10, 2023

Conversation

NeroBlackstone
Copy link
Contributor

Related issue

Copy link
Member

@zsunberg zsunberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NeroBlackstone thanks for your contribution! Can you have a look at my comments and make those changes before we merge?

lib/POMDPTools/src/Policies/dict.jl Outdated Show resolved Hide resolved
lib/POMDPTools/src/Policies/dict.jl Outdated Show resolved Hide resolved
lib/POMDPTools/src/Policies/dict.jl Outdated Show resolved Hide resolved
end
end
if max_action === nothing
max_action = available_actions[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
max_action = available_actions[1]
max_action = first(available_actions)

first is preferable to [1] because it will work for containers that are not indexable (e.g.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly, I think it would be better to use a default policy here, i.e. store default in the struct and call action(p.default, s) here. I think it would be even better to have a user-defined default value (implicitly you have defined it with max_action_value=0 here. Then, if none of the values in the Dict are higher, it will use the default policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. So I will add a new field default_policy in struct ValueDictPolicy.
And I think it would be better if we have two constructors for ValueDictPolicy.
ValueDictPolicy(mdp::MDP) : Random return one of action from actions(p.mdp,s), if max value action is not exist.
ValueDictPolicy(mdp::MDP, default_policy::Policy) : Return action by default_policy, if max value action is not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we let user define default value?
I set max_action_value=0 because I want to keep the same behavior with ValuePolicy, the default value of ValuePolicy is 0, before we encounter a specific state-action pair in the environment. Souce Code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly, I think it would be better to use a default policy here, i.e. store default in the struct and call action(p.default, s) here.

Do you mean to expose how to select action to the user?
Or just a policy seal in the struct and only execute in action() function?

Copy link
Member

@zsunberg zsunberg May 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set max_action_value=0 because I want to keep the same behavior with ValuePolicy, the default value of ValuePolicy is 0, before we encounter a specific state-action pair in the environment. Souce Code

Yes, but in ValuePolicy the user can change that if they want to. I think you should have

struct ValueDictPolicy{M<:MDP, T<:AbstractDict{Tuple,Float64}, P<:Policy} <: Policy
    mdp::M
    value_dict::T
    default_value::Float64
    default_policy::P
end

with the constructor setting default_value=-Inf unless the user overrides it. And then

max_action_value = p.default_value

initially, and

if isnothing(max_action)
    return action(p.default_policy)

near the end of the function. Does that make sense?

Copy link
Contributor Author

@NeroBlackstone NeroBlackstone May 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think default_value is very helpful, users could control the "initial value" of the "value table" (it's dict here)

But default_policy, it will be available on the situation "there is no max value action, so let default_policy help me decide what action to return" only. In most instances, it's the first time we encounter a state, so all the values are the same.

But shall we let users decide what action returned, when all the action values are the same?
I'm not sure, maybe just returning random action is better?

Could you please give me an example about "in this situation we must let users define policy select action"?
Any solver algorithm need control policy here?

Thank you very much, if you have any ideas please let me know, I will finish it.

lib/POMDPTools/src/Policies/dict.jl Outdated Show resolved Hide resolved
lib/POMDPTools/test/policies/test_dict_policy.jl Outdated Show resolved Hide resolved
@NeroBlackstone
Copy link
Contributor Author

I have added default_value. If you decide to add default_policy, please tell me. ( For now, I'm not sure it makes sense.
Thank you very much.

@NeroBlackstone NeroBlackstone force-pushed the DictPolicy branch 3 times, most recently from 5ff641a to b53dcc8 Compare June 7, 2023 18:14
@NeroBlackstone
Copy link
Contributor Author

Deleted some test code.
I think the only thing not finished is default_policy.
I don't think it makes sense. If you have more ideas please comment. Thank you.

@zsunberg
Copy link
Member

I just added the default policy if you want to see what I meant. Thanks again for your contribution @NeroBlackstone !

@zsunberg zsunberg merged commit 9c36eba into JuliaPOMDP:master Jun 10, 2023
@NeroBlackstone
Copy link
Contributor Author

Thank you very much, I learn a lot.

johannes-fischer pushed a commit to johannes-fischer/POMDPs.jl that referenced this pull request Jul 12, 2023
* add dict policy

* update dict.jl

* add docs for DIctPolicy

* Apply suggestions from code review

Co-authored-by: Zachary Sunberg <[email protected]>

* add default_value

* revert runtest.jl

* add return type

* added a default policy, changed actionvalues to valuemap

* updated docs

---------

Co-authored-by: Zachary Sunberg <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants