Dict policy #491

NeroBlackstone · 2023-05-14T20:37:21Z

Related issue

zsunberg

@NeroBlackstone thanks for your contribution! Can you have a look at my comments and make those changes before we merge?

lib/POMDPTools/src/Policies/dict.jl

zsunberg · 2023-05-24T16:05:36Z

lib/POMDPTools/src/Policies/dict.jl

+ end
+ end
+ if max_action === nothing
+ max_action = available_actions[1]


Suggested change

max_action = available_actions[1]

max_action = first(available_actions)

first is preferable to [1] because it will work for containers that are not indexable (e.g.)

More importantly, I think it would be better to use a default policy here, i.e. store default in the struct and call action(p.default, s) here. I think it would be even better to have a user-defined default value (implicitly you have defined it with max_action_value=0 here. Then, if none of the values in the Dict are higher, it will use the default policy.

I agree with you. So I will add a new field default_policy in struct ValueDictPolicy.
And I think it would be better if we have two constructors for ValueDictPolicy.
ValueDictPolicy(mdp::MDP) : Random return one of action from actions(p.mdp,s), if max value action is not exist.
ValueDictPolicy(mdp::MDP, default_policy::Policy) : Return action by default_policy, if max value action is not exist.

Shall we let user define default value?
I set max_action_value=0 because I want to keep the same behavior with ValuePolicy, the default value of ValuePolicy is 0, before we encounter a specific state-action pair in the environment. Souce Code

More importantly, I think it would be better to use a default policy here, i.e. store default in the struct and call action(p.default, s) here.

Do you mean to expose how to select action to the user?
Or just a policy seal in the struct and only execute in action() function?

I set max_action_value=0 because I want to keep the same behavior with ValuePolicy, the default value of ValuePolicy is 0, before we encounter a specific state-action pair in the environment. Souce Code

Yes, but in ValuePolicy the user can change that if they want to. I think you should have

struct ValueDictPolicy{M<:MDP, T<:AbstractDict{Tuple,Float64}, P<:Policy} <: Policy mdp::M value_dict::T default_value::Float64 default_policy::P end

with the constructor setting default_value=-Inf unless the user overrides it. And then

max_action_value = p.default_value

initially, and

if isnothing(max_action) return action(p.default_policy)

near the end of the function. Does that make sense?

I think default_value is very helpful, users could control the "initial value" of the "value table" (it's dict here)

But default_policy, it will be available on the situation "there is no max value action, so let default_policy help me decide what action to return" only. In most instances, it's the first time we encounter a state, so all the values are the same.

But shall we let users decide what action returned, when all the action values are the same?
I'm not sure, maybe just returning random action is better?

Could you please give me an example about "in this situation we must let users define policy select action"?
Any solver algorithm need control policy here?

Thank you very much, if you have any ideas please let me know, I will finish it.

lib/POMDPTools/src/Policies/dict.jl

lib/POMDPTools/test/policies/test_dict_policy.jl

Co-authored-by: Zachary Sunberg <[email protected]>

NeroBlackstone · 2023-06-04T18:03:10Z

I have added default_value. If you decide to add default_policy, please tell me. ( For now, I'm not sure it makes sense.
Thank you very much.

NeroBlackstone · 2023-06-07T18:23:43Z

Deleted some test code.
I think the only thing not finished is default_policy.
I don't think it makes sense. If you have more ideas please comment. Thank you.

zsunberg · 2023-06-10T01:24:42Z

I just added the default policy if you want to see what I meant. Thanks again for your contribution @NeroBlackstone !

NeroBlackstone · 2023-06-10T03:02:09Z

Thank you very much, I learn a lot.

* add dict policy * update dict.jl * add docs for DIctPolicy * Apply suggestions from code review Co-authored-by: Zachary Sunberg <[email protected]> * add default_value * revert runtest.jl * add return type * added a default policy, changed actionvalues to valuemap * updated docs --------- Co-authored-by: Zachary Sunberg <[email protected]>

NeroBlackstone added 3 commits May 15, 2023 03:53

add dict policy

2e0850e

update dict.jl

5e02f71

add docs for DIctPolicy

fb40360

zsunberg requested changes May 24, 2023

View reviewed changes

NeroBlackstone and others added 3 commits May 25, 2023 15:14

Apply suggestions from code review

8c195bf

Co-authored-by: Zachary Sunberg <[email protected]>

add default_value

8fbd6c5

revert runtest.jl

b53dcc8

NeroBlackstone force-pushed the DictPolicy branch from 3107806 to 5ff641a Compare June 7, 2023 18:02

add return type

5a48b04

NeroBlackstone force-pushed the DictPolicy branch 3 times, most recently from 5ff641a to b53dcc8 Compare June 7, 2023 18:14

NeroBlackstone force-pushed the DictPolicy branch from d29d2f7 to b53dcc8 Compare June 7, 2023 18:25

added a default policy, changed actionvalues to valuemap

cff4ef7

zsunberg approved these changes Jun 10, 2023

View reviewed changes

zsunberg added 2 commits June 9, 2023 18:26

updated docs

cf912f8

Merge branch 'master' into DictPolicy

923eb57

zsunberg merged commit 9c36eba into JuliaPOMDP:master Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dict policy #491

Dict policy #491

NeroBlackstone commented May 14, 2023

zsunberg left a comment

zsunberg May 24, 2023

zsunberg May 24, 2023

NeroBlackstone May 25, 2023

NeroBlackstone May 25, 2023

NeroBlackstone May 25, 2023

zsunberg May 27, 2023 •

edited

Loading

NeroBlackstone May 29, 2023 •

edited

Loading

NeroBlackstone commented Jun 4, 2023

NeroBlackstone commented Jun 7, 2023

zsunberg commented Jun 10, 2023

NeroBlackstone commented Jun 10, 2023

	max_action = available_actions[1]
	max_action = first(available_actions)

Dict policy #491

Dict policy #491

Conversation

NeroBlackstone commented May 14, 2023

zsunberg left a comment

Choose a reason for hiding this comment

zsunberg May 24, 2023

Choose a reason for hiding this comment

zsunberg May 24, 2023

Choose a reason for hiding this comment

NeroBlackstone May 25, 2023

Choose a reason for hiding this comment

NeroBlackstone May 25, 2023

Choose a reason for hiding this comment

NeroBlackstone May 25, 2023

Choose a reason for hiding this comment

zsunberg May 27, 2023 • edited Loading

Choose a reason for hiding this comment

NeroBlackstone May 29, 2023 • edited Loading

Choose a reason for hiding this comment

NeroBlackstone commented Jun 4, 2023

NeroBlackstone commented Jun 7, 2023

zsunberg commented Jun 10, 2023

NeroBlackstone commented Jun 10, 2023

zsunberg May 27, 2023 •

edited

Loading

NeroBlackstone May 29, 2023 •

edited

Loading