Skip to content

Sampling action when using deep policy #101

Answered by conorheins
sunghwan87 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @sunghwan87,

Thanks for your comment / question. This is a good and important point to make; I hope my explanation below is clarifying. Let me know if not.

So the reason the agent only samples its action from the first timestep of the currently evaluated policy posterior (which, as you note, may extend multiple timesteps into the future), is because the agent can only take one action per timestep. It can not "pre-determine" its future actions without first gathering observations.

The loop is action --> observation --> inference(of both state and policy) --> take another action. So we first must sample another observation and thus re-estimate our policy posterior, before we can take ano…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by conorheins
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
good first issue Good for newcomers
2 participants
Converted from issue

This discussion was converted from issue #99 on October 26, 2022 12:52.