Sampling action when using deep policy #101
-
Hi, thank you for this package I'm really enjoying learning about active inference. I deeply appreciate the contributors to this package. However, I have a question when I tried to implement the explore-exploit task in this article (Smith et al., A step-by-step tutorial on active inference and its application to empirical data, https://doi.org/10.1016/j.jmp.2021.102632) which is already implemented in MATLAB and "pymdp". I tried to run a loop for active inference of deep policy (two time-steps) according to the "complete recipe for active inference" as written in the "pymdp" tutorial notebook, but I found that the "sample_action" method of the "Agent" class only sample action from the first timestep of policy (each policy has the shape of (2,2), the first dim is the number of timesteps and the second dim is the number of factors) using "control.sample_policy" function as below: (line 674-675, control.py)
My setting of the agent class was:
In my thought, to sample the action of the other timestep in each policy, line 675 would be better if changed like this:
If I didn't understand this package well, then please let me know how to correct it. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @sunghwan87, Thanks for your comment / question. This is a good and important point to make; I hope my explanation below is clarifying. Let me know if not. So the reason the agent only samples its action from the first timestep of the currently evaluated policy posterior (which, as you note, may extend multiple timesteps into the future), is because the agent can only take one action per timestep. It can not "pre-determine" its future actions without first gathering observations. The loop is action --> observation --> inference(of both state and policy) --> take another action. So we first must sample another observation and thus re-estimate our policy posterior, before we can take another action. So your "forward horizon" of policy estimation will be constantly re-relativizing itself to the current timestep, in an online-fashion as you gather new observations, one timestep at a time. Does that make sense? In other words, your thought to " to sample the action of the other timestep in each policy" doesn't really make sense, because we can't "sample an action in the future", until we've reached the future (and the future is now the present) |
Beta Was this translation helpful? Give feedback.
-
Hi @sunghwan87, going to move this into a discussion and close |
Beta Was this translation helpful? Give feedback.
Hi @sunghwan87,
Thanks for your comment / question. This is a good and important point to make; I hope my explanation below is clarifying. Let me know if not.
So the reason the agent only samples its action from the first timestep of the currently evaluated policy posterior (which, as you note, may extend multiple timesteps into the future), is because the agent can only take one action per timestep. It can not "pre-determine" its future actions without first gathering observations.
The loop is action --> observation --> inference(of both state and policy) --> take another action. So we first must sample another observation and thus re-estimate our policy posterior, before we can take ano…