Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why append additional (s a r) pair to the replay buffer after one episode is done? #8

Open
Hanrui-Wang opened this issue Mar 16, 2019 · 3 comments

Comments

@Hanrui-Wang
Copy link

Hi Guan-Horng,

Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward in that pair is zero, I think it is probably not mentioned in the original paper.

agent.memory.append(

Thank you!

@zhihanyang2022
Copy link

zhihanyang2022 commented Apr 15, 2021

I think this is weird, too.

agent.memory.append(
                observation,
                agent.select_action(observation),
                0., False
            )

Also, done is set to False is this tuple, which is more perplexing.

@zhihanyang2022
Copy link

Having said so, I think this would probably have a negligible effect in terms of learning, given that the replay buffer is so big, but I think it's good for the author to check on this @ghliu .

@friedmainfunction
Copy link

In Buffer's code,I guess the terminal state can be used to divide transitions from each episodes,in terms this,I think maybe it‘s a bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants