Why append additional (s a r) pair to the replay buffer after one episode is done? #8

Hanrui-Wang · 2019-03-16T03:55:15Z

Hi Guan-Horng,

Thanks for your great implementation! I am wondering why do we append additional (s a r) pair to the replay buffer after one episode is done? The reward in that pair is zero, I think it is probably not mentioned in the original paper.

pytorch-ddpg/main.py

Line 64 in e9db328

agent.memory.append(

Thank you!

zhihanyang2022 · 2021-04-15T23:06:56Z

I think this is weird, too.

agent.memory.append(
                observation,
                agent.select_action(observation),
                0., False
            )

Also, done is set to False is this tuple, which is more perplexing.

zhihanyang2022 · 2021-04-15T23:10:54Z

Having said so, I think this would probably have a negligible effect in terms of learning, given that the replay buffer is so big, but I think it's good for the author to check on this @ghliu .

friedmainfunction · 2022-07-02T12:15:41Z

In Buffer's code，I guess the terminal state can be used to divide transitions from each episodes，in terms this，I think maybe it‘s a bug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why append additional (s a r) pair to the replay buffer after one episode is done? #8

Why append additional (s a r) pair to the replay buffer after one episode is done? #8

Hanrui-Wang commented Mar 16, 2019

zhihanyang2022 commented Apr 15, 2021 •

edited

Loading

zhihanyang2022 commented Apr 15, 2021

friedmainfunction commented Jul 2, 2022

Why append additional (s a r) pair to the replay buffer after one episode is done? #8

Why append additional (s a r) pair to the replay buffer after one episode is done? #8

Comments

Hanrui-Wang commented Mar 16, 2019

zhihanyang2022 commented Apr 15, 2021 • edited Loading

zhihanyang2022 commented Apr 15, 2021

friedmainfunction commented Jul 2, 2022

zhihanyang2022 commented Apr 15, 2021 •

edited

Loading