Temporal difference learning refers to a class of model-free reinforcement learning algorithms that learn by bootstrapping from the current estimate of the value function. The method was first introduced by Sutton in 1988. In his paper, Sutton stated that temporal-differences methods can produce better predictions with less memory and peak computation compared to supervised learning methods. The author illustrated his statements through two computational experiments, showing that temporal-differences learning procedure outperform supervised learning procedure in learning to predict in both cases. In our code and paper, we replicate the experiments to examine the performance of temporal-differences methods in prediction learning when using different data set and key parameters. Results indicates that temporal-difference learning procedures indeed learn faster than supervised learning, outperforming it in producing predictions. My paper reexamines Sutton’s (1988) study on temporal-differences methods’ learning performance by replicating his findings.
Sutton (1988): Click Here
Replication Analysis: Click Here
A small note: The bootstraping nature of TD method and the nature of expectation/happiness/intelligence in human:
TD algorithms works by bootstrapping, i.e., instead of trying to calculate the total expected rewards, the method only calculates the immediate rewards and the reward expectation at the next step (hence "bootstrapping"). As an example, imagine two person: the first person is going to the bank to cash out his $100 check, and the second person already cashed out the