Self Driving Car in TORCS Simulator

This project is aimed to develop a self driving car agent in TORCS Simulator using Deep Reinforcement's Learning Actor-Critic Algorithm.

Dependencies

You can install Python dependencies using pip install -r Requirements.txt , and it should just work. if you want to install package manually, here's a list:

Python==3.7
Tensorflow-gpu==2.3.0
Keras=2.6.0
Numpy=1.18.5
gym_torcs

Background

TORCS simulator is an open source car simulator which is extensively used in AI research. The reason for selecting TORCS for this project is that it is easy to get states from the game using gym_torcs library, which uses SCR plugin to setup connection with the game and thus making it easy to send commands into the game and also retrieving current states. In reinforcement learning we need to get states data and send action values continuously, so this simulator suited best for our project. Self driving car is an area of wide research and it encompasses many fields, implementation of this project was a good method for practically applying various concepts of reinforcement learning.

Approach

Actor-Critic Background

Imagine you play a video game with a friend that provides you some feedback. You’re the Actor and your friend is the Critic. At the beginning, you don’t know how to play the game, so you try some action randomly. The Critic observes your actions and provides feedback. Learning from this feedback, you’ll update your policy and be better at playing that game. On the other hand, your friend (Critic) will also update their own way to provide feedback so it can be better next time. As we can see, the idea of Actor Critic is to have two neural networks. We estimate both, both run in parallel. Because we have two models (Actor and Critic) that must be trained, it means that we have two set of weights, the weights of actor network are updated with resect toh the output of critic network. Update of target networks is done by soft update.

Why Actor-Critic ?

The Actor Critic model is a better score function. Instead of waiting until the end of the episode as we do in Monte Carlo REINFORCE, we make an update at each step (TD Learning). Because we do an update at each time step, we can’t use the total rewards R(t). Instead, we need to train a Critic model that approximates the value function (remember that value function calculates what is the maximum expected future reward given a state and an action). This value function replaces the reward function in policy gradient that calculates the rewards only at the end of the episode.

Data Exchange Between the Client and Game

`from gym_torcs import TorcsEnv` import gym_torcs library which is used to setup connection.

`env = TorcsEnv(vision=False, throttle=True,gear_change=False)` setup TORCS environment.

`ob = env.reset()`

`s_t = np.hstack((ob.angle, ob.track, ob.trackPos, ob.speedX, ob.speedY, ob.speedZ, ob.wheelSpinVel/100.0, ob.rpm))` retrieves data(states) from game server.

`ob, r_t, done, info = env.step(action)` sends command(actions to be taken) to the game server, where r_t is the reward for taking that action.

Actor-Critic Models

Actor Model

Critic Model

Model Working

This Algorithm was implemented using tensorflow as follows :

`loss = tf.convert_to_tensor(critic.train_on_batch([states,actions], y_t))` Trained critic network on states and actions obtained from actor network, the true vaules y_t are obtained from target network.

`a_for_grad = actor(states)` Obtained actions using actor_networks by passing states in it.

`qsa = critic([states,a_for_grad])` qsa is the outpt of critic network for states, a_for_grad, which will be used for updating actor policy.

`grads = tape.gradient(qsa,actor.trainable_weights)` Calculated gradients of actor policy with respect to the output of critic network.

`opt.apply_gradients(zip(grads, actor.trainable_weights))` Updated actor weights using gradients obtained above.

`critic_target.trainable_weights[i] = 0.001critic.trainable_weights[i] + (1-0.001)critic.trainable_weights[i]` Soft update of parameters of critic_target network with critic network parameters. Similarly actor_target network's parameters were updated.

Result

References

Deep-RL-Course

RL-Blog by yanpanlau

Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm by Nesma M. AshrafID1, Reham R. Mostafa2, Rasha H. Sakr1, M. Z. Rashad

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
OU.py		OU.py
README.md		README.md
ReplayBuffer.py		ReplayBuffer.py
Requirements.txt		Requirements.txt
__init__.py		__init__.py
actormodel.h5		actormodel.h5
actormodel.json		actormodel.json
autostart.sh		autostart.sh
criticmodel.h5		criticmodel.h5
criticmodel.json		criticmodel.json
ddpg.py		ddpg.py
gym_torcs.py		gym_torcs.py
sample_agent.py		sample_agent.py
snakeoil3_gym.py		snakeoil3_gym.py
tempo.py		tempo.py
tempo1.py		tempo1.py
torcs_env.py		torcs_env.py

atul-dhamija/Reinforcement-Learning-on-TORCS

Folders and files

Latest commit

History

Repository files navigation

Self Driving Car in TORCS Simulator

Dependencies

Background

Approach

Actor-Critic Background

Why Actor-Critic ?

Data Exchange Between the Client and Game

from gym_torcs import TorcsEnv import gym_torcs library which is used to setup connection.

env = TorcsEnv(vision=False, throttle=True,gear_change=False) setup TORCS environment.

ob = env.reset()

s_t = np.hstack((ob.angle, ob.track, ob.trackPos, ob.speedX, ob.speedY, ob.speedZ, ob.wheelSpinVel/100.0, ob.rpm)) retrieves data(states) from game server.

ob, r_t, done, info = env.step(action) sends command(actions to be taken) to the game server, where r_t is the reward for taking that action.

Actor-Critic Models

Actor Model

Critic Model

Model Working

loss = tf.convert_to_tensor(critic.train_on_batch([states,actions], y_t)) Trained critic network on states and actions obtained from actor network, the true vaules y_t are obtained from target network.

a_for_grad = actor(states) Obtained actions using actor_networks by passing states in it.

qsa = critic([states,a_for_grad]) qsa is the outpt of critic network for states, a_for_grad, which will be used for updating actor policy.

grads = tape.gradient(qsa,actor.trainable_weights) Calculated gradients of actor policy with respect to the output of critic network.

opt.apply_gradients(zip(grads, actor.trainable_weights)) Updated actor weights using gradients obtained above.

critic_target.trainable_weights[i] = 0.001*critic.trainable_weights[i] + (1-0.001)*critic.trainable_weights[i] Soft update of parameters of critic_target network with critic network parameters. Similarly actor_target network's parameters were updated.

Result

References

About

Topics

Resources

Stars

Watchers

Forks

Languages