Add checkpoint monitoring class #75

wil3 · 2020-06-14T12:55:58Z

Is your feature request related to a problem? Please describe.
RL training produces checkpoints however the examples do not include the evaluation

Describe the solution you'd like
The thesis work used a checkpoint monitor to evaluate new checkpoints once created

Describe alternatives you've considered
We can alternatively do this ondemand but we'll leave this to a new issue/ PR

Additional context
This would be one of a new features to support RL training and evaluation.

Closes: #75

Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.

xabierolaz · 2020-06-17T16:34:35Z

Will test this tomorrow and see if I can add my own landing and takeoff scripts slowly

wil3 · 2020-06-17T19:03:31Z

@xabierolaz this has been merged in to master was just referencing the changes. If you are looking to merge navigation support into GymFC it would be a good idea to open a feature request issue so others know you are working on it and so the approach can be discussed. Recently #76 was added however I have no idea if anything will come of it but if multiple people were working on it it'd be completed much faster.

xabierolaz · 2020-06-19T07:57:35Z

@wil3, have opened a new issue #79 and mentioned #76 in case we can complete it faster

varunag18 · 2020-06-23T12:57:14Z

Hi Wil, I have been going through the two scripts ppo_baselines_train.py and tf_checkpoint_evaluate.py. As per my understanding, ppo_baselines_train.py is the training script which saves the trained model in the checkpoints folder, while tf_checkpoint_evaluate.py is the script for retrieving the contents from the checkpoints folder(.meta file) and creating the trial.csv files.
If my understanding is correct, then there are a couple of doubts i need clarification on:

Why do we have the below code in tf_checkpoint_evaluate.py:

ac = pi.action(ob, env.sim_time, env.angular_rate_sp, env.imu_angular_velocity_rpy)
ob, reward, done,  _ = env.step(ac)

that invokes the step function of Gymfc environment. It is this data that is getting logged. Should we not be retrieving the data from the checkpoints folder for storing it in the .csv files?
2. What is the need to run tf_checkpoint_evaluate.py in parallel to ppo_baselines_train.py? Can we not run tf_checkpoint_evaluate after the training has completed?
3. tf_checkpoint_evaluate.py is running in an infinite loop for now because of the below code in monitor.py
while self.watching:
self._check_new_checkpoint()
time.sleep(10)
I believe there should be the condition to terminate after waiting for a given time duration.

wil3 · 2020-06-23T18:55:38Z

Hi @varunag18 ,

Which part of the code do you have a question about?
The first line we use the policy interface so we have a common interface for testing and benchmarking different policies. You can have a look at the current available ones here. A policy must have a action and reset function defined. If you look at that policy it is loading data from the checkpoint folder. The csv file is just a log of the episode. If you use the plotting tools you'll see how its used. The second line is the Gym interface, this is the same for all OpenAI gyms.
There's nothing saying you need to run them parallel, it's not requirement just suggested, this is why they are separate processes. It's recommended so you can see the performance semi real-time so you can determine if you need to kill the job and its more efficient than running sequentially. In most cases you want to be training a large number of agents to get the best one.
Yep this is a bug, feel free to submit a bug issue to track it. The monitor callback should probably return true to continue, false otherwise and set the return value here to self.watching.

varunag18 · 2020-06-24T05:37:42Z

The first line we use the policy interface so we have a common interface for testing and benchmarking different policies. You can have a look at the current available ones here. A policy must have a action and reset function defined. If you look at that policy it is loading data from the checkpoint folder. The csv file is just a log of the episode. If you use the plotting tools you'll see how its used. The second line is the Gym interface, this is the same for all OpenAI gyms.

I think I did not understand it clearly then. Your response makes it clear to me now. The training script invokes the MlpPolicy class for training the NN, while the evaluation script invokes the PpoBaselinesPolicy class to get the trained data from the checkpoints folder. Am I correct now?

In most cases you want to be training a large number of agents to get the best one.

Where exactly are we setting the count of number of agents?

Yep this is a bug, feel free to submit a bug issue to track it.

Will do this for sure.

wil3 · 2020-06-24T15:13:19Z

Yes that is correct. Checkpoints are just data files containing the neural network and other functions used during training. Training is specific to the RL algorithm. However as long as the algorithm produces a Tensorflow graph we can do the evaluation independent of the training library and just on the Tensorflow checkpoint which is ideal because this is more scalable. All you need to know is the input and output tensor names to extract the NN subgraph.

Where exactly are we setting the count of number of agents?

You don't. When you execute the PPO trainer it trains a single agent. To train more than one just execute that script N number of times. The number N depends on your research goals. Wrap the python call in a bash loop script if you want to automate it.

varunag18 · 2020-07-26T09:42:29Z

Hi Wil,
I was wondering if its possible to view the voltage and current values of the motors, as mentioned in Section VI C of your thesis, using which, the energy consumed at a given step can be calculated. In fc_env.py, within the _step_sim() method, if I print the self.state_message, I get the esc_current and esc_voltage values as 0, in the output. Please guide on how to add feature to monitor the energy consumption at each step.

varunag18 · 2020-07-26T12:20:06Z

Another question, how do we exactly zero down on the best checkpoint? What is the criteria to do so? In your thesis, you write, "Once training was complete, we select the checkpoint that provided the most stable step responses, which occurred after 2,500,000 steps to use as our flight controller policy." Please elaborate on this.

xabierolaz · 2020-07-27T07:06:02Z

@varunag18
I thik he's refering to based on we getting a checkpoint each 100.000 steps, the model starts to converge (MAE gets closer to 0) after 2 million steps, so getting a checkpoint with the lowest available error would be the best choice

wil3 · 2020-07-28T03:12:52Z

Hi @varunag18, this issue is currently closed. In the future please open a new issue if you have a new question.

Which chapter are you referring to? ESC voltage and current is baked into the framework (see the message type here) but there currently doesn't exist a model, this is not something I explored for my thesis work.

If this is something you are looking to support you can fork the aircraft-plugin repo, add the model and pass the value back here.

As @xabierolaz points out, if you plot over training validation the MAE or other error metrics it will begin to converge after a couple million steps. If the reward function was perfect we'd probably select the longest trained one with the highest reward. Unfortunately this isn't the case and it isn't perfect so after convergence it usually takes looking at a bunch of step responses plots and selecting the agent that produces the best step responses in terms of minimizing error and oscillations. Then when you look to have a good one trying it out on the drone and confirming there are no visual oscillations.

wil3 added the enhancement New feature or request label Jun 14, 2020

wil3 added a commit that referenced this issue Jun 14, 2020

Add PPO evaluation

2104806

Closes: #75

wil3 added a commit that referenced this issue Jun 17, 2020

Add PPO evaluation

67d638a

Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.

wil3 added a commit that referenced this issue Jun 17, 2020

Add PPO evaluation

44b5eec

Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.

wil3 mentioned this issue Jun 17, 2020

Add PPO evaluation #77

Merged

wil3 added a commit that referenced this issue Jun 17, 2020

Add PPO evaluation

d0dfa4e

Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.

wil3 added a commit that referenced this issue Jun 17, 2020

Add PPO evaluation

1be76e5

Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.

wil3 closed this as completed in #77 Jun 17, 2020

wil3 added a commit that referenced this issue Jun 17, 2020

Add PPO evaluation

6b7913f

Closes: #75. Disable gravity to match thesis results. Fix bug where RNG seed was ignored.

This was referenced Jun 17, 2020

ppo_example.py #59

Closed

Support worlds with and without gravity #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add checkpoint monitoring class #75

Add checkpoint monitoring class #75

wil3 commented Jun 14, 2020

xabierolaz commented Jun 17, 2020

wil3 commented Jun 17, 2020

xabierolaz commented Jun 19, 2020

varunag18 commented Jun 23, 2020 •

edited by wil3

Loading

wil3 commented Jun 23, 2020

varunag18 commented Jun 24, 2020 •

edited

Loading

wil3 commented Jun 24, 2020

varunag18 commented Jul 26, 2020 •

edited

Loading

varunag18 commented Jul 26, 2020

xabierolaz commented Jul 27, 2020 •

edited

Loading

wil3 commented Jul 28, 2020

Add checkpoint monitoring class #75

Add checkpoint monitoring class #75

Comments

wil3 commented Jun 14, 2020

xabierolaz commented Jun 17, 2020

wil3 commented Jun 17, 2020

xabierolaz commented Jun 19, 2020

varunag18 commented Jun 23, 2020 • edited by wil3 Loading

wil3 commented Jun 23, 2020

varunag18 commented Jun 24, 2020 • edited Loading

wil3 commented Jun 24, 2020

varunag18 commented Jul 26, 2020 • edited Loading

varunag18 commented Jul 26, 2020

xabierolaz commented Jul 27, 2020 • edited Loading

wil3 commented Jul 28, 2020

varunag18 commented Jun 23, 2020 •

edited by wil3

Loading

varunag18 commented Jun 24, 2020 •

edited

Loading

varunag18 commented Jul 26, 2020 •

edited

Loading

xabierolaz commented Jul 27, 2020 •

edited

Loading