Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding LSTM support to pretrain #315

Open
wants to merge 55 commits into
base: master
Choose a base branch
from

Conversation

XMaster96
Copy link

@XMaster96 XMaster96 commented May 8, 2019

This PR adds LSTM support to pretrain. I am not quite done yet, but there are some Implementations matters that I need to discuss first.

personal edit:
I finally found the time to work more on this PR. The problem is that I took so long that I forgot half of what I did, so if there is some rough code in there, it is because of that. I still do not have that much time, so expect me to not answer immediately

closes #253

@XMaster96
Copy link
Author

The problem is that ExpertsDataset now needs to know the exact batch_size and how many envs are used per update (only if LSTM is True). The problem with that is that this information are contained and calculated in the model. The ExpertDataset object needs to know this information before is past in to the pretrain function. So the user than needs to calculate this data before creating an instance of ExpertDataset. My idea is that pretrain accepts a String with the expert_path for dataset (or a ExpertDataset object), and than create an instance of ExpertDataset with the parameters content in the model internally.

@araffin
Copy link
Collaborator

araffin commented May 18, 2019

Hello,

do you consider this PR ready for review? (After a quick look, I saw that a saved file was still there (nano) and there seems to be some code duplication that can be improved ;))

@XMaster96
Copy link
Author

o/

I saw that a saved file was still there (nano)

I removed nano.

there seems to be some code duplication

I am not quite sure but I think you are referring to the _get_pretrain_placeholders function in each model class. I don't think there is an easy way to combine it in to one function, a lot of the models have slightly different placeholders, so it would be quite hard to combine the functions.

The only thing I still plan to do is to add a bit more functionality to it. The code which is already there is so far finalized and can be reviewed.

@XMaster96
Copy link
Author

The Test has failed after updating my branch.
ERROR com.codacy Invalid configuration: Empty argument for --project-token
I don't understand this error fully but after some googling I think this error comes from your side ??

@AdamGleave
Copy link
Collaborator

AdamGleave commented Sep 7, 2019

The Test has failed after updating my branch.
ERROR com.codacy Invalid configuration: Empty argument for --project-token
I don't understand this error fully but after some googling I think this error comes from your side ??

You can ignore this, I've attempted a fix in #467

@XMaster96
Copy link
Author

Now that my PR has finally pass all the unit test again, could you start reviewing the PR, so that I then can change it if necessary.
I am basically done with everything I was planning to do.

Copy link
Collaborator

@araffin araffin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the test_recorded_images folder too.

@@ -87,9 +87,20 @@ def __init__(self, policy, env, gamma=0.99, n_steps=5, vf_coef=0.25, ent_coef=0.

def _get_pretrain_placeholders(self):
policy = self.train_model

if self.initial_state is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do:

states_ph, snew_ph, dones_ph = None, None, None

so it's more compact, same for the else case

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the variable declaration vertical and horizontal interrupts the read flow. Yes It would make the code shorter but also les readable in my opinion. But I will change it if you really wont it that way.

@@ -152,8 +152,18 @@ def __init__(self, policy, env, gamma=0.99, n_steps=20, num_procs=1, q_coef=0.5,
def _get_pretrain_placeholders(self):
policy = self.step_model
action_ph = policy.pdtype.sample_placeholder([None])

if self.initial_state is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remark as before

@@ -87,9 +87,20 @@ def __init__(self, policy, env, gamma=0.99, n_steps=5, vf_coef=0.25, ent_coef=0.

def _get_pretrain_placeholders(self):
policy = self.train_model

if self.initial_state is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should rather check the recurrent attribute of the policy, it is in the base policy class

@@ -50,6 +50,10 @@ def __init__(self, policy, env, verbose=0, *, requires_vec_env, policy_base, pol
self.sess = None
self.params = None
self._param_load_ops = None
self.initial_state = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need that variable, there is the recurrent attribute for that

@@ -246,13 +250,24 @@ def pretrain(self, dataset, n_epochs=10, learning_rate=1e-4,
else:
val_interval = int(n_epochs / 10)

use_lstm = self.initial_state is not None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same remark, you can use the recurrent attribute

@@ -272,13 +287,23 @@ def pretrain(self, dataset, n_epochs=10, learning_rate=1e-4,

for epoch_idx in range(int(n_epochs)):
train_loss = 0.0
if use_lstm:
state = self.initial_state[:envs_per_batch]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial state is an attribute of the policy

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes and no.
All the models Which can use LSTM policies have the Variable self.initial_state, Which gets set to the initial state from policy. The variable self.initial_state gets used and not the one in the policy. It is also not that easy, to access the initial state from the BaseRLModel. It wars much simpler to at the self.initial_state variable to the Base Model, and then let is overwrite later at model initialization.


if use_lstm:
feed_dict.update({states_ph: state, dones_ph: expert_mask})
val_loss_, = self.sess.run([loss], feed_dict)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you only need to update the feeddict, self.sess.run can be called outside, so you avoid code duplication

:param batch_size: (int) the minibatch size for behavior cloning
:param traj_limitation: (int) the number of trajectory to use (if -1, load all)
:param randomize: (bool) if the dataset should be shuffled
:param randomize: (bool) if the dataset should be shuffled, this will be overwritten to False
if LSTM is True.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? where is the LSTM variable?

except StopIteration:
dataloader = iter(dataloader)
return next(dataloader)
if traj_data is not None and expert_path is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be check in the base class?
Looks like duplicated code

Also, I'm not sure if two classes are needed...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had it in one class, but someone who used the PR, has suggested to split it in two classes. I think that this was a good idea, because it clearly improved the user friendliness of the ExpertDataset class.

model.pretrain(dataset, n_epochs=20)
model.save("test-pretrain")
del dataset, model
@pytest.mark.parametrize("model_class_data", [[A2C, 4, True, "MlpLstmPolicy",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like duplicated code, I think you can handle the different type of policy and expert path in the if.

@araffin araffin removed this from the v2.8.0 milestone Sep 13, 2019
@XMaster96 XMaster96 requested a review from araffin September 14, 2019 12:27
@skervim
Copy link

skervim commented Oct 21, 2019

Hi everyone, I would like to know whether there is still active work on "LSTM support to pretrain". I've seen that this feature was removed from the v2.8.0 milestone more than a month ago. Is the work on hold? Kind regards!
@araffin @XMaster96

@XMaster96
Copy link
Author

Hi everyone, I would like to know whether there is still active work on "LSTM support to pretrain". I've seen that this feature was removed from the v2.8.0 milestone more than a month ago. Is the work on hold? Kind regards!
@araffin @XMaster96

@skervim
Well I am done programing it. I also would like to know, what the merge status is. But if you need it, just use my dev Fork.

@araffin
Copy link
Collaborator

araffin commented Oct 28, 2019

hello @skervim,
This PR requires more changes and time (that includes the code review) than expected. That's why it has been removed from the milestone for now.

As @XMaster96 says, you can use his fork for now if you want try the feature.

@XMaster96
Copy link
Author

This PR requires more changes and time (that includes the code review) than expected. That's why it has been removed from the milestone for now.

Are you referring to the requested changes, I have partially implemented and partially commented on why I think I shouldn't change that. Or are you referring to future change requests?

I am also aware that I don't have yet written a Documentation for the website, I was planning on doing it when everything is ok, and merge ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pre train LSTM policy [question]
5 participants