Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validating the model #20

Open
Magpi007 opened this issue Oct 3, 2019 · 7 comments
Open

Validating the model #20

Magpi007 opened this issue Oct 3, 2019 · 7 comments

Comments

@Magpi007
Copy link

Magpi007 commented Oct 3, 2019

Hi,

I would like to know if the model over-fits and also the optimum number of epochs, plotting accuracy and loss as it's shown here. It would be possible to do it using this repo without making too many changes (maybe using the evaluation results as validation)?

Thanks.

@ThilinaRajapakse
Copy link
Owner

You can get the training loss without any changes. You can use tensorboardx to get a graph of the training loss. The loss information is being written to the 'runs' directory.

If you want to evaluate on the dev set during training, you can set evaluate_during_training to True in the args dict.

If you want to add additional information to that, you can use additional tb_writer.add_scalar() calls inside the train function.

@Magpi007
Copy link
Author

Magpi007 commented Oct 9, 2019

The concept is simple but I am still not able to plot anything.

In the training function we have these lines:

tb_writer.add_scalar('eval_{}'.format(key), value, global_step)
tb_writer.add_scalar('lr', scheduler.get_lr()[0], global_step)
tb_writer.add_scalar('loss', (tr_loss - logging_loss)/args['logging_steps'], global_step)

If I understand it right, for each epoch, the first one contains every result that we have generated, and the second and third ones the learning rate and loss.

For this case I am going to plot this info only with one epoch, but it should still show something. As per the documentation, I understand that we only need to launch this line tensorboard --logdir runs (as we store the scalars in the runs directory), am I right?

I get no error message in any point of the implementation (having activated the option evaluate_during_training), but when I try to plot it I get this error:

Untitled

There is a folder called runs in the experiment folder.

@ThilinaRajapakse
Copy link
Owner

There should be a subdirectory inside runs for every training run. So your command would look like tensorboard --logdir=runs/subdirectory.

To visualize the last run, you can use the line below.
tensorboard --logdir=$(ls -td | head -1)

@Magpi007
Copy link
Author

Magpi007 commented Oct 9, 2019

There is a directory called Oct09_03-14-56_31dd366812b4, but when I run this line:

!tensorboard --logdir="runs/Oct09_03-14-56_31dd366812b4" --host localhost --port 8088

I get a message saying that site http://localhost:8088/ can't be reached, localhost refused to connect.

I have tried with different ports but no way. I have been researching on the internet and some people says that it's possible to achieve this using a tunnel, ngrok, here. Before trying it I would like to ask you if it makes sense, or if it should work straightforward from google colab.

@Magpi007
Copy link
Author

Magpi007 commented Oct 9, 2019

Supposedly wouldn't be needed...

https://www.tensorflow.org/tensorboard/tensorboard_in_notebooks

@Magpi007
Copy link
Author

Magpi007 commented Oct 9, 2019

ok, I think I got it.

First I loaded the TensorBoard notebook extension
%load_ext tensorboard

And run the tensorboard with:
%tensorboard --logdir=runs/Oct09_03-14-56_31dd366812b4

So I got the dashboard.

Untitled

I need to play with it a bit more to see if it's working, but looks it is.

@ThilinaRajapakse
Copy link
Owner

Great to see you got it to work. I didn't realize you were on Colab!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants