Validating the model #20

Magpi007 · 2019-10-03T09:25:45Z

Hi,

I would like to know if the model over-fits and also the optimum number of epochs, plotting accuracy and loss as it's shown here. It would be possible to do it using this repo without making too many changes (maybe using the evaluation results as validation)?

Thanks.

ThilinaRajapakse · 2019-10-03T09:39:00Z

You can get the training loss without any changes. You can use tensorboardx to get a graph of the training loss. The loss information is being written to the 'runs' directory.

If you want to evaluate on the dev set during training, you can set evaluate_during_training to True in the args dict.

If you want to add additional information to that, you can use additional tb_writer.add_scalar() calls inside the train function.

Magpi007 · 2019-10-09T03:01:12Z

The concept is simple but I am still not able to plot anything.

In the training function we have these lines:

tb_writer.add_scalar('eval_{}'.format(key), value, global_step)
tb_writer.add_scalar('lr', scheduler.get_lr()[0], global_step)
tb_writer.add_scalar('loss', (tr_loss - logging_loss)/args['logging_steps'], global_step)

If I understand it right, for each epoch, the first one contains every result that we have generated, and the second and third ones the learning rate and loss.

For this case I am going to plot this info only with one epoch, but it should still show something. As per the documentation, I understand that we only need to launch this line tensorboard --logdir runs (as we store the scalars in the runs directory), am I right?

I get no error message in any point of the implementation (having activated the option evaluate_during_training), but when I try to plot it I get this error:

There is a folder called runs in the experiment folder.

ThilinaRajapakse · 2019-10-09T03:09:42Z

There should be a subdirectory inside runs for every training run. So your command would look like tensorboard --logdir=runs/subdirectory.

To visualize the last run, you can use the line below.
tensorboard --logdir=$(ls -td | head -1)

Magpi007 · 2019-10-09T04:34:09Z

There is a directory called Oct09_03-14-56_31dd366812b4, but when I run this line:

!tensorboard --logdir="runs/Oct09_03-14-56_31dd366812b4" --host localhost --port 8088

I get a message saying that site http://localhost:8088/ can't be reached, localhost refused to connect.

I have tried with different ports but no way. I have been researching on the internet and some people says that it's possible to achieve this using a tunnel, ngrok, here. Before trying it I would like to ask you if it makes sense, or if it should work straightforward from google colab.

Magpi007 · 2019-10-09T05:00:57Z

Supposedly wouldn't be needed...

https://www.tensorflow.org/tensorboard/tensorboard_in_notebooks

Magpi007 · 2019-10-09T05:28:34Z

ok, I think I got it.

First I loaded the TensorBoard notebook extension
%load_ext tensorboard

And run the tensorboard with:
%tensorboard --logdir=runs/Oct09_03-14-56_31dd366812b4

So I got the dashboard.

I need to play with it a bit more to see if it's working, but looks it is.

ThilinaRajapakse · 2019-10-09T07:56:44Z

Great to see you got it to work. I didn't realize you were on Colab!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validating the model #20

Validating the model #20

Magpi007 commented Oct 3, 2019

ThilinaRajapakse commented Oct 3, 2019

Magpi007 commented Oct 9, 2019 •

edited

Loading

ThilinaRajapakse commented Oct 9, 2019

Magpi007 commented Oct 9, 2019

Magpi007 commented Oct 9, 2019

Magpi007 commented Oct 9, 2019 •

edited

Loading

ThilinaRajapakse commented Oct 9, 2019

Validating the model #20

Validating the model #20

Comments

Magpi007 commented Oct 3, 2019

ThilinaRajapakse commented Oct 3, 2019

Magpi007 commented Oct 9, 2019 • edited Loading

ThilinaRajapakse commented Oct 9, 2019

Magpi007 commented Oct 9, 2019

Magpi007 commented Oct 9, 2019

Magpi007 commented Oct 9, 2019 • edited Loading

ThilinaRajapakse commented Oct 9, 2019

Magpi007 commented Oct 9, 2019 •

edited

Loading

Magpi007 commented Oct 9, 2019 •

edited

Loading