Unable to import own training checkpoints #4

suissemaxx · 2017-02-22T09:35:32Z

I can successfully train on a different corpus with rnn_train.py and get these files in /checkpoints:

rnn_train_1487755124-1500000.meta
rnn_train_1487755124-1500000.data-00000-of-00001
rnn_train_1487755124-1500000.index
checkpoint

Unfortunately I am unable to use the saved checkpoint with rnn_play.py.

I changed the filepaths to the .meta and .data files above in rnn_play.py but get this error:

DataLossError (see above for traceback): Unable to open table file .\rnn_train_1487755124-1500000.data-00000-of-00001: Data loss: not an sstable (bad magic number): perhaps your file is in a different file format and you need to use a different restore operator?

I already checked GitHub and SO for possible answers but couldn´t solve it that way.

How can I fix this? Any help is very much appreciated.

The text was updated successfully, but these errors were encountered:

suissemaxx · 2017-02-23T11:13:23Z

I found a quick workaround. I changed the tf.train.Saver in rnn_train.py (line 126) to write V1 checkpoints:

saver = tf.train.Saver(write_version=tf.train.SaverDef.V1, max_to_keep=1)

Now i can successfully run rnn_play.py.

How can I correctly save & restore V2 checkpoints?

Many thanks in advance.

martin-gorner · 2017-02-23T13:57:09Z

Hmm, interesting...
I will have to investigate this one. Thanks for reporting the issue.

attilaaronnagy · 2017-05-07T18:34:15Z

for me the problem was that I did not change the name of the "author" in the rnn_play session restore. change it to tf.train.latest_checkpoint and then (at least for me) it works with V2

new_saver = tf.train.import_meta_graph('./checkpoints/rnn_train_1494179714-1800000.meta')
new_saver.restore(sess, tf.train.latest_checkpoint('./checkpoints/'))

Dor1s · 2018-06-10T04:07:40Z

Had the same issue, thanks @attilaaronnagy your comment helped.

I've also added

saved_file = saver.save(sess, 'checkpoints/rnn_train_' + timestamp)
print("Saved file: " + saved_fil

right after the loop in https://github.com/Dor1s/tensorflow-rnn-shakespeare/blob/f7038f79328f31302dc8b58b716535e31c54bab8/rnn_train.py#L194

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to import own training checkpoints #4

Unable to import own training checkpoints #4

suissemaxx commented Feb 22, 2017 •

edited

Loading

suissemaxx commented Feb 23, 2017 •

edited

Loading

martin-gorner commented Feb 23, 2017

attilaaronnagy commented May 7, 2017

Dor1s commented Jun 10, 2018

Unable to import own training checkpoints #4

Unable to import own training checkpoints #4

Comments

suissemaxx commented Feb 22, 2017 • edited Loading

suissemaxx commented Feb 23, 2017 • edited Loading

martin-gorner commented Feb 23, 2017

attilaaronnagy commented May 7, 2017

Dor1s commented Jun 10, 2018

suissemaxx commented Feb 22, 2017 •

edited

Loading

suissemaxx commented Feb 23, 2017 •

edited

Loading