Restarting from checkpoint and appending StateReporter to a file results in multiple entries for the same time step. #10

WillGPoole · 2023-07-13T09:18:28Z

Posting this here because it came up yesterday and I forgot to mention it, but it might be something for the main OpennMM repo.

One of the keen eyed at the workshop yesterday managed to work out that because the checkpoint file is written out every 1000 steps, there is time when a simulation finished on something that is not a multiple of 1000 - e.g. 1500.

Then restarting the simulation from the checkpoint file from step 1000 resulted in the log file having double entries.

When plotting the data, this results in two data points at that time step, although admittedly, the y values are basically identical.

I imagine it becomes more and more obvious when the disparity between the checkpoint file and the log file is greater than 10 fold and on how many steps it has to go back.

In the example below I could have got points all the way to 14900 and still restarted at 14000. Imagine if we had written a checkpoint every 10000 steps for example.

Very quick example:

 13900,-146856.87880890456,305.87961649181017,91.87334675537606
 14000,-146838.48677000694,302.0436151483807,91.85773612934197
 14100,-146570.08052000694,302.9515737651756,91.85773612934197
----------- Simulation ended, and restarted from 14000 -------------
 14100,-146569.67427000694,302.94798025386604,91.85773612934197
 14200,-146859.15565928526,305.1387575727,91.68905971263145
 14300,-147399.1003796428,305.3627325364048,91.29362283195196
 14400,-147585.3503796428,301.270411426622,91.29362283195196
 14500,-146945.1628796428,298.35001246529237,91.29362283195196

I guess a fix would be for OpenMM at somepoint in the reporter setup to check the current timestep vs the latest reported step and delete lines from the log file?

The text was updated successfully, but these errors were encountered:

WillGPoole · 2023-07-13T09:19:16Z

This is temperature

sef43 · 2023-07-13T12:27:33Z

Thanks for pointing this out! This is a limitation of the workshop script. The checkpointing and restarting procedure (or perhaps analysis) will need to be more sophisticated. I am not sure what the current best practice for this is and welcome suggestions!

sef43 mentioned this issue Jul 13, 2023

log file/checkpoint file discord openmm/openmm#4142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting from checkpoint and appending StateReporter to a file results in multiple entries for the same time step. #10

Restarting from checkpoint and appending StateReporter to a file results in multiple entries for the same time step. #10

WillGPoole commented Jul 13, 2023 •

edited

Loading

WillGPoole commented Jul 13, 2023

sef43 commented Jul 13, 2023

Restarting from checkpoint and appending StateReporter to a file results in multiple entries for the same time step. #10

Restarting from checkpoint and appending StateReporter to a file results in multiple entries for the same time step. #10

Comments

WillGPoole commented Jul 13, 2023 • edited Loading

WillGPoole commented Jul 13, 2023

sef43 commented Jul 13, 2023

WillGPoole commented Jul 13, 2023 •

edited

Loading