Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are WindowedTimeAverages working properly when picking up simulations? #3485

Open
tomchor opened this issue Feb 27, 2024 · 5 comments
Open

Comments

@tomchor
Copy link
Collaborator

tomchor commented Feb 27, 2024

I've been using WindowedTimeAverages for my simulations (by setting schedule = AveragedTimeInterval(...) in a NetCDFOutputWriter). I noticed that whenever I run out of walltime and have to checkpoint my simulations, when I pick them up again I get the following warning for each of the time-averaged outputs:

┌ Warning: Returning a WindowedTimeAverage before the collection period is complete.
└ @ Oceananigans.OutputWriters /glade/work/tomasc/.julia/packages/Oceananigans/3LHMs/src/OutputWriters/windowed_time_average.jl:201

(which comes from this call.)

Does this mean that the time averages aren't being correctly calculated after picking up? I tried following the trail to figure it out but couldn't determine the answer...

@glwagner
Copy link
Member

I don't think this is done correctly right now. Somehow the Checkpointer needs to know about the simulation for this to work. But right now it only saves model properties.

@tomchor
Copy link
Collaborator Author

tomchor commented Feb 28, 2024

Ah, I see. Sounds like it wouldn't be trivial to add that support.

I guess a workaround to avoid partially-averaged results when picking up would be to set the Checkpointer to only write checkpoints when the TimeAveraged results are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...

@glwagner
Copy link
Member

Ah, I see. Sounds like it wouldn't be trivial to add that support.

I guess a workaround to avoid partially-averaged results when picking up would be to set the Checkpointer to only write checkpoints when the TimeAveraged results are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...

There are two things. One is to fix the flow of information... that's probably pretty easy because we can either 1) make Checkpointer a callback or 2) change write_output! to have the syntax write_output(writer, simulation) here:

writer.schedule(sim.model) && write_output!(writer, sim.model)

then with a fallback write_output!(writer, sim) = write_output!(writer, sim.model), very little has to change...

The other task is to figure out how to save down the "state" of the time-averaging apparatus so that it can be restored correctly. That's maybe the harder part but of course unavoidable to make checkpointing work with it.

@navidcy
Copy link
Collaborator

navidcy commented Mar 14, 2024

This might be also relevant for when output files are split? See #3506.

@glwagner
Copy link
Member

Hmm yes, perhaps the output writers need to be re-initialized when picking up as well? That would require extending what we do when we pick up here:

if we_want_to_pickup(pickup)
checkpoint_file_path = checkpoint_path(pickup, sim.output_writers)
set!(sim.model, checkpoint_file_path)
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants