Are `WindowedTimeAverage`s working properly when picking up simulations? #3485

tomchor · 2024-02-27T18:03:37Z

I've been using WindowedTimeAverages for my simulations (by setting schedule = AveragedTimeInterval(...) in a NetCDFOutputWriter). I noticed that whenever I run out of walltime and have to checkpoint my simulations, when I pick them up again I get the following warning for each of the time-averaged outputs:

┌ Warning: Returning a WindowedTimeAverage before the collection period is complete.
└ @ Oceananigans.OutputWriters /glade/work/tomasc/.julia/packages/Oceananigans/3LHMs/src/OutputWriters/windowed_time_average.jl:201

(which comes from this call.)

Does this mean that the time averages aren't being correctly calculated after picking up? I tried following the trail to figure it out but couldn't determine the answer...

The text was updated successfully, but these errors were encountered:

glwagner · 2024-02-27T19:36:31Z

I don't think this is done correctly right now. Somehow the Checkpointer needs to know about the simulation for this to work. But right now it only saves model properties.

tomchor · 2024-02-28T15:41:36Z

Ah, I see. Sounds like it wouldn't be trivial to add that support.

I guess a workaround to avoid partially-averaged results when picking up would be to set the Checkpointer to only write checkpoints when the TimeAveraged results are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...

glwagner · 2024-02-28T15:47:53Z

Ah, I see. Sounds like it wouldn't be trivial to add that support.

I guess a workaround to avoid partially-averaged results when picking up would be to set the Checkpointer to only write checkpoints when the TimeAveraged results are also written. I'm not sure what that would do to other (more frequent) outputs though, since it'd potentially try to write some time steps twice (and not in monotonic ordering)...

There are two things. One is to fix the flow of information... that's probably pretty easy because we can either 1) make Checkpointer a callback or 2) change write_output! to have the syntax write_output(writer, simulation) here:

Oceananigans.jl/src/Simulations/run.jl

Line 147 in 643b484

writer.schedule(sim.model) && write_output!(writer, sim.model)

then with a fallback write_output!(writer, sim) = write_output!(writer, sim.model), very little has to change...

The other task is to figure out how to save down the "state" of the time-averaging apparatus so that it can be restored correctly. That's maybe the harder part but of course unavoidable to make checkpointing work with it.

navidcy · 2024-03-14T07:11:48Z

This might be also relevant for when output files are split? See #3506.

glwagner · 2024-03-14T14:17:44Z

Hmm yes, perhaps the output writers need to be re-initialized when picking up as well? That would require extending what we do when we pick up here:

Oceananigans.jl/src/Simulations/run.jl

Lines 87 to 90 in 3bb62a6

 if we_want_to_pickup(pickup) 

 checkpoint_file_path = checkpoint_path(pickup, sim.output_writers) 

 set!(sim.model, checkpoint_file_path) 

 end

navidcy added the output 💾 label Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are `WindowedTimeAverage`s working properly when picking up simulations? #3485

Are `WindowedTimeAverage`s working properly when picking up simulations? #3485

tomchor commented Feb 27, 2024

glwagner commented Feb 27, 2024

tomchor commented Feb 28, 2024

glwagner commented Feb 28, 2024

navidcy commented Mar 14, 2024

glwagner commented Mar 14, 2024

Are WindowedTimeAverages working properly when picking up simulations? #3485

Are WindowedTimeAverages working properly when picking up simulations? #3485

Comments

tomchor commented Feb 27, 2024

glwagner commented Feb 27, 2024

tomchor commented Feb 28, 2024

glwagner commented Feb 28, 2024

navidcy commented Mar 14, 2024

glwagner commented Mar 14, 2024

Are `WindowedTimeAverage`s working properly when picking up simulations? #3485

Are `WindowedTimeAverage`s working properly when picking up simulations? #3485