Output cleanup -- release as many file resources as possible when they are no longer needed #722

hellkite500 · 2024-02-07T15:31:42Z

Large scale ngen runs use a significant amount of file handles between the model engine, the formulation init files, and the routing routine. The current mechanics don't close and release file handles (or any other resource, really) between the rainfall-runoff model steps and the routing steps. This PR provides mechanisms for releasing resources when they are no longer needed, and should dramatically reduce the number of open file handles used throught the life cycle of a fully integrated routing run.

Additions

Manager and container public functions for releasing shared resources.

Removals

Changes

ngen main now releases formulation resources before launching routing.

Checklist

Target Environment support

Linux
MacOs

…from backing storage

program--

Just a few comments, mostly to clarify my understanding

program-- · 2024-02-07T18:43:30Z

include/utilities/FileStreamHandler.hpp

+            virtual ~FileStreamHandler() override {
+                auto tmp = std::static_pointer_cast<std::ofstream>(output_stream);
+                if(tmp){
+                    tmp->close();


I don't think this is needed, unless we were checking the failbit. When output_stream's lifetime ends, its destructor will be called, so if it's an ofstream it'll close the file anyways.

I'm not remembering exactly why I did this (that's what I get for leaving it sitting in my working tree for so long 😵‍💫). I think you are correct, and this may be an artifact of some early attempts to ensure the files were were closed but due to the shared references of formulations they were not and it took me a while to sort that out haha. I'll take a closer look and remove this if it isn't being useful.

program-- · 2024-02-07T19:00:02Z

src/NGen.cpp

+  features.release_formulations();
+  manager->clear();


Maybe I'm not understanding it, what's the reason for using shared pointers if we're going to handle manually decrementing the reference counts anyways? It would make more sense to put features and manager within an RAII block so they go out of scope before needing to call these.

Jumping around here in my responses, but they are all somewhat related.

Formulations exist as shared pointers because we want to ensure we only create one formulation per feature and reuse that formulation in different contexts.

The formulation manager (which superseded the network indexing I'll discuss momentarily) exists entirely to abstract and encapsulate the parsing, configuring, and instantiation of formulations, i.e. ready to execute models associated with some feature.

It does this reasonalby well, and ultimately creates a map of feature_id -> Formulation. Each formulation must be instantiated and ready to "run" before be we can begin using them in the modeling steps, so this container does just that. The hy_features container, then, is responsible for aligning the topology and network with these formulations. It uses the constructed formulations from the manager and "ties" them to the HY_Catchment realization. The network uses feature indexing and maps to provide the indirection back to to the formulation when used in the actual computational steps...

I'm not sure if that helped in understanding or not. I can try to articulate that more completely if needed. To your point on using RAII blocks...yes, that would be one solution. The functions to explicitly release the resources by clearing the underlying maps that hold the references are another. Having these functions provides some additional flexibility beyond relying on the RAII blocks, though. For example, a Network still has a valid feature index that may be useful/needed without having a need for the model formulation associated with the features indexed by the network (the complete HY_Catchment "realization"). Similarly, the manager actually manages a little bit more than just formulations -- it manages time and other configurations that have utility without require references to the model formuations for each catchment.

program-- · 2024-02-07T19:00:38Z

src/NGen.cpp

@@ -539,6 +539,17 @@ int main(int argc, char *argv[]) {

    } //done time

+  //Close down any file handles we have
+  for(auto& f : nexus_outfiles){
+    f.second.close();


Similar to below, nexus_outfiles should be moved into the function and placed within an RAII block, so the destructor can handle it.

Nexus output handling is still pretty rough, there's been a long standing TODO to refactor it -- but its been pretty low priority.

program-- · 2024-02-07T19:12:59Z

include/core/catchment/HY_CatchmentArea.hpp

    virtual ~HY_CatchmentArea();

    protected:

    polygon_t bounds;

-    utils::StreamHandler output;
+    std::unique_ptr<utils::StreamHandler> output;


If we're going to use pointers anyways, why not just use std::ostream classes directly?

StreamHandler is a bit of an abstract wrapper around ostream...the original intent was the ability to have an abstract output interface that was easily connected to existing mechanics (like ofstream for csv/file output) but also be able to implement more complicated output mechanics, such as writing to NetCDF, using the same stream operator/semantics and have a simple way of reconfiguring and switching between these using the base StreamHandler interface.

This could definitely evolve and change.

I think the reason I moved these to unique pointers was to provide some protection against pass by value copies, but I could be wrong on that motivation. I know I definitely wanted to ensure that a given formulation/realization was only ever associated with a single output resource that was tightly tied to it its lifecycle.

As an aside to actually reviewing this PR, I'm pretty sure we should just make StreamHandler go away.

PhilMiller · 2024-05-10T22:32:10Z

A lot of the resource releasing logic in here might fit in well with the finalize() interface that I wrote into #819.

As to RAII, I think we should probably aim for explicit high-level resource releases, and maybe embrace RAII-driven destruction in lower-level components where there's no need for distinction between release from destruction. My experience is that it can become way too difficult to arrange scopes and implicit lifetimes to fit initialization, the desired forward execution, and the necessary release/shutdown order. Ideally, everything would be a nice LIFO stack, but dependencies often don't work out that way.

PhilMiller · 2024-05-12T15:36:09Z

As an additional note on explicit release, going down that path also means that there's a point in the program where we can internally assert various checks that resource usage is down as we expect, as a guard against things that we don't leak, but shouldn't have retained.

hellkite500 added 4 commits February 7, 2024 08:21

refactor(output): manage output stream handler as unique pointer

1b5fc82

feat(formulation_manager): add public function to clear formulations …

e2546d4

…from backing storage

feat(hy_features): add public function to release formulation pointers

7102588

feat(ngen): release resources before launching routing

d64795d

hellkite500 requested review from PhilMiller, program-- and robertbartel February 7, 2024 15:31

program-- reviewed Feb 7, 2024

View reviewed changes

program-- assigned hellkite500 Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output cleanup -- release as many file resources as possible when they are no longer needed #722

Output cleanup -- release as many file resources as possible when they are no longer needed #722

hellkite500 commented Feb 7, 2024

program-- left a comment

program-- Feb 7, 2024

hellkite500 Feb 7, 2024

program-- Feb 7, 2024

hellkite500 Feb 7, 2024

program-- Feb 7, 2024

hellkite500 Feb 7, 2024

program-- Feb 7, 2024

hellkite500 Feb 7, 2024

PhilMiller Feb 7, 2024

PhilMiller commented May 10, 2024

PhilMiller commented May 12, 2024

Output cleanup -- release as many file resources as possible when they are no longer needed #722

Are you sure you want to change the base?

Output cleanup -- release as many file resources as possible when they are no longer needed #722

Conversation

hellkite500 commented Feb 7, 2024

Additions

Removals

Changes

Checklist

Target Environment support

program-- left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhilMiller commented May 10, 2024

PhilMiller commented May 12, 2024