-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output cleanup -- release as many file resources as possible when they are no longer needed #722
base: master
Are you sure you want to change the base?
Conversation
…from backing storage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments, mostly to clarify my understanding
virtual ~FileStreamHandler() override { | ||
auto tmp = std::static_pointer_cast<std::ofstream>(output_stream); | ||
if(tmp){ | ||
tmp->close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is needed, unless we were checking the failbit. When output_stream
's lifetime ends, its destructor will be called, so if it's an ofstream
it'll close the file anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not remembering exactly why I did this (that's what I get for leaving it sitting in my working tree for so long 😵💫). I think you are correct, and this may be an artifact of some early attempts to ensure the files were were closed but due to the shared references of formulations they were not and it took me a while to sort that out haha. I'll take a closer look and remove this if it isn't being useful.
features.release_formulations(); | ||
manager->clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not understanding it, what's the reason for using shared pointers if we're going to handle manually decrementing the reference counts anyways? It would make more sense to put features
and manager
within an RAII block so they go out of scope before needing to call these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jumping around here in my responses, but they are all somewhat related.
Formulations
exist as shared pointers because we want to ensure we only create one formulation per feature and reuse that formulation in different contexts.
The formulation manager (which superseded the network indexing I'll discuss momentarily) exists entirely to abstract and encapsulate the parsing, configuring, and instantiation of formulations, i.e. ready to execute models associated with some feature.
It does this reasonalby well, and ultimately creates a map
of feature_id -> Formulation
. Each formulation must be instantiated and ready to "run" before be we can begin using them in the modeling steps, so this container does just that. The hy_features
container, then, is responsible for aligning the topology and network with these formulations. It uses the constructed formulations from the manager and "ties" them to the HY_Catchment realization. The network
uses feature indexing and maps to provide the indirection back to to the formulation when used in the actual computational steps...
I'm not sure if that helped in understanding or not. I can try to articulate that more completely if needed. To your point on using RAII blocks...yes, that would be one solution. The functions to explicitly release the resources by clearing the underlying maps that hold the references are another. Having these functions provides some additional flexibility beyond relying on the RAII blocks, though. For example, a Network
still has a valid feature index that may be useful/needed without having a need for the model formulation associated with the features indexed by the network (the complete HY_Catchment "realization"). Similarly, the manager actually manages a little bit more than just formulations -- it manages time and other configurations that have utility without require references to the model formuations for each catchment.
@@ -539,6 +539,17 @@ int main(int argc, char *argv[]) { | |||
|
|||
} //done time | |||
|
|||
//Close down any file handles we have | |||
for(auto& f : nexus_outfiles){ | |||
f.second.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to below, nexus_outfiles
should be moved into the function and placed within an RAII block, so the destructor can handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nexus output handling is still pretty rough, there's been a long standing TODO to refactor it -- but its been pretty low priority.
virtual ~HY_CatchmentArea(); | ||
|
||
protected: | ||
|
||
polygon_t bounds; | ||
|
||
utils::StreamHandler output; | ||
std::unique_ptr<utils::StreamHandler> output; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use pointers anyways, why not just use std::ostream
classes directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StreamHandler is a bit of an abstract wrapper around ostream
...the original intent was the ability to have an abstract output interface that was easily connected to existing mechanics (like ofstream for csv/file output) but also be able to implement more complicated output mechanics, such as writing to NetCDF, using the same stream operator/semantics and have a simple way of reconfiguring and switching between these using the base StreamHandler interface.
This could definitely evolve and change.
I think the reason I moved these to unique pointers was to provide some protection against pass by value copies, but I could be wrong on that motivation. I know I definitely wanted to ensure that a given formulation/realization was only ever associated with a single output resource that was tightly tied to it its lifecycle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside to actually reviewing this PR, I'm pretty sure we should just make StreamHandler
go away.
A lot of the resource releasing logic in here might fit in well with the As to RAII, I think we should probably aim for explicit high-level resource releases, and maybe embrace RAII-driven destruction in lower-level components where there's no need for distinction between release from destruction. My experience is that it can become way too difficult to arrange scopes and implicit lifetimes to fit initialization, the desired forward execution, and the necessary release/shutdown order. Ideally, everything would be a nice LIFO stack, but dependencies often don't work out that way. |
As an additional note on explicit release, going down that path also means that there's a point in the program where we can internally assert various checks that resource usage is down as we expect, as a guard against things that we don't leak, but shouldn't have retained. |
Large scale ngen runs use a significant amount of file handles between the model engine, the formulation init files, and the routing routine. The current mechanics don't close and release file handles (or any other resource, really) between the rainfall-runoff model steps and the routing steps. This PR provides mechanisms for releasing resources when they are no longer needed, and should dramatically reduce the number of open file handles used throught the life cycle of a fully integrated routing run.
Additions
Removals
Changes
main
now releases formulation resources before launching routing.Checklist
Target Environment support