-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve management of analysis setup and results #178
Comments
I did not come up with a good solution, but I agree with all of this. Currently the ".pkl" files should get aggregated, if I recall correctly, reducing I/O on subsequent runs. However, having many separate initial files is a direct consequence of allowing multiple thousands of independent processes writing to the same directory without conflicts, and that is the big constraint I see going forward. You could imagine all kinds of things like a database to manage it, but that would substantially escalate complexity. I would concur "I am not aware of data formats that easily allow for this". |
Is your feature request related to a problem? Please describe.
I find it a bit inconvenient that the analysis setup as represented in
mh_dict
cannot be stored and loaded from a simple text file, e.g. in JSON format (due to the dataset objects not being serializable).One consequence of this is that reading the results with
ResultsHandler
, if it is not done in the same script or notebook running the minimisation, requires the replication of themh_dict
(while not all the data inmh_dict
are necessarily used byResultsHandler
).Describe the solution you'd like
A design in which the analysis setup can be saved to JSON and the results can be retrieved by reading such JSON output to feed the
ResultsHandler
.The only hard requirement would be an interface to access datasets by key/name rather than by object import. There are situations in which datasets can be altered by the user, like in
utils.custom_dataset
, but I think this can be addressed with a limited number of presets.Describe alternatives you've considered
Leaving things as they are, it works but I believe it can be improved.
Additional context
The analysis result has currently the form of a large number of extremely small files (one per trial) which is quite inefficient when it comes to I/O and very unfriendly towards most filesystems. It could be nice if every task could dump the result to a single binary file and these could be merged / concatenated by the
ResultsHandler
, however I am not aware of data formats that easily allow for this (I researched this problem in the past without much success). But if you have ideas feel free to shout 'em.The text was updated successfully, but these errors were encountered: