Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve management of analysis setup and results #178

Open
mlincett opened this issue Aug 3, 2022 · 1 comment
Open

Improve management of analysis setup and results #178

mlincett opened this issue Aug 3, 2022 · 1 comment

Comments

@mlincett
Copy link
Collaborator

mlincett commented Aug 3, 2022

Is your feature request related to a problem? Please describe.

I find it a bit inconvenient that the analysis setup as represented in mh_dict cannot be stored and loaded from a simple text file, e.g. in JSON format (due to the dataset objects not being serializable).

One consequence of this is that reading the results with ResultsHandler, if it is not done in the same script or notebook running the minimisation, requires the replication of the mh_dict (while not all the data in mh_dict are necessarily used by ResultsHandler).

Describe the solution you'd like
A design in which the analysis setup can be saved to JSON and the results can be retrieved by reading such JSON output to feed the ResultsHandler.

The only hard requirement would be an interface to access datasets by key/name rather than by object import. There are situations in which datasets can be altered by the user, like in utils.custom_dataset, but I think this can be addressed with a limited number of presets.

Describe alternatives you've considered
Leaving things as they are, it works but I believe it can be improved.

Additional context
The analysis result has currently the form of a large number of extremely small files (one per trial) which is quite inefficient when it comes to I/O and very unfriendly towards most filesystems. It could be nice if every task could dump the result to a single binary file and these could be merged / concatenated by the ResultsHandler, however I am not aware of data formats that easily allow for this (I researched this problem in the past without much success). But if you have ideas feel free to shout 'em.

@robertdstein
Copy link
Member

I did not come up with a good solution, but I agree with all of this. Currently the ".pkl" files should get aggregated, if I recall correctly, reducing I/O on subsequent runs. However, having many separate initial files is a direct consequence of allowing multiple thousands of independent processes writing to the same directory without conflicts, and that is the big constraint I see going forward. You could imagine all kinds of things like a database to manage it, but that would substantially escalate complexity. I would concur "I am not aware of data formats that easily allow for this".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants