Improve management of analysis setup and results #178

mlincett · 2022-08-03T09:23:10Z

Is your feature request related to a problem? Please describe.

I find it a bit inconvenient that the analysis setup as represented in mh_dict cannot be stored and loaded from a simple text file, e.g. in JSON format (due to the dataset objects not being serializable).

One consequence of this is that reading the results with ResultsHandler, if it is not done in the same script or notebook running the minimisation, requires the replication of the mh_dict (while not all the data in mh_dict are necessarily used by ResultsHandler).

Describe the solution you'd like
A design in which the analysis setup can be saved to JSON and the results can be retrieved by reading such JSON output to feed the ResultsHandler.

The only hard requirement would be an interface to access datasets by key/name rather than by object import. There are situations in which datasets can be altered by the user, like in utils.custom_dataset, but I think this can be addressed with a limited number of presets.

Describe alternatives you've considered
Leaving things as they are, it works but I believe it can be improved.

Additional context
The analysis result has currently the form of a large number of extremely small files (one per trial) which is quite inefficient when it comes to I/O and very unfriendly towards most filesystems. It could be nice if every task could dump the result to a single binary file and these could be merged / concatenated by the ResultsHandler, however I am not aware of data formats that easily allow for this (I researched this problem in the past without much success). But if you have ideas feel free to shout 'em.

The text was updated successfully, but these errors were encountered:

robertdstein · 2022-08-04T18:10:12Z

I did not come up with a good solution, but I agree with all of this. Currently the ".pkl" files should get aggregated, if I recall correctly, reducing I/O on subsequent runs. However, having many separate initial files is a direct consequence of allowing multiple thousands of independent processes writing to the same directory without conflicts, and that is the big constraint I see going forward. You could imagine all kinds of things like a database to manage it, but that would substantially escalate complexity. I would concur "I am not aware of data formats that easily allow for this".

mlincett mentioned this issue Aug 4, 2022

Retrieve dataset by name instead of importing object (build a dataset index) #182

Closed

mlincett mentioned this issue Aug 18, 2023

Improve ResultsHandler #295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve management of analysis setup and results #178

Improve management of analysis setup and results #178

mlincett commented Aug 3, 2022

robertdstein commented Aug 4, 2022

Improve management of analysis setup and results #178

Improve management of analysis setup and results #178

Comments

mlincett commented Aug 3, 2022

robertdstein commented Aug 4, 2022