Test and optimize IO performance when automatically generating BMI init configs #654

robertbartel · 2024-06-11T13:46:53Z

While integration of BMI init config auto-generation capabilities was done in #607, practical performance testing was not conducted. Given #637 and the fact that DMOD currently only implements an object store dataset backing, there may be some practical issues with the current implementation; e.g., it produces configs perfectly correctly, but takes an impractical or excessive (compare to the job needing the configs) amount of time to complete.

First, analysis is needed for the running time in various scenarios, given the current implementation and more practical off-the-shelf hardware configuration (i.e., at most, a small cluster of desktop-level machines). Depending on the results, adjustments to the implementation should be made to optimize it for current dataset capabilities. Where possible, this should be done in a way that lends itself well to future dataset backings (i.e., #593), which may or may not have the same IO performance characteristics and thus may need (or benefit from) certain differences in the implementation.

aaraney · 2024-06-14T12:45:39Z

I added support for writing config files to various archive formats in this PR. Here is an example of writing config files on the fly to a gzipped archive file. Compression is not required (and of course slows things down). I would be interested to see the performance of writing to just a tar archive.

This issue will also be useful in conducting benchmarks. The issue shows an, albeit naive, approach to generating config files concurrently.

robertbartel added enhancement New feature or request maas MaaS Workstream labels Jun 11, 2024

robertbartel self-assigned this Jun 11, 2024

robertbartel mentioned this issue Jun 11, 2024

Benchmark and optimize IO with forcing datasets to ensure practical job execution times #655

Open

This was referenced Jun 27, 2024

Normalize path values in configuration data within context of job workers #407

Open

Optimize for better IO performance during BMI init config dataset generation #671

Merged

robertbartel closed this as completed Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test and optimize IO performance when automatically generating BMI init configs #654

Test and optimize IO performance when automatically generating BMI init configs #654

robertbartel commented Jun 11, 2024

aaraney commented Jun 14, 2024

Test and optimize IO performance when automatically generating BMI init configs #654

Test and optimize IO performance when automatically generating BMI init configs #654

Comments

robertbartel commented Jun 11, 2024

aaraney commented Jun 14, 2024