Tracking Issue: Make loading of massive and/or remote runs faster #8

wookayin · 2022-11-10T17:34:28Z

Fetching and loading many runs (dozens or hundreds of) over remote (or local) machines, each of which can have pretty large file size can be slow and resource intensive for where get_runs is called, even when parallelized. Network traffic over remote runs can be also pretty much and often is a bottleneck.

For example, it would took > 2 minutes for fetching 24 real-world experiment runs over a remote node over SSH where each run has ~100K rows (and dozens of summary tags), and the eventfile is quite fat (~30MB each, network traffic being ~700MB in total) due to non-scalar artifacts/tensors saved. Needless to say about heavy CPU consumption distributed over the local 8 processes.

Ideally parsing of such large-scale experiment runs could be much faster. In comparison, Tensorboard's can load such scales of run data as fast as almost instant with the use of --load_fast mode (i.e., rustboard). There are several ideas and steps towards this goal:

Use native code (rustboard) to parse tensorboard event files.
- Add a native, fast tensorboard eventfile reader based on rustboard #13
Run extraction of scalar data remotely rather than locally; this would significantly reduce the network traffic required and the communication overhead.
Run a remote expt daemon/helper process which would enable incremental data loading. Or can we fetch raw data from tensorboard?

The text was updated successfully, but these errors were encountered:

wookayin added enhancement Small improvements new feature New feature request labels Nov 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue: Make loading of massive and/or remote runs faster #8

Tracking Issue: Make loading of massive and/or remote runs faster #8

wookayin commented Nov 10, 2022 •

edited

Loading

Tracking Issue: Make loading of massive and/or remote runs faster #8

Tracking Issue: Make loading of massive and/or remote runs faster #8

Comments

wookayin commented Nov 10, 2022 • edited Loading

wookayin commented Nov 10, 2022 •

edited

Loading