You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fetching and loading many runs (dozens or hundreds of) over remote (or local) machines, each of which can have pretty large file size can be slow and resource intensive for where get_runs is called, even when parallelized. Network traffic over remote runs can be also pretty much and often is a bottleneck.
For example, it would took > 2 minutes for fetching 24 real-world experiment runs over a remote node over SSH where each run has ~100K rows (and dozens of summary tags), and the eventfile is quite fat (~30MB each, network traffic being ~700MB in total) due to non-scalar artifacts/tensors saved. Needless to say about heavy CPU consumption distributed over the local 8 processes.
Ideally parsing of such large-scale experiment runs could be much faster. In comparison, Tensorboard's can load such scales of run data as fast as almost instant with the use of --load_fast mode (i.e., rustboard). There are several ideas and steps towards this goal:
Use native code (rustboard) to parse tensorboard event files.
Run extraction of scalar data remotely rather than locally; this would significantly reduce the network traffic required and the communication overhead.
Run a remote expt daemon/helper process which would enable incremental data loading. Or can we fetch raw data from tensorboard?
The text was updated successfully, but these errors were encountered:
Fetching and loading many runs (dozens or hundreds of) over remote (or local) machines, each of which can have pretty large file size can be slow and resource intensive for where
get_runs
is called, even when parallelized. Network traffic over remote runs can be also pretty much and often is a bottleneck.For example, it would took > 2 minutes for fetching 24 real-world experiment runs over a remote node over SSH where each run has ~100K rows (and dozens of summary tags), and the eventfile is quite fat (~30MB each, network traffic being ~700MB in total) due to non-scalar artifacts/tensors saved. Needless to say about heavy CPU consumption distributed over the local 8 processes.
Ideally parsing of such large-scale experiment runs could be much faster. In comparison, Tensorboard's can load such scales of run data as fast as almost instant with the use of
--load_fast
mode (i.e., rustboard). There are several ideas and steps towards this goal:The text was updated successfully, but these errors were encountered: