Full Scale Performance: Multi-Process, Sharrow On #22

i-am-sijia · 2024-06-17T23:02:10Z

This is the issue to report on memory usage and runtime performance...

data_dir: "data-full" full scale skims (24333 MAZs)
households_sample_size: 0 (full scale 100% sample of households)
sharrow: require
multiprocess: True

The text was updated successfully, but these errors were encountered:

i-am-sijia · 2024-06-17T23:10:53Z

Used num_processes: 28 on a 512 GB RAM machine with 32 physical cores. Did two runs on June 13, 2024. The only difference between the two runs was the version of Sharrow. One uses v2.9.1, the other uses a later version with np.where updates. More details please see rows 16 and 17 in RunMatrix_PerformanceResults.xls.

with Sharrow v2.9.1: completed in 195.1 mins
with Sharrow main@8d63a66: completed in 197.9 mins

The np.where updates in Sharrow main@8d63a66 does not seem to help run time in multiprocssing.

dhensle · 2024-06-18T03:27:57Z

Did an analogous run using num_processes: 20 of the 24 processors on an RSG machine with 500 GB RAM and 2.1 GHz Intel Xeon cores. Used the latest sharrow code (main@8d63a66) and completed in 289 mins = 4.81 hours.
sh_mp_full_logs.zip

Notably on this machine the single process time took 21.1 hours which is significantly longer than the single process run time for Sijia's above run.

jpn-- · 2024-07-02T16:59:05Z

Ran on SFCTA server,

ActivitySim: pr/867@d98f776af
sandag-abm3-example: main@8b58e69
Sharrow: Release 2.10.0
numba: v0.60.0
full-scale skims
household sample size: 100%
num_processes: 8 (machine has 160 cores)
NUMBA_NUM_THREADS: 4

Total runtime 239.7 minutes (i.e. just under 4 hours)

Archive-SFCTAserver-4thread-8MP.zip

dhensle · 2024-07-11T17:57:01Z

Ran the model on an RSG machine with 24 cores and 500 GB of RAM with the following settings:

ActivitySim: main@28a0ad0
sandag-abm3-example: main@8b58e69
Sharrow: Release 2.10.0
full-scale skims
household sample size: 100%
NUMBA_NUM_THREADS: 1

And varied the number of cores to see what the runtime improvements are:

Observations:

Main runtime savings comes from the interaction simulate models, particularly trip destination and location choice.
Simple simulate models are relatively unaffected by the increase in the number of cores.
Smallest runtime was with about 20 cores and took 177 minutes (2.95 hours). Adding or subtracting 4 cores affected the runtime minimally by just 3 minutes either way.
The time it takes to apportion the data into more cores increases with the number of cores and this leads to an inflection point in the runtime.

The results here are very consistent with the observations in the MTC model (see ActivitySim/activitysim-prototype-mtc#12 (comment)). The main difference was that here the runtime minimum was with 20 cores, but with the MTC example it was around 10 cores.

i-am-sijia added the performance-checks Issues that report on model performance label Jun 17, 2024

dhensle mentioned this issue Jul 16, 2024

Full Scale Performance: Multi-Process, Sharrow Off #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full Scale Performance: Multi-Process, Sharrow On #22

Full Scale Performance: Multi-Process, Sharrow On #22

i-am-sijia commented Jun 17, 2024

i-am-sijia commented Jun 17, 2024

dhensle commented Jun 18, 2024

jpn-- commented Jul 2, 2024

dhensle commented Jul 11, 2024

Full Scale Performance: Multi-Process, Sharrow On #22

Full Scale Performance: Multi-Process, Sharrow On #22

Comments

i-am-sijia commented Jun 17, 2024

i-am-sijia commented Jun 17, 2024

dhensle commented Jun 18, 2024

jpn-- commented Jul 2, 2024

dhensle commented Jul 11, 2024