Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Scale Performance: Multi-Process, Sharrow On #22

Open
i-am-sijia opened this issue Jun 17, 2024 · 4 comments
Open

Full Scale Performance: Multi-Process, Sharrow On #22

i-am-sijia opened this issue Jun 17, 2024 · 4 comments
Labels
performance-checks Issues that report on model performance

Comments

@i-am-sijia
Copy link
Contributor

This is the issue to report on memory usage and runtime performance...

  • data_dir: "data-full" full scale skims (24333 MAZs)
  • households_sample_size: 0 (full scale 100% sample of households)
  • sharrow: require
  • multiprocess: True
@i-am-sijia i-am-sijia added the performance-checks Issues that report on model performance label Jun 17, 2024
@i-am-sijia
Copy link
Contributor Author

Used num_processes: 28 on a 512 GB RAM machine with 32 physical cores. Did two runs on June 13, 2024. The only difference between the two runs was the version of Sharrow. One uses v2.9.1, the other uses a later version with np.where updates. More details please see rows 16 and 17 in RunMatrix_PerformanceResults.xls.

  • with Sharrow v2.9.1: completed in 195.1 mins
  • with Sharrow main@8d63a66: completed in 197.9 mins

The np.where updates in Sharrow main@8d63a66 does not seem to help run time in multiprocssing.

@dhensle
Copy link
Contributor

dhensle commented Jun 18, 2024

Did an analogous run using num_processes: 20 of the 24 processors on an RSG machine with 500 GB RAM and 2.1 GHz Intel Xeon cores. Used the latest sharrow code (main@8d63a66) and completed in 289 mins = 4.81 hours.
sh_mp_full_logs.zip

Notably on this machine the single process time took 21.1 hours which is significantly longer than the single process run time for Sijia's above run.

@jpn--
Copy link
Member

jpn-- commented Jul 2, 2024

Ran on SFCTA server,

  • ActivitySim: pr/867@d98f776af
  • sandag-abm3-example: main@8b58e69
  • Sharrow: Release 2.10.0
  • numba: v0.60.0
  • full-scale skims
  • household sample size: 100%
  • num_processes: 8 (machine has 160 cores)
  • NUMBA_NUM_THREADS: 4

Total runtime 239.7 minutes (i.e. just under 4 hours)

Archive-SFCTAserver-4thread-8MP.zip

@dhensle
Copy link
Contributor

dhensle commented Jul 11, 2024

Ran the model on an RSG machine with 24 cores and 500 GB of RAM with the following settings:

  • ActivitySim: main@28a0ad0
  • sandag-abm3-example: main@8b58e69
  • Sharrow: Release 2.10.0
  • full-scale skims
  • household sample size: 100%
  • NUMBA_NUM_THREADS: 1

And varied the number of cores to see what the runtime improvements are:
image

Observations:

  • Main runtime savings comes from the interaction simulate models, particularly trip destination and location choice.
  • Simple simulate models are relatively unaffected by the increase in the number of cores.
  • Smallest runtime was with about 20 cores and took 177 minutes (2.95 hours). Adding or subtracting 4 cores affected the runtime minimally by just 3 minutes either way.
  • The time it takes to apportion the data into more cores increases with the number of cores and this leads to an inflection point in the runtime.

The results here are very consistent with the observations in the MTC model (see ActivitySim/activitysim-prototype-mtc#12 (comment)). The main difference was that here the runtime minimum was with 20 cores, but with the MTC example it was around 10 cores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance-checks Issues that report on model performance
Projects
None yet
Development

No branches or pull requests

3 participants