Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Scale Performance: Multi-Process, Sharrow Off #9

Open
dhensle opened this issue Apr 30, 2024 · 3 comments
Open

Full Scale Performance: Multi-Process, Sharrow Off #9

dhensle opened this issue Apr 30, 2024 · 3 comments
Labels
performance-checks Issues that report on model performance

Comments

@dhensle
Copy link
Contributor

dhensle commented Apr 30, 2024

This is the issue to report on memory usage and runtime performance...

  • data_dir: "data-full" full scale skims (24333 MAZs)
  • households_sample_size: 0 (full scale 100% sample of households)
  • sharrow: false
  • multiprocess: True
@dhensle
Copy link
Contributor Author

dhensle commented Apr 30, 2024

Completed on a 1TB machine with 48 cores. Used num_processes: 40 and no chunking. Completed in about 3 hours.

logs_no_sh_full_mp.zip
(Run failed in final summarize step because the mp config was using a step not relevant for the ABM3 example, but doesn't make much impact on the runtime estimate.)

@dhensle dhensle added the performance-checks Issues that report on model performance label Apr 30, 2024
@dhensle
Copy link
Contributor Author

dhensle commented Jul 16, 2024

Ran multiprocessing with sharrow off and varied the number of cores (just like #22 (comment)). Completed on a 500 GB, 24 core RSG machine.

image

Observations:

  • Many of the same patterns seen in the sharrow on are present here -- there are diminishing returns upon adding more and more cores.
  • The non-mandatory tour scheduling model was uniquely bad as more cores were added... not sure why this might be. This was not seen in the sharrow on version of the code.
  • Both the sharrow on and sharrow off versions of the ABM3 model had a minimum runtime with 20 cores. This suggests that the optimum number of cores is potentially independent of sharrow, but rather a function of the model and the machine.

@i-am-sijia
Copy link
Contributor

Performed multiprocessing tests with different numbers of processors on WSP server that has 512 GB RAM and 32 cores, with sharrow off, explicit chunking set to 0.2 (5 chunks), and using the zlib skims.

The observations are similar to David's. Many components see diminishing returns with more processors. The oddball is non-mandatory tour scheduling which only gets worse with more processors.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance-checks Issues that report on model performance
Projects
None yet
Development

No branches or pull requests

2 participants