Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Scale Performance: Sharrow On #12

Open
dhensle opened this issue Apr 30, 2024 · 6 comments
Open

Full Scale Performance: Sharrow On #12

dhensle opened this issue Apr 30, 2024 · 6 comments
Labels
performance-checks Issues that report on model performance

Comments

@dhensle
Copy link
Contributor

dhensle commented Apr 30, 2024

This is the issue to report on memory usage and runtime performance when using sharrow...

@dhensle
Copy link
Contributor Author

dhensle commented Apr 30, 2024

First ran sharrow compile with the following settings:

  • households_sample_size: 100
  • sharrow: test

Run completed in 76 minutes.
log_sh_compile.zip

Then ran in production mode

  • households_sample_size: 0 100 percent sample
  • sharrow: require

Run completed in 7.7 hours with a memory peak at about 163 GB in trip destination.
image

logs_sh_full.zip

Followed by multiprocessing

  • households_sample_size: 0 100 percent sample
  • sharrow: require
  • multiprocessing: True
  • num_processors: 24

Run completed in 110 minutes (1.8 hours).
log_sh_full_mp.zip

@dhensle dhensle added the performance-checks Issues that report on model performance label May 1, 2024
@dhensle
Copy link
Contributor Author

dhensle commented Jun 24, 2024

Ran with 100% households and sharrow on, single process.

Run completed in 1090.3 minutes (18.2 hours). This is much longer than the previous time posted above of 7.7 hours.

Current run was performed using PR #867 commit c9d4205.

image
log.zip

Timing statements comparing the old run above to this current run show large differences mainly in the destination models:
image

Will try again with the main branch of ActivitySim instead of PR 867 to see if that makes a difference.

@dhensle
Copy link
Contributor Author

dhensle commented Jun 25, 2024

Ran using an older environment that uses the current version of ActivitySim (main@bd48d3db), but has sharrow v2.8.2 instead of the previous run's main@8d63a66 (> v2.9.1). Numba was also older using 0.56.4 compared to 0.59.1.

The run results were pretty much exactly the same -- run time was 1080.3 minutes.
log.zip

One difference between these current set of runs and the 7.7 hour run above is the server. The 7.7 hour run was done on SANDAG's 1TB RAM, 40 Core machine. These were done on RSG's 500 GB RAM, 24 core machine.

@i-am-sijia
Copy link
Contributor

Sharrow, single process, MTC extended model ran in 10.7 hours on WSP's 512 GB RAM, AMD server. Using everything the latest as of June 26. Memory peak 145 GB in trip destination.

ActivitySim: pr/867@c9d4205
Sharrow: v2.10.0
MTC: extended@a3da8bd

mtc extended single process sharrow

activitysim.log
timing_log.csv

@dhensle
Copy link
Contributor Author

dhensle commented Jul 1, 2024

Per the discussion at ActivitySim/sandag-abm3-example#6 (comment), ran many runs with different NUMBA multithreading (i.e. changing only NUMBA_NUM_THREADS setting):

image

All runs were performed on the same RSG machine with 24 threads.

Some observations:

  • Overall runtime decreased when using less threads
  • Destination / location choice models in particular had significant increases in runtime as threads were increased
  • Scheduling model runtime often decreased in runtime as more threads were used, in contrast to the destination models
  • Many models were affected, but not in any significant way -- mode choice, stop frequency, trip scheduling, cdap, etc.

@dhensle
Copy link
Contributor Author

dhensle commented Jul 9, 2024

Running the same tests as above and on the same machine, but using multiprocessing instead of multi-threading:

image

Comments:

  • 16 and 24 core runs are incomplete due to Multiprocess Fails Accessing Sharrow Cache activitysim#876
  • Saw roughly linear decreases in runtime for computationally intensive models going from 4 to 12 cores, but after that the gains decreased.
  • 20 cores took longer than 12 cores. This is due to some models being slower (school escorting, school location, joint tour scheduling, etc), and increased time spend apportioning and coalescing all of the cores. However, this runtime difference was pretty minimal.
  • The runtime in the final activitysim.log file is slightly longer than the total in the timing_log.csv file across all runs. The difference increases with the number of cores.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance-checks Issues that report on model performance
Projects
None yet
Development

No branches or pull requests

2 participants