Very Slow Chunking #861

dhensle · 2024-04-30T17:16:29Z

Describe the bug
Chunk training takes a VERY long time.

Performed on a SANDAG server with 1 TB of RAM (chunk_size was set to 450GB), ran with only 64k households (~5%) and 5 cores. Run time was 66.85 hours, or 2.78 days!

To Reproduce
Run the SANDAG ABM3 model in chunk training mode. This was performed with the BayDAG_estimation branch which is based off ActivitySim version 1.2.

Expected behavior
Chunk training shouldn't take all that much longer than actually running the model. We have not seen this long of chunk training behavior before. Is there something about the SANDAG model that takes a long time? (e.g. two-zones?) Is the problem a dependency was updated that really hit the performance?

Additional context
Log files can be seen here: training_log.zip

Running in production mode also took an extremely long time (again > 2.5 days!). Part of the problem may be that the num_processors setting was set to 40, but the machine only had 32, but this shouldn't make that big of a deal.

Looking at the production logs shows that about 700 minutes(!) of run time was in the parking location choice model. This looks to be due to ActivitySim creating a chunk for every single chooser in that model (hence the statements like Running chunk 10450 of 10456 with 1 of 10456 choosers in the log.) The chunk_cache.csv (found in the training_log above) certainly shows that more than one row should be allowed per chunk when the chunk_size is set to 450GB.
production_log_subset.zip

Is this behavior related to #860?

(Currently working on reproducing with the main branch, but run is not yet complete. I will update once complete...)

dhensle · 2024-05-01T19:52:32Z

As mentioned above, I tested with the current main branch of the code and the sandag-abm3-example. The results were very similar.

I ran with 100k households in chunk_training mode without sharrow and with 10 cores. The chunk training run took about 24 hours!

Log files are attached:
log_abm3_chunk_train_100k.zip

dhensle added the Bug Something isn't working/bug f label Apr 30, 2024

dhensle changed the title ~~Very Slow Chunk Trianing~~ Very Slow Chunking Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very Slow Chunking #861

Very Slow Chunking #861

dhensle commented Apr 30, 2024

dhensle commented May 1, 2024

Very Slow Chunking #861

Very Slow Chunking #861

Comments

dhensle commented Apr 30, 2024

dhensle commented May 1, 2024