Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vehicle type model runtime #787

Closed
2 tasks
jpn-- opened this issue Jan 30, 2024 · 7 comments
Closed
2 tasks

Vehicle type model runtime #787

jpn-- opened this issue Jan 30, 2024 · 7 comments
Assignees
Labels
Bug Something isn't working/bug f

Comments

@jpn--
Copy link
Member

jpn-- commented Jan 30, 2024

When running the MTC extended model with full size zones and full size population, in the non-Sharrow mode, the model failed in the vehicle type choice model with a memory error on a Windows machine with 512 GB RAM; in the Sharrow mode, the model completed the vehicle type choice model in 17 hours.

Need to improve:

  • Memory in non-Sharrow mode
  • Run time in Sharrow mode
@jpn-- jpn-- added the Bug Something isn't working/bug f label Jan 30, 2024
@dhensle
Copy link
Contributor

dhensle commented Feb 6, 2024

Some initial first steps:

  • Optimize for sharrow. This involves removing string comparisons (we should add an alts pre-processor). Work can build on progress made for MWCOG.
  • Remove any alts from alts table instead of relying on availability conditions.
  • Add option to select only specific chooser columns.
  • Move expensive log expressions to preprocessor.

@dhensle
Copy link
Contributor

dhensle commented Feb 19, 2024

Config updates: ActivitySim/activitysim-prototype-mtc#3
Code updates: #806

@i-am-sijia
Copy link
Contributor

i-am-sijia commented Feb 20, 2024

I looked into the memory usage of vehicle type model in the non-Sharrow mode. When running MTC extended model with 25% population, the interaction_df (the joined data frame of choosers and alternatives) of the first vehicle choice uses 212 GB of RAM, which explains why we got a memory error when running 100% population.

Below is a table of memory taken by each column in the interaction_df. The string columns are already converted to pandas categorical. No column stands out as being memory intensive, it's just that there are too many columns in this table and it adds up. Removing columns that are not used in the utility calculation will help reducing memory.

<style> </style>
Column Dtype Memory (GB)
Total   212.1
index int64 2.4
body_type_Car uint8 0.3
body_type_Motorcycle uint8 0.3
body_type_Pickup uint8 0.3
body_type_SUV uint8 0.3
body_type_Van uint8 0.3
age_1 uint8 0.3
age_10 uint8 0.3
age_11 uint8 0.3
age_12 uint8 0.3
age_13 uint8 0.3
age_14 uint8 0.3
age_15 uint8 0.3
age_16 uint8 0.3
age_17 uint8 0.3
age_18 uint8 0.3
age_19 uint8 0.3
age_2 uint8 0.3
age_20 uint8 0.3
age_3 uint8 0.3
age_4 uint8 0.3
age_5 uint8 0.3
age_6 uint8 0.3
age_7 uint8 0.3
age_8 uint8 0.3
age_9 uint8 0.3
fuel_type_BEV uint8 0.3
fuel_type_Diesel uint8 0.3
fuel_type_Gas uint8 0.3
fuel_type_Hybrid uint8 0.3
fuel_type_PEV uint8 0.3
body_type category 0.3
age int32 1.2
fuel_type category 0.3
vehicle_year int64 2.4
NumMakes int64 2.4
NumModels int64 2.4
MPG float64 2.4
Range int64 2.4
NewPrice float64 2.4
auto_operating_cost float64 2.4
co2gpm float64 2.4
vehicle_type category 0.6
household_id int64 2.4
vehicle_num int64 2.4
home_zone_id int64 2.4
income int64 2.4
hhsize int64 2.4
HHT int64 2.4
auto_ownership int32 1.2
num_workers int64 2.4
sample_rate float64 2.4
income_in_thousands float64 2.4
income_segment int32 1.2
median_value_of_time float64 2.4
hh_value_of_time float64 2.4
num_non_workers int64 2.4
num_drivers int8 0.3
num_adults int8 0.3
num_children int8 0.3
num_young_children int8 0.3
num_children_5_to_15 int8 0.3
num_children_16_to_17 int8 0.3
num_college_age int8 0.3
num_young_adults int8 0.3
non_family bool 0.3
family bool 0.3
home_is_urban bool 0.3
home_is_rural bool 0.3
hh_work_auto_savings_ratio float32 1.2
DISTRICT int64 2.4
SD int64 2.4
county_id int64 2.4
TOTHH int64 2.4
TOTPOP int64 2.4
TOTACRE float64 2.4
RESACRE float64 2.4
CIACRE float64 2.4
TOTEMP int64 2.4
AGE0519 int64 2.4
RETEMPN int64 2.4
FPSEMPN int64 2.4
HEREMPN int64 2.4
OTHEMPN int64 2.4
AGREMPN int64 2.4
MWTEMPN int64 2.4
PRKCST float64 2.4
OPRKCST float64 2.4
area_type int64 2.4
HSENROLL float64 2.4
COLLFTE float64 2.4
COLLPTE float64 2.4
TOPOLOGY int64 2.4
TERMINAL float64 2.4
household_density float64 2.4
employment_density float64 2.4
density_index float64 2.4
is_cbd bool 0.3
TOTENR_univ float64 2.4
ext_work_share float64 2.4
RETEMPN_scaled float64 2.4
FPSEMPN_scaled float64 2.4
HEREMPN_scaled float64 2.4
OTHEMPN_scaled float64 2.4
AGREMPN_scaled float64 2.4
MWTEMPN_scaled float64 2.4
TOTEMP_scaled float64 2.4
auPkRetail float64 2.4
auPkTotal float64 2.4
auOpRetail float64 2.4
auOpTotal float64 2.4
trPkRetail float64 2.4
trPkTotal float64 2.4
trOpRetail float64 2.4
trOpTotal float64 2.4
nmRetail float64 2.4
nmTotal float64 2.4
already_owned_veh category 0.6
total_hh_dist_to_work float32 1.2
total_hh_dist_to_work_cap float64 2.4
avg_hh_dist_to_work float32 1.2
hh_per_mi float64 2.4
hh_veh_gt_drivers int32 1.2
num_hh_veh_owned float64 2.4
num_hh_Van float64 2.4
num_hh_SUV float64 2.4
num_hh_Pickup float64 2.4
num_hh_Motorcycle float64 2.4
num_hh_Hybrid float64 2.4
num_hh_BEV float64 2.4
num_hh_PEV float64 2.4
num_hh_EV float64 2.4

@dhensle
Copy link
Contributor

dhensle commented Feb 20, 2024

Thanks Sijia, I had put options to specify which columns to keep in the choosers and alts table already in the PR's listed in my previous message.

@joecastiglione
Copy link
Contributor

Is it possible to systematically, across all models and tables, include only those columns that are used in the utility calculations?

@dhensle
Copy link
Contributor

dhensle commented Feb 20, 2024

Is it possible to systematically, across all models and tables, include only those columns that are used in the utility calculations?

Yes, we had brought this up a week or two ago and created this issue: #792

@i-am-sijia
Copy link
Contributor

Thanks Sijia, I had put options to specify which columns to keep in the choosers and alts table already in the PR's listed in my previous message.

Yep, I saw that. I wanted to provide more context and evidence that dropping unused columns will help.

@dhensle dhensle closed this as completed Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working/bug f
Projects
Status: Done
Development

No branches or pull requests

4 participants