-
Notifications
You must be signed in to change notification settings - Fork 7
Description
I am currently running WRF-GC v3.0 at 12km resolution over the continental U.S. I followed this GNU pull request to make it run on AWS. Chemical timestep is 6 minutes.
I was at first finding that my 12km run was very slow. The bottleneck was at every 6 minutes during chemical timestep. It took about 5 days to generate 40 days model outputs when I used 384 cores (192 cores over 2 nodes). When I tested 576 cores (192 cores over 3 nodes), the model became even much slower. My labmate Yuanjian and Haipeng helped to identified that this was related to HEMCO writing time step.
- For 384 cores case: After model stabilized (past the first time step), It took about 30 seconds during HEMCO writing stage. The chemical timestep took about 31.4 seconds due to slow HEMCO writing time.
hemco write time: 29.9510498
Timing for main: time 2015-08-01_00:24:00 on domain 1: 31.46896 elapsed seconds
- For 576 cores case: I don't have rsl file, but it took 20 minutes to generate the first output and hemco write time was 1100 seconds.
To overcome this, our quick patch was to disable PNETCDF like this:
source /opt/geos-chem/env/wrfgc.env
unset PNETCDF
Once we disable PNETCDF, now it would take about 2 days to generate 40 days model output. The chemical timestep takes about 6 seconds vs. 31 seconds. One downside is that after you do this, you have to comment out KppDiags in HISTORY.rc and you can no longer output HEMCO related files like HEMCO_manual, HEMCO_restart, and HEMCO_Diagn outputs.
Not sure if this PNETCDF issue is due to how the pull request modified the code or is expected behavior of WRF-GC.
Thanks!
rsl.out.0000_with_PNETCDF_384_cores.txt
rsl.out.0000_without_PNETCDF_384_cores.txt