application called MPI_Abort(MPI_COMM_WORLD, 0) - process 73 #446

InterstellarPenguin · 2024-09-25T11:02:30Z

Your name

Linyang Guo

Your affiliation

UCAS

What happened? What did you expect to happen?

Hi, all! I'm running a c48 simultion, but it crashed with the following error:

I'm not sure whether the error is related to the settings or MPI.

What are the steps to reproduce the bug?

setCommonSettings.sh:

gchp.job:

ExtData.rc:

Please attach any relevant configuration and log files.

setCommonSettings.txt.txt
ExtData.txt

btw, the MetDir has been changed to my own ExtData, and when I run the GCHP with C24 instead of C48, it completes successfully.

What GCHP version were you using?

14.4.3

What environment were you running GCHP on?

Local cluster

What compiler and version were you using?

ifort 2021.3.0

What MPI library and version were you using?

Intel MPI 2021.3.0

Will you be addressing this bug yourself?

Yes

Additional information

No response

yantosca · 2024-09-25T13:48:14Z

Thanks for writing @InterstellarPenguin. Could you also post the gchp*.log and the allPEs.log files?

You can also schedule extra debug information in the logging.yml file as described in our ReadTheDocs:

https://gchp.readthedocs.io/en/stable/user-guide/debugging.html#run-time-errors-that-occur-early-in-run

lizziel · 2024-09-25T15:35:37Z

@InterstellarPenguin, please note that we do not recommend using GCHP with coarse resolution meteorology. I do not think using the 2x2.5 fields is causing the problem, but it will cause less accurate results.

lizziel · 2024-09-25T15:37:50Z

If C24 works but C48 does not, I recommend trying to run with more cores. Also try explicitly requesting all memory per node with SBATCH.

InterstellarPenguin · 2024-09-26T07:39:22Z

Thanks @lizziel @yantosca , I've checked allPEs.log, there are bugs related to some extdata in the image below:

And then i came across a solution in another issue #429, it says that I should rewrite the setting in extdata.rc file like this.

BTW, in the GCHP.rc, I'm not sure about 'GCHPchem_INTERNAL_CHECKPOINT_FILE: Restarts/gcchem_internal_checkpoin' is correct.

The simulation crashed sometimes with error about 'netcdf4' in reading checkpoint files (unluckily i delete the case, so I no longer have the logs, sry about that), but when I add the '.nc4' extension behand 'GCHPchem_INTERNAL_CHECKPOINT_FILE: Restarts/gcchem_internal_checkpoin' just like 'GCHPchem_INTERNAL_CHECKPOINT_FILE: Restarts/gcchem_internal_checkpoin.nc4', or turn the switch 'WRITE_RESTART_BY_OSERVER' in 'GCHP.rc', it completes successfully. Is that a bug, or was it my mistake?

lizziel · 2024-09-26T14:05:11Z

If a previous run generated Restarts/gcchem_internal_checkpoint and it was not renamed or deleted by the run script then when you try to run again the model will crash. Do you still have this file after your run crashes? What run script are you using? The run scripts are designed to avoid this issue so if the one you are using it is not catching this then we would definitely like to know.

Generally the O-server is only needed by certain systems when you run with greater than 1000 cores. Try running again with the O-server off, with gcchem_internal_checkpoint deleted if it is present, and with GCHPchem_INTERNAL_CHECKPOINT_FILE set back to the orginal setting.

Please note that we do not recommend using the carbon simulation with version 14.4. Fixes are coming in 14.5.1. See github issues:
#440
#437
geoschem/geos-chem#2463

InterstellarPenguin · 2024-09-26T14:51:22Z

Thanks@lizziel ! The run script I used was not from the 'GCHP' dir, that's why the crash happened. I appreciate your reminder about my error and the bug in the carbon simulation!

In the history.rc, I've noticed that there are two different type of CO2 output: 1.EmisCO2 2.ProdCO2fromCO, I wonder if these are different methods for calculating CO2.

The second question is that the GCHP, unlike GCClassic, does not use HEMCO to read files, for this reason, if I'm going to change the land and ocean carbon flux input data, is it necessary to ensure that ExtData.rc aligns with HEMCO.config.rc, or can I just rewrite the inventory in ExtData.rc?

In the HEMCO.config.rc, I noticed that 'GC_restart' is set to false. I’m curious if GCHP can automatically recognize the restart file. The simulation runs well with this switch off, but it crashes when I turn it on (if-restart-2019.log). What should I do to configure a spin-up without a restart file?

gchp.job.txt
if-restart-2019.log

InterstellarPenguin added the category: Bug Something isn't working label Sep 25, 2024

yantosca added the topic: Runtime Related to runtime issues (e.g. simulation stops with error) label Sep 25, 2024

lizziel added category: Debug Help Request for help debugging GCHP and removed category: Bug Something isn't working labels Sep 25, 2024

lizziel self-assigned this Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

application called MPI_Abort(MPI_COMM_WORLD, 0) - process 73 #446

application called MPI_Abort(MPI_COMM_WORLD, 0) - process 73 #446

InterstellarPenguin commented Sep 25, 2024

yantosca commented Sep 25, 2024

lizziel commented Sep 25, 2024

lizziel commented Sep 25, 2024 •

edited

Loading

InterstellarPenguin commented Sep 26, 2024

lizziel commented Sep 26, 2024

InterstellarPenguin commented Sep 26, 2024 •

edited

Loading

application called MPI_Abort(MPI_COMM_WORLD, 0) - process 73 #446

application called MPI_Abort(MPI_COMM_WORLD, 0) - process 73 #446

Comments

InterstellarPenguin commented Sep 25, 2024

Your name

Your affiliation

What happened? What did you expect to happen?

What are the steps to reproduce the bug?

Please attach any relevant configuration and log files.

What GCHP version were you using?

What environment were you running GCHP on?

What compiler and version were you using?

What MPI library and version were you using?

Will you be addressing this bug yourself?

Additional information

yantosca commented Sep 25, 2024

lizziel commented Sep 25, 2024

lizziel commented Sep 25, 2024 • edited Loading

InterstellarPenguin commented Sep 26, 2024

lizziel commented Sep 26, 2024

InterstellarPenguin commented Sep 26, 2024 • edited Loading

lizziel commented Sep 25, 2024 •

edited

Loading

InterstellarPenguin commented Sep 26, 2024 •

edited

Loading