Open
Description
Describe the Code Quality Issue
In #5028, an issue related to ELPA is found that when dealing with large system (more than 1000 atoms), the scf will crash with :
==== backtrace (tid: 138369) ====
0 0x0000000000012cf0 __funlockfile() :0
1 0x0000000000254159 elpa2_compute_mp_trans_ev_band_to_full_complex_double_() /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa2_elpa2_compute.F90-src_elpa2_.libs_libelpa_openmp_private_la-elpa2_compute.o.F90:15626
2 0x00000000003717aa elpa2_impl_mp_elpa_solve_evp_complex_2stage_a_h_a_double_impl_() /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa2_elpa2.F90-src_elpa2_.libs_libelpa_openmp_private_la-elpa2.o.F90:6441
3 0x00000000000c512f elpa_impl_mp_elpa_eigenvectors_a_h_a_dc_() /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa_impl.F90-src_.libs_libelpa_openmp_private_la-elpa_impl.o.F90:5570
4 0x00000000000c4709 elpa_eigenvectors_a_h_a_dc() /lustre/home/2201110432/apps/abacus/toolchain_used/toolchain-icx/build/elpa-2024.03.001/build_cpu/manually_preprocessed_.._src_elpa_impl.F90-src_.libs_libelpa_openmp_private_la-elpa_impl.o.F90:5706
5 0x0000000000bde2e2 elpa_eigenvectors() /lustre/home/2201110432/lib/elpa/2024.03.001-icx/cpu/include/elpa/elpa_generic.h:82
6 0x0000000000bde8ae ELPA_Solver::generalized_eigenvector() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/genelpa/elpa_new_complex.cpp:130
7 0x00000000007641c3 hsolver::DiagoElpa<std::complex<double> >::diag() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/diago_elpa.cpp:90
8 0x00000000007641c3 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string() /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:519
9 0x00000000007641c3 hsolver::DiagoElpa<std::complex<double> >::diag() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/diago_elpa.cpp:95
10 0x000000000075c3d1 hsolver::HSolverLCAO<std::complex<double>, base_device::DEVICE_CPU>::hamiltSolvePsiK() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/hsolver_lcao.cpp:149
11 0x000000000075c3d1 hsolver::HSolverLCAO<std::complex<double>, base_device::DEVICE_CPU>::hamiltSolvePsiK() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/hsolver_lcao.cpp:150
12 0x000000000075a7d1 hsolver::HSolverLCAO<std::complex<double>, base_device::DEVICE_CPU>::solve() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_hsolver/hsolver_lcao.cpp:104
13 0x00000000008ba78f ModuleESolver::ESolver_KS_LCAO<std::complex<double>, double>::hamilt2density() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_esolver/esolver_ks_lcao.cpp:713
14 0x00000000008ba78f ???() /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:215
15 0x00000000008ba78f ???() /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:224
16 0x00000000008ba78f std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() /usr/lib/gcc/x86_64-redhat-linux/8/../../../../include/c++/8/bits/basic_string.h:661
17 0x00000000008ba78f ModuleESolver::ESolver_KS_LCAO<std::complex<double>, double>::hamilt2density() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_esolver/esolver_ks_lcao.cpp:713
18 0x000000000085b0f9 ModuleESolver::ESolver_KS<std::complex<double>, base_device::DEVICE_CPU>::runner() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_esolver/esolver_ks.cpp:449
19 0x00000000006f9265 Relax_Driver::relax_driver() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_relax/relax_driver.cpp:49
20 0x000000000070f442 Driver::driver_run() /lustre/home/2201110432/apps/abacus/abacus-test/source/driver_run.cpp:68
21 0x000000000070f442 Relax_Driver::~Relax_Driver() /lustre/home/2201110432/apps/abacus/abacus-test/source/module_relax/relax_driver.h:14
22 0x000000000070f442 Driver::driver_run() /lustre/home/2201110432/apps/abacus/abacus-test/source/driver_run.cpp:69
23 0x000000000070e665 Driver::atomic_world() /lustre/home/2201110432/apps/abacus/abacus-test/source/driver.cpp:186
24 0x000000000070df5e Driver::init() /lustre/home/2201110432/apps/abacus/abacus-test/source/driver.cpp:40
25 0x00000000004359e6 main() ???:0
26 0x000000000003ad85 __libc_start_main() ???:0
27 0x000000000043589e _start() ???:0
=================================
User need to change to scalapack_gvx. so can we fix it ?
Also, does this preblem have relation with #5707 ?
Additional Context
No response
Task list for Issue attackers (only for developers)
- Identify the specific code file or section with the code quality issue.
- Investigate the issue and determine the root cause.
- Research best practices and potential solutions for the identified issue.
- Refactor the code to improve code quality, following the suggested solution.
- Ensure the refactored code adheres to the project's coding standards.
- Test the refactored code to ensure it functions as expected.
- Update any relevant documentation, if necessary.
- Submit a pull request with the refactored code and a description of the changes made.