Feature: Implement cal_force_op for sincos parallel #6265

jieli-matrix · 2025-06-04T03:54:26Z

This PR introduces specialized GPU operators to accelerate the sincos computation bottlenecks in force calculations. The implementation targets the most computationally intensive loops in cal_force_loc and cal_force_ew functions, where ModuleBase::libm::sincos has been identified as the primary CPU hotspot.

Done:

Operator interface design
CPU reference implementations
CUDA/HIP GPU kernels
Code Integration and Calling Interface

ToDos:

AtomicAdd Optimization

jieli-matrix · 2025-06-04T08:40:49Z

All tests are now passing successfully. We need to benchmark the new code on GPU hardware to determine if further optimizations are warranted (e.g., reducing atomic operations, improving memory access patterns, or alternative reduction strategies). cc: @mohanchen @dyzheng

source/module_hamilt_pw/hamilt_pwdft/forces.cpp

source/module_hamilt_pw/hamilt_pwdft/kernels/force_op.cpp

jieli-matrix added 5 commits June 4, 2025 11:36

implement gpu op for sincos loops

94bfa42

add cpu kernel for cal_force_loc & cal_force_ew

69aa100

fix sincos op for gpu&cpu

06b2e25

fix vloc computation in cal_force_loc_sincos_op

48e91db

fix cal_force_ew

6638968

jieli-matrix mentioned this pull request Jun 4, 2025

Feature: Implement cal_force_op for sincos parallel #6251

Closed

6 tasks

fix malloc error

f3eafd0

mohanchen reviewed Jun 7, 2025

View reviewed changes

source/module_hamilt_pw/hamilt_pwdft/forces.cpp Show resolved Hide resolved

source/module_hamilt_pw/hamilt_pwdft/kernels/force_op.cpp Show resolved Hide resolved

mohanchen added GPU & DCU & HPC GPU and DCU and HPC related any issues Refactor Refactor ABACUS codes labels Jun 7, 2025

mohanchen approved these changes Jun 12, 2025

View reviewed changes

mohanchen merged commit 808af53 into deepmodeling:LTS Jun 12, 2025
14 checks passed

mohanchen added the Performance Issues related to fail running ABACUS label Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Implement cal_force_op for sincos parallel #6265

Feature: Implement cal_force_op for sincos parallel #6265

Uh oh!

jieli-matrix commented Jun 4, 2025

Uh oh!

jieli-matrix commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Feature: Implement cal_force_op for sincos parallel #6265

Feature: Implement cal_force_op for sincos parallel #6265

Uh oh!

Conversation

jieli-matrix commented Jun 4, 2025

Uh oh!

jieli-matrix commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!