Skip to content

Feature: Implement cal_force_op for sincos parallel #6265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 12, 2025

Conversation

jieli-matrix
Copy link
Collaborator

This PR introduces specialized GPU operators to accelerate the sincos computation bottlenecks in force calculations. The implementation targets the most computationally intensive loops in cal_force_loc and cal_force_ew functions, where ModuleBase::libm::sincos has been identified as the primary CPU hotspot.

Done:

  • Operator interface design
  • CPU reference implementations
  • CUDA/HIP GPU kernels
  • Code Integration and Calling Interface

ToDos:

  • AtomicAdd Optimization

@jieli-matrix
Copy link
Collaborator Author

All tests are now passing successfully. We need to benchmark the new code on GPU hardware to determine if further optimizations are warranted (e.g., reducing atomic operations, improving memory access patterns, or alternative reduction strategies). cc: @mohanchen @dyzheng

@mohanchen mohanchen added GPU & DCU & HPC GPU and DCU and HPC related any issues Refactor Refactor ABACUS codes labels Jun 7, 2025
@mohanchen mohanchen merged commit 808af53 into deepmodeling:LTS Jun 12, 2025
14 checks passed
@mohanchen mohanchen added the Performance Issues related to fail running ABACUS label Jun 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GPU & DCU & HPC GPU and DCU and HPC related any issues Performance Issues related to fail running ABACUS Refactor Refactor ABACUS codes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants