Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Randomized solver (both cuda and hip) #83

Merged
merged 9 commits into from
Dec 2, 2023
Merged

Randomized solver (both cuda and hip) #83

merged 9 commits into from
Dec 2, 2023

Conversation

kswirydo
Copy link
Collaborator

@kswirydo kswirydo commented Nov 28, 2023

Working randomized GMRES with preconditioner. You can make it into FGMRES by setting flexible to 1.

Caveat: the cuda version uses ancient version of triangular solve cusparseDcsrsv2 (I am using ILU0 preconditioner that needs triangular solve); despite my best efforts, I was not able to get the more modern version, cusparseSpSV to generate correct results. Hence, the code generates correct results... but uses older version of triangular solve so it compiles with tones of warnings on deprecated CUDA functions. I followed this example from cuda samples to no avail. See previous commits.

What has been added:

  • Two randomized classes (for fast Walsh Hadamard tranform based sketching and for Count-Sketch based sketching); cuda and hip kerneles for both,
  • Randomized solver that deploys Theta (sketching).
  • A cuda and hip-based examples.

To test, run runResolveOneMatrix with matrix of your choice (I tested with vas_stokes_1M from SuiteSparse).

Also fixes #84

Copy link
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work! My main suggestion is to have functionality tests for CUDA and HIP backends implemented and added to this PR. There are a few minor issues I also mentioned in this PR.

resolve/GramSchmidt.cpp Outdated Show resolved Hide resolved
resolve/CMakeLists.txt Show resolved Hide resolved
resolve/CMakeLists.txt Outdated Show resolved Hide resolved
resolve/LinSolverDirectRocSparseILU0.cpp Outdated Show resolved Hide resolved
resolve/LinSolverIterativeFGMRES.hpp Outdated Show resolved Hide resolved
resolve/cuda/cudaKernels.cu Show resolved Hide resolved
resolve/hip/hipKernels.hip Show resolved Hide resolved
resolve/hip/hipKernels.hip Show resolved Hide resolved
runResolveOneMatrix Outdated Show resolved Hide resolved
profile.sysinfo.txt Outdated Show resolved Hide resolved
@pelesh
Copy link
Collaborator

pelesh commented Nov 28, 2023

Ascent build fails with following error message:

Building CUDA object resolve/cuda/CMakeFiles/resolve_backend_cuda.dir/MemoryUtils.cu.o
/gpfs/wolf/proj-shared/csc359/ci/489235/resolve/cuda/cudaKernels.cu(157): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, double)
1 error detected in the compilation of "/gpfs/wolf/proj-shared/csc359/ci/489235/resolve/cuda/cudaKernels.cu".
make[2]: *** [resolve/cuda/CMakeFiles/resolve_backend_cuda.dir/build.make:76: resolve/cuda/CMakeFiles/resolve_backend_cuda.dir/cudaKernels.cu.o] Error 1

@kswirydo
Copy link
Collaborator Author

You need cuda architecture 7.0 (or 70) to build this. Othewise atomicAdd is defined only for floats but not for doubles. Default is 5.2 somehow.

@cameronrutherford
Copy link
Collaborator

I don't know why PNNL tests aren't running, but now ORNL tests are passing @ryandanehy

@kswirydo
Copy link
Collaborator Author

My lastest commit has a working version of cuda 11+ compatible SpSV triangular solve in ILU0 class. I also applied Slaven's dependency comments to 2 files.

Copy link
Collaborator

@cameronrutherford cameronrutherford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

examples/r_randGMRES.cpp Show resolved Hide resolved
examples/CMakeLists.txt Show resolved Hide resolved
resolve/CMakeLists.txt Show resolved Hide resolved
resolve/LinSolverDirectCuSparseILU0.cpp Show resolved Hide resolved
resolve/LinSolverDirectCuSparseILU0.hpp Show resolved Hide resolved
resolve/LinSolverIterativeFGMRES.hpp Show resolved Hide resolved
resolve/LinSolverIterativeRandFGMRES.hpp Show resolved Hide resolved
resolve/RandSketchingManager.hpp Show resolved Hide resolved
resolve/cuda/cudaKernels.cu Show resolved Hide resolved
runResolveOneMatrix Outdated Show resolved Hide resolved
@ryandanehy
Copy link
Collaborator

@kswirydo to get this to run PNNL tests the pr conflicts need to be fixed (probally best to do it through rebase). Then PNNL tests should start running.

Implementations for AMD and NVIDIA GPUs
@pelesh
Copy link
Collaborator

pelesh commented Nov 29, 2023

@kswirydo to get this to run PNNL tests the pr conflicts need to be fixed (probally best to do it through rebase). Then PNNL tests should start running.

It seems all we needed was to rebase to develop, not sure why.

Copy link
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining items before this can be merged:

  • Add functionality tests for randomized (F)GMRES on AMD and NVIDIA GPUs
  • Use STL math on the host (this will take care of a big chunk of warnings)
  • Fix other warnings
  • Remove roctx.h header from LinSolverDirectRocSparseILU0.hpp.

Almost there ;)

@pelesh
Copy link
Collaborator

pelesh commented Nov 30, 2023

I fixed our little mishap with rebasing. All is needed to do is to:

  • Generate test matrices inside functionality tests. This is lesser evil than keeping 170MB test matrix in a repo.
  • Add again randomized solvers tests to CMake build and testing.

@cameronrutherford
Copy link
Collaborator

I fixed our little mishap with rebasing. All is needed to do is to:

  • Generate test matrices inside functionality tests. This is lesser evil than keeping 170MB test matrix in a repo.
  • Add again randomized solvers tests to CMake build and testing.

GitHub shouldn't allow files that big. pre-commit can also help detect when these large files are added, and custom thresholds can be configured cc @ryandanehy

Copy link
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review issues have been addressed. Merge once CI tests pass.

@pelesh pelesh self-assigned this Dec 2, 2023
@pelesh pelesh merged commit 4c009e6 into develop Dec 2, 2023
4 checks passed
@pelesh pelesh deleted the rand-gmres-dev2 branch December 11, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA capability not set on Ascent pipeline
4 participants