Skip to content

Conversation

@JamieJQuinn
Copy link

This PR implements SPMV, SYMGS and the prolongation and restriction operators using a matrix-free method that manually applies the stencil encoded in the associated sparse matrix with extended for loops, instead of using the matrix directly. This is intended to remove the overhead of accessing the sparse matrix and the indirect memory accesses of the input vector. Things that are working:

  • Valid results (as reported by HPCG) when:
    • compiling and running with arch=Linux_Serial
    • compiling and running with arch=GCC_OMP and multiple OpenMP threads
    • compiling with arch=Linux_MPI and running with one MPI process
  • Toggling matrix-free and original version of each function independently
  • Comparison functions to compare outputs of the SPMV or SYMGS matrix-free calculations with the original versions.
  • Indexing map that exactly matches the flattened memory layout of the matrix or vector inputs

Things that are still sketchy, broken or untested:

  • Implementing other stencils
  • MPI implementation with >1 mpi process
  • Accessing the stencil properly from the input sparse matrix (or otherwise)

Previous matrix-free implementations manually handle the boundaries of the domain using generated code. This implementation uses ghost points at the edge of the boundary to incorporate known boundary values outside the domain, allow the use of one single loop over the entire domain. This reduces the complexity and size of the code, however we perform O(x^2) more FLOPs. Since the algorithm scales with O(x^3) this should be negligible and the preliminary performance testing reported below shows the tradeoff is acceptable.

Initial performance for MPI (single process) version is:

  • This PR: 8.672 GFLOP/s
  • Original: 2.62676 GFLOP/s
  • Generated matrix-free: 5.38418 GFLOP/s

@JamieJQuinn JamieJQuinn changed the base branch from master to original July 4, 2023 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants