The purpose of this repository is to both showcase the performance of the Ginkgo accessor and to serve as an integration example.
To use Ginkgo's accessor, you need to:
- Use C++14 or higher in your own project
- use
${GINKGO_DIR}
as an include directory (you only need${GINKGO_DIR}/accessor
from Ginkgo)
In this repository, we use CMake, which makes the integration straight forward.
We give users the option to either specify a local copy of the Ginkgo repository, or automatically clone the repository into the build directory, followed by using it.
We achieve both with these few lines in CMake:
Lines 16 to 43 in 8137ba6
In this repository, we only use the reduced_row_major
accessor, but all others work accordingly.
For the reduced_row_major
accessor, you need to specify:
- the dimensionality of the range (we specify 2D, even for vectors, so we can access vectors with a stride)
- the arithmetic and storage type
Now, this type can be used to create the
range<reduced_row_major<...>>
by specifying the size, stride and storage pointer.
We showcase the creation of both constant and non-constant ranges with reduced_row_major
accessors here:
accessor-BLAS/cuda/gemv_kernels.cuh
Lines 178 to 189 in 8137ba6
Utilizing the range in a kernel (works the same for CPUs) is straight forward:
- Use a templated kernel argument in order to accept all kind of ranges
- Read and write operations utilize the bracket operator()
To know which arithmetic type is used, we can either use the accessor::arithmetic_type
property, or detect what type arithmetic operations result in. In this example, we use the second option:
accessor-BLAS/cuda/gemv_kernels.cuh
Line 86 in 8137ba6
Read and write options can be observed in GEMV here:
accessor-BLAS/cuda/gemv_kernels.cuh
Line 110 in 8137ba6
Here, we compare the GEMV kernel written with plain pointers:
accessor-BLAS/cuda/gemv_kernels.cuh
Lines 30 to 64 in 8137ba6
and using the range/accessor:
accessor-BLAS/cuda/gemv_kernels.cuh
Lines 79 to 113 in 8137ba6
The main differences between these are:
- We have fewer parameters in the range/accesser kernel because stride and size information are integrated into the ranges.
- We don't need to compute a 1D index with the range/accessor because indices to both dimensions are fed into the brace operator
- For debug purposes (and showcase), we extract the arithmetic type from the accessor.
So far, we ran the benchmarks and error analysis on an nvidia A100 and on an nvidia V100 GPU.