title | event | lang |
---|---|---|
OpenMP offloading: <br>device functions |
CSC Summer School in High-Performance Computing 2024 |
en |
- Often it can be useful to call functions within loops to improve readability and modularisation
- By default OpenMP does not create accelerated regions for loops calling functions
- One has to instruct the compiler to compile a device version of the function
- Define a function to be compiled for an accelerator as well as the host
- C/C++
- enclose function declaration within
#pragma omp declare target
and#pragma omp end declare target
- enclose function declaration within
- Fortran
- use
!$omp declare target
within the subroutine
- use
- The functions will now be compiled both for host and device execution
**C/C++**
```c
#pragma omp declare target
void foo(float* v, int i, int n) {
for ( int j=0; j#pragma omp target teams loop
for (int i=0; i<n; ++i) {
foo(v,i); // executed on the device
}
</div>
<div class="column">
**Fortran**
```fortranfree
subroutine foo(v, i, n)
!$omp declare target
real :: v(:,:)
integer :: i, n
do j=1,n
v(i,j,n) = 1.0/(i*j)
enddo
end subroutine
!$omp target teams loop
do i=1,n
call foo(v,i,n)
enddo
!$omp end target teams loop
- Declare target directive
- Enables one to write device functions that can be called within parallel loops