Skip to content

Commit

Permalink
* #128: set up GPU CI pipelines
Browse files Browse the repository at this point in the history
* #128: specify cuda dir

* #128: temporarily run GPU on every push; stop MPI builds

* #128: specify correct dockerfile

* #128: provide correct build/source dirs

* #128: rework MPI in build script

* #128: rework MPI in other gpu build script

* #128: CI build script try another configuration and fix invalid path

* #128: fix missing letter in path for a gpu build

* #128: add newlines to end of files

* #128: add spack find -p to find cuda root

* #128: only run one pipeline; add cuda paths

* #128: add kokkos variables

* #128: add Tpetra_INST_SERIAL:BOOL=ON

* #128: add CUDA root flag

* #128: use correct kokkos architecture

* #128: enable cusolver and cusparse

* #128: emulate local build

* #128: try with different docker image

* #128: update cuda path

* Try adding debug flag for Buildx

* Tpetra: Disable cudaMemcpyAsync for Intercept.cpp

* #128: lower -j

* #128: use defaul kokkos architecture

* #128: use fewer processes for GPU testing

* 128: re-enable Kokkos_ARCH_AMPERE86

* #128: add cuda sample build to CI to validate CUDA

* #128: run cuda test on NGA host

* #128: update jobs dependency on CI for cuda

* #128: add CUD sample run

* #128: remove command not existing in CI

* #128: change cuda path

* #128: try to display information about driver

* #128: change bad command in CI

* #128: fix command

* #128: try install nvidia util in Docker container

* #128: remove commands

* #128: fix cuda path

* #128: fix dockerfile

* #128: add different cuda test images

* Run container in separate step

* Remove not needed code

* Apply changes to Epetra=OFF

* Try to build and run docker within same step

* #128: remove unused old CI files

* #128: check both gpu pipelines

* #128: Tpetra_INST_SERIAL=ON

* #128: fix workflow name

* #128: rework with cuda 11.4 dockerfile

* #168: try to simplify CI sheel script

* #128: try simplify shell script

* #128: remove librairies path for blas and lapack to check if resolved

* #128: try remove Lapack and blas lib paths from cmake call

* #128: try again changing path dynamically

* #128: fix another path

* #128: fix blas path

* #128: apply working conffiguration to other build scripts

* #128: restore triggering workflows on PR

* #128: disable GPU build job for PR having `EpetraMPI T1` label

* #128: enable GPU build only with EpetraMPI T2 and EpetraMPI T3 labels

* #128: upload test log

* #128: fix typo

* #128: fix artifacts

* #128: add junit report for tests

* #128: add junit reporting in CI and set

* #128: fix artifact name

* #128: fix artifacts missing

* #128 fix extra slach char in path

* #128: fix artifacts path

* #128: fix path in gitbub action

* #128: try mounting artifacts folder into the host runner

* #128: use same logic for gpu or non-gpu pipelines

* 128: Finalize pipelines (GPU on push, MPI cancellations)

* 128: remove label requirements

* Revert "Tpetra: Disable cudaMemcpyAsync for Intercept.cpp"

This reverts commit 5db2d5d.

* #128: test intercept reversion

* Revert "Revert "Tpetra: Disable cudaMemcpyAsync for Intercept.cpp""

This reverts commit de87a22.

* #128: fix underscore

* #128: run GPU pipeline on merge to fy23 develop

---------

Co-authored-by: Thomas Dutheillet-Lamonthézie <[email protected]>
Co-authored-by: Jacob Domagala <[email protected]>
  • Loading branch information
3 people committed Sep 8, 2023
1 parent 80337ab commit afd6f62
Show file tree
Hide file tree
Showing 9 changed files with 383 additions and 11 deletions.
55 changes: 55 additions & 0 deletions .github/workflows/ci-gpu-epetraOFF.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: GPU-EpetraOFF

# Trigger the workflow on merge to NGA-FY23-develop
on:
push:
branches:
- NGA-FY23-develop
workflow_dispatch:

# Cancel any existing jobs
concurrency:
group: ${{ github.event.repository.name }}-${{ github.ref }}-${{ github.workflow }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}

jobs:
build:
runs-on: self-hosted
strategy:
fail-fast: true
steps:
- uses: actions/checkout@v3
- name: CI Variables
run: echo "DOCKER_TAG=$(echo ${{ github.ref }} | cut -d'/' -f3- | sed 's/[^a-z0-9_-]/__/gi')" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
buildkitd-flags: --debug
- name: Inspect Builder
run: |
echo "Name: ${{ steps.buildx.outputs.name }}"
echo "Endpoint: ${{ steps.buildx.outputs.endpoint }}"
echo "Status: ${{ steps.buildx.outputs.status }}"
echo "Flags: ${{ steps.buildx.outputs.flags }}"
echo "Platforms: ${{ steps.buildx.outputs.platforms }}"
echo "DOCKER_TAG: ${{ env.DOCKER_TAG }}"
- name: Build and Run Docker Image
run: |
docker build -t ${{ env.DOCKER_TAG }} -f ./nga-ci/gpu-epetraOFF.dockerfile .
docker run -v /tmp/artifacts:/tmp/artifacts --gpus all ${{ env.DOCKER_TAG }} /opt/src/Trilinos/nga-ci/test-gpu.sh
- name: Upload artifacts
uses: actions/upload-artifact@v3
if: success() || failure()
with:
name: Artifacts
path: /tmp/artifacts/*
if-no-files-found: ignore
- name: Report Test results
uses: phoenix-actions/test-reporting@v12
if: success() || failure()
with:
name: Tests report (GPU-EpetraOFF)
path: /tmp/artifacts/junit-tests-report.xml
reporter: java-junit
output-to: step-summary
fail-on-error: 'true'
55 changes: 55 additions & 0 deletions .github/workflows/ci-gpu-epetraON.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: GPU-EpetraON

# Trigger the workflow on merge to NGA-FY23-develop
on:
push:
branches:
- NGA-FY23-develop
workflow_dispatch:

# Cancel any existing jobs
concurrency:
group: ${{ github.event.repository.name }}-${{ github.ref }}-${{ github.workflow }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}

jobs:
build:
runs-on: self-hosted
strategy:
fail-fast: true
steps:
- uses: actions/checkout@v3
- name: CI Variables
run: echo "DOCKER_TAG=$(echo ${{ github.ref }} | cut -d'/' -f3- | sed 's/[^a-z0-9_-]/__/gi')" >> $GITHUB_ENV
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
with:
buildkitd-flags: --debug
- name: Inspect Builder
run: |
echo "Name: ${{ steps.buildx.outputs.name }}"
echo "Endpoint: ${{ steps.buildx.outputs.endpoint }}"
echo "Status: ${{ steps.buildx.outputs.status }}"
echo "Flags: ${{ steps.buildx.outputs.flags }}"
echo "Platforms: ${{ steps.buildx.outputs.platforms }}"
echo "DOCKER_TAG: ${{ env.DOCKER_TAG }}"
- name: Build and Run Docker Image
run: |
docker build -t ${{ env.DOCKER_TAG }} -f ./nga-ci/gpu-epetraON.dockerfile .
docker run -v /tmp/artifacts:/tmp/artifacts --gpus all ${{ env.DOCKER_TAG }} /opt/src/Trilinos/nga-ci/test-gpu.sh
- name: Upload artifacts
uses: actions/upload-artifact@v3
if: success() || failure()
with:
name: Artifacts
path: /tmp/artifacts/*
if-no-files-found: ignore
- name: Report Test results
uses: phoenix-actions/test-reporting@v12
if: success() || failure()
with:
name: Tests report (GPU-EpetraOFF)
path: /tmp/artifacts/junit-tests-report.xml
reporter: java-junit
output-to: step-summary
fail-on-error: 'true'
95 changes: 95 additions & 0 deletions nga-ci/build-gpu-epetraOFF.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/usr/bin/env bash

set -e
set -x

. /opt/spack/share/spack/setup-env.sh
spack env activate trilinos

cd /opt/build/Trilinos

export MPI_ROOT="$(dirname $(which mpicc))"
export MPICC="${MPI_ROOT}/mpicc"
export MPICXX="${MPI_ROOT}/mpicxx"
export MPIF90="${MPI_ROOT}/mpif90"
export MPIRUN="${MPI_ROOT}/mpirun"

export BLAS_ROOT="$(spack location -i openblas)"
export LAPACK_ROOT="${BLAS_ROOT}"

export CUDA_ROOT=/usr/local/cuda
export PATH=${CUDA_ROOT}/bin:$PATH
export OMPI_CXX=/opt/src/Trilinos/packages/kokkos/bin/nvcc_wrapper
export LD_LIBRARY_PATH=${CUDA_ROOT}/lib64:$LD_LIBRARY_PATH
export CUDA_LAUNCH_BLOCKING=1
ENABLE_CUDA=ON

cmake -G "${CMAKE_GENERATOR:-Ninja}" \
-D CMAKE_BUILD_TYPE=DEBUG \
-D Trilinos_ENABLE_DEBUG=ON \
-D Trilinos_PARALLEL_LINK_JOBS_LIMIT=2 \
-D Trilinos_ENABLE_ALL_PACKAGES=ON \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES=ON \
-D Trilinos_ALLOW_NO_PACKAGES=ON \
-D Trilinos_DISABLE_ENABLED_FORWARD_DEP_PACKAGES=ON \
-D Trilinos_IGNORE_MISSING_EXTRA_REPOSITORIES=ON \
-D Trilinos_ENABLE_TESTS=ON \
-D Trilinos_TEST_CATEGORIES=BASIC \
-D Trilinos_ENABLE_ALL_FORWARD_DEP_PACKAGES=ON \
-D Trilinos_VERBOSE_CONFIGURE=ON \
-D BUILD_SHARED_LIBS=ON \
\
-D Trilinos_WARNINGS_AS_ERRORS_FLAGS="-Wno-error" \
-D Trilinos_ENABLE_SEACAS=OFF \
-D Trilinos_ENABLE_Sacado=OFF \
\
-D TPL_ENABLE_CUDA="${ENABLE_CUDA}" \
-D Tpetra_INST_SERIAL=ON \
-D Tpetra_INST_CUDA=ON \
-D Trilinos_ENABLE_Kokkos=ON \
-D Kokkos_ARCH_AMPERE86=ON \
-D Kokkos_ENABLE_OPENMP=OFF \
-D Kokkos_ENABLE_CUDA="${ENABLE_CUDA}" \
-D Kokkos_ENABLE_CUDA_LAMBDA="${ENABLE_CUDA}" \
-D Kokkos_ENABLE_CUDA_UVM=OFF \
\
-D TPL_ENABLE_CUSOLVER=ON \
-D TPL_ENABLE_CUSPARSE=ON \
\
-D TPL_ENABLE_BLAS=ON \
-D TPL_BLAS_LIBRARIES="${BLAS_ROOT}/lib/libopenblas.so" \
-D TPL_ENABLE_LAPACK=ON \
-D TPL_LAPACK_LIBRARIES="${LAPACK_ROOT}/lib/libopenblas.so" \
\
-D TPL_ENABLE_Matio=OFF \
-D TPL_ENABLE_X11=OFF \
-D TPL_ENABLE_Pthread=OFF \
-D TPL_ENABLE_Boost=OFF \
-D TPL_ENABLE_BoostLib=OFF \
-D TPL_ENABLE_ParMETIS=OFF \
-D TPL_ENABLE_Zlib=OFF \
-D TPL_ENABLE_HDF5=OFF \
-D TPL_ENABLE_Netcdf=OFF \
-D TPL_ENABLE_SuperLU=OFF \
-D TPL_ENABLE_Scotch=OFF \
\
-D CMAKE_C_COMPILER=${MPICC} \
-D CMAKE_CXX_COMPILER=${MPICXX} \
-D CMAKE_Fortran_COMPILER=${MPIF90} \
-D TPL_ENABLE_MPI=ON \
-D MPI_BIN_DIR=${MPIRUN} \
-D MPI_EXEC=${MPIRUN} \
\
-D Trilinos_ENABLE_Rythmos=OFF \
-D Trilinos_ENABLE_Pike=OFF \
-D Trilinos_ENABLE_Komplex=OFF \
-D Trilinos_ENABLE_TriKota=OFF \
-D Trilinos_ENABLE_Moertel=OFF \
-D Trilinos_ENABLE_Domi=OFF \
-D Trilinos_ENABLE_FEI=OFF \
\
-D Trilinos_ENABLE_PyTrilinos=OFF \
\
-D Trilinos_ENABLE_Epetra=OFF \
-S /opt/src/Trilinos -B /opt/build/Trilinos
ninja -j 4
95 changes: 95 additions & 0 deletions nga-ci/build-gpu-epetraON.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/usr/bin/env bash

set -e
set -x

. /opt/spack/share/spack/setup-env.sh
spack env activate trilinos

cd /opt/build/Trilinos

export MPI_ROOT="$(dirname $(which mpicc))"
export MPICC="${MPI_ROOT}/mpicc"
export MPICXX="${MPI_ROOT}/mpicxx"
export MPIF90="${MPI_ROOT}/mpif90"
export MPIRUN="${MPI_ROOT}/mpirun"

export BLAS_ROOT="$(spack location -i openblas)"
export LAPACK_ROOT="${BLAS_ROOT}"

export CUDA_ROOT=/usr/local/cuda
export PATH=${CUDA_ROOT}/bin:$PATH
export OMPI_CXX=/opt/src/Trilinos/packages/kokkos/bin/nvcc_wrapper
export LD_LIBRARY_PATH=${CUDA_ROOT}/lib64:$LD_LIBRARY_PATH
export CUDA_LAUNCH_BLOCKING=1
ENABLE_CUDA=ON

cmake -G "${CMAKE_GENERATOR:-Ninja}" \
-D CMAKE_BUILD_TYPE=DEBUG \
-D Trilinos_ENABLE_DEBUG=ON \
-D Trilinos_PARALLEL_LINK_JOBS_LIMIT=2 \
-D Trilinos_ENABLE_ALL_PACKAGES=ON \
-D Trilinos_ENABLE_ALL_OPTIONAL_PACKAGES=ON \
-D Trilinos_ALLOW_NO_PACKAGES=ON \
-D Trilinos_DISABLE_ENABLED_FORWARD_DEP_PACKAGES=ON \
-D Trilinos_IGNORE_MISSING_EXTRA_REPOSITORIES=ON \
-D Trilinos_ENABLE_TESTS=ON \
-D Trilinos_TEST_CATEGORIES=BASIC \
-D Trilinos_ENABLE_ALL_FORWARD_DEP_PACKAGES=ON \
-D Trilinos_VERBOSE_CONFIGURE=ON \
-D BUILD_SHARED_LIBS=ON \
\
-D Trilinos_WARNINGS_AS_ERRORS_FLAGS="-Wno-error" \
-D Trilinos_ENABLE_SEACAS=OFF \
-D Trilinos_ENABLE_Sacado=OFF \
\
-D TPL_ENABLE_CUDA="${ENABLE_CUDA}" \
-D Tpetra_INST_SERIAL=ON \
-D Tpetra_INST_CUDA=ON \
-D Trilinos_ENABLE_Kokkos=ON \
-D Kokkos_ARCH_AMPERE86=ON \
-D Kokkos_ENABLE_OPENMP=OFF \
-D Kokkos_ENABLE_CUDA="${ENABLE_CUDA}" \
-D Kokkos_ENABLE_CUDA_LAMBDA="${ENABLE_CUDA}" \
-D Kokkos_ENABLE_CUDA_UVM=OFF \
\
-D TPL_ENABLE_CUSOLVER=ON \
-D TPL_ENABLE_CUSPARSE=ON \
\
-D TPL_ENABLE_BLAS=ON \
-D TPL_BLAS_LIBRARIES="${BLAS_ROOT}/lib/libopenblas.so" \
-D TPL_ENABLE_LAPACK=ON \
-D TPL_LAPACK_LIBRARIES="${LAPACK_ROOT}/lib/libopenblas.so" \
\
-D TPL_ENABLE_Matio=OFF \
-D TPL_ENABLE_X11=OFF \
-D TPL_ENABLE_Pthread=OFF \
-D TPL_ENABLE_Boost=OFF \
-D TPL_ENABLE_BoostLib=OFF \
-D TPL_ENABLE_ParMETIS=OFF \
-D TPL_ENABLE_Zlib=OFF \
-D TPL_ENABLE_HDF5=OFF \
-D TPL_ENABLE_Netcdf=OFF \
-D TPL_ENABLE_SuperLU=OFF \
-D TPL_ENABLE_Scotch=OFF \
\
-D CMAKE_C_COMPILER=${MPICC} \
-D CMAKE_CXX_COMPILER=${MPICXX} \
-D CMAKE_Fortran_COMPILER=${MPIF90} \
-D TPL_ENABLE_MPI=ON \
-D MPI_BIN_DIR=${MPIRUN} \
-D MPI_EXEC=${MPIRUN} \
\
-D Trilinos_ENABLE_Rythmos=OFF \
-D Trilinos_ENABLE_Pike=OFF \
-D Trilinos_ENABLE_Komplex=OFF \
-D Trilinos_ENABLE_TriKota=OFF \
-D Trilinos_ENABLE_Moertel=OFF \
-D Trilinos_ENABLE_Domi=OFF \
-D Trilinos_ENABLE_FEI=OFF \
\
-D Trilinos_ENABLE_PyTrilinos=OFF \
\
-D Trilinos_ENABLE_Epetra=ON \
-S /opt/src/Trilinos -B /opt/build/Trilinos
ninja -j 4
11 changes: 11 additions & 0 deletions nga-ci/gpu-epetraOFF.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Choose a base image
FROM calebschilly/trilinos-deps:main AS build-stage

COPY . /opt/src/Trilinos
RUN mkdir -p /opt/build/Trilinos

# Build using the spack environment we created
RUN bash /opt/src/Trilinos/nga-ci/build-gpu-epetraOFF.sh

# For running later
RUN chmod +x /opt/src/Trilinos/nga-ci/test-gpu.sh
11 changes: 11 additions & 0 deletions nga-ci/gpu-epetraON.dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Choose a base image
FROM calebschilly/trilinos-deps:main AS build-stage

COPY . /opt/src/Trilinos
RUN mkdir -p /opt/build/Trilinos

# Build using the spack environment we created
RUN bash /opt/src/Trilinos/nga-ci/build-gpu-epetraON.sh

# For running later
RUN chmod +x /opt/src/Trilinos/nga-ci/test-gpu.sh
25 changes: 25 additions & 0 deletions nga-ci/test-gpu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash

set -x
set -e

. /opt/spack/share/spack/setup-env.sh
spack env activate trilinos

cd /opt/build/Trilinos
ret_code=0

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

# path to the artifacts
artifacts_dir=/tmp/artifacts

ctest -j 5 --output-junit junit-tests-report.xml --output-on-failure || ret_code=$?
# We collect the test logs for exporting
echo "ctest returned: $ret_code"
mkdir -p ${artifacts_dir}
cp /opt/build/Trilinos/junit-tests-report.xml ${artifacts_dir}
cp /opt/build/Trilinos/Testing/Temporary/LastTest.log ${artifacts_dir}
echo ${ret_code} > ${artifacts_dir}/success_flag.txt
ls ${artifacts_dir}
25 changes: 25 additions & 0 deletions nga-ci/test-mpi.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/usr/bin/env bash

set -x
set -e

. /opt/spack/share/spack/setup-env.sh
spack env activate trilinos

cd /opt/build/Trilinos
ret_code=0

export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

# path to the artifacts
artifacts_dir=/tmp/artifacts

ctest -j 14 --output-junit junit-tests-report.xml --output-on-failure || ret_code=$?
# We collect the test logs for exporting
echo "ctest returned: $ret_code"
mkdir -p ${artifacts_dir}
cp /opt/build/Trilinos/junit-tests-report.xml ${artifacts_dir}
cp /opt/build/Trilinos/Testing/Temporary/LastTest.log ${artifacts_dir}
echo ${ret_code} > ${artifacts_dir}/success_flag.txt
ls ${artifacts_dir}
Loading

0 comments on commit afd6f62

Please sign in to comment.