#163: Belos: Provide `Tpetra` version of LSQR examples #168

tlamonthezie · 2023-09-07T15:24:24Z

Fixes #163

* #128: set up GPU CI pipelines * #128: specify cuda dir * #128: temporarily run GPU on every push; stop MPI builds * #128: specify correct dockerfile * #128: provide correct build/source dirs * #128: rework MPI in build script * #128: rework MPI in other gpu build script * #128: CI build script try another configuration and fix invalid path * #128: fix missing letter in path for a gpu build * #128: add newlines to end of files * #128: add spack find -p to find cuda root * #128: only run one pipeline; add cuda paths * #128: add kokkos variables * #128: add Tpetra_INST_SERIAL:BOOL=ON * #128: add CUDA root flag * #128: use correct kokkos architecture * #128: enable cusolver and cusparse * #128: emulate local build * #128: try with different docker image * #128: update cuda path * Try adding debug flag for Buildx * Tpetra: Disable cudaMemcpyAsync for Intercept.cpp * #128: lower -j * #128: use defaul kokkos architecture * #128: use fewer processes for GPU testing * 128: re-enable Kokkos_ARCH_AMPERE86 * #128: add cuda sample build to CI to validate CUDA * #128: run cuda test on NGA host * #128: update jobs dependency on CI for cuda * #128: add CUD sample run * #128: remove command not existing in CI * #128: change cuda path * #128: try to display information about driver * #128: change bad command in CI * #128: fix command * #128: try install nvidia util in Docker container * #128: remove commands * #128: fix cuda path * #128: fix dockerfile * #128: add different cuda test images * Run container in separate step * Remove not needed code * Apply changes to Epetra=OFF * Try to build and run docker within same step * #128: remove unused old CI files * #128: check both gpu pipelines * #128: Tpetra_INST_SERIAL=ON * #128: fix workflow name * #128: rework with cuda 11.4 dockerfile * #168: try to simplify CI sheel script * #128: try simplify shell script * #128: remove librairies path for blas and lapack to check if resolved * #128: try remove Lapack and blas lib paths from cmake call * #128: try again changing path dynamically * #128: fix another path * #128: fix blas path * #128: apply working conffiguration to other build scripts * #128: restore triggering workflows on PR * #128: disable GPU build job for PR having `EpetraMPI T1` label * #128: enable GPU build only with EpetraMPI T2 and EpetraMPI T3 labels * #128: upload test log * #128: fix typo * #128: fix artifacts * #128: add junit report for tests * #128: add junit reporting in CI and set * #128: fix artifact name * #128: fix artifacts missing * #128 fix extra slach char in path * #128: fix artifacts path * #128: fix path in gitbub action * #128: try mounting artifacts folder into the host runner * #128: use same logic for gpu or non-gpu pipelines * 128: Finalize pipelines (GPU on push, MPI cancellations) * 128: remove label requirements * Revert "Tpetra: Disable cudaMemcpyAsync for Intercept.cpp" This reverts commit 5db2d5d. * #128: test intercept reversion * Revert "Revert "Tpetra: Disable cudaMemcpyAsync for Intercept.cpp"" This reverts commit de87a22. * #128: fix underscore * #128: run GPU pipeline on merge to fy23 develop --------- Co-authored-by: Thomas Dutheillet-Lamonthézie <[email protected]> Co-authored-by: Jacob Domagala <[email protected]>

* #128: specify cuda dir * #128: temporarily run GPU on every push; stop MPI builds * #128: specify correct dockerfile * #128: provide correct build/source dirs * #128: rework MPI in build script * #128: rework MPI in other gpu build script * #128: CI build script try another configuration and fix invalid path * #128: fix missing letter in path for a gpu build * #128: add newlines to end of files * #128: add spack find -p to find cuda root * #128: only run one pipeline; add cuda paths * #128: add kokkos variables * #128: add Tpetra_INST_SERIAL:BOOL=ON * #128: add CUDA root flag * #128: use correct kokkos architecture * #128: enable cusolver and cusparse * #128: emulate local build * #128: try with different docker image * #128: update cuda path * Try adding debug flag for Buildx * Tpetra: Disable cudaMemcpyAsync for Intercept.cpp * #128: lower -j * #128: use defaul kokkos architecture * #128: use fewer processes for GPU testing * 128: re-enable Kokkos_ARCH_AMPERE86 * #128: add cuda sample build to CI to validate CUDA * #128: run cuda test on NGA host * #128: update jobs dependency on CI for cuda * #128: add CUD sample run * #128: remove command not existing in CI * #128: change cuda path * #128: try to display information about driver * #128: change bad command in CI * #128: fix command * #128: try install nvidia util in Docker container * #128: remove commands * #128: fix cuda path * #128: fix dockerfile * #128: add different cuda test images * Run container in separate step * Remove not needed code * Apply changes to Epetra=OFF * Try to build and run docker within same step * #128: remove unused old CI files * #128: check both gpu pipelines * #128: Tpetra_INST_SERIAL=ON * #128: fix workflow name * #128: rework with cuda 11.4 dockerfile * #168: try to simplify CI sheel script * #128: try simplify shell script * #128: remove librairies path for blas and lapack to check if resolved * #128: try remove Lapack and blas lib paths from cmake call * #128: try again changing path dynamically * #128: fix another path * #128: fix blas path * #128: apply working conffiguration to other build scripts * #128: restore triggering workflows on PR * #128: disable GPU build job for PR having `EpetraMPI T1` label * #128: enable GPU build only with EpetraMPI T2 and EpetraMPI T3 labels * #128: upload test log * #128: fix typo * #128: fix artifacts * #128: add junit report for tests * #128: add junit reporting in CI and set * #128: fix artifact name * #128: fix artifacts missing * #128 fix extra slach char in path * #128: fix artifacts path * #128: fix path in gitbub action * #128: try mounting artifacts folder into the host runner * #128: use same logic for gpu or non-gpu pipelines * 128: Finalize pipelines (GPU on push, MPI cancellations) * 128: remove label requirements * Revert "Tpetra: Disable cudaMemcpyAsync for Intercept.cpp" This reverts commit 5db2d5d. * #128: test intercept reversion * Revert "Revert "Tpetra: Disable cudaMemcpyAsync for Intercept.cpp"" This reverts commit de87a22. * #128: fix underscore * #128: run GPU pipeline on merge to fy23 develop --------- Co-authored-by: Thomas Dutheillet-Lamonthézie <[email protected]> Co-authored-by: Jacob Domagala <[email protected]>

tlamonthezie · 2023-09-11T14:53:46Z

Hello @stmcgovern

The new PrecLSQRTpetraExFile example would require to add ifpack2 as dependency for Belos but it would add a circular dependency. It is due to the fact that in Ifpack2 depnds on Belos (only for some tests and examples as I can see)

I think there are 2 possibilities if we want to have a new Tpetra version of this example:

Option A: Add PrecLSQRTpetraExFile to Ifpack2 examples
Option B: Add PrecLSQRTpetraExFile to Belos examples + move Belos/Ifpack2 tests and examples from Ifpack2 to Belos package + change dependencies (make belos depend on Ifpack2 instead of the inverse, like it is the case with Ifpack)

I think the option A is easier to apply but the option B might be more logic even if we have to modify 2 packages at once.
@stmcgovern could you please give me some guidance on that or ask Belos and Ifpack2 package owners ?
Or is there maybe a place where to add tests and examples external to thse 2 packages to prevent the circular dependency ?

As I am writing this comment I am working on making the example working by ignoring the dependency problem but we will have to know how to make it work at build stage to not have dependency issue.

tlamonthezie · 2023-09-11T15:30:55Z

Currently the example does not complete before test timeout of 1500 seconds in the CI if we include the example as a test.
But it WORKS on local machine in one second.
So I guess the problem is not the code but might be the parameters or someting linked to the CI machine... I suggest to ignore that timeout.

packages/belos/tpetra/example/LSQR/CMakeLists.txt

packages/belos/tpetra/example/LSQR/LSQRTpetraExFile.cpp

packages/belos/tpetra/example/LSQR/PrecLSQRTpetraExFile.cpp

cwschilly

Looks good to me, after rebase should be good to go

packages/belos/tpetra/example/CMakeLists.txt

…rbose output

…atement

stmcgovern

The Preconditioned version "PrecLQSRTpetraExFile.cpp" is provided but is not built (commented out in CMakeLists.txt) due to the circular dependency issue under consideration by the package developers.

tlamonthezie changed the title ~~163 belos add lsqr examples~~ #163: Belos: Provide Tpetra version of LSQR examples Sep 7, 2023

tlamonthezie self-assigned this Sep 7, 2023

tlamonthezie added NGA-internal NGA workers will take care of these EpetraMPI T2 pkg: Belos labels Sep 7, 2023

tlamonthezie changed the title ~~#163: Belos: Provide Tpetra version of LSQR examples~~ #163: Belos: Provide Tpetra version of LSQR examples Sep 7, 2023

tlamonthezie force-pushed the 163-belos-add-LSQR-examples branch from 9605912 to a1b19bc Compare September 11, 2023 15:49

tlamonthezie requested review from stmcgovern, antoinemeyer5 and cwschilly September 13, 2023 16:13

antoinemeyer5 reviewed Sep 14, 2023

View reviewed changes

cwschilly previously approved these changes Sep 14, 2023

View reviewed changes

packages/belos/tpetra/example/CMakeLists.txt Show resolved Hide resolved

cwschilly mentioned this pull request Sep 14, 2023

List of PRs to Trilinos/develop and fork development #77

Open

tlamonthezie marked this pull request as ready for review September 14, 2023 15:29

cwschilly force-pushed the 163-belos-add-LSQR-examples branch from 56b7a2c to 504841c Compare September 15, 2023 15:28

tlamonthezie added 11 commits September 15, 2023 11:30

#163: start working on the 2 LSQR examples

fa1beda

#163: WIP rewriting LSQRT example

05979db

/ #163: add LSQR subdirectory in cmakelists

97cd7c6

#163: fix method calls

d84c6c7

#163: WIP fix another method calls

666321d

#163: try code from another test to make compile and temporary set ve…

4e95f0f

…rbose output

#163: clean code

94f1cdc

#163: remove test for Tpetra LSQR example as it converge

fae05cf

#163: start writing second LSQR example for tpetra

0067fb1

#163: add Ifpack2 preconditioner and fix types

d7dc9a2

#163: add Ifpack2 error handling function and fix two constructor calls

200f629

tlamonthezie added 15 commits September 15, 2023 11:32

#163: remove invalid method

638d450

#163: fix bad method call and disable bad old code

0bba07a

#163: clean code ad refactor Preconditioner example

df1c5b3

#163: fix parameters for LSQR Prec example and remove unused using st…

a298349

…atement

#163: update comment in CMakeLists

83f1eda

#163: remove a comment

caaad12

#163: remove test of the examples

a5d9c63

#163: test change epsilon

695bd02

#163: update some default parameters

6f221cb

#163: fix type error

5d82fd3

#163: add missing requirement to triutils in cmakelists

beaf12b

#163: code quality following review

3351d20

#163: clean code

8aa8048

#163: clean code

c97603b

#163: remove test from example

0b158b3

cwschilly dismissed their stale review via 0b158b3 September 15, 2023 15:33

cwschilly force-pushed the 163-belos-add-LSQR-examples branch from 504841c to 0b158b3 Compare September 15, 2023 15:33

stmcgovern approved these changes Sep 15, 2023

View reviewed changes

stmcgovern closed this Sep 15, 2023

stmcgovern reopened this Sep 15, 2023

stmcgovern merged commit 096231a into NGA-FY23-develop Sep 15, 2023
0 of 4 checks passed

tlamonthezie added the PR to Trilinos label Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#163: Belos: Provide `Tpetra` version of LSQR examples #168

#163: Belos: Provide `Tpetra` version of LSQR examples #168

tlamonthezie commented Sep 7, 2023 •

edited

Loading

tlamonthezie commented Sep 11, 2023

tlamonthezie commented Sep 11, 2023 •

edited

Loading

cwschilly left a comment

stmcgovern left a comment

#163: Belos: Provide Tpetra version of LSQR examples #168

#163: Belos: Provide Tpetra version of LSQR examples #168

Conversation

tlamonthezie commented Sep 7, 2023 • edited Loading

tlamonthezie commented Sep 11, 2023

tlamonthezie commented Sep 11, 2023 • edited Loading

cwschilly left a comment

Choose a reason for hiding this comment

stmcgovern left a comment

Choose a reason for hiding this comment

#163: Belos: Provide `Tpetra` version of LSQR examples #168

#163: Belos: Provide `Tpetra` version of LSQR examples #168

tlamonthezie commented Sep 7, 2023 •

edited

Loading

tlamonthezie commented Sep 11, 2023 •

edited

Loading