Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Belos CGSingleRedIter: Reorder memory so that MvTransMv arguments have constant stride #12797

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

cgcgcg
Copy link
Contributor

@cgcgcg cgcgcg commented Mar 6, 2024

@trilinos/belos

Motivation

The single reduce CG uses two MVs to hold its vector data. Different operations act on single vectors or sub-MVs. In particular, there is a call to MvTransMv for MVs $S=[W_0, W_1]$ and $T=[W_0,W_2]$ where $W$ is a MV with 3 columns. This call was slow on GPUs because it ran on a MV with non-constant stride. By reordering $W$ we now have $S=[W_0, W_1]$ and $T=[W_1,W_2]$ instead.

@cgcgcg cgcgcg self-assigned this Mar 6, 2024
@cgcgcg cgcgcg requested a review from a team as a code owner March 6, 2024 18:21
@trilinos-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request.

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: Trilinos_PR_gcc-8.3.0

  • Build Num: 3759
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any&&!sandybridge
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-serial

  • Build Num: 2258
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-v2-gnu-8.3.0-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-debug

  • Build Num: 2248
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_clang-11.0.1

  • Build Num: 2247
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-clang-11.0.1-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_python3

  • Build Num: 3421
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-7.2.0-anaconda3-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_pr-framework
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL ascic
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_cuda-11.4.2-uvm-off

  • Build Num: 3250
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL GPU
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_intel-2021.3

  • Build Num: 1889
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-intel-2021.3-sems-openmpi-4.0.5_release-debug_shared_no-kokkos-arch_no-asan_no-complex_fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-off_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_cuda-11.4.20-uvm

  • Build Num: 5
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Using Repos:

Repo: TRILINOS (cgcgcg/Trilinos)
  • Branch: belosSingleRedConstStride
  • SHA: cc9f254
  • Mode: TEST_REPO

Pull Request Author: cgcgcg

@trilinos-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: Trilinos_PR_gcc-8.3.0

  • Build Num: 3759
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-openmp_release-debug_static_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any&&!sandybridge
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-serial

  • Build Num: 2258
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-v2-gnu-8.3.0-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_gcc-8.3.0-debug

  • Build Num: 2248
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-8.3.0-openmpi-1.10.1-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_clang-11.0.1

  • Build Num: 2247
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-clang-11.0.1-openmpi-1.10.1-serial_release-debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_python3

  • Build Num: 3421
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-gnu-7.2.0-anaconda3-serial_debug_shared_no-kokkos-arch_no-asan_no-complex_no-fpic_no-mpi_no-pt_no-rdc_no-uvm_deprecated-on_pr-framework
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL ascic
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_cuda-11.4.2-uvm-off

  • Build Num: 3250
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_no-uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL GPU
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_intel-2021.3

  • Build Num: 1889
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-intel-2021.3-sems-openmpi-4.0.5_release-debug_shared_no-kokkos-arch_no-asan_no-complex_fpic_mpi_no-pt_no-rdc_no-uvm_deprecated-off_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f

Build Information

Test Name: Trilinos_PR_cuda-11.4.20-uvm

  • Build Num: 5
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
FORCE_CLEAN true
GENCONFIG_BUILD_NAME rhel7_sems-cuda-11.4.2-sems-gnu-10.1.0-sems-openmpi-4.0.5_release_static_Volta70_no-asan_complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables
PR_LABELS pkg: Belos
PULLREQUESTNUM 12797
PULLREQUEST_CDASH_TRACK Pull Request
TEST_REPO_ALIAS TRILINOS
TRILINOS_NODE_LABEL trilinos-any
TRILINOS_SOURCE_REPO https://github.com/cgcgcg/Trilinos
TRILINOS_SOURCE_SHA cc9f254
TRILINOS_SRN_CONFIG true
TRILINOS_TARGET_BRANCH develop
TRILINOS_TARGET_REPO https://github.com/trilinos/Trilinos
TRILINOS_TARGET_SHA 204363f


CDash Test Results for PR# 12797.


Wiki: How to Reproduce PR Testing Builds and Errors.

@jhux2
Copy link
Member

jhux2 commented Mar 8, 2024

@cgcgcg Cool. Can you quantify what you saw on GPUs prior to this/the resulting speedup due to this PR?

@cgcgcg
Copy link
Contributor Author

cgcgcg commented Mar 8, 2024

@jhux2 I ran MiniEM with the maxwell-large input deck on a single device on Weaver. Here are the timers of the mass solve with "Fold Convergence Detection Into Allreduce"=true before the change:

CG Q_B: BlockCGSolMgr total solve time: 3.95561 - 22.2116% [100]
|   Belos::MVT::Assign: 0.00350715 - 0.0886627% [200]
|   CG Q_B: Operation Prec*x: 0.46646 - 11.7924% [2179]
|   |   Ifpack2::Relaxation::apply: 0.0237883 - 5.09975% [2179]
|   |   Remainder: 0.442672 - 94.9002%
|   CG Q_B: Operation Op*x: 0.634718 - 16.046% [2179]
|   Belos::MVT::MvTransMv: 2.62291 - 66.3086% [2179]
|   Belos::MVT::MvAddMv: 0.0594899 - 1.50394% [6138]
|   Remainder: 0.168526 - 4.26043%

And with the fix:

CG Q_B: BlockCGSolMgr total solve time: 1.68635 - 10.9587% [100]
|   Belos::MVT::Assign: 0.00506513 - 0.300361% [200]
|   CG Q_B: Operation Prec*x: 0.450976 - 26.7428% [2179]
|   |   Ifpack2::Relaxation::apply: 0.0237705 - 5.2709% [2179]
|   |   Remainder: 0.427206 - 94.7291%
|   CG Q_B: Operation Op*x: 0.631306 - 37.4363% [2179]
|   Belos::MVT::MvTransMv: 0.356611 - 21.147% [2179]
|   Belos::MVT::MvAddMv: 0.0725575 - 4.30264% [6138]
|   Remainder: 0.169832 - 10.071%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants