-
Notifications
You must be signed in to change notification settings - Fork 558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Belos CGSingleRedIter: Reorder memory so that MvTransMv arguments have constant stride #12797
base: develop
Are you sure you want to change the base?
Conversation
…e constant stride
Status Flag 'Pre-Test Inspection' - Auto Inspected - Inspection is Not Necessary for this Pull Request. |
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
Using Repos:
Pull Request Author: cgcgcg |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs 30 Mins. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: Trilinos_PR_gcc-8.3.0
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-serial
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_gcc-8.3.0-debug
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_clang-11.0.1
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_python3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.2-uvm-off
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_intel-2021.3
Jenkins Parameters
Build InformationTest Name: Trilinos_PR_cuda-11.4.20-uvm
Jenkins Parameters
|
@cgcgcg Cool. Can you quantify what you saw on GPUs prior to this/the resulting speedup due to this PR? |
@jhux2 I ran MiniEM with the maxwell-large input deck on a single device on Weaver. Here are the timers of the mass solve with
And with the fix:
|
@trilinos/belos
Motivation
The single reduce CG uses two MVs to hold its vector data. Different operations act on single vectors or sub-MVs. In particular, there is a call to$S=[W_0, W_1]$ and $T=[W_0,W_2]$ where $W$ is a MV with 3 columns. This call was slow on GPUs because it ran on a MV with non-constant stride. By reordering $W$ we now have $S=[W_0, W_1]$ and $T=[W_1,W_2]$ instead.
MvTransMv
for MVs