Skip to content

kernel/riscv64:fixed the performance problem in RISCV64_ZVL256 when OPENBLAS_K is small #5291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 10, 2025

Conversation

guoyuanplct
Copy link
Contributor

I made these two code modifications to address the HBMV issue. When the computation scale is too small, the performance of RVV is very poor. Therefore, I call the unvectorized code when the scale is small.
The numbers 8 and 16 in the code are the balance points I found. Around these values, the performance of the RVV version and the unvectorized version is close.

@guoyuanplct
Copy link
Contributor Author

This PR is directly related to issue #5286

@martin-frbg
Copy link
Collaborator

martin-frbg commented Jun 5, 2025

Thanks - that's pretty much the same as what I came up with in my initial experimentation. I do wonder if the _rvv kernels (used by the ZVL128B and x280 targets) perform better - one oddity I noticed is that the _vector kernels always request maximum vector length (VSETVL_MAX) while their _rvv counterparts seem to try to match the vector length to the actual amount of data. (I'm still rather new to RISCV though, so may be misreading the code...)

@guoyuanplct
Copy link
Contributor Author

I'm also relatively new to RVV. My understanding is that both kernels will try to match the vector length to the actual amount of data. VSETVL_MAX is generally only used outside the computation loop to initialize some variables (or perform other types of operations), which need to be long enough to ensure they can be used for subsequent vector computations. Once inside the loop, both kernels will use VSETVL to match the appropriate length.

@martin-frbg martin-frbg added this to the 0.3.30 milestone Jun 5, 2025
@martin-frbg
Copy link
Collaborator

Ah, you're right of course, I missed the later vsetvl() in zdot_vector.c

@martin-frbg martin-frbg merged commit fe220a0 into OpenMathLib:develop Jun 10, 2025
83 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants