Skip to content

[OpenCL] Fix Half-Precision Kernels #3119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

djeong20
Copy link
Contributor

@djeong20 djeong20 commented Apr 16, 2025

This patch fixes half-precision OpenCL kernels to accumulate values in single-precision format. It also corrects the output for the BLAS kernel unit test.

Self-evaluation:

  1. Build test: [X]Passed [ ]Failed [ ]Skipped
  2. Run test: [X]Passed [ ]Failed [ ]Skipped

@djeong20 djeong20 changed the title [ [Wait for #3106][OpenCL] Fix Half-Precision Kernels Apr 16, 2025
@djeong20 djeong20 marked this pull request as ready for review April 16, 2025 01:28
@@ -26,6 +26,7 @@ namespace nntrainer {
// get global cl_context to use in kernels
static ClContext *attention_cc =
static_cast<ClContext *>(Engine::Global().getRegisteredContext("gpu"));
static ClBufferManager &clbuffInstance = ClBufferManager::getInstance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why aren't clbuffInstance.getOutBufferA() and .getOutBufferB() managed as separate variables even though they are used multiple times?

@djeong20 djeong20 force-pushed the bugfix/engine/opencl_context_v8 branch from 65628fd to 1a28224 Compare April 16, 2025 08:33
@djeong20 djeong20 changed the title [Wait for #3106][OpenCL] Fix Half-Precision Kernels [OpenCL] Fix Half-Precision Kernels Apr 16, 2025
This patch fixes half-precision OpenCL kernels to accumulate values in single-precision format. It also corrects the output for the BLAS kernel unit test.

**Self-evaluation:**
1. Build test: [X]Passed [ ]Failed [ ]Skipped
2. Run test:   [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghyeon Jeong <[email protected]>
@djeong20 djeong20 force-pushed the bugfix/engine/opencl_context_v8 branch from 1a28224 to d3a38c0 Compare April 23, 2025 00:53
Copy link
Contributor

@songgot songgot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants