Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable level zero v2 in DPC++ #16656

Open
wants to merge 12 commits into
base: sycl
Choose a base branch
from
1 change: 1 addition & 0 deletions buildbot/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ def do_configure(args):

if sys.platform != "darwin":
sycl_enabled_backends.append("level_zero")
sycl_enabled_backends.append("level_zero_v2")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments

  1. Can you explain what L0 V2 is and how it releases to the current L0/L0 adapter

  2. I agree with James, we shouldn't build two plugins by default. If we have some configure switch to build two, that's fine.

Copy link
Contributor

@pbalcer pbalcer Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. README. In short, it's a new L0 adapter with re-implemented performance-critical APIs. See here for performance validation results.
  2. Why though? As I noted, with the current implementation, users won't be affected by the inclusion of the two adapters. And not including it by default would mean that it's going to be much more difficult for application and framework developers to experiment with the new adapter, which is the whole point of including it in SYCL right now in the first place. In practice, it will likely mean we might need to provide the users that want to use it an alternative prebuilt compiler package.

@igchor ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why "v2" instead of incremental refactoring?

Copy link
Contributor

@sarnex sarnex Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Got it, thanks

  2. At this point it seems like the v2 adapter is only intended to be used by DPCPP/UR developers, and IMO features like that should not be built by default. If the feature is ready enough for downstream clients, should we switch over by default?

Copy link
Contributor

@pbalcer pbalcer Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why "v2" instead of incremental refactoring?

The existing L0 adapter code base has accumulated a lot of complexity over the years as L0 and the driver evolved. It has become very difficult to change. For example, there are 4 major queue modes that the adapter supports - "batched in order", "batched out of order", "immediate in order", "immediate out of order". In the existing adapter, they are all implemented together, with various conditions sprinkled all throughout to make it work. In practice, batched mode is all but legacy for 99% of scenarios, and "in order" and "out of order" L0 paths are so different (different types of L0 events, different way of synchronization), that there's not much to gain from sharing all the code between them.
Refactoring this incrementally, without breaking existing code, would be challenging. So the team has decided that a partial re-implementation (we still reuse all that's practical) of the performance critical paths is the best path forward. This also means that we have the opportunity to easily leverage new driver features and do a clean break away from the accumulated legacy tech debt.

At this point it seems like the v2 adapter is only intended to be used by DPCPP/UR developers, and IMO features like that should not be built by default. If the feature is ready enough for downstream clients, should we switch over by default?

In terms of quality, the v2 adapter has higher UR CTS passrate than the current one, and passes the vast majority of SYCL e2e tests (big exception being tests that hard-code exact sequence of operations expected from the adapter). However, this is still work-in-progress.


# lld is needed on Windows or for the HIP adapter on AMD
if platform.system() == "Windows" or (args.hip and args.hip_platform == "AMD"):
Expand Down
2 changes: 1 addition & 1 deletion sycl/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ endif()
# If SYCL_ENABLE_BACKENDS is undefined, we default to enabling OpenCL and Level
# Zero backends.
if (NOT DEFINED SYCL_ENABLE_BACKENDS)
set(SYCL_ENABLE_BACKENDS "opencl;level_zero")
set(SYCL_ENABLE_BACKENDS "opencl;level_zero;level_zero_v2")
endif()

# Option to enable JIT, this in turn makes kernel fusion and spec constant
Expand Down
9 changes: 8 additions & 1 deletion sycl/cmake/modules/FetchUnifiedRuntime.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@ set(UR_ENABLE_TRACING ON)
if("level_zero" IN_LIST SYCL_ENABLE_BACKENDS)
set(UR_BUILD_ADAPTER_L0 ON)
endif()
if("level_zero_v2" IN_LIST SYCL_ENABLE_BACKENDS)
set(UR_BUILD_ADAPTER_L0_V2 ON)
endif()
if("cuda" IN_LIST SYCL_ENABLE_BACKENDS)
set(UR_BUILD_ADAPTER_CUDA ON)
endif()
Expand Down Expand Up @@ -116,7 +119,7 @@ if(SYCL_UR_USE_FETCH_CONTENT)
CACHE PATH "Path to external '${name}' adapter source dir" FORCE)
endfunction()

set(UNIFIED_RUNTIME_REPO "https://github.com/oneapi-src/unified-runtime.git")
set(UNIFIED_RUNTIME_REPO "https://github.com/omarahmed1111/unified-runtime.git")
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/modules/UnifiedRuntimeTag.cmake)

set(UMF_BUILD_EXAMPLES OFF CACHE INTERNAL "EXAMPLES")
Expand Down Expand Up @@ -266,6 +269,10 @@ if("level_zero" IN_LIST SYCL_ENABLE_BACKENDS)
# added to the new build system
endif()

if("level_zero_v2" IN_LIST SYCL_ENABLE_BACKENDS)
add_sycl_ur_adapter(level_zero_v2)
endif()

if("cuda" IN_LIST SYCL_ENABLE_BACKENDS)
add_sycl_ur_adapter(cuda)
endif()
Expand Down
12 changes: 6 additions & 6 deletions sycl/cmake/modules/UnifiedRuntimeTag.cmake
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# commit 023a84744b7db460d42225c8b5ea4d6def085c81
# Merge: 9b57d7b7 0607d2c8
# commit c5bf8fd5c92ab7a24b439cd89cefc9cd425ec787
# Merge: e6d4355f 778085f7
# Author: Kenneth Benzie (Benie) <[email protected]>
# Date: Thu Jan 16 10:17:25 2025 +0000
# Merge pull request #2546 from lukaszstolarczuk/bump-umf-to-latest-main
# [main] Bump UMF to v0.10.1
set(UNIFIED_RUNTIME_TAG 023a84744b7db460d42225c8b5ea4d6def085c81)
# Date: Mon Jan 13 10:41:50 2025 +0000
# Merge pull request #2479 from aarongreig/aaron/parameterizeDeviceTests
# Parameterize CTS tests across all available adapters and devices.
set(UNIFIED_RUNTIME_TAG bbf7e01bd74ac51b518fc044d0ca71a920443e5b)
Loading