-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to avoid UB in thrust::reference
#2813
base: main
Are you sure you want to change the base?
Conversation
@Artem-B can you check whether that fixes some of the UB for you? |
We were dereferencing a nullptr because we wanted to avoid default constructing a system. This is obvious UB and prevents us from running the tests with sanitizers, so just default construct the systems. Addresses NVIDIA#1645
ffe0ddb
to
9763535
Compare
Is that something on your side?
|
This could be ptxas bug (see NVIDIA's bug 4924626). It will be supposedly fixed in cuda-12.8, but I do not have the details what exactly triggers it -- there was nothing obvious in the minimized reproducer, but it did seem to be related to |
The patch unbreaks ~500 out of 680 tests that were failing in ubsan-enabled builds. I do not run into the ptxas crash, but I'm using a very recent clang build, so it likely generates different PTX than your builds. Remaining failing tests report ubsan failures in these places:
A handful of tests failed to compile PTX with missing symbol:
Reproducible on This is the same issue miscco mentioned above. I'll take a closer look tomorrow. |
// for dispatch. This assumes that `get_value` will not access any system | ||
// state. | ||
typename thrust::iterator_system<pointer>::type* system = nullptr; | ||
typename thrust::iterator_system<pointer>::type system; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just pass the type explicitly instead of materializing a value to pass? That way we can avoid both needing to default construct or using a nullptr.
typename thrust::iterator_system<pointer>::type system; | |
typename thrust::iterator_system<pointer>::type system; | |
return convert_to_value_type<decltype(system)>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly would also prefer if we did not need to construct systems. From the recent CI failures and the generally open design (users can bring their own systems) we have to assume that users could rely on Thrust not trying to default construct a system. Given the current code "worked" and no data of the system was actually accessed, we can assume that only the type information should be needed. I assume/hope there is a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered how thrust::device_vector
produces a system when it inits the elements during construction: It tries to get the system from the allocator. But in case the allocator has no system, it default constructs a system (thrust knows the type of the system for an allocator):
cccl/thrust/thrust/detail/allocator/allocator_traits.inl
Lines 270 to 284 in 87f7246
template <typename Alloc> | |
_CCCL_HOST_DEVICE ::cuda::std::enable_if_t<has_member_system<Alloc>::value, typename allocator_system<Alloc>::type&> | |
system(Alloc& a) | |
{ | |
// return the allocator's system | |
return a.system(); | |
} | |
template <typename Alloc> | |
_CCCL_HOST_DEVICE ::cuda::std::enable_if_t<!has_member_system<Alloc>::value, typename allocator_system<Alloc>::type> | |
system(Alloc&) | |
{ | |
// return a copy of a value-initialized system | |
return typename allocator_system<Alloc>::type(); | |
} |
The default allocator for a thrust::device_vector
does not seem to contain a system. So it seems we have precedence for default constructing systems when needed.
🟨 CI finished in 6h 03m: Pass: 97%/222 | Total: 7d 12h | Avg: 48m 40s | Max: 6h 01m | Hits: 44%/13508
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
+/- | Thrust |
CUDA Experimental | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
CUDA Experimental | |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 222)
# | Runner |
---|---|
184 | linux-amd64-cpu16 |
16 | linux-arm64-cpu16 |
13 | linux-amd64-gpu-v100-latest-1 |
9 | windows-amd64-cpu16 |
It appears that allocator.h calls cccl/thrust/thrust/mr/allocator.h Line 205 in f4d358a
But the function itself is defined as a host only function: cccl/thrust/thrust/mr/memory_resource.h Line 207 in f4d358a
That does not work for clang. Clang was supposed to diagnose such use of a host function from a host_device function, but it didn't happen (probably because constructors are special, or we've just missed this corner case).
|
This is caused by a local static initializer here: cccl/thrust/thrust/mr/memory_resource.h Line 209 in 8994dc4
Looks like it's an odd corner case, which may be something for clang to fix. However, when clang sees an existing, but empty constructor, it allows it, but then that constructor is handled by the usual initializer machinery, and, if the call is not eliminated, ends up generating the guard calls. If we're using the default constructor, everything is fine. A work-around is to pass Passing that option to clang also appears to fix the other error about |
@miscco Any thoughts/suggestions on the UB in the thrust_iterator.h ?
As for the |
I believe we should be able to just suppress the warning prior to Clang20 and be done with it. |
Do you have a call stack for that issue? |
Here's one from catch2_test_warp_scan_api:
Afaict, we end up dereferencing cccl/thrust/thrust/detail/vector_base.inl Line 924 in dc920c9
|
That looks like a bug to me, because we never check whether there an allocation there |
Tracked by #2964
Can you elaborate on that? Unresolved reference to |
Oh yeah I mean fixing |
There is indeed a bug in clang which resulted in the call to a wrong-side function not to be diagnosed: llvm/llvm-project#118415 |
The function is called from a host/device constructors and may be invoked from the device code. As it happens neither NVCC nor clang diagnose such an invalid call. llvm/llvm-project#118415 As the result, in unoptimized builds we end up with ptxas failing with an unresolved reference to this function because it is never generated during GPU-side compilation. NVIDIA#2813 (comment)
The function is called from a host/device constructors and may be invoked from the device code. As it happens neither NVCC nor clang diagnose such an invalid call. llvm/llvm-project#118415 As the result, in unoptimized builds we end up with ptxas failing with an unresolved reference to this function because it is never generated during GPU-side compilation. NVIDIA#2813 (comment)
The function is called from a host/device constructors and may be invoked from the device code. As it happens neither NVCC nor clang diagnose such an invalid call. llvm/llvm-project#118415 As the result, in unoptimized builds we end up with ptxas failing with an unresolved reference to this function because it is never generated during GPU-side compilation. NVIDIA#2813 (comment)
The function is called from a host/device constructors and may be invoked from the device code. As it happens neither NVCC nor clang diagnose such an invalid call. llvm/llvm-project#118415 As the result, in unoptimized builds we end up with ptxas failing with an unresolved reference to this function because it is never generated during GPU-side compilation. NVIDIA#2813 (comment)
We were dereferencing a nullptr because we wanted to avoid default constructing a system.
This is obvious UB and prevents us from running the tests with sanitizers, so just default construct the systems.
Addresses #1645