-
Notifications
You must be signed in to change notification settings - Fork 257
Description
Environment
- GPU: NVIDIA H100
- CUDA: 12.8 (via PyTorch/Lightning AI Cloudspace)
- PyTorch: bundled via Lightning AI Cloudspace
- Build command:
makefromkernels/attention/mha_h100
Bug Description
Building kernels/attention/mha_h100 fails with two errors in
ThunderKittens/include/common/util.cuh:
/teamspace/studios/this_studio/ThunderKittens/include/common/util.cuh(503):
error: namespace "std" has no member "copy_n"
std::copy_n(other.attributes, num_attributes, attributes);
/teamspace/studios/this_studio/ThunderKittens/include/common/util.cuh(509):
error: namespace "std" has no member "copy_n"
std::copy_n(other.attributes, num_attributes, attributes);
2 errors detected in the compilation of "mha_h100.cu".
make: *** [../../common.mk:121: _C.cpython-312-x86_64-linux-gnu.so] Error 2
Root Cause
std::copy_n is defined in <algorithm>, but that header is not included
in util.cuh. In some environments (e.g. PyTorch + CUDA toolkit combos),
<algorithm> is not pulled in transitively, causing the build to fail.
Note: A similar class of issue was reported in #40 and #45
(std::bit_cast missing in base_types.cuh), but this is a separate
occurrence in util.cuh.
Fix
Adding #include <algorithm> at the top of
ThunderKittens/include/common/util.cuh resolves the error:
+ #include <algorithm>Result After Fix
Build succeeds with warnings only (harmless #191-D type qualifier warnings
and #177-D unused variable warning). The kernel runs successfully:
Suggested Fix (one-liner)
sed -i '1s/^/#include <algorithm>\n/' \
ThunderKittens/include/common/util.cuhOr manually add #include <algorithm> at the top of util.cuh.