-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feedback on group load/store #16800
Comments
Implementation doesn't apply any optimizations without |
Thanks. I added the "props" to group load and store
./main 8192 8192 100 Is this reproducible ? |
I haven't tried, but the link above doesn't have any changes. Also, why has your baseline improved in the last comment when you supposedly changed the "blocked" code path? |
I made the changes locally and didn't push them to the repository. I didn't make any changes to the baseline. Sorry about the confusion. |
Hello
Please see the benchmark https://github.com/zjin-lcf/HeCBench/tree/master/src/blockAccess-sycl
"main.cpp" evaluates the group load/store with the migration of CUB's block load/store.
"main2.cpp" evaluate the SYCL group load/store.
The time of running the two blockAccess kernels on an Intel Max1100 device is shown below:
icpx -std=c++17 -Wall -fsycl --gcc-toolchain= -O3 -DUSE_GPU -c main.cpp -o main.o
icpx -std=c++17 -Wall -fsycl --gcc-toolchain= -O3 -DUSE_GPU main.o -o main
./main 8192 8192 100
Average execution time of the reference kernel: 1317.816895 (us)
Average execution time of the blockAccess kernel: 606.993347 (us)
icpx -std=c++17 -Wall -fsycl --gcc-toolchain= -O3 -DUSE_GPU -c main2.cpp -o main2.o
icpx -std=c++17 -Wall -fsycl --gcc-toolchain= -O3 -DUSE_GPU main2.o -o main
./main 8192 8192 100
Average execution time of the reference kernel: 1422.783325 (us)
Average execution time of the blockAccess kernel: 2391.377686 (us)
I am not sure if the kernel in main.cpp is faster than that in main2.cpp because of the shared local memory.
Another question: does the SYCL group load/store function need an argument for "block_items_end – [in] Number of valid items to load" ?
Thanks
The text was updated successfully, but these errors were encountered: