Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
-Implemented radix codelets up to 47. -Implemented composite radix codelets for arbitrary composite stage sizes. -Implemented new register assignment logic, aimed at optimizing shared memory transfers, register usage and warp utilization. -Performance improvements for all system sizes - please report regressions if they happen (especially for vendors other than Nvidia and AMD). -All double pointers passed to VkFFT now make local copy of their contents (#184, #185) -Fixed locale setting for code generator (vincefn/pyvkfft#38)
- Loading branch information