The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Removed explicit GPU synchronisation barriers (using
KA.synchronize
) by default. This can now be re-enabled by passingsynchronise = true
as a plan argument. Enabling synchronisation is useful for getting accurate timings (inp.timer
) but may result in decreased performance.
-
Faster spatial sorting of non-uniform points (CPU and GPU).
-
Tune GPU parameters: kernel workgroupsize; block size for spatial sorting.
-
Plans:
block_size
argument can now be a tuple (block size along each separate dimension).
- Avoid recompilation of GPU kernels when number of non-uniform points changes.
- Fix transforms of real non-uniform data on CUDA.jl.
- Add preliminary GPU support.
- AbstractNFFTs interface: fix 1D transforms.
- Implement AbstractNFFTs interface for easier comparison with other NUFFT packages.