You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-This update removes the limit of ~2^12 for R2C and R2R systems - they can all now be done in up to three uploads with coverage ~2^32 for all dimensions, same as C2C.
-Added versions of all R2C and R2R algorithms, implementad as load/store callbacks. This functionality will be enchanced in the future to support arbitrary user callbacks (I just need to find out how this can be done for a multiple-API user-interaction).
-Restructured internal kernel typing enumeration.
Copy file name to clipboardExpand all lines: README.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -6,10 +6,10 @@ VkFFT is an efficient GPU-accelerated multidimensional Fast Fourier Transform li
6
6
## Currently supported features:
7
7
- 1D/2D/3D/ND systems - specify VKFFT_MAX_FFT_DIMENSIONS for arbitrary number of dimensions.
8
8
- Forward and inverse directions of FFT.
9
-
- Support for big FFT dimension sizes. Current limits: C2C or even C2R/R2C - (2^32, 2^32, 2^32). Odd C2R/R2C - (2^12, 2^32, 2^32). R2R - (2^12, 2^12, 2^12). Depends on the amount of shared memory on the device. (will be increased later).
9
+
- Support for big FFT dimension sizes. Current limits: approximately 2^32 in all dimensions for all types of transforms. Depends on the amount of shared memory available on the device.
10
10
- Radix-2/3/4/5/7/8/11/13 FFT. Sequences using radix 3, 5, 7, 11 and 13 have comparable performance to that of powers of 2.
11
11
- Rader's FFT algorithm for primes from 17 up to max shared memory length (~10000). Inlined and done without additional memory transfers.
12
-
- Bluestein's FFT algorithm for all other sequences. Full coverage of C2C range, single upload (2^12, 2^12, 2^12) for R2C/C2R/R2R. Optimized to have as few memory transfers as possible by using zero padding and merged convolution support of VkFFT.
12
+
- Bluestein's FFT algorithm for all other sequences. Optimized to have as few memory transfers as possible by using zero padding and merged convolution support of VkFFT.
13
13
- Single, double, half and quad (double-double) precision support. Double and quad precision uses CPU-generated LUT tables. Half precision still does all computations in single and only uses half precision to store data.
14
14
- All transformations are performed in-place with no performance loss. Out-of-place transforms are supported by selecting different input/output buffers.
15
15
- No additional transposition uploads. Note: Data can be reshuffled after the Four Step FFT algorithm with an additional buffer (for big sequences). Doesn't matter for convolutions - they return to the input ordering (saves memory).
0 commit comments