libpopcnt-3.0
libpopcnt-3.0 is a major new release with many improvements, but it is still backwards compatible with libpopcnt-2.*!
The two main new features of libpopcnt-3.0 are: the new ARM SVE popcount algorithm that is up to 3x faster than the ARM NEON popcount algorithm and the new AVX512 VPOPCNT algorithm that is up to 35% faster than the old AVX512 Harley-Seal popcount algorithm. Unlike the old AVX512 algorithm, the new AVX512 VPOPCNT algorithm is also fast for short arrays ≥ 48 bytes.
- Add ARM SVE algorithm.
- Replace AVX512BW algorithm by faster AVX512 VPOPCNTDQ algorithm.
- Add MSVC support for ARM NEON.
- Improve preprocessor checks using
__has_include()
macro. - Port tests from AppVeyor to GitHub actions.
- Get rid of unaligned
uint64_t
memory acceses, this fixes test failures when using GCC compiler sanitizers. - Prefix all libpopcnt macros using
LIBPOPCNT_
to avoid any naming collisions.