Skip to content

libpopcnt-3.0

Compare
Choose a tag to compare
@kimwalisch kimwalisch released this 27 Jun 18:39
· 3 commits to master since this release

libpopcnt-3.0 is a major new release with many improvements, but it is still backwards compatible with libpopcnt-2.*!

The two main new features of libpopcnt-3.0 are: the new ARM SVE popcount algorithm that is up to 3x faster than the ARM NEON popcount algorithm and the new AVX512 VPOPCNT algorithm that is up to 35% faster than the old AVX512 Harley-Seal popcount algorithm. Unlike the old AVX512 algorithm, the new AVX512 VPOPCNT algorithm is also fast for short arrays ≥ 48 bytes.

  • Add ARM SVE algorithm.
  • Replace AVX512BW algorithm by faster AVX512 VPOPCNTDQ algorithm.
  • Add MSVC support for ARM NEON.
  • Improve preprocessor checks using __has_include() macro.
  • Port tests from AppVeyor to GitHub actions.
  • Get rid of unaligned uint64_t memory acceses, this fixes test failures when using GCC compiler sanitizers.
  • Prefix all libpopcnt macros using LIBPOPCNT_ to avoid any naming collisions.