Skip to content

Commit

Permalink
+add AVX2 optimizations of class ResizerBf16Bilinear (part 4).
Browse files Browse the repository at this point in the history
  • Loading branch information
ermig1979 committed Jan 6, 2025
1 parent fa8c22f commit 895d5bd
Showing 1 changed file with 14 additions and 1 deletion.
15 changes: 14 additions & 1 deletion src/Simd/SimdAvx2ResizerBilinear.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -997,7 +997,20 @@ namespace Simd
float* pb = pbx[k];
const uint16_t* ps = src + (sy + k) * srcStride;
size_t dx = 0;
if (cn >= 4)
if (cn >= 8)
{
for (; dx < rs;)
{
const uint16_t* ps0 = ps + _ix[dx];
__m256 fx1 = _mm256_set1_ps(_ax[dx]);
__m256 fx0 = _mm256_sub_ps(_1, fx1);
for (size_t end = dx + cnF; dx < end; dx +=F, ps0 += F)
_mm256_storeu_ps(pb + dx, BilinearRowSumBf16(ps0, cn, fx0, fx1));
if (cnTF)
_mm256_storeu_ps(pb + dx + cnLF, BilinearRowSumBf16(ps0 + cnLF, cn, fx0, fx1)), dx += cnTF;
}
}
else if (cn >= 4)
{
for (; dx < rs;)
{
Expand Down

0 comments on commit 895d5bd

Please sign in to comment.