Simd vector similarity #575

hermeGarcia · 2023-02-08T21:16:37Z

Description

Simd cosine similarity + inlining of function

How was this PR tested?

Local tests + performance analysis with 1m_stats.

codecov · 2023-02-08T21:29:45Z

Codecov Report

Base: 69.46% // Head: 69.46% // Increases project coverage by +0.00% 🎉

Coverage data is based on head (2f6d4c7) compared to base (5a31924).
Patch has no changes to coverable lines.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #575   +/-   ##
=======================================
  Coverage   69.46%   69.46%           
=======================================
  Files         225      225           
  Lines       15745    15751    +6     
=======================================
+ Hits        10937    10942    +5     
- Misses       4808     4809    +1

Flag	Coverage Δ
node-sidecar	`75.32% <ø> (+0.07%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
nucliadb_node/nucliadb_node/servicer.py	`74.07% <0.00%> (-1.40%)`	⬇️
nucliadb_node/nucliadb_node/reader.py	`94.87% <0.00%> (+0.75%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

nucliadb_vectors/src/data_types/vector.rs

fulmicoton · 2023-02-09T09:19:17Z

nucliadb_vectors/src/data_types/vector.rs

+ let partial_dot_product = partial_simd_x * partial_simd_y;
+ let partial_power_x = partial_simd_x * partial_simd_x;
+ let partial_power_y = partial_simd_y * partial_simd_y;
+ dem_x += partial_power_x.reduce_add();


you don't want to reduce here. I think it can be done out of the loop.

Yes I had a version with that out of the loop, but the performance stayed the same, I will do that version again and update the code!

fulmicoton · 2023-02-09T09:20:00Z

nucliadb_vectors/src/data_types/vector.rs

+ let mut free_in_buffer = 0;
+ while i < len {
+ i += 1;
+ x.read_exact(&mut buff_x).unwrap();


I'd avoid using read_exact here... The compiler may or may not be smart enough to convert that into a simd unaligned load.

Im not sure I follow you on this.. Here im only reading the f32 value, that I have as bytes on x, then I add the f32 to the simd buffer.

/// `left` & `right` do not need to be aligned. /// This requires nightly due to portable SIMD, /// and a cpu with AVX2 or equivalent to run. /// f32 are assumed to be using native endianess /// (most likely little endian in your case.) /// /// AVX 512 should be 2x faster. pub fn dot_product_simd(left: &[u8], right: &[u8]) -> f32 { const DIM: usize = 768; const NUM_LANES: usize = 8; assert_eq!(left.len(), DIM * 4); assert_eq!(right.len(), DIM * 4); let mut left_ptr = left.as_ptr() as *const f32x8; let mut right_ptr = right.as_ptr() as *const f32x8; let mut acc: f32x8 = Default::default(); unsafe { //< ok thanks to the assert above. for i in 0..DIM / NUM_LANES { let mut left: f32x8 = std::ptr::read_unaligned(left_ptr); let right: f32x8 = std::ptr::read_unaligned(right_ptr); acc.add_assign(left.mul(right)); left_ptr = left_ptr.offset(1 as isize); right_ptr = right_ptr.offset(1 as isize); } } acc.as_array().iter().copied().sum() }

(You need to add the sqrt root stuff or normalized your vector though)

I see, thank you so much! I am trying to adapt this to nucliadb_vectors where apart of adding the sqrt and all that the dimension of the vectors is user defined..

scalar_dot time: [564.91 ns 574.43 ns 585.44 ns] change: [-19.913% -16.021% -12.075%] (p = 0.00 < 0.05) Performance has improved. simd_dot time: [74.888 ns 76.182 ns 77.759 ns] change: [+1.3111% +3.2754% +5.5679%] (p = 0.00 < 0.05) Performance has regressed. scalar_cosine time: [601.11 ns 602.38 ns 604.07 ns] change: [+0.3208% +0.7946% +1.2116%] (p = 0.00 < 0.05) Change within noise threshold. Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe simd_cosine time: [87.091 ns 87.313 ns 87.598 ns] change: [-34.935% -34.503% -34.119%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 2 (2.00%) high mild 4 (4.00%) high severe

Thank you so much for the help! I am trying to figure out how would this behave in the case where a lane can not be filled, for examen if the dimension of the vectors is 5.

I added the following to the code: 398587e
Now I am seeing a better perfomance!

can you share the bench? (and make sure you compile with target-cpu=native...)

I use this binary for testing the performance.

fulmicoton · 2023-02-09T09:21:45Z

nucliadb_vectors/src/data_types/vector.rs

+ let len_y = Len::from_le_bytes(buff_y);
+ assert_eq!(len_x, len_y);
+ let len = len_x;
+ let mut buff_x = [0; 4];


is it 4 lane or 8 lane?

In that case is 4 bytes, is the buffer used to read the f32 values

javitonino · 2024-05-24T13:35:57Z

Superseded by #2179

hermeGarcia added 2 commits February 8, 2023 18:54

simd similarity

86e1db1

inlining cosine calculation

4633479

light ref

4a40b48

fulmicoton reviewed Feb 9, 2023

View reviewed changes

nucliadb_vectors/src/data_types/vector.rs Outdated Show resolved Hide resolved

fulmicoton reviewed Feb 9, 2023

View reviewed changes

hermeGarcia added 3 commits February 9, 2023 10:31

fix spelling mistake

cd2f1a4

improved simd cosine similarity

398587e

fix on reading len

2f6d4c7

alekece marked this pull request as draft February 15, 2023 08:56

vangheem force-pushed the main branch from d604e37 to 395577d Compare July 18, 2023 00:42

javitonino closed this May 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simd vector similarity #575

Simd vector similarity #575

hermeGarcia commented Feb 8, 2023

codecov bot commented Feb 8, 2023 •

edited

fulmicoton Feb 9, 2023

hermeGarcia Feb 9, 2023

fulmicoton Feb 9, 2023

hermeGarcia Feb 9, 2023

fulmicoton Feb 9, 2023 •

edited

fulmicoton Feb 9, 2023

hermeGarcia Feb 9, 2023

fulmicoton Feb 9, 2023

hermeGarcia Feb 9, 2023

hermeGarcia Feb 9, 2023

fulmicoton Feb 9, 2023

hermeGarcia Feb 10, 2023

fulmicoton Feb 9, 2023

hermeGarcia Feb 9, 2023

javitonino commented May 24, 2024

Simd vector similarity #575

Simd vector similarity #575

Conversation

hermeGarcia commented Feb 8, 2023

Description

How was this PR tested?

codecov bot commented Feb 8, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fulmicoton Feb 9, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

javitonino commented May 24, 2024

codecov bot commented Feb 8, 2023 •

edited

fulmicoton Feb 9, 2023 •

edited