Vectorize base64 decoding #27

k0ekk0ek · 2023-02-20T11:40:41Z

Base64 encoding is used quite a lot. i.e. in RRSIG and DNSKEY records. Wojciech Muła and Daniel Lemire wrote the paper "Faster Base64 Encoding and Decoding Using AVX2 Instructions" in which they outline use of SIMD instructions to improve performance. The current implementation is very basic and improving it is extremely likely to result in a significant performance boost. More research is available too and even improving the current scalar implementation is a big gain.

The flamegraph for parsing an .se zone looks nice and balanced, except for base64 data as the parser spends about 30% of the time in parse_base64.

The text was updated successfully, but these errors were encountered:

lemire · 2023-06-20T14:38:40Z

@k0ekk0ek Are spaces allowed within the codes? That's a big limiting factor pre-AVX-512.

k0ekk0ek · 2023-06-20T14:51:28Z

Sort-of, but I don't think they're an issue(?) Spaces (among others) are delimiters within zone files so we get a separate token for each block of characters (we require this data to be specified as contiguous, so no spaces inside the token). However, we will need to track state because there may be multiple tokens (at least for RRSIG (presentation format)).

An example of how this occurs in zone data:

host.example.com. 86400 IN RRSIG A 5 3 86400 20030322173103 (
                               20030220173103 2642 example.com.
                               oJB1W6WNGv+ldvQ3WDG0MQkg5IEhjRip8WTr
                               PYGv07h108dUKGMeDPKijVCHX3DDKdfb+v6o
                               B9wfuh3DTJXUAfI/M0zmO/zz8bW0Rznl8O3t
                               GNazPwQKkRN20XPXV6nwwfoXmJQbsLNrLfkG
                               J5D6fwFm8nN+6pBzeDQfsS3Ap3o= )

Looking at .se they cut the signature up into blocks of 56 characters (without ( ... ), so one line), while .com seems to keep it as one set of contiguous characters.

lemire · 2023-11-11T09:12:56Z

This is an important problem. Node.js has a related issue.

See
nodejs/performance#128

I want to work on solving this issue as soon as I can.

k0ekk0ek · 2023-11-12T13:20:33Z

Sounds good @lemire! The client9 decoders are probably good to have as fallback(?) At least the base64 one got adopted and modified by @aklomp. The same can be done for base16 (hex). At least, I am planning to do so. The stream decoding interface fits simdzone nicely. Something I noticed in the modified version is that whole blocks need to be written. So the base64 AVX2 version requires at least 45 bytes of input (8 bytes for the fallback) to ensure proper output. Since simdzone guarantees enough padding in input and output buffers, we can perhaps let that requirement go. That may make things speedier/easier(?) Of course, this may or may not apply to Node.js. I'm currently focused on getting the SVCB RRTYPE done, after that testing and api cleanup to get to a first release, but if you need me to look at something, I'd be happy to help!

lemire · 2023-11-12T14:16:59Z

The main challenge I am concerned with are the spaces in the input… The overflows can be handled in some way but I agree that simdzone can benefit of faster routines than other systems.

lemire · 2023-12-13T22:04:35Z

Of course, this may or may not apply to Node.js

I am not sure how much of a difference this makes. It is an empirical issue and it is easy to implement both padded and strict versions.

k0ekk0ek · 2023-12-14T17:32:59Z

I/we'd have to test. Adding the faster base64 decoder made a big difference for the .se zone. Going by that example, any TLD zone that has widespread DNSSEC usage (such as .nl) will benefit. A really simple test would be to replace consecutive dec_loop_generic_32_inner calls in dec_loop_generic_32 by one (or more) SIMD function replacements. Also note that we don't have to account for spaces in the input. That part is handled automatically because they're separate tokens (base64 is to be written as <contiguous> not <quoted>, except for svcb parameters, but there we simply do not allow spaces), so we can quite easily select the right decoder based on token length.

k0ekk0ek · 2023-12-14T17:35:28Z

I'd have to study the library a bit better, but the loop currently selects for rounds 8, 4, 2 and 1. As 8x4 = 32 and 4x4 = 16, we could simply add AVX2 and SSE versions there.

lemire · 2023-12-14T18:27:56Z

I'll be working on this (starting 'now'). My current 'vision' is to make it configurable so one can explore the usage space.

k0ekk0ek · 2023-12-14T21:35:45Z

Sounds good 👍

lemire · 2024-04-10T18:02:22Z

So I think I solved the problem, although maybe not for simdzone directly.

The way the problem appears in simdzone is somewhat non-standard and I wanted to work on a more generic problem. So I have worked on the WHATWG forgiving-base64 problem: you must decode base64 in a string but there might be arbitrary ASCII spaces all over the string. It is part of the latest release of simdutf.

We have currently integrating it in Node.js with decent results: nodejs/node#52428 It passes all the tests in simdutf and in Node.js, so it is pretty solid and it is fast.

Whether it applies to simdzone as is is trickier because simdzone has the pattern where the location of the white spaces are known before you start parsing.

So maybe it is worth experimenting.

k0ekk0ek · 2024-04-11T08:14:11Z

Thanks for the update @lemire. Looked at the numbers, that's another impressive boost!

(sorry for the overlong/overcomplete reply)

Indeed, simdzone tokenizes based on whitespace (amongst other delimiters), so situation is a little different. The work may still be very useful. There are quite some RR TYPEs where the RDATA section contains base64 data. e.g. OPENPGPKEY, DNSKEY, RRSIG, etc. While it may be possible to pop all tokens of the tape and use the work to decode those sequences, simdzone does not read the entire file at once, but in blocks. Popping a token may lead to the parser flushing the buffer and proceding to read the next section. The scanner merely guarantees the complete token is available until the next one is read. Of course, it's entirely possible to alter tape operation for these types of sequences. Whether that's worth the effort is something to investigate. Another complexity is the possible uses of parentheses to wrap newlines. For example, this is a valid RRSIG RR (RFC4034 section 3.2):

host.example.com. 86400 IN RRSIG A 5 3 86400 20030322173103 (
                                  20030220173103 2642 example.com.
                                  oJB1W6WNGv+ldvQ3WDG0MQkg5IEhjRip8WTr
                                  PYGv07h108dUKGMeDPKijVCHX3DDKdfb+v6o
                                  B9wfuh3DTJXUAfI/M0zmO/zz8bW0Rznl8O3t
                                  GNazPwQKkRN20XPXV6nwwfoXmJQbsLNrLfkG
                                  J5D6fwFm8nN+6pBzeDQfsS3Ap3o= )

However, the RDATA section may use parentheses, without nesting, anywhere. i.e. this is valid too:

host.example.com. 86400 IN RRSIG A 5 3 86400 20030322173103 (
                                  20030220173103 2642 example.com.
                                  oJB1W6WNGv+ldvQ3WDG0MQkg5IEhjRip8WTr
                                  ) ( PYGv07h108dUKGMeDPKijVCHX3DDKdfb+v6o
                                  ) ( B9wfuh3DTJXUAfI/M0zmO/zz8bW0Rznl8O3t
                                  ) ( GNazPwQKkRN20XPXV6nwwfoXmJQbsLNrLfkG
                                  ) ( J5D6fwFm8nN+6pBzeDQfsS3Ap3o= )

Of course, no one ever writes/generates the latter, but it is something to consider.

k0ekk0ek added enhancement New feature or request good first issue Good for newcomers labels Feb 20, 2023

k0ekk0ek added this to the Release 0.2.0 milestone Feb 20, 2023

k0ekk0ek added the help wanted Extra attention is needed label Feb 20, 2023

k0ekk0ek mentioned this issue Jun 20, 2023

Optimize base16, base32 and base64 decoders #7

Closed

k0ekk0ek mentioned this issue Nov 10, 2023

Use base64 library by @aklomp #106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize base64 decoding #27

Vectorize base64 decoding #27

k0ekk0ek commented Feb 20, 2023 •

edited

lemire commented Jun 20, 2023

k0ekk0ek commented Jun 20, 2023

lemire commented Nov 11, 2023

k0ekk0ek commented Nov 12, 2023

lemire commented Nov 12, 2023

lemire commented Dec 13, 2023

k0ekk0ek commented Dec 14, 2023

k0ekk0ek commented Dec 14, 2023

lemire commented Dec 14, 2023

k0ekk0ek commented Dec 14, 2023

lemire commented Apr 10, 2024

k0ekk0ek commented Apr 11, 2024 •

edited

Vectorize base64 decoding #27

Vectorize base64 decoding #27

Comments

k0ekk0ek commented Feb 20, 2023 • edited

lemire commented Jun 20, 2023

k0ekk0ek commented Jun 20, 2023

lemire commented Nov 11, 2023

k0ekk0ek commented Nov 12, 2023

lemire commented Nov 12, 2023

lemire commented Dec 13, 2023

k0ekk0ek commented Dec 14, 2023

k0ekk0ek commented Dec 14, 2023

lemire commented Dec 14, 2023

k0ekk0ek commented Dec 14, 2023

lemire commented Apr 10, 2024

k0ekk0ek commented Apr 11, 2024 • edited

k0ekk0ek commented Feb 20, 2023 •

edited

k0ekk0ek commented Apr 11, 2024 •

edited