-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor: Have a common base for Kyber/Dilithium structures #4024
base: master
Are you sure you want to change the base?
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as outdated.
This comment was marked as outdated.
64f9db8
to
066fc6d
Compare
dd45e61
to
c30b9fe
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below are a few highlights and potential conversion starters of things I saw along the way. :)
// This is a mitigation for a potential side channel called "KyberSlash". | ||
// It implements the division by Q using a multiplication and a shift. Most | ||
// compilers would generate similar code for such a division by a constant. | ||
// Though, in some cases, compilers might use a variable-time int division, | ||
// resulting in a potential side channel. | ||
// | ||
// See "Hacker's Delight" (Second Edition) by Henry S. Warren, Jr. | ||
// Chapter 10-9 "Unsigned Division by Divisors >= 1" | ||
auto divide_by_q = [](uint32_t n) -> unsigned_T { | ||
static_assert(KyberConstants::Q == 3329); | ||
BOTAN_DEBUG_ASSERT(n < (1 << 23)); | ||
|
||
// These constants work for all values that appear in Kyber with the | ||
// greatest being 3328 * 2^11 + Q // 2 = 6,817,408 < 2**23 = 8,388,608. | ||
constexpr uint64_t m = 2580335; | ||
constexpr size_t p = 33; | ||
return static_cast<unsigned_T>((n * m) >> p); | ||
}; | ||
|
||
constexpr unsigned_T mask = (1 << d) - 1; | ||
return divide_by_q((static_cast<uint32_t>(x) << d) + KyberConstants::Q / 2) & mask; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've adapted the KyberSlash countermeasure to also work for the compression instances with d = 10
and d = 11
. Not sure it's necessary but doesn't hurt either. I'd definitely be happy about some other sharp pair of eyes.
// TODO: perhaps secure vector | ||
std::vector<T> m_coeffs_storage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Polynomials store their coefficients as a std::vector<>
. Some of those store secret values and should therefore use secure_vector
. Also, it might be beneficial to add first-level support for Polynomials
(and friends) as Strong types. (Most likely this would be another pull request, though.)
namespace detail { | ||
|
||
constexpr auto as_byte_source(BufferSlicer& slicer) { | ||
return [&](std::span<uint8_t> out) { slicer.copy_into(out); }; | ||
} | ||
|
||
constexpr auto as_byte_source(Botan::XOF& xof) { | ||
return [&](std::span<uint8_t> out) { xof.output(out); }; | ||
} | ||
|
||
} // namespace detail | ||
|
||
template <typename T> | ||
concept byte_source = | ||
requires(T& t) { requires std::invocable<decltype(detail::as_byte_source(t)), std::span<uint8_t>>; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this is a top-level concept we might want to establish deeper in the library. Perhaps as an (internal) replacement for the DataSource
class.
Namely: A thing that can provide a concrete number of bytes where I don't really know how many bytes will be needed. The source may be infinite or not.
Things that could fall in this abstraction may be:
- the
RandomNumberGenerator
- a
XOF
- read from some buffer (think
BufferSlicer
) - from some I/O source (e.g. the network)
In Kyber/Dithithium this is used to either sample some random polynomial from a XOF or to unmarshal it from a bitpacked encoding.
Other use cases may be:
- parsing in general (TLS, X.509 and friends)
- data loading, e.g. like:
load_be<uint64_t>(rng)
to get a random 64 bit integer. (See also Internal: load_be/le should accept a BufferSlicer #4068) - among other things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a technical perspective, one might even implement it like shown here: As a C++20 concept that tries to build a thin wrapper around "the source". That way, new data sources could just hook themselves into this "helper" and decide for themselves to be a "data source" or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool idea 👍. I'm not sure if we want to commit to a pure concept construction, though. In some cases, it may be more convenient to have a class like DataSource. We could still extend the DataSource class to fulfill the respective concept if one wants to juggle with templates.
// TODO: Remove the need for the montgomery transformation | ||
// | ||
// -> When calculating A*s below, A is not in montgomery form, but s is. The | ||
// operation uses fqmul internally, which performs a montgomery reduction. | ||
auto A = montgomery(kyber_sample_matrix(rho, false /* not transposed */, mode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was different in the reference implementation. @FAlbertDev we did look into this a while ago, I feel we should take this up again and get to the bottom of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just investigated the usage of Montgomery in Kyber. First, a short recap on Montgomery:
The main observation is that every Montgomery multiplication
In our Kyber and Dilithium code, the following operations add or remove the Montgomery factor.
- the function
montgomery
adds a factor$R$ - the function
inv_ntt
adds a factor$R$ . Note thatntt
doesn't add a Montgomery factor. - As already said, a Montgomery multiplication adds a factor
$R^{-1}$ to each poly, vector, or matrix element. This holds for all multiplication types, i.e., polynomial-polynomial, vector-vector, matrix-vector multiplication, etc.
For encryption and decryption, this setup works well since there is always one multiplication (adding factor inv_ntt
(adding factor inv_ntt
. We have a factor of montgomery
. Instead, I propose to add this factor to t
after the multiplication. The fewer elements multiplied by
auto A = kyber_sample_matrix(rho, false /* not transposed */, mode);
auto s = ntt(ps.sample_polynomial_vector_cbd_eta1());
const auto e = ntt(ps.sample_polynomial_vector_cbd_eta1());
auto t = montgomery(A * s);
t += e;
t.reduce();
// End Algorithm 12 ---------------------------
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Thanks for double-checking and the detailed write-up! It would probably be really nice to track the montgomery parameter in the type system as well, but it does feel like overkill.
Also: by applying montgomery()
to the t
vector, we can remove the overload for PolynomialMatrix
altogether. :)
using KyberPolyNTT = Botan::CRYSTALS::Polynomial<KyberPolyTraits, Botan::CRYSTALS::Domain::NTT>; | ||
using KyberPolyVecNTT = Botan::CRYSTALS::PolynomialVector<KyberPolyTraits, Botan::CRYSTALS::Domain::NTT>; | ||
using KyberPolyMat = Botan::CRYSTALS::PolynomialMatrix<KyberPolyTraits>; | ||
|
||
using KyberPoly = Botan::CRYSTALS::Polynomial<KyberPolyTraits, Botan::CRYSTALS::Domain::Normal>; | ||
using KyberPolyVec = Botan::CRYSTALS::PolynomialVector<KyberPolyTraits, Botan::CRYSTALS::Domain::Normal>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before: it would be good if these could be the basis for strong types to disambiguate them in interfaces E.g. to replace the multiple poly_pack_t0
, ..._t1
, ... methods in dilithium_algos.cpp
with an overloaded poly_pack()
that just "does the right thing" ™ based on the strong type.
enum DilithiumTau : uint32_t { _39 = 39, _49 = 49, _60 = 60 }; | ||
|
||
enum DilithiumLambda : uint32_t { _128 = 128, _192 = 192, _256 = 256 }; | ||
|
||
enum DilithiumGamma1 : uint32_t { ToThe17th = (1 << 17), ToThe19th = (1 << 19) }; | ||
|
||
enum DilithiumGamma2 : uint32_t { Qminus1DevidedBy88 = (Q - 1) / 88, Qminus1DevidedBy32 = (Q - 1) / 32 }; | ||
|
||
enum DilithiumEta : uint32_t { _2 = 2, _4 = 4 }; | ||
|
||
enum DilithiumBeta : uint32_t { _78 = 78, _196 = 196, _120 = 120 }; | ||
|
||
enum DilithiumOmega : uint32_t { _80 = 80, _55 = 55, _75 = 75 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dilithium has many integer constants that change depending on the instantiation of the algorithm. I found it worthwhile to restrict them with explicit enums. I'm okay with this concept, also given the restricted (algorithm-internal) scope. But I feel there must be a better way for this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also like this idea. While the implied switch-cases are annoying, they guarantee we do not forget a value.
dilithium_poly_pack_t0(p, stuffer); | ||
} | ||
|
||
BOTAN_ASSERT_NOMSG(stuffer.full()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those became ubiquitous and at this point I do start wondering whether we should just put that in the destructors of Stuffer
and Slicer
(perhaps explicitly disengagable). What do y'all think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that after 3.5.0 is cut, to avoid unwanted surprises.
c30b9fe
to
cf16bf3
Compare
cf16bf3
to
3fc00ec
Compare
Most notably, this is an abstract implementation to handle CRYSTALS polynomials, poly vectors and poly matrices. Also, a generic implementation for bit-packed encoding/decoding of coefficients useful for both Kyber and Dilithium. Co-Authored-By: Fabian Albert <[email protected]>
3fc00ec
to
8dcd72b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only looked into dilithium and the crystals module so far. The other sections (including Kyber) will come tomorrow.
For algorithms of this high complexity, the new structure looks very well-organized and clear :)
I've only found some minor nits so far.
(Btw sorry if I also marked lots of removed code. The VSCode extension for GitHub caused this)
size_t Dilithium_PublicKey::key_length() const { | ||
return m_public->mode().public_key_bytes(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get confused by this functions every single time...
/**
* Return an integer value best approximating the length of the
* primary security parameter. For example for RSA this will be
* the size of the modulus, for ECDSA the size of the ECC group,
* and for McEliece the size of the code will be returned.
*/
virtual size_t key_length() const = 0;
I'm not sure which value fits best here. We should, at least, provide some size in bits.
Same for Kyber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Kyber, we could return the ML-KEM-x values: i.e. {512, 768, 1024}. What do these even mean though? My best guess: k*N
as the coefficients in the private vector s
?! At least that has some foundation in the names of the specified parameter sets, but its not a bit length. We could multiply those by 12 as the bit-length of Q, but that will result in fairly arbitrary values {6144, 9216, 12288}.
For Dilithium, we could go for the bit lengths of s1
and s2
perhaps?
@randombit Do you have any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When looking at where key_length
is used at all, it seems that it should be some metric beside estimated_strength
to identify specific instances of an algorithm while being generic enough to be a number. Also, the higher the number, the higher the security. Therefore, I would guess everything with these properties should be sufficient, even {1,2,3}. Therefore, I think {512, 768, 1024} should suffice. But I'll submit to the judgment of our Botan master ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@FAlbertDev has the right idea about the intent. It doesn't really have anything to do with the length of the key in bytes in any direct way (after all a ECDSA P-256 public key is 64 bytes being two field elements, but saying the key length of ECDSA P-256 is 512 bits is not going to illuminate the situation!).
For Kyber I see quite commonly ML-KEM-{512,768,1024} so that seems the obvious choice. NIST uses the same naming.
For Dilithium I have seen less press. The NIST spec (as well as https://www.ietf.org/id/draft-ietf-lamps-dilithium-certificates-03.html) uses 44, 65, and 87. TBH I have no idea where those numbers come from, but it's as good as anything.
Co-authored-by: Fabian Albert <[email protected]>
3cbf069
to
2289246
Compare
Co-Authored-By: Fabian Albert <[email protected]>
t1 is part of the public key and thus independent of any other verification input. Precomputing it saves about 20% of verification runtime when performing more than a single verification with the same Botan::PK_Verifier. Co-Authored-By: Fabian Albert <[email protected]>
1a3ac57
to
ae4298c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final review iteration.
I really like your work. IMO it's the better reference implementation ;)
size_t Kyber_PublicKey::key_length() const { | ||
// TODO: this should report 512, 768, 1024 | ||
return m_public->mode().public_key_byte_length(); | ||
return m_public->mode().public_key_bytes(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See dilithium comment above. I'm not sure if 512, 768, 1024 are the correct values, though.
// TODO: Remove the need for the montgomery transformation | ||
// | ||
// -> When calculating A*s below, A is not in montgomery form, but s is. The | ||
// operation uses fqmul internally, which performs a montgomery reduction. | ||
auto A = montgomery(kyber_sample_matrix(rho, false /* not transposed */, mode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just investigated the usage of Montgomery in Kyber. First, a short recap on Montgomery:
The main observation is that every Montgomery multiplication
In our Kyber and Dilithium code, the following operations add or remove the Montgomery factor.
- the function
montgomery
adds a factor$R$ - the function
inv_ntt
adds a factor$R$ . Note thatntt
doesn't add a Montgomery factor. - As already said, a Montgomery multiplication adds a factor
$R^{-1}$ to each poly, vector, or matrix element. This holds for all multiplication types, i.e., polynomial-polynomial, vector-vector, matrix-vector multiplication, etc.
For encryption and decryption, this setup works well since there is always one multiplication (adding factor inv_ntt
(adding factor inv_ntt
. We have a factor of montgomery
. Instead, I propose to add this factor to t
after the multiplication. The fewer elements multiplied by
auto A = kyber_sample_matrix(rho, false /* not transposed */, mode);
auto s = ntt(ps.sample_polynomial_vector_cbd_eta1());
const auto e = ntt(ps.sample_polynomial_vector_cbd_eta1());
auto t = montgomery(A * s);
t += e;
t.reduce();
// End Algorithm 12 ---------------------------
src/tests/test_crystals.cpp
Outdated
Botan_Tests::CHECK("hamming weight of polynomials", | ||
[](Test::Result& res) { | ||
Kyberish_Poly<Domain::Normal> p; | ||
res.test_is_eq<size_t>("hamming weight of 0", p.hamming_weight(), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
already tested below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One is testing poly, the other polyvec. The more the merrier I would say.
src/tests/test_crystals.cpp
Outdated
|
||
} // namespace | ||
|
||
BOTAN_REGISTER_TEST_FN("pubkey", "CRYSTALS", test_extended_euclidean_algorithm, test_polynomial_basics, test_encoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOTAN_REGISTER_TEST_FN("pubkey", "CRYSTALS", test_extended_euclidean_algorithm, test_polynomial_basics, test_encoding); | |
BOTAN_REGISTER_TEST_FN("pubkey", "crystals", test_extended_euclidean_algorithm, test_polynomial_basics, test_encoding); |
it seems all other test suits use snake_case (see ./botan-test --help
). Maybe, while you're at it, adapt the names for dilithium as well (e.g.dilithium_kat_4x4_AES_Deterministic
, dilithium_kat_4x4_Randomized
, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. Regarding the dilithium KATs: These names are currently tightly coupled with the DilithiumMode
constants. 😨 This will need a cleanup anyway, but lets not do that in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review still need to look at changes in dilithium
and kyber
a939cf1
to
33c6a10
Compare
Co-Authored-By: Fabian Albert <[email protected]> Co-Authored-By: Jack Lloyd <[email protected]>
33c6a10
to
3ecb3f9
Compare
This is essentially a complete overhaul of the Kyber and Dilithium implementations aiming at as much shared code as possible between the two algorithms. The rough structure is outlined below. I split-up the changes into three commits, that could be separate pull requests, if needed.
Module:
pqcrystals
This is a templated base for the polynomials (as well as poly vectors/matrices) used in both implementations. The base implementation shares math where possible and is pluggable where necessary. It also exposes its domain (NTT or "normal") in the type system. Also, it makes some initial optimization effort (mostly memory layout), but more work could be done here.
Both algorithms contain bit-packing algorithms to efficiently encode polynomial coefficients of various known value ranges. Both Kyber and Dilithium now share a generic and pluggable implementation of this.
Module:
kyber
This now makes use of the
pqcrystals
module. The polynomial instantiation happens inkyber_polynomial.h
(replacing the oldkyber_structures.h
). Most low-level "algorithms" (pseudo-codes of the spec) live inkyber_algos.cpp
. The indcpa "K-PKE" scheme is implemented inkyber_keys.cpp
and the cca2-secure wrapper can be found inkyber.cpp
. Auxiliary stuff like constants and strong types are in other headers.Note that this now contains almost no verbatim code from the reference implementation anymore.
Module:
dilithium
Likewise, this uses the
pqcrystals
module (instantiating the polynomials indilithium_polynomial.h
(replacing the almost verbatim reference implementation code). Thedilithium_algos.cpp
is home to most low-level "algorithms" (as seen in the spec).dilithium.cpp
houses the actual signature/verification algorihtms that bind it all together..Care has been taken to have a consistent implementation structure between Kyber and Dilithium. Note, though, that Dilithium's flexibility might require some love before a clean integration of ML-KEM (similar to what happened to Kyber in #3887).
Benchmark
Despite not explicitly aiming to increase speed,./botan speed
shows nice gains for Kyber and no losses for Dilithium.It might be worth looking into why Kyber is so much faster, though. Feels suspicious.Edit: The new implementation pre-computes and stores the expanded public matrix A. So as long as one re-uses thePK_KEM_Encryptor
for multiple encapsulations, one saves a lot of invocations to the XOF. That explains. 😏GCC 13.2 vs. Clang 18.1 on Linux (in WSL)