ML-KEM Mythbusting
## What is this?
There have been some recent concerns about ML-KEM, NIST’s standard for encryption with Post-Quantum Cryptography, related standards of the IETF, and lots of conspiracy theories about malicious actors subverting the standardization process. As someone who has been involved with this standardization process at pretty much every label, here a quick debunking of the various nonsense I have heard. So let’s get started, FAQ style.
## Did the NSA invent ML-KEM?
No. It was first specified by a team of various European cryptographers, whom you can look up on their website.
## Okay, but that was Kyber, not ML-KEM, did the NSA change Kyber?
No. The differences between Kyber and ML-KEM are pretty minute, mostly editorial changes by NIST. The only change that could be seen as actually interesting was a slight change to how certain key derivation mechanics worked. This change was suggested by Peter Schwabe, one of the original authors of Kyber, and is fairly straightforward to analyze. The reason for this change was that originally, Kyber was able to produce shared secrets of any length, by including a KDF step. But applications usually need their own KDF to apply to shared secrets, in order to bind the shared secret to transcripts and similar, so you would end up with two KDF calls. Since Kyber only uses the KDF to stretch the output, removing it slightly improves the performance of the algorithm without having any security consequences. Basically, there was a feature that turned out to not actually be a feature in real world scenarios, so NIST removed it, after careful consideration, and after being encouraged to do so by the literal author of the scheme, and under the watchful eyes of the entire cryptographic community. Nothing untoward happened here.
## Okay but what about maybe there still being a backdoor?
There is no backdoor in ML-KEM, and I can prove it. For something to be a backdoor, specifically a “Nobody but us backdoor” (NOBUS), you need some way to ensure that nobody else can exploit it, otherwise it is not a backdoor, but a broken algorithm, and any internal cryptanalysis you might have will be caught up eventually by academia. So for something to be a useful backdoor, you need to possess some secret that cannot be brute forced that acts as a private key to unlock any ciphertext generated by the algorithm. This is the backdoor in DUAL_EC_DRBG, and, since the US plans to use ML-KEM themselves (as opposed to the export cipher shenanigans back in the day), would be the only backdoor they could reasonably insert into a standard. But if you have a private key, that cannot be brute forced, you need to have a public key as well, and that public key needs to be embedded into the algorithm, as a parameter. And in order to not be brute forceable, this public key needs to have at least 128 bits of entropy. This gives us a nice test to see whether a scheme is capable of having cryptographic NOBUS backdoors: We tally up the entropy of the parameter space. If the result is definitely less than 128 bits, the scheme can at most be broken, but cannot be backdoored.
So let’s do that for ML-KEM:
This is the set of parameters, let’s tally them up, with complete disregard for any of the choices being much more constrained than random integers would suggest (actually, I am too much of a nerd to not point out the constraints, but I will use the larger number for the tally).
* Degree of the number field: 8 bits (actually, it has to be a power of two, so really only 3 bits)
* Prime: 12 bits (actually, it has to be a prime, so 10.2 bits (Actually, actually, it has to be a prime of the form , and it has to be at least double the rank times degree, and 3329 is literally the smallest prime that fits that bill))
* Rank of the module: 3 bits (well, the rank of the module is the main security parameter, it literally just counts from 2 to 4)
* Secret and error term bounds: 2 + 2 bits (really these come from the size of the prime, the module rank, and the number field degree)
* Compression strength: 4 + 3 bits
In total, this gives us 34 bits. Counted exceedingly generously. I even gave and extra bit for all the small numbers! Any asymmetric cryptosystem with a 34 bit public key would be brute forceable by a laptop within a few minutes. There is no backdoor in ML-KEM, because there simply is no space to hide a backdoor in ML-KEM.
And just to be sure, if you apply this same counting bits of parameters test to the famously backdoored DUAL_EC_DRBG, you indeed have multiple elliptic curve points defined in the standard without any motivation, immediately blowing our 128 bits of entropy budget for parameters. In fact, it would be trivial to fix DUAL_EC_DRBG by applying what’s called a “Nothing up my sleeves” paradigm: Instead of just having the elliptic curves points sit there, with no explanation, make it so that they are derived from digits of π, e, or the output of some hash function on some published seed. That would still not pass our test, but that it because I designed this test to be way too aggressive, as the remarks in the comments show, there is not really any real choice to these parameters, they are just the smallest set of parameters that result in a secure scheme (making them larger would only make the scheme slower and/or have more overhead).
So no, there is no backdoor in ML-KEM.
## But didn’t NIST fail basic math when picking ML-KEM?
No. In fact, I wrote an entire blog post about that topic, but “no” is an accurate summary of that post.
## I thought ML-KEM was broken, something about a fault attack?
There are indeed fault attacks on ML-KEM. This is not super surprising, if you know what a fault attack (also called glitch attack) is. For a fault attack, you need to insert a mistake – a fault – in the computation of the algorithm. You can do this via messing with the physical hardware, things like ROWHAMMER that literally change the memory while the computation is happening. It’s important to analyze these types of failures, but literally any practical cryptographic algorithm in existence is vulnerable to fault attacks. It’s literally computers failing at their one job and not computing very well. CPU and memory attacks are probably one of the most powerful families of attacks we have, and they have proven to be very stubborn to mitigate. But algorithms failing in the face of them is not particularly surprising, after all, if you can flip a single arbitrary bit, you might as well just set “verified_success” to true and call it a day. Technically, this is the strongest form of fault, where the attacker choses where it occurs, but even random faults usually demolish pretty much any cryptographic algorithm, and us knowing about these attacks is merely evidence of an algorithm being seen as important enough to do the math of how exactly they fail when you literally pull the ground out beneath them.
## But what about decryption failure attacks? Those sound scary!
ML-KEM has a weird quirk: It is, theoretically, possible to create a ciphertext, in an honest fashion, the the private key holder will reject. If one were to successfully do so, one would learn information about the private key. But here comes the kicker: The only way to create this poisoned ciphertext is by honestly running the encapsulation algorithm, and hoping to get lucky. There is a slight way to bias the ciphertexts, but to do so, one still has to compute them, and the advantage would be abysmal, since ML-KEM forces the hand of the encapsulating party on almost all choices. The probability of this decapsulation failure can be compute with relatively straight-forward mathematics, the Cauchy-Schwartz inequality. And well, the parameters of ML-KEM are chosen in such a way that the actual probability is vanishingly small, less than . At this point, the attacker cannot really assume that they were observing a decapsulation failure anymore, as a whole range of other incredibly unlikely events, such as enough simultaneous bit flips due to cosmic radiation to evade error detection are far more likely. It is true that after the first decapsulation failure has been observed, the attacker has much more abilities to stack the deck in their favor, but to do so, you first need the first failure to occur, and there is not really any hope in doing so.
On top of this, the average ML-KEM key is used exactly once, as such is the fate of keys used in key exchange, further making any adaptive attack like this meaningless, but ML-KEM keys are save to use even with multiple decapsulations.
## But wasn’t there something called Kyberslash?
Yeah. It turns out, implementing cryptographic code is still hard. My modest bragging right is that my implementation, which would eventually morph into BoringSSL’s ML-KEM implementation, never had this problem, so I guess the answer here is to git gud, or something. But really, especially initially, there are some rough edges in new implementations as we learn the right techniques to avoid them. The good news here is that implementationwise, ML-KEM is actually a lot simpler than elliptic curves are, so these kinds of minor side channel issues are likely to be rarer here.
## Okay, enough about ML-KEM, what about hybrids and the IETF?
Okay, this one is a funny one. Well funny if you likely deeply dysfunctional bikeshedding, willful misunderstanding, and drama. First of, what are hybrids? Assume you have two cryptographic schemes that do the same thing, and you distrust both of them. But you do trust the combination of the two. That is, in essence, what hybrids allow you to do: Combine two schemes of the same type into one, so that the combined scheme is at least as secure as either of them. The usual line is that this is perfect for PQC, as it allows you to combine the well studied security of classical schemes with the quantum resistance of PQC schemes. Additionally, the overhead of elliptic curve cryptography, when compared with lattice cryptography, is tiny, so why not throw it in there. And generally I agree with that stance, although I would say that my trust in lattice cryptography is pretty much equal to my trust in elliptic curves, and quite a bit higher than my trust in RSA, so I would not see hybrids as absolutely, always and at every turn, superduper essential. But they are basically free, so why not? In the end, yes, hybrids are the best way to go, and indeed, this is what the IETF enabled people to do. There are various RFCs to that extend, to understand the current controversy, we need to focus on two TLS related ones: X25519MLKEM768 aka 0x11EC, and MLKEM1024. The former is a hybrid, the latter is not. And, much in line with my reasoning, 0x11EC is the default key exchange algorithm used by Chrome, Firefox, and pretty much all other TLS clients that currently support PQC. So what’s the point of MLKEM1024? Well it turns out there is one customer who really really hates hybrids, and only wants to use ML-KEM1024 for all their systems. And that customer happens to be the NSA. And honestly, I do not see a problem with that. If the NSA wants to make their own systems inefficient, then that is their choice. Why inefficient? It turns out that, due to the quirks of how TLS works, the client needs to predict what the server will likely accept. They could predict more things, but since PQC keys are quite chonky, sending more than one PQC key is making your handshakes slower. And so does mispredicting, since it results in the server saying “try again, with the right public key, this time”. So, if everyone but the NSA uses X25519MLKEM768, the main effect is that the NSA has slower handshakes. As said, I don’t think it’s reasonable to say their handshakes are substantially less secure, but sure, if you really think ML-KEM is broken, then yes, the NSA has successfully undermined the IETF in order to make their own systems less secure, while not impacting anyone else. Congratulations to them, I guess.
## But doesn’t the IETF actively discourage hybrids?
No. To understand this, we need to look at three flags that come with TLS keyexchange algorithms: Recommended, Discouraged and Mandatory To Implement. Discouraged is a flag used for algorithms known to be broken, such as RC4. Clearly ML-KEM, with or without a hybrid, is not known to be broken, so Discouraged is the wrong category. It is true that 0x11EC is not marked as Recommended, mostly because it started out as an experimental combination that then somehow ended up as the thing everybody was doing, and while lots of digital ink was spilled on whether or not it should be recommended, nobody updated the flag before publishing the RFC. So yes, technically the IETF did not recommend a hybrid algorithm. But your browsers and everybody else is using it, so there is that. And just in case you were worried about that, the NSA option of MLKEM1024 is also not marked as recommended. Lastly, Mandatory To Implement is an elaborate prank by the inventors of TLS to create more discussions on mailing lists. As David Benjamin once put it, the only algorithm that is actually mandatory to implement is the null algorithm, as that is the name of the initial state of a TLS connection, before an algorithm has been negotiated. Otherwise, at least my recommendation, is to respond with this gif
whenever someone requests a MTI algorithm you don’t want to support. The flag has literally zero meaning. Oh and yeah, neither of the two algorithms is MTI.
### Share this:
* Click to share on X (Opens in new window) X
* Click to share on Facebook (Opens in new window) Facebook
*
Like Loading...