Attacking 2-Round AES with 1 Known Plaintext
An implementation of 1 known plaintext 2-round AES attack from a paper by Charles Bouillaguet, et. al. regarding AES attacks.
Last updated
An implementation of 1 known plaintext 2-round AES attack from a paper by Charles Bouillaguet, et. al. regarding AES attacks.
Last updated
First of all, disclaimer! I'm not a cryptography guy and I'm also not a scientific-writing guy. This is also my first cryptography article. If there are mistakes in my terminologies, notations, or methodologies, please let me know.
Last week, my faculty had conducted a training session for my faculty's CTF teams, to prepare us for upcoming competitions. For the training, a 3-day jeopardy CTF was conducted. There's one interesting cryptography challenge, authored by prajnapras19, that has zero solves, even until the CTF ended. Let's take a look at the challenge.
We are given three source files. The first one is called aes.py
, a modified version of boppreh
's AES implementation found at . Overall, it is still the same, but most functions are unused and deleted. I assume all of you are already pretty familiar with AES' encryption process, so I'm not gonna explain it too much here.
The second source file is chall.py
. It's the main challenge where the encryption of the flag happens.
The program generates a random 16-byte value as the master encryption key. The master key is hashed using SHA-512, then the hashed key is used to to XOR the flag. The program generates another random 16-byte value as the plaintext. The plaintext is then converted to a 4x4 matrix, where every element is its bytes. The round keys are then generated from the master key through AES key scheduling algorithm, but only for 2 rounds.
Then, the AES suboperations are done in order to the plaintext (initial AddRoundKey
, then SubBytes
, ShiftRows
, MixColumns
, and the next AddRoundKey
) for two rounds. The encrypted plaintext from the second round is then extracted and printed out for the player, along with the protected flag and the initial plaintext. Then, the program deletes most of the bytes from the generated master key, only leaving some behind before printing it to the player. Lastly, we're given a hint for some values: the 1st round key of index [2][1] and index [3][1].
As a non-cryptography guy, it took me an entire 2-3 days to get...nowhere. I tried some approaches:
Carefully and diligently looked through the expand_key
algorithm to try and deduce the other missing round keys and the master key. Only managed to recover around 5 bytes.
Blindly brute-forced some key values, then realized that even 0xFF^4
takes forever.
From the blind brute-force, tried reversing the encryption process from the known plaintext and ciphertext. Needs bruteforcing.
Asked ChatGPT, Claude Sonnet, and DeepSeek R1 — none got the approach right.
I tried to ask the problemsetter for hints considering that the challenge still has 0 solves, but they refused. The challenge ended up not getting any solves until the CTF ended. Only then the problemsetter gave a hint: the challenge (and solution) comes from a paper. Even though I know there are cryptography challenges that refers to a paper, I didn't think that a challenge for "training session" will be one of them. Being a non-cryptography player, I also still don't (and maybe will never) have the gut feeling to Google for papers.
The paper assesses methodologies that can be used for attacking fewer round AES encryptions. Usually, AES encryptions undergo transformations of 10 rounds for 128-bit keys, 12 rounds for 192-bit keys, and 14 rounds for 256-bit keys. Based on the paper, for a known plaintext-ciphertext pair (or more) retrieved in round 4 or less of AES, there could be methodologies we can use to attack it and retrieve the original keys. It depends on how many rounds the encryption exactly went through and how many plaintext-ciphertext pair we know.
For this challenge, we know that the encryption goes for two rounds, and we only have one pair of plaintext and ciphertext. In the paper, there is a segment written exactly to help us understand how to attack this kind of encryption.
According to C. Bouillaguet in "Low-Data Complexity Attacks on AES" (2012), it is possible to attack two-round AES just with a single plaintext-ciphertext pair. The attack is mainly based on some observations mentioned in the paper and on reversing the key scheduling algorithm.
There are two phases of the attack. For the most part, I will refer to this figure to know the attacking steps in order.
The first row of the figure shows the first round encryption process, and the second row shows the second round encryption process. The grey cells are the informations we know. The numbers in the cells are values retrieved from each step — values we should retrieve in order, to recover the master key correctly. Note that cells with the same number doesn't necessarily mean they have the same number, it just means that the value in that cell will be retrieved on the same step number. The black squares are the values we have to "guess" first before going through the steps — and this can be done via bruteforcing. Well--yes, bruteforcing 0xFF^9 would take forever, but that's not the case for this challenge.
Some notations I use will be different from the papers:
= This will refer to the round key of round , row , column for and , .
The matrices in the challenge are transposed from the one in the paper. For example, if I'm referring to key_matrices[1][3][1]
, I will be referring to of the first round key in zero-indexed matrix notation.
According to section 5.2 of the paper, we have to guess 9 round key bytes marked in black before going through the steps, where 6 bytes are from the master key and 3 bytes are from the first round key. Translating the figure, the round keys we have to guess are:
Luckily, if you noticed, 8 out of 9 key bytes are already given to us on the hints:
Through the key expansion algorithm, we know that the master key is the initial key round, equal to k[0]
. We can group the master key hint given as a list of 4 lists each consisting of 4 elements — a 4x4 matrix. From there, along with the key_matrices
hint, we get:
This greatly narrows down the bruteforcing we have to do — from 0xFF^9 to just 0xFF, because we just need to guess key_matrices[1][3][3]
. Looking at the figure, key_matrices[1][3][3]
is used only after the fourth step, which means we can start running through the steps 1 - 4.
For step 1, we have the full plaintext and known bytes of the master key to perform the initial AddRoundKey
, which is just XOR-ing every byte with the master key's bytes on the same index. Because some master key values are still unknown, we're not able to get the full AddRoundKey
results yet — hence the empty squares. Only the values on the cells marked with 1
are known.
For step 2, with the AddRoundKey
results, we can perform SubBytes
. Again, some values are still unknown, so we only perform SubBytes
on the known AddRoundKey
results — resulting in cells marked 2
.
For step 3, we can perform ShiftRows
. Visually, on the figure, it looks like row (0-indexed) is shifted left time(s). If you remember the reminder above, the matrix in the code is transposed, so visually on the code, every row shifted left is every column shifted up. This results in the cells marked 3
.
For step 4, we can perform MixColumns
. To perform MixColumns
, we can't use partially known columns — only columns with fully known values can be used to MixColumns
. Because of this, MixColumns
only applies to the 3rd column, resulting in cells marked 4
.
Now we have new known cells defined for each AES suboperations result in round 1! But...are these steps really necessary? Turns out, they're not. Quoting Section 5.2 of the paper, for the first phase,
Step 7 uses Observation 3(3), step 8 uses Observation 3(2), steps 5, 6, and 9 use the key schedule, and the remaining steps are computed using the AES algorithm.
If you look at the figure again, the round keys we can deduce other than the guessed bytes (black cells) are only cells marked with 5
, 6
, 7
, 8
, and 9
— cells we can define from step 5 - 9. Steps 1 to 4 doesn't give us any new information for the round keys' missing values, and steps 5 - 9 doesn't use AES operations for their deductions, so there's actually no need to perform steps 1 - 4.
You can still do it for optimization purposes, because with steps 1 to 4 we can get new known values for each step in the AES encryption suboperations. But in my approach, I didn't do it.
Take a look at the figure again, and the quote from Section 5.2.
... steps 5, 6, and 9 use the key schedule, and the ...
The deduction for the next two steps uses the key schedule/key expanding algorithm. In order to fully understand how we can deduce these cells, let's do a full breakdown of the expand_key
function used to generate the round keys (key_matrices
). I will break it down in the context of this challenge only.
Initially, the function creates a matrix of the master key called key_columns
, calculates the iteration size (which equals4
), and initialized a round counter i
. It goes in a while loop and appends key_columns
with new values (word
) until the length reaches 12 (since we are doing it for 2 rounds). The new word
s are taken from the previous one (of index -1
) then modified with some conditions.
If the current length of key_columns
is divisible by 4 (start of the next round), the bytes of word
are first circular-shifted to the left. Then, the bytes are substituted using s_box
, the substitution table for the SubBytes
AES suboperation. The first byte is then XORed with the round constant r_con[i]
, with i
referring to the current round number. i
is then incremented. For the other cases where the current length of key_columns
isn't divisible by 4, word
will directly go to the next step.
There is an else if statement, but this will never get executed in our challenge because our key is only 16 bytes (128-bit).
Then, every byte of the word will be XORed with the equivalent byte from the word of the previous round. For example, if we're in word 2 of round 2, it will get XORed with word 2 of round 1. The word then gets appended to key_columns
. After key_columns
achieved the length 12, it will be returned as a 3-dimensional list, where the outer indices indicates the round, the rest are just 4x4 matrices.
From the breakdown, we can carefully create equations that determines the value of every byte of the round keys:
This will be our "equation template" to deduce the round key values.
Back to the deduction steps 5 and 6. Looking at the figure, the mark 5
is in the cell and the mark 6
is in the cell , which means it's key_matrices[1][3][0]
and key_matrices[1][0][1]
respectively. Using the equation template, we get these:
All the values on the right hand side are known, so we can directly deduce the values from steps 5 and 6.
Step 7 uses Observation 3(3), ...
To understand what Observation 3(3) is, let me quote it from the paper using what I understand and modified using notations:
Consider a series of consecutive round keys and denote and . Then, the round keys can be represented as linear combinations of (the columns of ) and the 32-bit words as shown in the [omitted] table.
As a result, we have the following useful relations between round keys (for i = 0, 1, 2, 3):
C. Bouillaguet, P. Derbez, O. Dunkelman, P. -A. Fouque, N. Keller and V. Rijmen, "Low-Data Complexity Attacks on AES," in IEEE Transactions on Information Theory, vol. 58, no. 11, pp. 7002-7017, Nov. 2012, doi: 10.1109/TIT.2012.2207880
After some Googling and some hints of which paper I should look for, I found this paper written by C. Bouillaguet, et. al. with the title "Low-Data Complexity Attacks on AES" (2012). The paper is publicly available .