Attacking 2-Round AES with 1 Known Plaintext

An implementation of 1 known plaintext 2-round AES attack from a paper by Charles Bouillaguet, et. al. regarding AES attacks.

Last updated 4 months ago

Attacking 2-Round AES with 1 Known Plaintext

An implementation of 1 known plaintext 2-round AES attack from a paper by Charles Bouillaguet, et. al. regarding AES attacks.

First of all, disclaimer! I'm not a cryptography guy and I'm also not a scientific-writing guy. This is also my first cryptography article. If there are mistakes in my terminologies, notations, or methodologies, please let me know.

Introduction

Last week, my faculty had conducted a training session for my faculty's CTF teams, to prepare us for upcoming competitions. For the training, a 3-day jeopardy CTF was conducted. There's one interesting cryptography challenge, authored by prajnapras19, that has zero solves, even until the CTF ended. Let's take a look at the challenge.

Challenge Description

We are given three source files. The first one is called aes.py, a modified version of boppreh's AES implementation found at . Overall, it is still the same, but most functions are unused and deleted. I assume all of you are already pretty familiar with AES' encryption process, so I'm not gonna explain it too much here.

#!/usr/bin/env python3
"""
This is a modified version of boppreh's AES implementation found at https://github.com/boppreh/AES. Follow the original disclaimer
__________________________________
This is an exercise in secure symmetric-key encryption, implemented in pure
Python (no external libraries needed).

Original AES-128 implementation by Bo Zhu (http://about.bozhu.me) at 
https://github.com/bozhu/AES-Python . PKCS#7 padding, CBC mode, PKBDF2, HMAC,
byte array and string support added by me at https://github.com/boppreh/aes. 
Other block modes contributed by @righthandabacus.


Although this is an exercise, the `encrypt` and `decrypt` functions should
provide reasonable security to encrypted messages.
"""


s_box = (
    # omitted for brevity
)

inv_s_box = (
    # omitted for brevity
)


def sub_bytes(s):
    for i in range(4):
        for j in range(4):
            s[i][j] = s_box[s[i][j]]


def inv_sub_bytes(s):
    for i in range(4):
        for j in range(4):
            s[i][j] = inv_s_box[s[i][j]]


def shift_rows(s):
    s[0][1], s[1][1], s[2][1], s[3][1] = s[1][1], s[2][1], s[3][1], s[0][1]
    s[0][2], s[1][2], s[2][2], s[3][2] = s[2][2], s[3][2], s[0][2], s[1][2]
    s[0][3], s[1][3], s[2][3], s[3][3] = s[3][3], s[0][3], s[1][3], s[2][3]


def inv_shift_rows(s):
    s[0][1], s[1][1], s[2][1], s[3][1] = s[3][1], s[0][1], s[1][1], s[2][1]
    s[0][2], s[1][2], s[2][2], s[3][2] = s[2][2], s[3][2], s[0][2], s[1][2]
    s[0][3], s[1][3], s[2][3], s[3][3] = s[1][3], s[2][3], s[3][3], s[0][3]


def add_round_key(s, k):
    for i in range(4):
        for j in range(4):
            s[i][j] ^= k[i][j]


# learned from https://web.archive.org/web/20100626212235/http://cs.ucsb.edu/~koc/cs178/projects/JT/aes.c
def xtime(a): return (((a << 1) ^ 0x1B) & 0xFF) if (a & 0x80) else (a << 1)


def mix_single_column(a):
    # see Sec 4.1.2 in The Design of Rijndael
    t = a[0] ^ a[1] ^ a[2] ^ a[3]
    u = a[0]
    a[0] ^= t ^ xtime(a[0] ^ a[1]) 
    a[1] ^= t ^ xtime(a[1] ^ a[2])
    a[2] ^= t ^ xtime(a[2] ^ a[3])
    a[3] ^= t ^ xtime(a[3] ^ u)


def mix_columns(s):
    for i in range(4):
        mix_single_column(s[i])


def inv_mix_columns(s):
    # see Sec 4.1.3 in The Design of Rijndael
    for i in range(4):
        u = xtime(xtime(s[i][0] ^ s[i][2]))
        v = xtime(xtime(s[i][1] ^ s[i][3]))
        s[i][0] ^= u
        s[i][1] ^= v
        s[i][2] ^= u
        s[i][3] ^= v

    mix_columns(s)


r_con = (
    0x00, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40,
    0x80, 0x1B, 0x36, 0x6C, 0xD8, 0xAB, 0x4D, 0x9A,
    0x2F, 0x5E, 0xBC, 0x63, 0xC6, 0x97, 0x35, 0x6A,
    0xD4, 0xB3, 0x7D, 0xFA, 0xEF, 0xC5, 0x91, 0x39,
)


def bytes2matrix(text):
    """ Converts a 16-byte array into a 4x4 matrix.  """
    return [list(text[i:i+4]) for i in range(0, len(text), 4)]


def matrix2bytes(matrix):
    """ Converts a 4x4 matrix into a 16-byte array.  """
    return bytes(sum(matrix, []))


def xor_bytes(a, b):
    """ Returns a new byte array with the elements xor'ed. """
    return bytes(i ^ j for i, j in zip(a, b))


def inc_bytes(a):
    """ Returns a new byte array with the value increment by 1 """
    out = list(a)
    for i in reversed(range(len(out))):
        if out[i] == 0xFF:
            out[i] = 0
        else:
            out[i] += 1
            break
    return bytes(out)


def pad(plaintext):
    """
    Pads the given plaintext with PKCS#7 padding to a multiple of 16 bytes.
    Note that if the plaintext size is a multiple of 16,
    a whole block will be added.
    """
    padding_len = 16 - (len(plaintext) % 16)
    padding = bytes([padding_len] * padding_len)
    return plaintext + padding


def unpad(plaintext):
    """
    Removes a PKCS#7 padding, returning the unpadded text and ensuring the
    padding was correct.
    """
    padding_len = plaintext[-1]
    assert padding_len > 0
    message, padding = plaintext[:-padding_len], plaintext[-padding_len:]
    assert all(p == padding_len for p in padding)
    return message


def split_blocks(message, block_size=16, require_padding=True):
    assert len(message) % block_size == 0 or not require_padding
    return [message[i:i+16] for i in range(0, len(message), block_size)]


def expand_key(master_key, n_rounds):
    """
    Expands and returns a list of key matrices for the given master_key.
    """
    # Initialize round keys with raw key material.
    key_columns = bytes2matrix(master_key)
    iteration_size = len(master_key) // 4

    i = 1
    while len(key_columns) < (n_rounds + 1) * 4:
        # Copy previous word.
        word = list(key_columns[-1])

        # Perform schedule_core once every "row".
        if len(key_columns) % iteration_size == 0:
            # Circular shift.
            word.append(word.pop(0))
            # Map to S-BOX.
            word = [s_box[b] for b in word]
            # XOR with first byte of R-CON, since the others bytes of R-CON are 0.
            word[0] ^= r_con[i]
            i += 1
        elif len(master_key) == 32 and len(key_columns) % iteration_size == 4:
            # Run word through S-box in the fourth iteration when using a
            # 256-bit key.
            word = [s_box[b] for b in word]

        # XOR with equivalent word from previous iteration.
        word = xor_bytes(word, key_columns[-iteration_size])
        key_columns.append(word)

    # Group key words in 4x4 byte matrices.
    return [key_columns[4*i: 4*(i+1)] for i in range(len(key_columns) // 4)]

The second source file is chall.py. It's the main challenge where the encryption of the flag happens.

from os import urandom
from hashlib import sha512
from Crypto.Util.strxor import strxor

from aes import *

FLAG = open('flag.txt', 'rb').read()
master_key = urandom(16)
flag_key = sha512(master_key).digest()
enc_flag = strxor(FLAG, flag_key[:len(FLAG)])

plaintext = urandom(16)
plain_state = bytes2matrix(plaintext)
key_matrices = expand_key(master_key, 2)

add_round_key(plain_state, key_matrices[0])
sub_bytes(plain_state)
shift_rows(plain_state)
mix_columns(plain_state)
add_round_key(plain_state, key_matrices[1])
sub_bytes(plain_state)
shift_rows(plain_state)
mix_columns(plain_state)
add_round_key(plain_state, key_matrices[2])

ciphertext = matrix2bytes(plain_state)

print(f"Protected flag: {enc_flag.hex()}")
print(f"Plaintext: {plaintext.hex()}")
print(f"Ciphertext: {ciphertext.hex()}")

master_key = [
    0, master_key[1], 0, 0,
    master_key[4], 0, master_key[6], 0,
    0, 0, 0, master_key[11],
    master_key[12], 0, master_key[14], 0,
]

print(f"Hint: {master_key}")
print(f"Hint: {key_matrices[1][2][0]}")
print(f"Hint: {key_matrices[1][3][1]}")

The program generates a random 16-byte value as the master encryption key. The master key is hashed using SHA-512, then the hashed key is used to to XOR the flag. The program generates another random 16-byte value as the plaintext. The plaintext is then converted to a 4x4 matrix, where every element is its bytes. The round keys are then generated from the master key through AES key scheduling algorithm, but only for 2 rounds.

Then, the AES suboperations are done in order to the plaintext (initial AddRoundKey, then SubBytes, ShiftRows, MixColumns, and the next AddRoundKey) for two rounds. The encrypted plaintext from the second round is then extracted and printed out for the player, along with the protected flag and the initial plaintext. Then, the program deletes most of the bytes from the generated master key, only leaving some behind before printing it to the player. Lastly, we're given a hint for some values: the 1st round key of index [2][1] and index [3][1].

(A Little) Story Time

As a non-cryptography guy, it took me an entire 2-3 days to get...nowhere. I tried some approaches:

Carefully and diligently looked through the expand_key algorithm to try and deduce the other missing round keys and the master key. Only managed to recover around 5 bytes.
Blindly brute-forced some key values, then realized that even 0xFF^4 takes forever.
From the blind brute-force, tried reversing the encryption process from the known plaintext and ciphertext. Needs bruteforcing.
Asked ChatGPT, Claude Sonnet, and DeepSeek R1 — none got the approach right.

I tried to ask the problemsetter for hints considering that the challenge still has 0 solves, but they refused. The challenge ended up not getting any solves until the CTF ended. Only then the problemsetter gave a hint: the challenge (and solution) comes from a paper. Even though I know there are cryptography challenges that refers to a paper, I didn't think that a challenge for "training session" will be one of them. Being a non-cryptography player, I also still don't (and maybe will never) have the gut feeling to Google for papers.

Where The Paper Comes In

The paper assesses methodologies that can be used for attacking fewer round AES encryptions. Usually, AES encryptions undergo transformations of 10 rounds for 128-bit keys, 12 rounds for 192-bit keys, and 14 rounds for 256-bit keys. Based on the paper, for a known plaintext-ciphertext pair (or more) retrieved in round 4 or less of AES, there could be methodologies we can use to attack it and retrieve the original keys. It depends on how many rounds the encryption exactly went through and how many plaintext-ciphertext pair we know.

For this challenge, we know that the encryption goes for two rounds, and we only have one pair of plaintext and ciphertext. In the paper, there is a segment written exactly to help us understand how to attack this kind of encryption.

The Attack

According to C. Bouillaguet in "Low-Data Complexity Attacks on AES" (2012), it is possible to attack two-round AES just with a single plaintext-ciphertext pair. The attack is mainly based on some observations mentioned in the paper and on reversing the key scheduling algorithm.

There are two phases of the attack. For the most part, I will refer to this figure to know the attacking steps in order.

The first row of the figure shows the first round encryption process, and the second row shows the second round encryption process. The grey cells are the informations we know. The numbers in the cells are values retrieved from each step — values we should retrieve in order, to recover the master key correctly. Note that cells with the same number doesn't necessarily mean they have the same number, it just means that the value in that cell will be retrieved on the same step number. The black squares are the values we have to "guess" first before going through the steps — and this can be done via bruteforcing. Well--yes, bruteforcing 0xFF^9 would take forever, but that's not the case for this challenge.

Notations

Some notations I use will be different from the papers:

$k_{r,i,j}$ = This will refer to the round key of round $r$ , row $i$ , column $j$ for $r = 0,1,2$ and $0 \leq i, j \leq 3$ , $i,j \in \mathbb{Z}$ .

The matrices in the challenge are transposed from the one in the paper. For example, if I'm referring to key_matrices[1][3][1], I will be referring to $k_{1,1,3}$ of the first round key in zero-indexed matrix notation.

Phase 1

For the phases and steps, I'm gonna write exactly the steps I've gone through. While writing this, I've seen that some optimizations can be done, but for the sake of authenticity, I will not modify my solution and steps.

According to section 5.2 of the paper, we have to guess 9 round key bytes marked in black before going through the steps, where 6 bytes are from the master key and 3 bytes are from the first round key. Translating the figure, the round keys we have to guess are:

k[0][0][1]
k[0][1][0]
k[0][1][2]
k[0][2][3]
k[0][3][0]
k[0][3][2]
k[1][2][0]
k[1][3][1]
k[1][3][3]

Luckily, if you noticed, 8 out of 9 key bytes are already given to us on the hints:

print(f"Hint: {master_key}")
print(f"Hint: {key_matrices[1][2][0]}")
print(f"Hint: {key_matrices[1][3][1]}")

# Output
# Hint: [0, 131, 0, 0, 93, 0, 142, 0, 0, 0, 0, 187, 129, 0, 42, 0]
# Hint: 23
# Hint: 16

Through the key expansion algorithm, we know that the master key is the initial key round, equal to k[0]. We can group the master key hint given as a list of 4 lists each consisting of 4 elements — a 4x4 matrix. From there, along with the key_matrices hint, we get:

k[0][0][1] = master_key[1] = 131
k[0][1][0] = master_key[4] = 93
k[0][1][2] = master_key[6] = 142
k[0][2][3] = master_key[11] = 187
k[0][3][0] = master_key[12] = 129
k[0][3][2] = master_key[14] = 42
k[1][2][0] = 23
k[1][3][1] = 16
k[1][3][3]

This greatly narrows down the bruteforcing we have to do — from 0xFF^9 to just 0xFF, because we just need to guess key_matrices[1][3][3]. Looking at the figure, key_matrices[1][3][3] is used only after the fourth step, which means we can start running through the steps 1 - 4.

Steps 1 - 4 (Spoiler Alert: It's Not Really Necessary)

For step 1, we have the full plaintext and known bytes of the master key to perform the initial AddRoundKey , which is just XOR-ing every byte with the master key's bytes on the same index. Because some master key values are still unknown, we're not able to get the full AddRoundKey results yet — hence the empty squares. Only the values on the cells marked with 1are known.

For step 2, with the AddRoundKey results, we can perform SubBytes. Again, some values are still unknown, so we only perform SubBytes on the known AddRoundKey results — resulting in cells marked 2.

For step 3, we can perform ShiftRows. Visually, on the figure, it looks like row $r$ (0-indexed) is shifted left $r$ time(s). If you remember the reminder above, the matrix in the code is transposed, so visually on the code, every row shifted left is every column shifted up. This results in the cells marked 3.

For step 4, we can perform MixColumns. To perform MixColumns, we can't use partially known columns — only columns with fully known values can be used to MixColumns. Because of this, MixColumnsonly applies to the 3rd column, resulting in cells marked 4.

Now we have new known cells defined for each AES suboperations result in round 1! But...are these steps really necessary? Turns out, they're not. Quoting Section 5.2 of the paper, for the first phase,

Step 7 uses Observation 3(3), step 8 uses Observation 3(2), steps 5, 6, and 9 use the key schedule, and the remaining steps are computed using the AES algorithm.

If you look at the figure again, the round keys we can deduce other than the guessed bytes (black cells) are only cells marked with 5, 6, 7, 8, and 9 — cells we can define from step 5 - 9. Steps 1 to 4 doesn't give us any new information for the round keys' missing values, and steps 5 - 9 doesn't use AES operations for their deductions, so there's actually no need to perform steps 1 - 4.

You can still do it for optimization purposes, because with steps 1 to 4 we can get new known values for each step in the AES encryption suboperations. But in my approach, I didn't do it.

Steps 5 - 6

Take a look at the figure again, and the quote from Section 5.2.

... steps 5, 6, and 9 use the key schedule, and the ...

The deduction for the next two steps uses the key schedule/key expanding algorithm. In order to fully understand how we can deduce these cells, let's do a full breakdown of the expand_key function used to generate the round keys (key_matrices). I will break it down in the context of this challenge only.

def expand_key(master_key, n_rounds):
    """
    Expands and returns a list of key matrices for the given master_key.
    """
    # Initialize round keys with raw key material.
    key_columns = bytes2matrix(master_key)
    iteration_size = len(master_key) // 4

    i = 1
    while len(key_columns) < (n_rounds + 1) * 4:
        # Copy previous word.
        word = list(key_columns[-1])

        # Perform schedule_core once every "row".
        if len(key_columns) % iteration_size == 0:
            # Circular shift.
            word.append(word.pop(0))
            # Map to S-BOX.
            word = [s_box[b] for b in word]
            # XOR with first byte of R-CON, since the others bytes of R-CON are 0.
            word[0] ^= r_con[i]
            i += 1
        elif len(master_key) == 32 and len(key_columns) % iteration_size == 4:
            # Run word through S-box in the fourth iteration when using a
            # 256-bit key.
            word = [s_box[b] for b in word]

        # XOR with equivalent word from previous iteration.
        word = xor_bytes(word, key_columns[-iteration_size])
        key_columns.append(word)

    # Group key words in 4x4 byte matrices.
    return [key_columns[4*i: 4*(i+1)] for i in range(len(key_columns) // 4)]

Initially, the function creates a matrix of the master key called key_columns, calculates the iteration size (which equals4), and initialized a round counter i. It goes in a while loop and appends key_columns with new values (word) until the length reaches 12 (since we are doing it for 2 rounds). The new words are taken from the previous one (of index -1) then modified with some conditions.

If the current length of key_columns is divisible by 4 (start of the next round), the bytes of word are first circular-shifted to the left. Then, the bytes are substituted using s_box, the substitution table for the SubBytes AES suboperation. The first byte is then XORed with the round constant r_con[i] , with i referring to the current round number. iis then incremented. For the other cases where the current length of key_columnsisn't divisible by 4, word will directly go to the next step.

There is an else if statement, but this will never get executed in our challenge because our key is only 16 bytes (128-bit).

Then, every byte of the word will be XORed with the equivalent byte from the word of the previous round. For example, if we're in word 2 of round 2, it will get XORed with word 2 of round 1. The word then gets appended to key_columns. After key_columns achieved the length 12, it will be returned as a 3-dimensional list, where the outer indices indicates the round, the rest are just 4x4 matrices.

From the breakdown, we can carefully create equations that determines the value of every byte of the round keys:

// key_matrices[r][i][j] will be called k[r][i][j] for ease

if i == 0 and j == 0:
    k[r][i][j] = (s_box[k[r-1][3][(j+1)%4]] ^ r_con[r]) ^ k[r-1][i][j]
if i == 0 and j != 0:
    k[r][i][j] = s_box[k[r-1][3][(j+1)%4]] ^ k[r-1][i][j]
else:
    k[r][i][j] = k[r][i-1][j] ^ k[r-1][i][j]

This will be our "equation template" to deduce the round key values.

Back to the deduction steps 5 and 6. Looking at the figure, the mark 5 is in the cell $k_{1,0,3}$ and the mark 6 is in the cell $k_{1,1,0}$ , which means it's key_matrices[1][3][0]and key_matrices[1][0][1] respectively. Using the equation template, we get these:

key_matrices[1][3][0] = key_matrices[1][2][0] ^ key_matrices[0][3][0]
key_matrices[1][0][1] = s_box[key_matrices[0][3][2]] ^ key_matrices[0][0][1]

All the values on the right hand side are known, so we can directly deduce the values from steps 5 and 6.

key_matrices[1][3][0] = 23 ^ 129 = 150
key_matrices[1][0][1] = s_box[key_matrices[0][3][2]] ^ key_matrices[0][0][1] = s_box[42] ^ 131 = 229 ^ 131 = 102

Step 7

Step 7 uses Observation 3(3), ...

To understand what Observation 3(3) is, let me quote it from the paper using what I understand and modified using notations:

Consider a series of consecutive round keys $k_r, k_{r+1}, ...,$ and denote $k_r = (a, b, c, d)$ and $v_r = RotBytes(SubBytes(k_{r,i,3})) \oplus R\_CON[r+1]$ . Then, the round keys $k_{r+1}, k_{r+2}, ...$ can be represented as linear combinations of $(a, b, c, d)$ (the columns of $k_{r}$ ) and the 32-bit words $v_r,v_{r+1}, ...,$ as shown in the [omitted] table.
As a result, we have the following useful relations between round keys (for i = 0, 1, 2, 3):

The rest of the article is Work In Progress. I accidentally clicked merge changes before it even finished 🤦

References

C. Bouillaguet, P. Derbez, O. Dunkelman, P. -A. Fouque, N. Keller and V. Rijmen, "Low-Data Complexity Attacks on AES," in IEEE Transactions on Information Theory, vol. 58, no. 11, pp. 7002-7017, Nov. 2012, doi: 10.1109/TIT.2012.2207880

Last updated 4 months ago

Introduction

Challenge Description

#!/usr/bin/env python3
"""
This is a modified version of boppreh's AES implementation found at https://github.com/boppreh/AES. Follow the original disclaimer
__________________________________
This is an exercise in secure symmetric-key encryption, implemented in pure
Python (no external libraries needed).

Original AES-128 implementation by Bo Zhu (http://about.bozhu.me) at 
https://github.com/bozhu/AES-Python . PKCS#7 padding, CBC mode, PKBDF2, HMAC,
byte array and string support added by me at https://github.com/boppreh/aes. 
Other block modes contributed by @righthandabacus.


Although this is an exercise, the `encrypt` and `decrypt` functions should
provide reasonable security to encrypted messages.
"""


s_box = (
    # omitted for brevity
)

inv_s_box = (
    # omitted for brevity
)


def sub_bytes(s):
    for i in range(4):
        for j in range(4):
            s[i][j] = s_box[s[i][j]]


def inv_sub_bytes(s):
    for i in range(4):
        for j in range(4):
            s[i][j] = inv_s_box[s[i][j]]


def shift_rows(s):
    s[0][1], s[1][1], s[2][1], s[3][1] = s[1][1], s[2][1], s[3][1], s[0][1]
    s[0][2], s[1][2], s[2][2], s[3][2] = s[2][2], s[3][2], s[0][2], s[1][2]
    s[0][3], s[1][3], s[2][3], s[3][3] = s[3][3], s[0][3], s[1][3], s[2][3]


def inv_shift_rows(s):
    s[0][1], s[1][1], s[2][1], s[3][1] = s[3][1], s[0][1], s[1][1], s[2][1]
    s[0][2], s[1][2], s[2][2], s[3][2] = s[2][2], s[3][2], s[0][2], s[1][2]
    s[0][3], s[1][3], s[2][3], s[3][3] = s[1][3], s[2][3], s[3][3], s[0][3]


def add_round_key(s, k):
    for i in range(4):
        for j in range(4):
            s[i][j] ^= k[i][j]


# learned from https://web.archive.org/web/20100626212235/http://cs.ucsb.edu/~koc/cs178/projects/JT/aes.c
def xtime(a): return (((a << 1) ^ 0x1B) & 0xFF) if (a & 0x80) else (a << 1)


def mix_single_column(a):
    # see Sec 4.1.2 in The Design of Rijndael
    t = a[0] ^ a[1] ^ a[2] ^ a[3]
    u = a[0]
    a[0] ^= t ^ xtime(a[0] ^ a[1]) 
    a[1] ^= t ^ xtime(a[1] ^ a[2])
    a[2] ^= t ^ xtime(a[2] ^ a[3])
    a[3] ^= t ^ xtime(a[3] ^ u)


def mix_columns(s):
    for i in range(4):
        mix_single_column(s[i])


def inv_mix_columns(s):
    # see Sec 4.1.3 in The Design of Rijndael
    for i in range(4):
        u = xtime(xtime(s[i][0] ^ s[i][2]))
        v = xtime(xtime(s[i][1] ^ s[i][3]))
        s[i][0] ^= u
        s[i][1] ^= v
        s[i][2] ^= u
        s[i][3] ^= v

    mix_columns(s)


r_con = (
    0x00, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40,
    0x80, 0x1B, 0x36, 0x6C, 0xD8, 0xAB, 0x4D, 0x9A,
    0x2F, 0x5E, 0xBC, 0x63, 0xC6, 0x97, 0x35, 0x6A,
    0xD4, 0xB3, 0x7D, 0xFA, 0xEF, 0xC5, 0x91, 0x39,
)


def bytes2matrix(text):
    """ Converts a 16-byte array into a 4x4 matrix.  """
    return [list(text[i:i+4]) for i in range(0, len(text), 4)]


def matrix2bytes(matrix):
    """ Converts a 4x4 matrix into a 16-byte array.  """
    return bytes(sum(matrix, []))


def xor_bytes(a, b):
    """ Returns a new byte array with the elements xor'ed. """
    return bytes(i ^ j for i, j in zip(a, b))


def inc_bytes(a):
    """ Returns a new byte array with the value increment by 1 """
    out = list(a)
    for i in reversed(range(len(out))):
        if out[i] == 0xFF:
            out[i] = 0
        else:
            out[i] += 1
            break
    return bytes(out)


def pad(plaintext):
    """
    Pads the given plaintext with PKCS#7 padding to a multiple of 16 bytes.
    Note that if the plaintext size is a multiple of 16,
    a whole block will be added.
    """
    padding_len = 16 - (len(plaintext) % 16)
    padding = bytes([padding_len] * padding_len)
    return plaintext + padding


def unpad(plaintext):
    """
    Removes a PKCS#7 padding, returning the unpadded text and ensuring the
    padding was correct.
    """
    padding_len = plaintext[-1]
    assert padding_len > 0
    message, padding = plaintext[:-padding_len], plaintext[-padding_len:]
    assert all(p == padding_len for p in padding)
    return message


def split_blocks(message, block_size=16, require_padding=True):
    assert len(message) % block_size == 0 or not require_padding
    return [message[i:i+16] for i in range(0, len(message), block_size)]


def expand_key(master_key, n_rounds):
    """
    Expands and returns a list of key matrices for the given master_key.
    """
    # Initialize round keys with raw key material.
    key_columns = bytes2matrix(master_key)
    iteration_size = len(master_key) // 4

    i = 1
    while len(key_columns) < (n_rounds + 1) * 4:
        # Copy previous word.
        word = list(key_columns[-1])

        # Perform schedule_core once every "row".
        if len(key_columns) % iteration_size == 0:
            # Circular shift.
            word.append(word.pop(0))
            # Map to S-BOX.
            word = [s_box[b] for b in word]
            # XOR with first byte of R-CON, since the others bytes of R-CON are 0.
            word[0] ^= r_con[i]
            i += 1
        elif len(master_key) == 32 and len(key_columns) % iteration_size == 4:
            # Run word through S-box in the fourth iteration when using a
            # 256-bit key.
            word = [s_box[b] for b in word]

        # XOR with equivalent word from previous iteration.
        word = xor_bytes(word, key_columns[-iteration_size])
        key_columns.append(word)

    # Group key words in 4x4 byte matrices.
    return [key_columns[4*i: 4*(i+1)] for i in range(len(key_columns) // 4)]

The second source file is chall.py. It's the main challenge where the encryption of the flag happens.

from os import urandom
from hashlib import sha512
from Crypto.Util.strxor import strxor

from aes import *

FLAG = open('flag.txt', 'rb').read()
master_key = urandom(16)
flag_key = sha512(master_key).digest()
enc_flag = strxor(FLAG, flag_key[:len(FLAG)])

plaintext = urandom(16)
plain_state = bytes2matrix(plaintext)
key_matrices = expand_key(master_key, 2)

add_round_key(plain_state, key_matrices[0])
sub_bytes(plain_state)
shift_rows(plain_state)
mix_columns(plain_state)
add_round_key(plain_state, key_matrices[1])
sub_bytes(plain_state)
shift_rows(plain_state)
mix_columns(plain_state)
add_round_key(plain_state, key_matrices[2])

ciphertext = matrix2bytes(plain_state)

print(f"Protected flag: {enc_flag.hex()}")
print(f"Plaintext: {plaintext.hex()}")
print(f"Ciphertext: {ciphertext.hex()}")

master_key = [
    0, master_key[1], 0, 0,
    master_key[4], 0, master_key[6], 0,
    0, 0, 0, master_key[11],
    master_key[12], 0, master_key[14], 0,
]

print(f"Hint: {master_key}")
print(f"Hint: {key_matrices[1][2][0]}")
print(f"Hint: {key_matrices[1][3][1]}")

(A Little) Story Time

As a non-cryptography guy, it took me an entire 2-3 days to get...nowhere. I tried some approaches:

Carefully and diligently looked through the expand_key algorithm to try and deduce the other missing round keys and the master key. Only managed to recover around 5 bytes.
Blindly brute-forced some key values, then realized that even 0xFF^4 takes forever.
From the blind brute-force, tried reversing the encryption process from the known plaintext and ciphertext. Needs bruteforcing.
Asked ChatGPT, Claude Sonnet, and DeepSeek R1 — none got the approach right.

Where The Paper Comes In

After some Googling and some hints of which paper I should look for, I found this paper written by C. Bouillaguet, et. al. with the title "Low-Data Complexity Attacks on AES" (2012). The paper is publicly available .

The Attack

There are two phases of the attack. For the most part, I will refer to this figure to know the attacking steps in order.

Notations

Some notations I use will be different from the papers:

$k_{r,i,j}$ = This will refer to the round key of round $r$ , row $i$ , column $j$ for $r = 0,1,2$ and $0 \leq i, j \leq 3$ , $i,j \in \mathbb{Z}$ .

Phase 1

k[0][0][1]
k[0][1][0]
k[0][1][2]
k[0][2][3]
k[0][3][0]
k[0][3][2]
k[1][2][0]
k[1][3][1]
k[1][3][3]

Luckily, if you noticed, 8 out of 9 key bytes are already given to us on the hints:

print(f"Hint: {master_key}")
print(f"Hint: {key_matrices[1][2][0]}")
print(f"Hint: {key_matrices[1][3][1]}")

# Output
# Hint: [0, 131, 0, 0, 93, 0, 142, 0, 0, 0, 0, 187, 129, 0, 42, 0]
# Hint: 23
# Hint: 16

k[0][0][1] = master_key[1] = 131
k[0][1][0] = master_key[4] = 93
k[0][1][2] = master_key[6] = 142
k[0][2][3] = master_key[11] = 187
k[0][3][0] = master_key[12] = 129
k[0][3][2] = master_key[14] = 42
k[1][2][0] = 23
k[1][3][1] = 16
k[1][3][3]

Steps 1 - 4 (Spoiler Alert: It's Not Really Necessary)

Step 7 uses Observation 3(3), step 8 uses Observation 3(2), steps 5, 6, and 9 use the key schedule, and the remaining steps are computed using the AES algorithm.

You can still do it for optimization purposes, because with steps 1 to 4 we can get new known values for each step in the AES encryption suboperations. But in my approach, I didn't do it.

Steps 5 - 6

Take a look at the figure again, and the quote from Section 5.2.

... steps 5, 6, and 9 use the key schedule, and the ...

def expand_key(master_key, n_rounds):
    """
    Expands and returns a list of key matrices for the given master_key.
    """
    # Initialize round keys with raw key material.
    key_columns = bytes2matrix(master_key)
    iteration_size = len(master_key) // 4

    i = 1
    while len(key_columns) < (n_rounds + 1) * 4:
        # Copy previous word.
        word = list(key_columns[-1])

        # Perform schedule_core once every "row".
        if len(key_columns) % iteration_size == 0:
            # Circular shift.
            word.append(word.pop(0))
            # Map to S-BOX.
            word = [s_box[b] for b in word]
            # XOR with first byte of R-CON, since the others bytes of R-CON are 0.
            word[0] ^= r_con[i]
            i += 1
        elif len(master_key) == 32 and len(key_columns) % iteration_size == 4:
            # Run word through S-box in the fourth iteration when using a
            # 256-bit key.
            word = [s_box[b] for b in word]

        # XOR with equivalent word from previous iteration.
        word = xor_bytes(word, key_columns[-iteration_size])
        key_columns.append(word)

    # Group key words in 4x4 byte matrices.
    return [key_columns[4*i: 4*(i+1)] for i in range(len(key_columns) // 4)]

There is an else if statement, but this will never get executed in our challenge because our key is only 16 bytes (128-bit).

From the breakdown, we can carefully create equations that determines the value of every byte of the round keys:

// key_matrices[r][i][j] will be called k[r][i][j] for ease

if i == 0 and j == 0:
    k[r][i][j] = (s_box[k[r-1][3][(j+1)%4]] ^ r_con[r]) ^ k[r-1][i][j]
if i == 0 and j != 0:
    k[r][i][j] = s_box[k[r-1][3][(j+1)%4]] ^ k[r-1][i][j]
else:
    k[r][i][j] = k[r][i-1][j] ^ k[r-1][i][j]

This will be our "equation template" to deduce the round key values.

key_matrices[1][3][0] = key_matrices[1][2][0] ^ key_matrices[0][3][0]
key_matrices[1][0][1] = s_box[key_matrices[0][3][2]] ^ key_matrices[0][0][1]

All the values on the right hand side are known, so we can directly deduce the values from steps 5 and 6.

key_matrices[1][3][0] = 23 ^ 129 = 150
key_matrices[1][0][1] = s_box[key_matrices[0][3][2]] ^ key_matrices[0][0][1] = s_box[42] ^ 131 = 229 ^ 131 = 102

Step 7

Step 7 uses Observation 3(3), ...

To understand what Observation 3(3) is, let me quote it from the paper using what I understand and modified using notations:

Consider a series of consecutive round keys $k_r, k_{r+1}, ...,$ and denote $k_r = (a, b, c, d)$ and $v_r = RotBytes(SubBytes(k_{r,i,3})) \oplus R\_CON[r+1]$ . Then, the round keys $k_{r+1}, k_{r+2}, ...$ can be represented as linear combinations of $(a, b, c, d)$ (the columns of $k_{r}$ ) and the 32-bit words $v_r,v_{r+1}, ...,$ as shown in the [omitted] table.
As a result, we have the following useful relations between round keys (for i = 0, 1, 2, 3):

The rest of the article is Work In Progress. I accidentally clicked merge changes before it even finished 🤦

References

C. Bouillaguet, P. Derbez, O. Dunkelman, P. -A. Fouque, N. Keller and V. Rijmen, "Low-Data Complexity Attacks on AES," in IEEE Transactions on Information Theory, vol. 58, no. 11, pp. 7002-7017, Nov. 2012, doi: 10.1109/TIT.2012.2207880