Protostar Heap 3 Walkthrough

Having previously tackled the last stack challenge in Protostar from Exploit Exercises (https://medium.com/@airman604/protostar-stack7-walkthrough-2aa2428be3e0), we’re now switching to the last heap challenge (https://exploit-exercises.com/protostar/heap3/).

Heap 3 Challenge

Here’s the source code for the Protostar Heap 3 challenge:

The goal is to call the function. As is used, it is possible to overflow all three of the dynamically allocated buffers, so let’s see what we can do with that.

Heap

Heap is used in C programs for dynamic memory management. provides an easy to use interface (/ ) to allocate and de-allocate memory regions. The version of in Protostar uses an implementation of malloc called dlmalloc, after the name of it’s original creator, Doug Lea. The control structures used by malloc are stored in-band with the data, which allows manipulation of the control structures when heap buffer overflow is possible, as above.

allocates memory in chunks that have the following structure:

stores the size of the previous chunk, stores the size of the current chunk. Chunk size will always be a multiple of 8 bytes for alignment, which means that the 3 lowest bits of the size will always be 0. uses these three bits, most notably the least significant bit will indicate whether the previous chunk is in use or free. So, if we see 0x29 in the size field, the size of the current chunk is 40 (hex 0x28), and the previous chunk is in use. When is called, it initializes and and returns the address of the memory right after ( in the picture above). Fields depicted as and are ignored for used chunks and the memory is used for the program data.

When a chunk is free (i.e. after was called to de-allocate the chunk) it is stored in a double-linked list structure, and the field contains the address of the next free chunk (forward pointer), and field contains the address of the previous free chunk (backward pointer). Thus these pointers overwrite the beginning of the data in an unused chunk.

This is how is defined in (simplified):

Now let’s run the program in debugger and see how it all works:

We load the program into GDB and set breakpoints after all of the calls, after all of the calls, and after all of the calls. Let’s run it:

After malloc we inspect program’s memory layout to find the location of the heap () and configure GDB to print current CPU instruction and 40 32-bit values from the top of the heap every time a breakpoint is reached.

The highlighted portion of the heap is the first allocated chunk. We can see that is 0, is 0x29 (40 bytes, and the least significant bit is set to 1 to indicate that the previous chunk is in use), and then 32 bytes of the allocated memory.

Let’s continue:

We see that copied the data into the allocated memory (‘A’ is ASCII 0x41, ‘B’ is 0x42, ‘C’ is 0x43).

And now we see something unexpected. Firstly, is still 0 in all the chunks, but it should contain the size of the previous block. Secondly, while the correctly points to the next free block ( for the first chunk, which is the address of the second chunk), doesn’t seem to be set. Also, the least significant bit of the field have not been set to indicate the previous chunk is free. What is going on?

Enter Fastbins

The reason this is not working the way we expected is due to the fact that the allocated buffers are small. When a chunk is smaller than 64 bytes (by default), will use a simplified data structure (fastbin) and will ignore , , and the “previous chunk in use” bit.

So why did we talk about all these fields if all the chunks are small? For our exploit code we will need the chunks to be treated as regular chunks by , not as fastbin chunks.

Free

When is called on a chunk, if there are free chunks adjacent to the chunk being freed (i.e. right before or right after), will consolidate them into a larger free chunk. Free chunks are stored in a double-linked list (ignoring fastbin chunks for now), and when doing the consolidation will remove the adjacent free chunk that is being consolidated from the list as it will become a part of a new, larger, chunk. Here’s what happens conceptually (there’s much more code and edge cases, like checking if the current chunk is the last one in the heap):

The unlink part is done through macro, here’s a simplified version:

This is called with the chunk to be unlinked as the first argument , and temporary variables to store pointers to previous and next free chunks as arguments and . When a chunk is unlinked it makes the next free chunk and the previous free chunk point at each other. Here’s a visual:

So basically writes the value of to the memory at address and the value of to the memory at address . The changed memory is highlighted in blue. If we can control the values of and we can overwrite arbitrary memory, with the restriction being that both and have to be writeable.

Global Offset Table

Have a look at the disassembly of again. The source code calls at the end of , but as part of optimizations compiler determines that the string is constant and replaces it with a call to at address . Let’s see what it does:

is not called directly, but rather through PLT (Procedure Linkage Table) that jumps to the address contained in GOT (Global Offset Table). This is part of dynamic library linking, here’s a video explanation from LiveOverflow: https://www.youtube.com/watch?v=kUk5pw4w0h4

We can use the Global Offset Table to redirect the execution flow if we overwrite the address at .

Our plan is clear now. We’ll store shellcode that will call somewhere on the heap, we will then force the chunk consolidation and the call to on a specially crafted chunk. The chunk will contain in field and the address of the shellcode in the field. We cannot write the address of to the as that part of memory is not writeable and will also be updated as part of .

Negative Chunk Size

We need one more little trick explained in the Phrack article linked below — what if we have -4 () as the chunk size?

  • When determining whether to use fastbin, malloc is casting the chunk size to unsigned int, so -4 is bigger than 64.
  • The least significant bit of is not set, which indicates the previous adjacent chunk is free and will be called for it!
  • The address of the previous adjacent chunk will be calculated by subtracting -4 (i.e. adding 4) from the current chunk’s beginning.
  • The address of the next adjacent chunk will be calculated by adding -4 (i.e. subtracting 4) from the current chunk’s beginning. Its will also be -4.
  • The value right before the start of the current chunk will be used to determine whether the next adjacent chunk is free. This should be set to an odd number to avoid memory corruption (otherwise will be called for the next adjacent chunk also as part of free chunk consolidation).

If we can get called on this specially crafted chunk, it will cause the value of to be replaced with the address of our shellcode. Note, the shellcode should either be super short (8 bytes or less), or jump 12 bytes ahead as the memory at “addr of shellcode”+8 will be overwritten by .

Using negative size also takes care of the NULL/0x00 bytes, which we cannot put arbitrarily into our buffers as they will be treated as string terminators.

Shellcode

We just need to call :

This assembler snippet should do it:

I used http://shell-storm.org/online/Online-Assembler-and-Disassembler/ to turn these instructions into , which fits into the 8 bytes we have available.

The Exploit

We’ll use the third buffer for our crafted chunk. We’ll store the shellcode in the middle of the second buffer, and also will use it to overwrite and of the last block with .

The exploit saves these buffers in files as , so heap3 should be called as:

It works! Let’s check it in GDB. We’ll set breakpoints at the first call to and at the call to :

I’ve highlighted the shellcode at address and our fake chunk at address . Let’s continue:

We check that we have correctly updated the address in GOT to point to the shellcode. I’ve also highlighted the address after the shellcode that was overwritten by .

Final Thought

It took me a while to figure out how heap exploits work, it seemed like dark magic in the beginning. Also, fastbin stuff was very confusing. I hope this walkthrough helped.

References

Phrack issue #57, Once upon a free() by anonymous — http://phrack.org/issues/57/9.html

LiveOverflow Binary Hacking course — https://www.youtube.com/watch?v=iyAyN3GFM7A&list=PLhixgUqwRTjxglIswKp9mpkfPNfHkzyeN

Protostar Heap 3 Walkthrough from Joshua Wang: https://conceptofproof.wordpress.com/2013/11/19/protostar-heap3-walkthrough/

Random rumblings about #InfoSec. The opinions expressed here are my own and not necessarily those of my employer.