6.858 2015 Lecture 2: Fixing Buffer Overflows Topic: defenses against buffer overflows. Overflows are a popular attack path, worth understanding. Example of evolution of defense/attack over time. It has been very worthwhile to raise the bar, even though defenses are still not perfect. Summary of the situation: * Basic problem: buggy C code that writes beyond end of buffer/array. * The ideal solution is to use a language that enforces bounds, e.g. Python or Java. It's a huge effort to re-train programmers and re-write s/w, but not impossible (e.g. Microsoft and C#). * But C is used for many valuable applications and libraries, so we cannot abandon it, and often can't avoid writing new C code. The C definition makes it hard to impossible to precisely check bounds automatically. * The perfect programmer would check bounds 100% of the time, but it turns out no programmers are perfect. * So we need defenses that make make buffer overflows harder to exploit, for big buggy C programs that we don't understand! Let's start with "classic" stack buffer overflows. Attacks have two parts: 1) write some instructions into buffer on stack. 2) overwrite return PC to point to attacker's instructions. Attacker's instructions can do anything the app can do. Thus it's a powerful attack. Defense idea: O/S tells the hardware not to execute on the stack. E.g. O/S sets Intel's NX (No eXecute) bit on each stack page. This prevents execution of injected instructions on the stack. Q: does NX mean we don't have to worry about buffer overflows? Q: is NX a waste of time if it isn't perfect? Defense idea: stack canaries (e.g., StackGuard, gcc's Stack Smashing Protector) Detects modification of return PC on stack *before* it is used by RET. Compiler generates code that pushes a "canary" value on stack at function entry, pops and checks value before return. Canary sits between variables and return address, e.g.: | | +------------------+ entry %esp ----> | return address | ^ +------------------+ | new %ebp ------> | saved %ebp | | +------------------+ | | CANARY | | Overflow goes +------------------+ | this way. | buf[127] | | | ... | | | buf[0] | | +------------------+ | | Q: what value should we use for the canary? A: perhaps a random number, chosen at program start, stored somewhere. Are we done yet? What kinds of attacks might work despite stack canaries? * maybe attacker can write or read the secret random canary value somehow. * overwrite function pointer on stack before canary. * overwrite some other crucial variable on the stack, e.g. bool ok = ...; char buf[128]; gets(buf); if(ok){ ... } * overflow of one global variable into the next (much like on stack). * overflow of heap-allocated buffer. Are overflows of heap-allocated buffers exploitable? Important b/c modern code tends to use heap a lot. foo(){ char *p = malloc(16); gets(p); } Can attacker predict what is after p in memory? [diagram of free/used malloc blocks] It turns out there are some pretty powerful heap attacks! Here's a simplified version of a real attack. Some malloc()s lay out blocks of free/used memory like this: data next prev ---- data next prev ---- data next prev Malloc keeps free blocks on a doubly-linked list. In address order, so it can merge small adjacent blocks into large one. When a free block is allocated, here's part of what malloc() does: b = choose a free block b->next->prev = b->prev; b->prev->next = b->next; If the attacker overflows a malloc()ed block, the attacker can modify the next and prev pointers in the next block. So: suppose attacker writes x y to start of next block. Call next block b, then b->prev = x, b->next = y. Suppose b free and happens to be chosen by next malloc(): malloc() will effectively execute *y = x Thus writing an attacker-chosen value to any memory location! If attacker can guess address of saved return PC, and can guess address of the buffer being overflowed, can load instructions into the buffer and cause PC to point to injected instructions. Similarly for *any* function pointer with predictable address. Q: how could the attacker predict such addresses? Real attacks have to be more complex; search the web for Smashing the Heap for Fun and Profit, or Vudo Malloc Tricks. Note the pieces the attacker must assemble: * Find a buffer overflow bug in the application or library. * Find a way to get the program to execute the buggy code in a way that causes attacker's bytes to overflow the buffer. * Understand malloc() implementation. * Find a code pointer and guess its address. * Guess the address of the buffer, i.e. attacker's injected instructions. These attacks require attacker effort and skill. Attacker must understand corner cases in app logic, malloc(), compiler output. Bad news for defender: few application programmers think about corner cases. Good news for defender: multiple fragile pieces in attacker's puzzle. A high-level point: if there's a buffer overflow bug, a clever enough attacker can probably exploit it. More generally, many bugs that look harmless can be turned to an attacker's advantage, perhaps in combination with other flaws. Luckily, as the paper shows, one can retrofit automatic array bounds checking to existing C programs! If we're willing to modify the compiler, recompile applications, and perhaps modify the applications. Bounds checking approach #1: Fat pointers Straightforward though not very practical. Each pointer contains start and end of original memory object, as well as current pointer value: +--------------+------------+-------------+ | 32-bit start | 32-bit end | 32-bit curr | +--------------+------------+-------------+ Modify compiler to generate code that: * Sets p.start and p.end in malloc() and p = &x. * During dereference, check start <= curr < end, panic if outside. * During pointer assignment, copy entire fat pointer. Thus: p = malloc(4); // malloc fills in start and end. *p = 1; // start <= curr < end, so check succeeds. q = p; // copy fat ptr to q. q = q + 8; // add 8 to q.curr. *q = 1; // check fails, since curr >= end. But: q = q - 6; *q = 1; // check succeeds. Start/end arrangement remembers what the original object was. Lots of C code requires this, e.g. pointer to one past end. Problem #1: Checks may be expensive. Problem #2: Not very compatible. You can't pass a fat pointer to an unmodified library. Changes size of data structures. Updates to fat pointers are not atomic, because they span multiple words. Applications with threads and async interrupts/signals may fail due to these non-atomic updates. Bounds checking approach #2: Keep bounds information in a separate table. So that we don't have to change pointer representation. Baggy Bounds (and others) do this. -Basic idea: For each allocated object, store start and size of object. malloc() or compiler-generated code maintains the table. e.g. table[p] = ??? -Set an OOB (Out Of Bounds) bit in the pointer if it leaves its original object. -For each pointer, compiler generates special code for two kinds of operations (compiler "interposes" on them): 1. pointer arithmetic: char *q = p + 256; must detect if pointer no longer points into original object, and set OOB bit. compare q to table[p]. 2. pointer dereferencing: char ch = *q; must panic if q's OOB bit is set. Challenge: need to index table[] with both p and e.g. p + 10, i.e. if p points to middle of object rather than start. Challenge: if arithmetic modifies p to point to a different object, we have to know to NOT look at table[p] on future uses of p. can use OOB bit in pointer for this. Challenge: if arithmetic modifies p to return back to its original object, we must detect that and clear OOB, since pointer is now valid again. in particular, how to know what original object was? Challenge: prevent table from being huge, or expensive to access. Baggy Bounds uses five tricks: 1) Each table entry covers slot_size=16 bytes, so table isn't huge. Thus a 64-byte object at address x uses 4 table entries starting at table[x/16]. 2) table entry holds just size, as log base 2, to fit in a byte, so table isn't huge. 2^table[i] yields size, so e.g. 20 means 1 megabyte. 3) Allocate only power-of-two sizes on power-of-two boundaries. Then compiled code can compute start of an object from pointer and table[] entry: just clear low log2(size) bits. 4) Use virtual memory system to prevent out-of-bound derefs: set most significant bit in an OOB pointer, and then mark pages in the upper half of the address space as inaccessible. So, compiler doesn't have to insert checking instructions for dereferences! 5) OOB mark means "ptr is within slot_size/2 of original object". Thus arithmetic code can reconstruct what the original object was, and clear OOB if ptr moves back to original object. Panic if arithmetic tries to move OOB more than slot_size/2 away. Given object p, here's how to find size and start of the object p points into, for slot_size=16: size = 1 << table[p >> 4]; // 4 is the number of bits in 16. start = p & ~(size - 1); // clears the low log2(size) bits. Example: p = malloc(25); // rounds up to 32, sets two table[] entries. q = p + 10; // arithmetic; reads table[], sees OK. r = p + 35; // arithmetic; not OK; sets OOB. *r = 1; // VM system forces crash. s = p + 35; // sets OOB. s = s - 20; // arithmetic clears OOB. *s = 1; // OK t = p + 100; // arithmetic forces crash: not withing slot_size/2. What's the answer to the homework problem? char *p = malloc(256); char *q = p + 256; char ch = *q; //Does this raise an exception? So: Baggy Bounds can be applied to big existing buggy C programs, and automatically stops the program if an attacker tries to exploit a buffer overflow. This is a big win! Can Baggy Bounds ever panic when the code is actually legal? C code that casts pointers to integers, and then compares them, may break due to the OOB bit. Many working C programs store information in the high bits of pointers, which will conflict with the OOB bit. What overflow bugs might Baggy Bounds not catch (Section 7)? Overflow of array inside a larger malloc()'d struct. Cast pointer to int, modify, cast back to pointer. Application might implement its own allocator. Dangling pointers to re-allocated memory. Baggy Bounds, and overflows in general, illustrate a general pattern: Defenders habitually make a specific kind of error. Attackers find a way to exploit that error. Defenders build a tool to spot/fix the error. =============================== Some bugs in the baggy bounds paper: -Figure 3, explicit bounds check should generate the size like this: size = 1 << table[p >> log_of_slot_size] -Figure 3, optimized bounds check should be (p^p') >> table[p >> log_of_slot_size] == 0 -Figures 5 and 18, pointer arithmetic code should be char *p = &buf[i]; or char *p = buf + i;