Buffer Overflows ================ lab1 is out, first part due this friday, other parts due next friday sent email to class mailing list about lab1, sign up if you didn't get it today's lecture: control hijacking attacks (lab1) paper: buffer overflows quick overview of what's going on short code snippet with a buffer overflow: web server read_request() { char buf[256]; int i = 0; for (;;) { buf[i] = read_input(); if (buf[i] == '\n') break; i++; } } what the compiler generates stack diagram, %ebp, %esp [points to last thing on the stack] stack grows down +----------------+ entry ebp ----> | ..prev frame.. | | .... | +----------------+ entry esp ----> | return address | +----------------+ new ebp ------> | saved ebp | +----------------+ | buf[255] | | ... | | buf[0] | +----------------+ new esp ------> | i | +----------------+ push %ebp mov %esp -> %ebp sub 296, %esp # size of stack vars + some other junk ... mov %ebp -> %esp pop %ebp ret what's our threat model here? what are we worried about? assumption: attacker controls input policy: unknown, but if this program is privileged and enforcing some policy, attacker may be able to subvert it attackers send spam, steal data, attack other machines how does the attacker take advantage of this? supply long input, overwrite data on stack past buffer, change ret addr set return address to be &buf[0] how do we guess the address of where the code is? or where our buffer is? what happens when the same application is run on different machines? how much do you have to know about the machine you're attacking? one machine might have twice as much memory as another does this change memory addresses? no, virtual memory helps us addresses depend largely on software versions once the attacker is running injected code, what can they do? can use all privileges of the process OS protection will not help if server is running as root (often) even if not root, can access the server's interesting files can attack other machines behind a firewall (if there was one) why would you write such bad code? well, even if you don't, libc has plenty of unsafe functions strcpy, gets, scanf even the safe versions aren't always safe strncpy leaves the buffer without null-termination two things going on with buffer overflows: 1. gaining control over execution (program counter) 2. injecting code into the process what are the difficulties to doing each of these? 1. requires overwriting some code pointer return address is common (on stack), others possible too normally shouldn't happen! 2. often easier: process already has lots of code inside of it process accepts inputs that attacker can supply main challenges: finding a predictable address of this code if injecting, ensuring code has no nulls/newlines/etc protection mechanisms that the paper talks about? avoid bugs, auditing ensuring the lack of bugs is hard finding these bugs can be easy: supply large inputs watch for a program to crash look at what the large input corrupted, see if you can exploit it simple approach finds simple bugs, but not tricky corner cases tools called "fuzzers" do this mechanically we'll look at one of these systems in a later lecture non-executable buffers/stack [ paper doesn't mention, but can make all writable memory non-exec ] works for many programs (i.e. non-executable heap, statics, etc) doesn't work for specialized apps: JITs/runtimes "arc injection" or "return-to-libc" attacks (not necessarily return and not necessarily libc..) gaining control is often enough because there's lots of code already in particular, standard functions you might want to run are there system(), execl(), unlink(), .. mention "return-oriented programming" first call strcpy with the right args, then call system, .. bounds checking type-safe languages doesn't solve problem for legacy code, or runtime implementations why doesn't C have bounds checking? performance convenience few people worried about attackers when the language was designed Compaq C compiler helpful but not enough cannot do bounds-checking across function calls modify pointer representation prevents buffer overflows but incompatible with lots of code keep shadow data structures (Jones&Kelly) keeps track of allocated objects in memory for each pointer expression (in code compiled with their compiler): compute the base address, according to some rules compute the pointer expression value check that the pointer falls inside the object for base address if the pointer is out-of-range, flag an error can catch bugs across functions as long as both the alloc site and overflow site are recompiled slowdown for pointer-intensive code (30x for matrix multiply) what cases does this work for? struct { char buf[256]; void (*f)(void); } s; char *ptr = s.buf; for (...) write_byte(ptr); [ writes a byte to *ptr ] ptr++; works even if write_byte() is not instrumented: ptr computation aborts what about "s.f();" afterwards? cannot prevent overflows within an allocation (e.g. struct) will invoke attacker-supplied s.fptr code pointer would reordering f and buf help? not if there was an array of many struct's (one alloc) whole-stack-frame bounds checking (libsafe, Snarskii, etc): try to prevent buffer overflows in functions like strcpy, gets, .. find if target pointer is in some stack frame try to deduce the size of that stack frame if we seem to be overflowing it, abort code pointer integrity checking observation: expensive to prevent buffer overflows at all times however, what's really bad is subsequent use of overflowed data idea: OK to overflow, as long as we don't use resulting ptrs stackguard place a canary on the stack when entering, check before return [ where does the canary go on the stack diagram? ] making the canary hard to forge: terminator canary (null, cr, lf, -1) why does this work? many C functions assume these characters are special might not allow overflowing a buffer past them random canary will stackguard solve every possible buffer overflow problem? paper is very aggressive about claiming it's almost perfect what about corrupting other pointers? (doesn't help!) function pointers c++ vtable data pointers (can use later for arbitrary mem writes?) char *ptr; char buf[256]; strcpy(buf, .. input ..); *ptr = 5; can you still get around stackguard to hijack control on return? need to guess the canary might be able to obtain it? perhaps remove null termination from a buffer the application sends you an error about the buffer then it might send you back the canary, and you can use it bypass without knowing the canary? somewhere the authentic canary needs to be stored if you can do arbitrary mem writes, you can corrupt it then canary check will succeed pointguard "canary" every pointer, not just return addresses problems: space for canary, inserting code at pointer-use time solving space problem: "encrypt" (XOR with secret value) hard for attacker to control decrypted value, likely crash protection mechanisms the paper doesn't mention? make it difficult to inject code: ASLR, stack randomization weakness: leak/guess addresses function pointers not usually thought of as secret before lots of code doesn't treat them as such prints them in stack traces, logs, etc linux used to expose address space of a process! 32-bit machines: only a few bits of randomness pages are 4K each, which takes up 12 bits of addr in theory could get up to 20 bits of randomness in practice, fewer (8-16 bits) more effective on a 64-bit machine fill address space with shell code many nop's followed by shell code random jump has some reasonable chance of running your code doesn't help with logic overflows randomization has been used to defend against code injection elsewhere syscall # randomization SQL injection: SQL language randomization instruction set randomization prevent execution dependence on injected data: taint tracking works OK to a point; sort-of like code pointer integrity checking? hard to determine what it means to depend on injected data certainly want to execute different code based on inputs what about using injected data as an offset? looks the same as using a corrupted vtable pointer at low level add tainted data to untainted data and deref/call result control flow integrity statically analyze possible control flows dynamically enforce it will look at a similar paper later prevents some bugs but not others (arbitrary function ptr calls) structure your program so buffer overflows don't matter (priv sep) requires programmer effort to re-design application what do you think of this paper in general? my thoughts: a bit dated but still relevant starts out sounding like an overview but then sells stackguard? buffer overflows still a big problem, but not dominant the web "won": more SQL injection, cross-site scripting bugs now what are other examples of similar problems? and how do our techniques apply? [ didn't get to most of these ] double-free; heap overflows can overwrite heap data structures to cause arbitrary mem writes later heap maintains doubly-linked list of all elements when an object is freed, it might do something like prev->next = next; next->prev = prev; by controlling prev&next can cause two arbitrary mem writes! integer overflows suppose you carefully allocate an array of N elements, 4 bytes each attacker says there's 2^30+1 elements, you allocate 4 bytes! format string bugs %n writes current number of characters into supplied pointer %p/%d/%.. can leak sensitive randomized pointers with sprintf, can overflow buffers other examples of control hijacking: SQL injection, shell injection system("mail " + emailaddr) emailaddr = "; rm -rf /" sql("UPDATE users SET password=" + pw + " WHERE user=" + user) user is authenticated, can't be arbitrary user supplies pw to be "x; " terminates SQL query, updates everyone's password to x can also inject other SQL queries afterwards