Buffer Overflows
================

lab1 is out, first part due this friday, other parts due next friday
sent email to class mailing list about lab1, sign up if you didn't get it

today's lecture: control hijacking attacks (lab1)
    paper: buffer overflows

quick overview of what's going on
    short code snippet with a buffer overflow: web server

	read_request() {
	    char buf[256];
	    int i = 0;

	    for (;;) {
		buf[i] = read_input();
		if (buf[i] == '\n') break;
		i++;
	    }
	}

    what the compiler generates
	stack diagram, %ebp, %esp [points to last thing on the stack]
	stack grows down

			+----------------+
	entry ebp ----> | ..prev frame.. |
			|      ....      |
			+----------------+
	entry esp ----> | return address |
			+----------------+
	new ebp ------> |   saved ebp    |
			+----------------+
			|    buf[255]    |
			|      ...       |
			|     buf[0]     |
			+----------------+
	new esp ------>	|       i	 |
			+----------------+

	push	%ebp
	mov	%esp -> %ebp
	sub	296, %esp	# size of stack vars + some other junk
	...
	mov	%ebp -> %esp
	pop	%ebp
	ret

    what's our threat model here?  what are we worried about?
	assumption: attacker controls input
	policy: unknown, but if this program is privileged and enforcing
		some policy, attacker may be able to subvert it
	attackers send spam, steal data, attack other machines

    how does the attacker take advantage of this?
	supply long input, overwrite data on stack past buffer, change ret addr
	set return address to be &buf[0]

    how do we guess the address of where the code is?  or where our buffer is?
	what happens when the same application is run on different machines?
	how much do you have to know about the machine you're attacking?
	one machine might have twice as much memory as another
	does this change memory addresses?
	    no, virtual memory helps us
	    addresses depend largely on software versions

    once the attacker is running injected code, what can they do?
	can use all privileges of the process
	    OS protection will not help if server is running as root (often)
	    even if not root, can access the server's interesting files
	    can attack other machines behind a firewall (if there was one)

    why would you write such bad code?
	well, even if you don't, libc has plenty of unsafe functions
	    strcpy, gets, scanf
	even the safe versions aren't always safe
	    strncpy leaves the buffer without null-termination

two things going on with buffer overflows:
    1. gaining control over execution (program counter)
    2. injecting code into the process

    what are the difficulties to doing each of these?
    1. requires overwriting some code pointer
	return address is common (on stack), others possible too
	normally shouldn't happen!
    2. often easier:
	process already has lots of code inside of it
	process accepts inputs that attacker can supply
	main challenges:
	    finding a predictable address of this code
	    if injecting, ensuring code has no nulls/newlines/etc

protection mechanisms that the paper talks about?
    avoid bugs, auditing
	ensuring the lack of bugs is hard
	finding these bugs can be easy:
	    supply large inputs
	    watch for a program to crash
	    look at what the large input corrupted, see if you can exploit it
	simple approach finds simple bugs, but not tricky corner cases
	    tools called "fuzzers" do this mechanically
	    we'll look at one of these systems in a later lecture

    non-executable buffers/stack
	[ paper doesn't mention, but can make all writable memory non-exec ]
	works for many programs (i.e. non-executable heap, statics, etc)
	doesn't work for specialized apps: JITs/runtimes

	"arc injection" or "return-to-libc" attacks
	    (not necessarily return and not necessarily libc..)
	gaining control is often enough because there's lots of code already
	in particular, standard functions you might want to run are there
	    system(), execl(), unlink(), ..
	mention "return-oriented programming"
	    first call strcpy with the right args, then call system, ..

    bounds checking
	type-safe languages
	    doesn't solve problem for legacy code, or runtime implementations
	why doesn't C have bounds checking?
	    performance
	    convenience
	    few people worried about attackers when the language was designed
	Compaq C compiler
	    helpful but not enough
	    cannot do bounds-checking across function calls
	modify pointer representation
	    prevents buffer overflows but incompatible with lots of code
	keep shadow data structures (Jones&Kelly)
	    keeps track of allocated objects in memory
	    for each pointer expression (in code compiled with their compiler):
		compute the base address, according to some rules
		compute the pointer expression value
		check that the pointer falls inside the object for base address
		if the pointer is out-of-range, flag an error
	    can catch bugs across functions
		as long as both the alloc site and overflow site are recompiled
	    slowdown for pointer-intensive code (30x for matrix multiply)
	    what cases does this work for?

		    struct {
			char buf[256];
			void (*f)(void);
		    } s;

		    char *ptr = s.buf;
		    for (...)
			write_byte(ptr);   [ writes a byte to *ptr ]
			ptr++;

		works even if write_byte() is not instrumented:
		    ptr computation aborts
		what about "s.f();" afterwards?
		    cannot prevent overflows within an allocation (e.g. struct)
		    will invoke attacker-supplied s.fptr code pointer
		would reordering f and buf help?
		    not if there was an array of many struct's (one alloc)
	whole-stack-frame bounds checking (libsafe, Snarskii, etc):
	    try to prevent buffer overflows in functions like strcpy, gets, ..
	    find if target pointer is in some stack frame
	    try to deduce the size of that stack frame
	    if we seem to be overflowing it, abort

    code pointer integrity checking
	observation: expensive to prevent buffer overflows at all times
	    however, what's really bad is subsequent use of overflowed data
	    idea: OK to overflow, as long as we don't use resulting ptrs
	stackguard
	    place a canary on the stack when entering, check before return
		[ where does the canary go on the stack diagram? ]
	    making the canary hard to forge:
		terminator canary (null, cr, lf, -1)
		    why does this work?
		    many C functions assume these characters are special
		    might not allow overflowing a buffer past them
		random canary
	    will stackguard solve every possible buffer overflow problem?
		paper is very aggressive about claiming it's almost perfect
		what about corrupting other pointers?  (doesn't help!)
		    function pointers
		    c++ vtable
		    data pointers (can use later for arbitrary mem writes?)

			char *ptr;
			char buf[256];

			strcpy(buf, .. input ..);
			*ptr = 5;

	    can you still get around stackguard to hijack control on return?
		need to guess the canary
		might be able to obtain it?
		    perhaps remove null termination from a buffer
		    the application sends you an error about the buffer
		    then it might send you back the canary, and you can use it
		bypass without knowing the canary?
		    somewhere the authentic canary needs to be stored
		    if you can do arbitrary mem writes, you can corrupt it
		    then canary check will succeed
	pointguard
	    "canary" every pointer, not just return addresses
	    problems: space for canary, inserting code at pointer-use time
	    solving space problem: "encrypt" (XOR with secret value)
		hard for attacker to control decrypted value, likely crash

protection mechanisms the paper doesn't mention?
    make it difficult to inject code: ASLR, stack randomization
	weakness: leak/guess addresses
	    function pointers not usually thought of as secret before
		lots of code doesn't treat them as such
		prints them in stack traces, logs, etc
		linux used to expose address space of a process!
	    32-bit machines: only a few bits of randomness
		pages are 4K each, which takes up 12 bits of addr
		in theory could get up to 20 bits of randomness
		in practice, fewer (8-16 bits)
	    more effective on a 64-bit machine
	fill address space with shell code
	    many nop's followed by shell code
	    random jump has some reasonable chance of running your code
	doesn't help with logic overflows

	randomization has been used to defend against code injection elsewhere
	    syscall # randomization
	    SQL injection: SQL language randomization
	    instruction set randomization

    prevent execution dependence on injected data: taint tracking
	works OK to a point; sort-of like code pointer integrity checking?

	hard to determine what it means to depend on injected data
	certainly want to execute different code based on inputs
	what about using injected data as an offset?
	    looks the same as using a corrupted vtable pointer at low level
	    add tainted data to untainted data and deref/call result

    control flow integrity
	statically analyze possible control flows
	dynamically enforce it
	will look at a similar paper later
	prevents some bugs but not others (arbitrary function ptr calls)

    structure your program so buffer overflows don't matter (priv sep)
	requires programmer effort to re-design application

what do you think of this paper in general?  my thoughts:
    a bit dated but still relevant
    starts out sounding like an overview but then sells stackguard?
    buffer overflows still a big problem, but not dominant
	the web "won": more SQL injection, cross-site scripting bugs now

what are other examples of similar problems?  and how do our techniques apply?
    [ didn't get to most of these ]

    double-free; heap overflows
	can overwrite heap data structures to cause arbitrary mem writes later
	heap maintains doubly-linked list of all elements
	    when an object is freed, it might do something like
		prev->next = next;
		next->prev = prev;
	    by controlling prev&next can cause two arbitrary mem writes!

    integer overflows
	suppose you carefully allocate an array of N elements, 4 bytes each
	attacker says there's 2^30+1 elements, you allocate 4 bytes!

    format string bugs
	%n writes current number of characters into supplied pointer
	%p/%d/%.. can leak sensitive randomized pointers
	with sprintf, can overflow buffers

    other examples of control hijacking: SQL injection, shell injection
	system("mail " + emailaddr)
	    emailaddr = "; rm -rf /"
	sql("UPDATE users SET password=" + pw + " WHERE user=" + user)
	    user is authenticated, can't be arbitrary
	    user supplies pw to be "x; "
	    terminates SQL query, updates everyone's password to x
	    can also inject other SQL queries afterwards