Start this lecture with actual buffer overrun attack like in 6.033?

Outline for the next 4 OS lectures:
  OS is TCB, so important it right
  Need to understand attacks against OS. 
  Most OS (kernel, libraries, system services)
     Widely-used OSes written in C
  Possible attacks:
    - Buffer overruns + fixes (today)
    - Integer overflow + fixes (Wed)
  OS is soo big, changes fast, always bugs
  How to defend against bugs in general
   - Privilege separation using UIDs and ACL
   - Privilege separation capabilities
  Will help you with the first 3 labs

Buffer overrun demo
   linux vm
   http://web.mit.edu/6.033/www/assignments/handson-stack.html
   download stack.tgz 
   run server
   show grades   curl  http://ubuntu:4000/app.py
   remove grades: exploit-ex.py

   look at httpd.c

   restart server
   use gdb to inspect
   set breakpoint in process_client
   launch attack
   inspect stack, reqpath
   breakpoint at httpd.c:173
   continue
   inspect stack, reqpath
   look at ebp
   look at epb+4
   display /i *(ebp+4)
   

Buffer overflows, memory safety
===============================

Some bugs in the paper:
  Figure 3, optimized bounds check should probably be
    (p^p') >> table[p >> log_of_slot_size] == 0
  Figures 5 and 18, pointer arithmetic code should probably be
    char *p = &buf[i];
  or
    char *p = buf + i;

Recall what's going on with buffer overflows (last lecture, 6.033, ..)
  Consider the following example code from, say, a web server (simplified
  version of the demo one):

    void read_req(void) {
      char buf[128];
      int i;
      gets(buf);
      .. parse buf ..
    }

  What does the compiler generate in terms of memory layout?
    x86 stack:
      Stack grows down.
      %esp points to the last (bottom-most) valid thing on the stack.
      %ebp points to the caller's %esp value.

                         +------------------+
        entry %ebp ----> | .. prev frame .. |
                         |                  |
                         |                  |
                         +------------------+
        entry %esp ----> |  return address  |
                         +------------------+
        new %ebp ------> |    saved %ebp    |
                         +------------------+
                         |     buf[127]     |
                         |       ...        |
                         |      buf[0]      |
                         +------------------+
        new %esp ------> |        i         |
                         +------------------+

    Caller's code (say, main):
      call read_req

    read_req's code:
      push    %ebp
      mov     %esp -> %ebp
      sub     168, %esp        # stack vars, etc
      ...
      mov     %ebp -> %esp
      pop     %ebp
      ret

  What's the threat model, policy?
    Assume that adversary can connect to web server, supply any inputs
       Best to assume that attacker has source code too, even though it maybe unnecessary for attacker
    Policy is a bit loose: only perform operations intended by programmer?
    In practice: don't want adversary to send spam, steal data, install bot.

  How does the adversary take advantage of this code?
    Supply long input, overwrite data on stack past buffer.
    Interesting bit of data: return address, gets used by 'ret'.
    Can set return address to the buffer itself, include some code in there.

  How does the adversary know the address of the buffer?
    What if one machine has twice as much memory?
    Luckily for adversary, virtual memory makes things more deterministic.
    For a given OS and program, addresses will often be the same.

  What happens if stack grows up, instead of down?
    Look at the stack frame for gets.

  What can the adversary do once they are executing code?
    Use any privileges of the process.
    Often leverage overflow to gain easier access into system.
      Originally on Unix, run shell /bin/sh (thus, "shell code").
    If the process is running as root or Administrator, can do anything.
    Even if not, can still send spam, read files (web server, database), ..
    Can attack other machines behind a firewall.

  Why would programmers write such code?
    Legacy code, wasn't exposed to the internet.
    Programmers were not thinking about security.
    Many standard functions used to be unsafe (strcpy, gets, sprintf).
    Even safe versions have gotchas (strncpy does not null-terminate).

  More generally, any memory errors can translate into a vulnerability.
    Using memory after it has been deallocated (use-after-free).
      If writing, overwrite new data structure, e.g. function ptr.
      If reading, might call a corrupted function pointer.
    Freeing the same memory twice (double-free).
      Might cause malloc to later return the same memory twice.
    Decrementing the stack ptr past the end of stack, into some other memory.
      [ http://www.invisiblethingslab.com/resources/misc-2010/xorg-large-memory-attacks.pdf ]
    Might not even need to overwrite a return address or function pointer.
      Can suffice to read sensitive data like an encryption key.
      Can suffice to change some bits (e.g. int isLoggedIn, int isRoot).

Fixing buffer overflows, plan 1: avoid bugs in C code.
  Carefully check sizes of buffers, strings, arrays, etc.
  Use functions that take buffer sizes into account (strncpy, fgets, snprintf).
    gcc warns when a program uses gets() now.
    Other potentially dangerous functions are still widespread.
  Good: prevents problem in the first place.
  Bad: hard to ensure that code is bug-free, especially large existing code.

Fixing buffer overflows, plan 2: help programmers find bugs.
  Works well in practice; will look at static analysis in later lectures.
  "Fuzzers" that supply random inputs can be effective for some kinds of bugs.
  Generally, hard to prove the absence of bugs, esp. for unsafe code like C.
  But, even partial analysis is useful (e.g. for Baggy bounds checking).

Fixing buffer overflows, plan 3: use a memory-safe language (Java, C#, Python).
  Good: does prevent memory corruption errors.
    Except that low-level bindings still need to be correct.
    E.g. language runtime itself, or generic bindings like Python ctypes.
  Bad: still have a lot of legacy code in unsafe languages.
  Bad: might not perform as well as a fine-tuned C application?
    Used to be a bigger problem.
    Hardware & high-level languages are getting better.

All 3 above approaches are effective and widely used, but not enough.
  Large/complicated legacy code written in C is still a big problem.
  Even newly written code in C/C++ can have memory errors.

How to mitigate buffer overflows despite buggy code?
  Two things going on in a "traditional" buffer overflow:
    1: Adversary gains control over execution (program counter).
    2: Adversary executes some malicious code.
  What are the difficulties to these two steps?
    1: Requires overwriting a code pointer (which is later invoked).
       Common target is a return address using a buffer on the stack.
       Any memory error could potentially work, in practice.
       Function pointers, C++ vtables, exception handlers, ..
    2: Requires some interesting code in process's memory.
       Often easier than #1.
       Process already contains a lot of code.
       Process accepts inputs that adversary can supply.
       (But, adversary needs to find a predictable location.)

Mitigation approach 1: canaries (StackGuard, gcc's SSP, ..)
  Idea: OK to overwrite code ptr, as long as we catch it before invocation.
  One of the earlier systems: StackGuard
  Place a canary on the stack upon entry, check canary value before return.
    Usually requires source code; compiler inserts canary checks.
    Where is the canary on the stack diagram?
  Make the canary hard to forge:
    "Terminator canary": four bytes (0, CR, LF, -1)
      Idea: many C functions treat these characters as terminators.
      As a result, if canary matched, then further writes didn't happen.
    Random canary: much more common today (but, need good randomness!)
  What kinds of vulnerabilities will a stack canary not catch?
    Overwrites of function pointer variables before the canary.
    Heap object overflows (function pointers, C++ vtables).
       quite general are malloc/free overflows
       int main(int argc, char **argv) {
        char *p, *q;

        p = malloc(1024);
        q = malloc(1024);
        if (argc >= 2)
                strcpy(p, argv[1]);
        free(q);
        free(p);
        return 0;
       }              
       attacker overwrites the meta data that goes along with the allocated buffers
       the meta sits right before and after the buffer
       when the second free happens the malloc library will try to merge the buffers
       it will write to an address a value that come from the meta data
       both are controled by the attacker, and so the attacker gets to write a memory location of his choosing
       for example, a memory location that corresponds to the stack slot that holds return address
       http://www.win.tue.nl/~aeb/linux/hh/hh-11.html
    Overwrite a data pointer, then leverage it to do arbitrary mem writes.
      int *ptr = ...;
      char buf[128];
      gets(buf);
      *ptr = 5;
    Write directly to return address past the canary (for some buggy code).
  How could you trick the canary?
    Guess or obtain the canary value (if random)
      Maybe application leaks random values from its memory?
      Remove null terminator from a string, app will read later bytes.
    Overwrite the authentic canary value?
      If vulnerability allows arbitrary mem writes, can ovewrite it.

Mitigation approach 2: bounds checking.
  Overall goal: prevent pointer misuse by checking if pointers are in range.
    Sometimes hard to tell in C what's an overflow vs. what's legitimate.
      Suppose program allocates integer array, int x[1024];
              program also creates pointer, e.g. int *y = &x[107];
      Is it OK to increment y to access subsequent elements?
      If it's meant to be used like a string buffer, maybe yes.
      If it's meant to store bank account balances of many users, no.
    Gets even more difficult with mixed structs or unions.
    Usually, the plan is to at least enforce bounds between malloc objects.
      Or variables allocated by the compiler, either global or on stack.
    Often requires compiler changes (hard to do for unmodified, existing code).
  Electric fence: simple debugging tool from a while back.
    Align allocated memory objects with a guard page at end or beginning.
    Use page tables to ensure that accesses to guard page cause a crash.
    Reasonably effective as a debugging technique.
    Can prevent some buffer overflows for heap objects, but not for stack.
    Advantage: works without source code, no compiler changes.
      Need to be able to replace default malloc with efence's malloc.
    Problem: huge overhead (only 1 object per page, + guard pages).
  Fat pointers.
    Idea: modify a pointer representation to include bounds information.
      Requires a different compiler, access to source code.
    Usually called "fat pointer" because the pointer becomes much larger.
    E.g. simple scheme: 4-byte pointer, 4-byte obj base, 4-byte obj end.
    Compiler generates code to aborts if dereferencing pointer whose
      address is outside of its own base..end range.
    Problem 1: can be expensive to check all pointer dereferences.
    Problem 2: incompatible with a lot of existing code.
      Cannot pass fat pointer to unmodified library.
      Cannot use fat pointers in fixed-size data structures.
      Fat pointers are not atomic (some code assumes ptr writes are).
  Shadow data structures to keep track of bounds (Jones and Kelly, Baggy).
    Requires a significant amount of compiler support.
    For each allocated object, keep track of how big the object is.
      E.g. record the value passed to malloc: char *p = malloc(256);
      Or, for static variables, determined by compiler: char p[256];
    For each pointer, need to interpose on two operations:
      1. pointer arithmetic: char *q = p + 256;
      2. pointer dereferencing: char ch = *q;
      Why do we need to interpose on dereference?  (can we do just arithmetic?)
      Why do we need to interpose on arithmetic?  (can we do just dereference?)
    Need to ensure that out-of-bound pointers cannot touch other data.
    Challenge 1: looking up limits information for a regular pointer.
      Naive: hash table or interval tree for addrs.  Slow lookup.
      Naive: array w/ limit info for each memory addr.  Memory overhead.
    Challenge 2: causing out-of-bounds pointer dereferences to fail.
      Naive: interpose on every pointer dereference.  Expensive.
    What's the trick from the paper?
      1. Align, round up allocations to powers of 2: 5 bits of limit.
      2. Store limit info in a linear array: fast lookup.
      3. Allocate memory at slot granularity: fewer array entries.
      4. Use virtual memory system to prevent out-of-bound derefs.

      Why powers of 2?
        Can represent in a small amount of space (5 bits for 32-bit ptrs).
        Easy to check: base = p & ~(size-1).
        (p ^ p' >> table[p >> log_of_slot_size]) == 0
      What's a buddy allocator?
    What is the data structure that Baggy has in memory?
      Array at fixed virtual address, one byte per 16-byte mem "slot".
      Using virtual memory to allocate this array on-demand.
    Why don't these guys throw an error when arithmetic goes out-of-bounds?
      Applications simulate 1-indexed arrays.
      Applications want some value to represent one-past-the-end.
      Applications might compute OOB ptr but check later if it's OK to use.
      Applications may compute p+(a-b) as (p+a)-b.
    Example code:
      char *p = malloc(0x2c);   # 0x2c=44; 0x00001040 (suppose)
      char *q = p + 0x3c;       # 0x3c=60; 0x0000107c
          ok: within baggy bounds, q's slot will contain p's size
      char *r = q + 0x10;       # 0x10=16; 0x8000108c
           outside of baggy bounds and more than 1/2 slot (8), signal error
      char *s = q + 8;          #          0x80001084 
           marked as OOB , because outside of bounds but less than 1/2 slot (so no error)
      char *t = s - 0x20;       # 0x20=32; 0x00001064
            ok, OOB cleared

     Another example:
       char *a=malloc(64);
       char *b=malloc(64);
       b=a+17;
       Does baggy bounds checking will raise an exception?

    What does their pointer look like?
      Normal pointer for in-bounds values.
      High-bit-set for out-of-bounds (within half a slot).
        Typically, OS kernel lives in upper half, protects itself.
      Why half a slot for out-of-bounds?
    What happens when pointer arithmetic goes out-of-bounds?
      Instrumented code jumps to slow path, computes OOB pointer value.
      New pointer value points to "kernel memory", will cause crash.
      If OOB pointer converted to in-bounds pointer, high bit is cleared.
      In common case (all in bounds), no extra overhead.
    So what's the answer to the homework problem?
    Does Baggy instrument every memory address computation & access?
      Static analysis could prove some addr is always safe (no details).
      What's considered an "unsafe variable"?
        Address of variable taken, and not all uses can be proved safe.
      What does Baggy do with function call arguments?  Why?
        Cannot change, because x86 calling convention is fixed.
        Copy unsafe args to separate area, which is aligned & protected.
    How do they get binary compatibility with existing libraries?
      Normal pointers for in-bounds values.
      Set the bounds info to 31 (2^31 bound) for de-allocated memory.
      Solves problem of library code allocating its own memory.
    What can still go wrong with existing libraries?
      Can't detect out-of-bounds pointers generated in that code.
      Can't detect when OOB pointer passed into library goes in-bounds again.
    Why do they instrument strcpy and memcpy?  Do they need to?
    How does their 64-bit scheme work?
      Can get rid of the table storing bounds information, store in pointer.
      Can keep track of OOB pointers that go much further out-of-bounds (2^16).
    Does legitimate code always work with Baggy?
      Not quite; breaks in some cases.
      Unused out-of-bounds pointers.
      Temporary out-of-bounds pointers by more than slot_size/2.
      Conversion from pointer to integers and back.
      Passing out-of-bounds pointer into unchecked code.
    Can Baggy be bypassed to exploit a buffer overflow?
      Could exploit vulnerability in un-instrumented libraries.
      Could exploit temporal vulnerabilities (use-after-free).
      Mixed buffers and code pointers:

        struct {
            char buf[256];
            void (*f) (void);
        } my_type;
        struct my_type s;

      An overflow of s.buf will corrupt s.f, but not flag a bounds error.
      Would re-ordering f and buf help?
        Might break applications that depend on struct layout.
        Might not help if this is an array of (struct my_type)'s.
    What are the costs of bounds checking (e.g. Baggy)?
      Space overhead for limits information (fat pointer or extra table).
      Space overhead for extra padding memory used by buddy allocator (Baggy).
      CPU overheads for pointer arithmetic, dereferencing.
      False alarms!

Mitigation approach 3: non-executable memory (AMD's NX bit, Windows DEP, W^X, ..)
  Modern hardware allows specifying read, write, and execute perms for memory.
    R, W permissions were there a long time ago; execute is recent.
  Can mark the stack non-executable, so that adversary cannot run their code.
  More generally, some systems enforce "W^X", meaning all memory is either
    writable, or executable, but not both.  (Of course, OK to be neither.)
  Advantage: potentially works without any application changes.
  Shortcoming: harder to dynamically generate code (esp. with W^X).
    JITs like Java runtimes, Javascript engines, generate x86 on the fly.
    Can work around it, by first writing, then changing to executable.
    Java runtime used to mark all memory W+X (compatible, but not ideal).
  Can this still be exploited?
    Programs already contain a lot of code, might not need new code.
    Arc injection / return-to-libc attacks.
    Particularly useful functions: system, execl, unlink, strcpy, memcpy, ..
    General technique: "return-oriented programming"

Mitigation approach 4: randomized memory addresses (ASLR, stack randomization, ..)
  Make it difficult for adversary to guess a valid code pointer.
    Stack randomization: move stack to random locations, random offsets.
      Adversary doesn't know what to put for start of buf in return addr.
    Randomize entire address space (Address Space Layout Randomization):
      Rely on the fact that a lot of code is relocatable.
      Dynamic loader can choose random address for each library, program.
      Adversary doesn't know address of system(), etc.
  Can this still be exploited?
    Adversary might guess randomness.
      Especially on 32-bit machines, not a lot of random bits.
      32-bit address: 1 bit for kernel, 12 bit for page, at most 19 left.
      Most systems have more restrictions, leading to 8-16 random bits.
      More practical on 64-bit machines (easily 32 bits of randomness).
    Adversary might extract randomness.
      Programs might print a stack trace or error message w/ pointer.
      If adversary can run some code, they might get real addresses.
      Cute address leak in Flash's Dictionary (hash table):
        Get victim to visit your Flash-enabled page (e.g. buy an ad).
        Hash table internally computes hash value of keys.
        Hash value of integers is the integer.
        Hash value of object is its address.
        Iterating over a hash table is done in hash bucket order.
        Can compare address to integer, guess object address.
        Now, can exploit some code pointer overwrite and bypass ASLR.
    Adversary might not care exactly where to jump.
      "Heap spraying": fill memory w/ shellcode so that random jump is OK.
    Adversary might exploit some code that's not randomized (if exists).
  Some other interesting uses of randomization elsewhere.
    System call randomization (each process has its own system call numbers).
    Instruction set (perhaps machine or SQL language) randomization.

Are the mitigation techniques usable?  Should we use them?
  gcc and MSVC enable stack canaries by default.
  Linux, Windows include ASLR, NX by default.
  Fuzzing / fixing bugs / using memory-safe languages is common when possible.
  Bounds checking not as common (performance overheads, false alarms).
    Common thread in security tools: false alarms prevent adoption of tools.
    Often, 0 false alarms w/ some misses better than 0 misses w/ false alarms.
  Buffer overflows aren't #1 anymore; the web has "won" (SQL injection, XSS).

References:
  http://www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf