Introduction
============

Welcome to 6.858 -- Computer Systems Security

Course structure
  Lectures will be MW11-12:30, in 1-190.
    One paper per lecture.
      Tentative schedule online, may change.
    Read the paper before lecture, and submit by 10PM the night before:
      Answer to a short homework question (link from schedule page).
      Your own question about the paper (will try to answer in lecture).
    Interrupt, ask questions, point out mistakes.
  One quiz, one final exam.
    Quiz during class, final during finals week.
  Assignments: Five labs.
    Defenses and/or attacks on fairly real systems.
    Not a lot of coding, but lots of non-standard thinking.
      Poke into obscure corners of x86 asm, C, Python, Javascript, ..
    Office hours for lab/lecture help.
    Lab 1, buffer overflows, first part due next Friday.  Start early.
    Two options for Lab 5:
      Ordinary lab,
      or your choice of final project, in groups.
    For projects:
      Presentations at the end of the semester.
      Think of projects you'd like to work on as you're reading papers.
      Both attack- or defense-oriented projects are possible.
  Two lecturers: Robert Morris, Frans Kaashoek.
  Five TAs: Steve, Haogang, Jon, Alex, Albert.
  Sign up for Piazza (link on course web site).
    Mostly questions/answers about labs.
    We will post any important announcements there.
  Warning about security work/research on MITnet (and in general).
    You will learn how to attack systems, so that you know how to defend them.
    Know the rules: http://ist.mit.edu/network/rules
    Don't mess with other peoples' data/computers/networks w/o permission.
    Ask course staff for advice if in doubt.

6.858 is about building secure computer systems
  Secure = achieves some property despite attacks by adversaries.
  Systematic thought is required for successful defense.
  High-level plan for thinking about security:
    Policy: the goal you want to achieve.
      e.g. only Alice should read file F.
      Common goals: confidentiality, integrity, availability.
    Threat model: assumptions about what the attacker can do.
      e.g. can guess passwords, cannot physically steal our server.
    Mechanism: software/hardware that your system uses to enforce policy.
      e.g. user accounts, passwords, file permissions, encryption.

Building secure systems is hard -- why?
  Example: 6.858 grade file, stored on an Athena AFS server.
    Policy: only TAs should be able to read and write the grades file.
  Easy to implement the *positive* aspect of the policy:
    There just has to be one code path that allows a TA to get at the file.
  But security is a *negative* goal:
    We want no tricky way for a non-TA to get at the file.
  There are a huge number of potential attack to consider!
    Exploit a bug in the server's code.
    Guess a TA's password.
    Steal a TA's laptop, maybe it has a local copy of the grades file.
    Intercept grades when they are sent over the network to the registrar.
    Get a job in the registrar's office, or as a 6.858 TA.
  Result:
    One cannot get policies/threats/mechanisms right on the first try.
    One must usually iterate:
      Design, watch attacks, update understanding of threats and policies.
    Defender is often at a disadvantage in this game.
      Defender usually has limited resources, other priorities.
      Defender must balance security against convenience.
    A determined attacker can usually win!
    
What's the point if we can't achieve perfect security?
  Perfect security is rarely required.
  Make cost of attack greater than the value of the information.
    So that perfect defenses aren't needed.
  Make our systems less attractive than other peoples'.
    Works well if attacker e.g. just wants to generate spam.
  Find techniques that have big security payoff (i.e. not merely patching holes).
    We'll look at techniques that cut off whole classes of attacks.
    Successful: popular attacks from 10 years ago are no longer very fruitful.
  Sometimes security *increases* value for defender:
    VPNs might give employees more flexibility to work at home.
    Sandboxing (JavaScript, Native Client) might give me more confidence
      to run software I don't fully understand.

What goes wrong #1: problems with the policy.
  I.e. system correctly enforces policy -- but policy is inadequate.
  Example: Sarah Palin's email account.
    [ Ref: http://en.wikipedia.org/wiki/Sarah_Palin_email_hack ]
    Yahoo email accounts have a username, password, and security questions.
    User can log in by supplying username and password.
    If user forgets password, can reset by answering security Qs.
    Some adversary guessed Sarah Palin's high school, birthday, etc.
    Policy amounts to: can log in with either password *or* security Qs.
      No way to enforce "Only if user forgets password, then ..."
    Thus user should ensure that password *and* security Qs are
      both hard to guess.
  Example: Mat Honan's accounts at Amazon, Apple, Google, etc.
    [ Ref: http://www.wired.com/gadgetlab/2012/08/apple-amazon-mat-honan-hacking/all/ ]
    Honan an editor at wired.com; someone wanted to break into his gmail account.
    Gmail password reset: send a verification link to a backup email address.
      Google helpfully prints part of the backup email address.
      Mat Honan's backup address was his Apple @me.com account.
    Apple password reset: need billing address, last 4 digits of credit card.
      Address is easy, but how to get the 4 digits?
    Amazon's password reset e-mail includes last 4 digits of all your registered
      credit cards; I think they required full card # to reset.
    How to get hold of that e-mail?
    Call Amazon tech support; you can persuade them to add a new e-mail to
      any account (add cred card; then use that cc to verify yrself).
    Now you will get copy of Amazon pwd reset e-mail w/ last 4.
    Now attacker can reset Apple password, read gmail reset e-mail,
      reset gmail password.
    Lesson: attacks often assemble apparently unrelated trivia.
    Lesson: individual policies OK, but combination is not.
      Apple views last 4 as a secret, but many other sites do not.
    Lesson: big sites cannot hope to identify which human they are talking to;
      at best "same person who originally created this account".
      security questions and e-mailed reset link are examples of this.

What goes wrong #2: problems with threat model / assumptions.
  I.e. designer assumed an attack wasn't feasible (or didn't think of the attack).
  Example: most users are not thinking about security.
    User gets e-mail saying "click here to renew your account",
      then plausible-looking page asks for their password.
    Or dialog box pops up with "Do you really want to install this program?"
    Or tech support gets call from convincing-sounding user to reset password.
  Example: computational assumptions change over time.
    MIT's Kerberos system used 56-bit DES keys, since mid-1980's.
    At the time, seemed fine to assume adversary can't check all 2^56 keys.
    No longer reasonable: now costs about $100.
      [ Ref: https://www.cloudcracker.com/dictionaries.html ]
      Several years ago, 6.858 final project showed can get any key in a day.
  Example: all SSL certificate CAs are fully trusted.
    Browser verifies server's certificate to ensure talking to the right server.
    Certificate contains server's host name and cryptographic key,
      signed by some trusted certificate authority (CA).
    Browser has CA's public key built in to verify certificates.
    If attacker compromises CA, can generate fake certificate
      for any server name.
    Originally there were only a few CAs; seemed unlikely that
      attacker could compromise a CA.
    But now browsers fully trust 100s of CAs!
    In 2011, two CAs were compromised, issued fake certs for many domains
      (google, yahoo, tor, ...), apparently used in Iran (?).
      [ Ref: http://en.wikipedia.org/wiki/DigiNotar ]
      [ Ref: http://en.wikipedia.org/wiki/Comodo_Group ]
    In 2012, a CA inadvertently issued a root certificate valid for any domain.
      [ Ref: http://www.h-online.com/security/news/item/Trustwave-issued-a-man-in-the-middle-certificate-1429982.html ]
    Mistake: maybe reasonable to trust one CA, but not 100s.
  Example: assuming your hardware is trustworthy.
    If NSA is your adversary, turns out to not be a good assumption.
    [ Ref: https://www.schneier.com/blog/archives/2013/12/more_about_the.html ]
  Example: subverting military OS security.
    In the 80's, military encouraged research into secure OS'es.
    Surprise: successful attacks by gaining access to development systems
    Mistake: implicit trust in compiler, developers, distribution, &c

What goes wrong #3: problems with the mechanism -- bugs.
  Bugs routinely undermine security.
    Rule of thumb: one bug per 1000 lines of code.
    Bugs in implementation of security policy.
    But also bugs in code that may seem unrelated to security.
  Example: Apple's iCloud password-guessing rate limits.
    [ Ref: https://github.com/hackappcom/ibrute ]
    People often pick weak passwords; can often guess w/ few attempts (1K-1M).
    Most services, including Apple's iCloud, rate-limit login attempts.
    Apple's iCloud service has many APIs.
    One API (the "Find my iPhone" service) forgot to implement rate-limiting.
    Attacker could use that API for millions of guesses/day.
    Lesson: if many checks are required, one will be missing.
  Example: Missing access control checks in Citigroup's credit card web site.
    [ Ref: http://www.nytimes.com/2011/06/14/technology/14security.html ]
    Citigroup allowed credit card users to access their accounts online.
    Login page asks for username and password.
    If username and password OK, redirected to account info page.
    The URL of the account info page included some numbers.
      e.g. x.citi.com/id=1234
    The numbers were (related to) the user's account number.
    Adversary tried different numbers, got different people's account info.
    The server didn't check that you were logged into that account!
    Lesson: programmers tend to think only of intended operation.
  Example: Android's Java SecureRandom weakness leads to Bitcoin theft.
    [ Ref: https://bitcoin.org/en/alert/2013-08-11-android ]
    [ Ref: https://www.nilsschneider.net/2013/01/28/recovering-bitcoin-private-keys.html ]
    Bitcoins can be spent by anyone that knows the owner's private key.
    Many Bitcoin wallet apps on Android used Java's SecureRandom API.
    Turns out the system sometimes forgot to seed the PRNG!
      A Pseudo-Random Number Generator is deterministic after you set the seed.
      So the seed had better be random!
    As a result, some Bitcoin keys turned out to be easy to guess.
    Adversaries searched for guessable keys, spent any corresponding bitcoins.
    (Really it was the nonce in the ECDSA signature that wasn't random,
     and repeated nonce allows private key to be deduced.)
    Lesson: be careful
  Example: Moxie's SSL certificate name checking bug
    [ Ref: http://www.wired.com/2009/07/kaminsky/ ]
    Certificates use length-encoded strings, but C code often is null-terminated.
    CAs would grant certificate for amazon.com\0.rtm.org
    Browsers saw the \0 and interpreted as a cert for amazon.com
    Lesson: parsing code is a huge source of security bugs.
  Example: buffer overflows (see below).

Case study: buffer overflows.
  An important class of security problems,
    for which many attacks and defenses are known.
  This is the topic of Lab 1.
  Suppose your web server has a bug in the HTTP input parsing.
    On certain inputs, it crashes.
  Should you be worried?
  Let's take a look at a simplified example.

    % cat readreq.c

#include <stdio.h>
#include <stdlib.h>

char *
gets(char *buf) {
  int c;
  while((c = getchar()) != EOF && c != '\n')
    *buf++ = c;
  *buf = '\0';
  return buf;
}

int
read_req(void) {
  char buf[128];
  int i;
  gets(buf);
  i = atoi(buf);
  return i;
}

int
main() {
  int x = read_req();
  printf("x = %d\n", x);
}

    % ./readreq
    1234
    % ./readreq
    AAAAAAAAAAAA....AAAA

  Why did it crash?
  We should think "this is a bug; could an attacker exploit it?"
  Let's figure out what exactly is happening.

    % gdb ./readreq
    b read_req
    r
    info reg
    disas $eip

  Where is buf[]?

    print &buf[0]
    print $esp
    print &i

  Aha, buf[] is on the stack, followed by i.
  The sub $0xa8, %esp allocates space for buf[] and i.

  Let's draw a picture of what's on the stack.
                         +------------------+
                         |  main()'s frame  |
                         |                  |
                         |                  |
                         +------------------+
                         |  return address  |
                         +------------------+
            %ebp ------> |    saved %ebp    |
                         +------------------+
                         |        i         |
                         +------------------+
                         |     buf[127]     |
                         |       ...        |
                         |      buf[0]      |
                         +------------------+
            %esp ------> |       ...        |
                         +------------------+
  The x86 stack grows down in addresses.
  push == decrement $esp, then write to *$esp

  $ebp is "frame pointer" -- saved stack ptr at function entry.

    x $ebp
    x $ebp+4

  Let's see what the saved return $eip refers to:

    disas 0x0804850d

  It's the instruction in main() after the call to read_req()
  OK, back to read_req, just before gets()

    disas $eip
    next
    AAAAAAA...AAA

  What did gets() do to the stack?

    print &buf[0]

  Hmm, 156 is more than 128!
  How can that be?

    x $ebp
    x $ebp+4

  Saved frame pointer and return eip are 0x41414141!
  What's 41?

    next
    disas
    stepi
    stepi
    disas

  Now about to execute get()'s return instruction.

    x $esp
    stepi -- the ret
    info reg -- note eip is 0x41414141
    stepi -- crash, this is our seg fault

  Is this a serious problem?
    I.e. if our web server code had this bug,
      could an attacker exploit it to break into our computer?

  Is the attacker limited to jumping somewhere random?
    No: "code injection" attack.
    How does the adversary know the address of the buffer?

  What can the adversary do once they are executing injected code?
    If the process is running as root or Administrator, can do anything.
    Even if not, can still send spam, read files (web server, database), ..
    Can load bigger program from somewhere on the net.

  What happens if stack grows up, instead of down?
    Stack frame for read_req() has buf[] at highest address,
      so won't overflow onto read_req() return $eip.
    Can an attacker still exploit this bug?

How to defend against buffer overflows?
  Use a language that checks array bounds automatically.
  For C:
    Don't call gets().
    Intel lets you mark the stack as non-executable.
      Is this a 100% solution for C?
    Randomize layout, canaries, &c
    Structure the application to limit damage from bugs (Lab 2).
    Good news: simple buffer overruns like this do not work any more.

Buffer overflow lessons:
  Bugs are a problem in all parts of code, not just in security mechanism.
  Policy may be irrelevant if the implementation has bugs.
  But stay tuned; there is hope for the defense.