6.858 2020 Lecture 5: Privilege Separation, Isolation, and OKWS

The problem: what to do about bugs?

Plan A: find them, fix them, avoid making new ones.
  Huge progress here, e.g. with buffer overflows.
  Is this plan enough?

Example: traditional web server setup (Apache).
  Apache runs N identical processes, handling HTTP requests.
  Each Apache process has all application code:
    executes requests for many users;
    executes lots of different kinds of requests (log in, read e-mail, &c).
  Storage: SQL database w/ passwords, cookies, messages, &c.
  This arrangement is convenient and efficient.
  But it's secure only if the web code has no bugs!
  History suggests that developers will keep inventing new bugs:
    Buffer overflow + code injection exposes whole DB to attacker.
    Memory bugs may let attacker read data from app memory.
    Missing access control checks before DB queries.
    Bugs in file handling may give access to sensitive files.
      e.g. open("/profiles/" + user)
      what about user=../etc/passwd or ../mail/rtm
    SQL injection may let attacker r/w all DB data.
      "SELECT email FROM users WHERE id = " + userid
  So: every reason to expect we'll keep seeing bugs.
  And: this "hard shell, soft inside" setup makes bugs devastating.

Plan B: build systems that are secure even if there are bugs.
  Can we do anything like this?

Big idea: privilege separation
  divide up the s/w and data to limit damage from bugs
  two related benefits:
    limit damage from successful exploit -- "least privilege"
    limit attacker's access to buggy code -- "attack surface"
  designer must choose the separation scheme(s):
    by service / type of data (friend lists vs passwords)
    by user (my e-mail vs your e-mail)
    by buggyness (image resizing vs everything else)
    by exposure to direct attack (HTTP parsing vs everything else)
    by inherent privilege (hide superuser processes; hide the DB)

Privilege separation is difficult
  Need to isolate (client/server, VMs, containers, processes, &c).
  Need to allow controlled interaction.
  Need to retain good performance.

We've seen these ideas in the Google Architecture paper.
  Now we'll dig into a detailed case study, OkCupid's OKWS web server.
  You'll use an OKWS-like setup in Lab 2!

How does OKWS partition the web server?
  Figure 1 in paper.
  How does a request flow in this web server?
    okld starts all other processes, from a config file.
    okd -> oklogd
        -> pubd
        -> svc -> dbproxy -> DB
               -> oklogd
  How does this design map onto physical machines?
    Many front-end machines, each with okld, okd, pubd, oklogd, svc*.
    A few DB machines, each with dbproxy, DB.

What are the different services?
  Application-specific.
  Paper (5.3) mentions matching, messaging, profile editor, photos.
  Login is probably also a separate service.

Why this privilege separation arrangement?
  Most bugs will be in svc code.
    Lots of them, relatively complex.
    Written by online dating experts, not security experts.
    Hopefully security expert writes okld, okd, db proxies, &c.
  Each svc can only get at relevant data from DB.
    dbproxy restricts what queries each service can use.
    Can't read/write files, or affect other components.
  Thus a buffer overflow in the profile editor won't expose passwords.
    Though it may let attacker read/write any user's profile.

What harm if each component compromised? How vulnerable is each?
  harm == privileges
  attack surface == avenues by which attacker could tickle bugs 
  okld: 
    privileges: superuser access to web server machine.
    attack surface: small (no user input other than svc exit).
  okd:
    privileges: intercept all user HTTP reqs/responses, steal passwords.
    attack surface: parsing the first line of HTTP request.
  pubd:
    privileges: some file system access, could corrupt templates.
    attack surface: requests to fetch templates from okd.
  oklogd:
    privileges: change/delete log entries -- cover attacker's tracks.
    attack surface: log messages from okd, okld, svcs
  service:
    privileges: service's data for any user, requests to dbproxy.
    attack surface: HTTP requests, DB content (!).
  dbproxy:
    privileges: access/change all data in the database it's talking to.
    attack surface: requests from authorized services
                    requests from unauthorized services (easy to drop)

How powerful is separation by service?
  Does it prevent successful attacker from seeing anyone else's data?
  Would it make sense to separate by user instead?
    For reading messages?
    For matching and viewing profiles?

Where should an attacker look for weaknesses?
  Probably lots of bugs in svc implementations
    Maybe not so bad for the "friend status" service
    Bad that bug in e-mail svc means I can read/write your e-mail
    Very bad if bugs in the password authentication service
    Hopefully sensitive services have few lines of code (== few bugs)
  Bugs in OS kernel
    Code injected into a svc might be able to exploit a kernel bug
    to become superuser, break out of chroot.
  Bugs in okd URL parsing
  Bugs in DB proxies (SQL injection, too permissive).

Why are database proxies separate? Why not let svcs talk to the DB?
  DB accepts general SQL queries, can retrieve/modify anything.
  dbproxy accepts RPCs (not SQL); dbproxy generates SQL and talks to DB.
    thus svc code isn't subject to SQL injection.
  dbproxy knows which queries each svc is allowed to make.
    this is where the security policy lives: svc / query matrix.
  A knowledgeable developer must maintain dbproxy.

How does a dbproxy know what svc is talking to it?
  dbproxies on separate DB machines, svcs use TCP sockets to connect.
  You can't tell from TCP much about who connected to you.
  So:
    Each svc has a unique secret 20-byte token.
    svc supplies its token in RPC when talking to dbproxy.
    dbproxy has list of allowed queries for each token.
  Where does the 20-byte token come from?
    okld reads from config, passes it to svc.
  What if a token disclosed (this is the homework question)?
    e.g. e-mail service's token is disclosed.
    there is no immediate problem.
    but, if any svc were then compromised, the attacker could
     read all users' e-mail, even w/o breaking into e-mail svc.

What if an exploited svc tries to read tokens from okld's config file?
  Or tries to use gdb to look inside another service process?
  Or tries to read a token from an svc's core dump file?
  We need help from the O/S to enforce isolation!

What are the mechanisms for isolation and control over sharing?
  Paper uses Unix processes, user IDs (UIDs), file permissions, and fd passing
    What is setuid(uid)?
      a process can drop its privileges from root to an ordinary uid
    What is chroot(dirname)?
      causes / to refer to dirname for this process and descendants,
      so they can't name files outside of dirname.
    What is FD passing?
      One process open network connection and passes the file descriptor for it to another process
      For example, okld passes file descriptor for port 80 to okd.

How does OKWS enforce isolation between components in Figure 1?
  okld runs each service with a separate UID.
    [In lab 2, you would run each service as a separate container]
    So services can't read/write each other's memory.
  okld uses chroot to prevent processes from seeing most files.
    Table 1
    pubd and oklog can only get at their own files.
  okld runs as root (for setuid() and to allocate TCP port 80)
    So we want it to do as little as possible!
  Why is okd a separate process?
    We need a way to route HTTP requests to the right svc.
    okd sees all requests, so we don't want to do anything else in okd.
    note okd does *not* run as superuser; okld gives it port 80.
  Why is oklogd a separate process?
    We don't want corrupt svc to delete/overwrite log files.
  Why is pubd a separate process?
    Keeps file handling code out of svcs.
  Table 1: why are all services and okld in the same chroot?
    We want to chroot okld -- it may have bugs too.
    okld needs to re-launch okd + services.
      So okd and services need to live somewhere in okld's chroot jail.
    What are we exposing by having okld, okd, and svc share chroot jail?
      Readable: shared libraries containing service code.
      Writable: each service can write to its own /cores/<uid>.
    Where's the config file?
      /etc/okws_config, maybe okld reads on startup before chroot.
    oklogd & pubd have separate chroots because they use files.
      so okld must start oklogd and pubd before it chroots itself.
  Why a separate UID for each service?
    kill, ptrace, core files 
  Why a separate GID for each service?
    So svc can execute its binary but not read/write/chmod it.
    Binary owned by root, and x-only for svc GID: rwx--x---
    Thus svc can't read secrets out of its executable, and can't
      modify it to persist an attack.
  How do OKWS components interact?
    okld sets up socketpairs (bidirectional pipes) for each service.
      Control RPCs.
      Logging.
      HTTP connections, okd -> svc.
    Services talk to DB proxy over TCP.
      Most state in DB, most interaction via state in DB.

UNIX process-level isolation tools are hard to use.
  Many global name-spaces: files, UIDs, PIDs, ports.
    Each may allow processes to see what others are up to.
    Each is an invitation for bugs or careless set-up.
  No idea of "default to no access".
    Thus hard for designer to reason about what a process can do.
  No fine-grained grants of privilege.
    Can't say "process can read only these three files."
  Chroot() and setuid() can only be used by superuser.
    So non-superusers can't reduce/limit their own privilege.
    Awkward since security suggests *not* running as superuser.

Lab 2 uses Linux containers (lxc)
  Didn't exist when author build OKWS
  Containers provide the illusion of virtual machines wo. using virtual machines
    Containers are more efficient than virtual machines
  Container is a Linux process, but strongly isolated:
    Limited access to the kernel name spaces
    Limited access to system calls
    No access to the file system
  Containers behave like a virtual machine
    Started from a VM image
    Have their own IP address
    Have their own file system
  Lab 2 uses *unprivileged* containers
    These containers run as non-root user processes
    If the process inside the container runs as root, still limited privileges
  More difficult to break out of container than chrooted-process
  Lab2 also uses chroot/uid privilege separate processes in one container (profile)
    But that is the exception, lab 2 mostly relies on containers
    
Using containers for privilege separation
  Plan: turn single-process application into a virtual "distributed" application
    Create a container for different services
      Copy the right files into the container
      Assign its own IP address
      Use RPC over TCP to communicate with other containers
    Limit communication between containers
      Setup firewall rules to limit communication between containers
    Lab 2 has zookld that does this; similar to okld
  The google architecture paper uses physical machines to split services
    Containers support the same idea using a single physical machine

What has happened since 2004, when OKWS was published?
  OKWS itself still (probably) used at OK Cupid, but not elsewhere.
    C++ isn't popular for web programming.
    UNIX process-level isolation tools are hard to use.
    Fine partitioning is hard, tension with fast devel and evolution.
    OKWS partitioning not v. useful if you have only one important service.
  Privilege separation commonly used in practice
    Load balancer, login svc, profile svc, password DB, profile DB.
    Google architecture paper is a good example.
      Even finer-grained than OKWS: per-user tickets.
  Some systems with OKWS-style fine-grained partition:
    ssh-agent process to hold crypto keys, vs ssh itself.
    Chrome runs each frame in an isolated process.
  Many new isolation tools, better than processes/chroot:
    VMs, FreeBSD jail, Linux containers (Docker), Linux seccomp, ..