6.858 2017 Lecture 5: Privilege Separation, Isolation, and OKWS

The problem, again: bugs
  No-one knows how to prevent programmers from making mistakes.
  Many bugs cause security problems.
  We need ways to limit the damage from as-yet-unknown bugs.

How does this paper's plan differ from last paper (Baggy bounds)?
  Baggy focuses on a specific class of bugs
  OKWS is bug-agnostic; many other bugs can arise (SQL injection, file access, ..)
  Baggy does not always prevent exploits
  OKWS never prevents exploits but does limit damage
  Baggy crashes process if it detects a problem
  OKWS runs separate process for each service, so crash affects only one service

Example: traditional web server setup (Apache).
  Apache runs N identical processes, handling HTTP requests.
  Each Apache process has all application code:
    executes requests for many users;
    executes lots of different kinds of requests (log in, read e-mail, &c).
  Storage: SQL database stores application state (passwords, cookies, messages, &c).
  This arrangement is pretty vulnerable to bugs:
    Memory bugs may let attacker read/write lots of app's data.
    Bugs in file handling may give access to sensitive files.
      e.g. open("/profiles/" + user)
      what if browser sets user=../etc/passwd or ../mail/rtm
    SQL injection may let attacker r/w all DB data.
    Buffer overflow / code injection gives access to all data. 

Big idea: privilege separation
  divide up the s/w and data to limit damage from bugs
  what kinds of separation make sense?
    regular user access vs access by tech support agents?
      not helpful: same database privileges
    update user profile vs look at friend profiles
      somewhat helpful: read-only vs read-write access
    messaging vs profile access
      big separation in terms of databsae privileges!
  designer must choose the separation scheme(s):
    by type of data (friend lists vs passwords)
    by user (my e-mail vs your e-mail)
    by buggyness (image resizing vs everything else)
    by exposure to direct attack (HTTP parsing vs everything else)
    by inherent privilege (hide superuser processes; hide the DB)

Big idea: isolation
  construct walls between units of privilege separation
  to prevent exploits in one unit from spreading to others

These two ideas have been very successful
  e.g. client/server, virtual machines, sandboxing, linux containers, &c
  Challenges:
    Separation vs sharing.
    Separation vs performance.
    Hard to use O/S to enforce isolation and control sharing.

What are the Unix mechanisms for isolation and control over sharing?
  Unix is the context in which the OKWS paper and Lab 2 live.
  Unix actions are taken by processes.
    A process is a running program.
    Processes are the most basic Unix tool for keeping code/data separate.
    A process's user ID (UID) controls many of its privileges.
      A UID is a small integer.
      Superuser (UID=0) bypasses most checks.
      A process also has a set of group IDs (GIDs) used in file permissions.
  Sharing often depends on naming.
    If a process can name something, it can often access it.
    More important: if it *can't* name something, it usually *can't* use it.
      We can isolate a process by limiting what names it can use.
    So we want to know about the name-spaces Unix provides:
      PIDs, UIDs, memory, files, file descriptors, network connections.
  What types of objects does Unix let processes manipulate?
    I.e. what do we need to control to enforce isolation, allow precise sharing?
    Processes.
      Processes with same UID can kill and debug (ptrace) each other.
      Otherwise not much direct interaction is allowed.
      So: processes are reasonably well isolated for different UIDs.
    Process memory.
      One process cannot directly name or access memory in another process.
      Exceptions: ptrace, memory mapped files.
      So: process memory is reasonably well isolated for different UIDs.
    Files, directories.
      File operations: read, write, execute, change perms, ..
      Directory operations: lookup, create, remove, rename, change perms, ..
      Each inode has an owner user and group.
      Each inode has read, write, execute perms for user, group, others.
        E.g. rtm staff rwxr-x---
      Who can change a file's permissions?  Only its owner (process UID).
      Execute for directory means being able to lookup names (but not ls).
      Checks for process opening file /etc/passwd:
        Must be able to look up 'etc' in /, 'passwd' in /etc (x permission).
        Must be able to open /etc/passwd (r or w permission).
      Unix rwx scheme is simple but not very expressive;
        cannot e.g. have two owners, or permissions for specific users.
      So: can control which processes (UIDs) can access a specific file.
        But hard to control the set of files a specific process can access.
    File descriptors (FDs).
      A process has one FD per open file and open IPC/network connection.
      Processes cannot see or interfere with each others' FDs.
      Processes can pass file descriptors (via Unix domain sockets).
      So: FDs are well isolated -- process-local names, not global.
    Local IPC -- "Unix domain sockets" -- "socketpair".
      OKWS uses these for most of its inter-server communication.
      As used by OKWS, they have no names.
      A process can create a connection -- gets two FDs.
      It can then give the connection end FDs to other processes,
        either via fork/exec or by sending over existing connections.
      So: Unix domain connections are well isolated.
    TCP/IP (Internet) connections.
      Servers listen on ports -- 16-bit numbers, e.g. http is 80.
        OKWS DB server and proxies probably listen on TCP/IP ports.
        Only superuser is allowed to listen to ports < 1024, e.g. 80.
      Anyone can try to connect to any port as a client.
        Servers can ignore; firewalls can block.
        Servers can't directly tell who the client is.
      Only the two end-points can send/receive on an existing connection.
        (not really true; bad people may snoop/inject on network)
      So: servers have to be careful who they talk to.

How is a process's UID set?
  Superuser (UID 0) can call setuid(uid) and setgid(gid).
  Non-superuser processes can't change their UID.
  UID/GID often initially set by login, from /etc/passwd.
  UID inherited during fork(), exec().

One more Unix isolation trick: chroot()
  Problem: it is too hard to ensure that there are no
    sensitive files that a program can read, or write;
    100,000+ files in a typical Unix install; applications
    are often careless about setting permissions.
  Solution: chroot(dirname)
    causes / to refer to dirname for this process and descendants,
    so they can't name files outside of dirname.
  e.g. chroot("/var/okws/run") causes subsequent absolute pathnames
    to start at /var/okws/run, not the real /.
    Thus the program can only name files/dirs under /var/okws/run.
  chroot() is typically used to prevent a process from interacting
    at all with other processes via files, i.e. complete isolation.
  chroot() effective only for non-root processes
    (Process running as root can reset chroot.)

Overall, Unix is awkward at precisely-controlled isolation+sharing:
  Many global name-spaces: files, UIDs, PIDs, ports.
    Each may allow processes to see what others are up to.
    Each is an invitation for bugs or careless set-up.
  No idea of "default to no access".
    Thus hard for designer to reason about what a process can do.
  No fine-grained grants of privilege.
    Can't say "process can read only these three files."
    Privileges are coarse-grained, via UID, or implicit, e.g. wait() for children.
  Chroot() and setuid() can only be used by superuser.
    So non-superusers can't reduce/limit their own privilege.
    Awkward since security suggests *not* running as superuser.

How does OKWS partition the web server?
  Figure 1 in paper.
  How does a request flow in this web server?
    okld starts all other processes, from a config file.
    okd -> oklogd
        -> pubd
        -> svc -> dbproxy
               -> oklogd
  How does this design map onto physical machines?
    Probably many front-end machines (okld, okd, pubd, oklogd, svc)
    Several DB machines (dbproxy, DB)

Why this privilege separation arrangement?
  Most bugs will be in svc code.
  So think "attacker has injected code into a svc; what can attacker do now?"
  High-level picture: OKWS isolates each svc so it can access only relevant data.
    E.g. buffer overflow in e-mail service won't let attacker see passwords.

How do these components interact?
  okld sets up socketpairs (bidirectional pipes) for each service.
    One socketpair for control RPC requests (e.g., "get a new log socketpair").
    One socketpair for logging (okld has to get it from oklogd first via RPC).
    For HTTP services: one socketpair for forwarding HTTP connections.
    For okd: the server-side FDs for HTTP services' socketpairs (HTTP+RPC).
  Services talk to DB proxy over TCP (connect by port number).
    Most state in DB, most interaction via state in DB.

How does OKWS enforce isolation between components in Figure 1?
  okld runs each service with a separate UID and GID.
    So services can't read/write each other's memory.
  okld uses chroot to confine each process to a separate directory (almost).
    Services can't read/write *any* files (system files, application state, &c).
    pubd and oklog can only get at their own files.
  Why is okld a separate process?
    Must run as superuser to bind to port 80, call chroot() and setuid().
    We want as little code as possible to run as superuser.
  Why is okd a separate process?
    We need a way to route HTTP requests to the right svc.
    okd sees all requests, so we don't want to do anything else in okd.
    note okd does *not* run as superuser; okld gives it port 80.
  Why is oklogd a separate process?
    We don't want corrupt svc to delete/overwrite log files.
    More generally we don't want svcs to have any access to files (too bug-prone).
  Why is pubd a separate process?
    Keeps file handling code out of svcs.
  Why are database proxies separate? Why not let svcs talk to the DB?
    Ensure that each service cannot fetch wrong data, if it is compromised.
      DB proxy protocol defined by app developer, depending on what app requires.
      Proxy enforces overall query structure (select, update),
        but allows client to fill in query parameters.
    Where does the 20-byte token come from?  Passed as arguments to service.
    Who checks the token?  DB proxy has list of tokens (& allowed queries?)
    Who generates token?  Not clear; manual by system administrator?
    What if token disclosed (this is The Question)?
      e.g. compromised newsletter svc could issue queries as e-mail service.
        and read/write any user's e-mail.
  Table 1: why are all services and okld in the same chroot?  Is it a problem?
    How would we decide?  What are the readable, writable files there?
    Readable: shared libraries containing service code.
    Writable: each service can write to its own /cores/<uid>.
    Where's the config file?
      /etc/okws_config, maybe okld reads on startup before chroot.
    oklogd & pubd have separate chroots because they have important state:
      oklogd's chroot contains the log file, want to ensure it's not modified.
      pubd's chroot contains the templates, want to avoid disclosing them (?).
  Why a separate UID for each service?
  Why a separate GID for each service?
    So svc can execute its binary but not read/write/chmod it.
    Binary owned by root, and x-only for svc GID: rwx--x---
    Thus svc can't read secrets out of its executable, and can't
      modify it to persist an attack.
  Why not process per user? Or even process per user per service?
    Per-service isolation probably made sense for okcupid given their apps.
      (i.e., perhaps they need a lot of sharing between users anyway?)
    Per-user isolation requires allocating UIDs per user, complicating okld,
      and reducing performance (though may still be OK for some use cases).

What harm if each component compromised? How vulnerable is each?
  okld: superuser access to web server machine, but maybe not directly to DB.
    attack surface: small (no user input other than svc exit).
  okd: intercept/modify all user HTTP reqs/responses, steal passwords.
    attack surface: parsing the first line of HTTP request.
  pubd: corrupt templates, leverage to maybe exploit bug in some service?
    attack surface: requests to fetch templates from okd.
  oklogd: corrupt/ignore/remove/falsify log entries.
    attack surface: log messages from okd, okld, svcs
  service: read/write service's data for any user, send requests to dbproxy.
    attack surface: HTTP requests from users (+ control msgs from okd)
  dbproxy: access/change all user data in the database it's talking to.
    attack surface: requests from authorized services
                    requests from unauthorized services (easy to drop)

Where should an attacker look for weaknesses?
  Probably lots of bugs in svc implementations
    Maybe not so bad for the "friend status" service
    Bad that bug in e-mail svc means I can read/write your e-mail
    Very bad if bugs in the password authentication service
    Hopefully sensitive services have few lines of code (== few bugs)
  Bugs in OS kernel
    Code injected into a svc might be able to exploit a kernel bug
    to become superuser, break out of chroot.
  Bugs in okd URL parsing
  Bugs in DB proxies (SQL injection, too permissive).

How have these ideas evolved since 2004, when OKWS was published?
  OKWS itself still used at OK Cupid, but not adopted elsewhere.
    C++ not a very popular language for web programming.
    Too much app-specific effort to partition services finely.
    Too much app-specific effort to build DB proxy interfaces.
  Many systems partition at coarse grain, where detailed sharing isn't needed.
    client/server, browser/server, VMs, Linux containers.
    Often partitioning is by independent service / application / web site.
  Some systems use fine-grained cooperating partitions, reminiscent of OKWS.
    ssh-agent process to hold crypto keys, vs ssh itself.
    Chrome runs each frame in an isolated process.
    Google's design, which we saw at a high level in an earlier lecture.
      okd = GFE
      Services, isolated by VMs
      Even per-user tokens, which OKWS doesn't have
    It takes a lot of work to build in this style.
  Much recent effort to make isolation both convenient and precise.
    FreeBSD jail, Linux containers (Docker), ..
    Next lecture: capabilities.