6.858 2019 Lecture 5: Privilege Separation, Isolation, and OKWS The problem: what to do about bugs? Plan A: find them, fix them, avoid making new ones. Huge progress here, e.g. with buffer overflows. Is this plan enough? Example: traditional web server setup (Apache). Apache runs N identical processes, handling HTTP requests. Each Apache process has all application code: executes requests for many users; executes lots of different kinds of requests (log in, read e-mail, &c). Storage: SQL database w/ passwords, cookies, messages, &c. This arrangement is convenient and efficient. But it's secure only if the web code has no bugs! History suggests that developers will keep inventing new bugs: Buffer overflow + code injection exposes whole DB to attacker. Memory bugs may let attacker read data from app memory. Missing access control checks before DB queries. Bugs in file handling may give access to sensitive files. e.g. open("/profiles/" + user) what about user=../etc/passwd or ../mail/rtm SQL injection may let attacker r/w all DB data. "SELECT email FROM users WHERE id = " + userid So: every reason to expect we'll keep seeing bugs. And: this "hard shell, soft inside" setup makes bugs devastating. Plan B: build systems that are secure even if there are bugs. Can we do anything like this? Big idea: privilege separation divide up the s/w and data to limit damage from bugs two related benefits: limit damage from successful exploit -- "least privilege" limit attacker's access to buggy code -- "attack surface" designer must choose the separation scheme(s): by service / type of data (friend lists vs passwords) by user (my e-mail vs your e-mail) by buggyness (image resizing vs everything else) by exposure to direct attack (HTTP parsing vs everything else) by inherent privilege (hide superuser processes; hide the DB) Privilege separation is difficult Need to isolate (client/server, VMs, containers, processes, &c). Need to allow controlled interaction. Need to retain good performance. We've seen these ideas in the Google Architecture paper. Now we'll dig into a detailed case study, OkCupid's OKWS web server. You'll use an OKWS-like setup in Lab 2! How does OKWS partition the web server? Figure 1 in paper. How does a request flow in this web server? okld starts all other processes, from a config file. okd -> oklogd -> pubd -> svc -> dbproxy -> DB -> oklogd How does this design map onto physical machines? Many front-end machines, each with okld, okd, pubd, oklogd, svc*. A few DB machines, each with dbproxy, DB. What are the different services? Application-specific. Paper (5.3) mentions matching, messaging, profile editor, photos. Login is probably also a separate service. Why this privilege separation arrangement? Most bugs will be in svc code. Lots of them, relatively complex. Written by online dating experts, not security experts. Hopefully security expert writes okld, okd, db proxies, &c. Each svc can only get at relevant data from DB. dbproxy restricts what queries each service can use. Can't read/write files, or affect other components. Thus a buffer overflow in the profile editor won't expose passwords. Though it may let attacker read/write any user's profile. What harm if each component compromised? How vulnerable is each? harm == privileges attack surface == avenues by which attacker could tickle bugs okld: privileges: superuser access to web server machine. attack surface: small (no user input other than svc exit). okd: privileges: intercept all user HTTP reqs/responses, steal passwords. attack surface: parsing the first line of HTTP request. pubd: privileges: some file system access, could corrupt templates. attack surface: requests to fetch templates from okd. oklogd: privileges: change/delete log entries -- cover attacker's tracks. attack surface: log messages from okd, okld, svcs service: privileges: service's data for any user, requests to dbproxy. attack surface: HTTP requests, DB content (!). dbproxy: privileges: access/change all data in the database it's talking to. attack surface: requests from authorized services requests from unauthorized services (easy to drop) How powerful is separation by service? Does it prevent successful attacker from seeing anyone else's data? Would it make sense to separate by user instead? For reading messages? For matching and viewing profiles? Where should an attacker look for weaknesses? Probably lots of bugs in svc implementations Maybe not so bad for the "friend status" service Bad that bug in e-mail svc means I can read/write your e-mail Very bad if bugs in the password authentication service Hopefully sensitive services have few lines of code (== few bugs) Bugs in OS kernel Code injected into a svc might be able to exploit a kernel bug to become superuser, break out of chroot. Bugs in okd URL parsing Bugs in DB proxies (SQL injection, too permissive). Why are database proxies separate? Why not let svcs talk to the DB? DB accepts general SQL queries, can retrieve/modify anything. dbproxy accepts RPCs (not SQL); dbproxy generates SQL and talks to DB. thus svc code isn't subject to SQL injection. dbproxy knows which queries each svc is allowed to make. this is where the security policy lives: svc / query matrix. A knowledgeable developer must maintain dbproxy. How does a dbproxy know what svc is talking to it? dbproxies on separate DB machines, svcs use TCP sockets to connect. You can't tell from TCP much about who connected to you. So: Each svc has a unique secret 20-byte token. svc supplies its token in RPC when talking to dbproxy. dbproxy has list of allowed queries for each token. Where does the 20-byte token come from? okld reads from config, passes it to svc. What if a token disclosed (this is The Question)? e.g. e-mail service's token is disclosed. there is no immediate problem. but, if any svc were then compromised, the attacker could read all users' e-mail, even w/o breaking into e-mail svc. What if an exploited svc tries to read tokens from okld's config file? Or tries to use gdb to look inside another service process? Or tries to read a token from an svc's core dump file? We need help from the O/S to enforce isolation! What are the Unix mechanisms for isolation and control over sharing? Unix is the context in which the OKWS paper and Lab 2 live. Unix actions are taken by processes. A process is a running program. Processes are the most basic Unix tool for keeping code/data separate. A process's user ID (UID) controls many of its privileges. A UID is a small integer. Superuser (UID=0) bypasses most checks. What types of objects does Unix let processes manipulate? I.e. what must we control to enforce isolation, allow precise sharing? Processes. Processes with same UID can kill and debug (ptrace) each other. Otherwise not much direct interaction is allowed. So: processes are reasonably well isolated for different UIDs. Process memory. One process cannot directly name or access memory in another process. Exceptions: ptrace, memory mapped files. So: process memory is reasonably well isolated for different UIDs. Files, directories. Each file/directory has an owner user and group. Each has read, write, execute perms for user, group, others. E.g. rtm staff rwxr-x--- Not very expressive -- e.g. not a full Access Control List. Who can change a file's permissions? Only its owner (process UID). So: can control which processes (UIDs) can access a specific file. But hard to control the set of files a specific process can access. And hard to ensure that every file has the right permissions! No notion of central policy. Files are a huge area of difficulty for isolation. File descriptors (FDs). A process has one FD per open file and open IPC/network connection. Processes cannot see or interfere with each others' FDs. So: FDs are well isolated -- process-local names, not global. Local IPC -- "Unix domain sockets" -- "socketpair". Similar to UNIX pipes; created as FD pairs. OKWS uses these for most of its inter-server communication. Process can give end(s) (FDs) to other processes, either via fork() or by sending over existing connections. So: Well isolated, useful for setting up controlled communication. TCP/IP (Internet) connections. Servers listen on ports -- 16-bit numbers, e.g. http is 80. OKWS DB server and proxies probably listen on TCP/IP ports. Only superuser is allowed to listen to ports < 1024, e.g. 80. Anyone can try to connect to any port as a client. Servers can ignore; firewalls can block. Servers can't directly tell who the client is. So: servers have to be careful who they talk to. chroot() Problem: it is too hard to ensure that there are no sensitive files that a program can read, or write; 100,000+ files in a typical Unix install; applications are often careless about setting permissions. Solution: chroot(dirname) causes / to refer to dirname for this process and descendants, so they can't name files outside of dirname. e.g. chroot("/var/okws/run") causes subsequent absolute pathnames to start at /var/okws/run, not the real /. Thus the program can only name files/dirs under /var/okws/run. chroot() is typically used to prevent a process from using files at all, or to limit use to a specific small set of files. chroot() effective only for non-root processes (Process running as root can reset chroot.) Overall, Unix is awkward at precisely-controlled isolation+sharing: Many global name-spaces: files, UIDs, PIDs, ports. Each may allow processes to see what others are up to. Each is an invitation for bugs or careless set-up. No idea of "default to no access". Thus hard for designer to reason about what a process can do. No fine-grained grants of privilege. Can't say "process can read only these three files." Chroot() and setuid() can only be used by superuser. So non-superusers can't reduce/limit their own privilege. Awkward since security suggests *not* running as superuser. How do OKWS components interact? okld sets up socketpairs (bidirectional pipes) for each service. Control RPCs. Logging. HTTP connections, okd -> svc. Services talk to DB proxy over TCP. Most state in DB, most interaction via state in DB. How does OKWS enforce isolation between components in Figure 1? okld runs each service with a separate UID. So services can't read/write each other's memory. okld uses chroot to prevent processes from seeing most files. Table 1 pubd and oklog can only get at their own files. okld runs as root (for setuid() and to allocate TCP port 80) So we want it to do as little as possible! Why is okd a separate process? We need a way to route HTTP requests to the right svc. okd sees all requests, so we don't want to do anything else in okd. note okd does *not* run as superuser; okld gives it port 80. Why is oklogd a separate process? We don't want corrupt svc to delete/overwrite log files. Why is pubd a separate process? Keeps file handling code out of svcs. Table 1: why are all services and okld in the same chroot? We want to chroot okld -- it may have bugs too. okld needs to re-launch okd + services. So okd and services need to live somewhere in okld's chroot jail. What are we exposing by having okld, okd, and svc share chroot jail? Readable: shared libraries containing service code. Writable: each service can write to its own /cores/. Where's the config file? /etc/okws_config, maybe okld reads on startup before chroot. oklogd & pubd have separate chroots because they use files. so okld must start oklogd and pubd before it chroots itself. Why a separate UID for each service? kill, ptrace, core files Why a separate GID for each service? So svc can execute its binary but not read/write/chmod it. Binary owned by root, and x-only for svc GID: rwx--x--- Thus svc can't read secrets out of its executable, and can't modify it to persist an attack. What has happened since 2004, when OKWS was published? OKWS itself still (probably) used at OK Cupid, but not elsewhere. C++ isn't popular for web programming. UNIX process-level isolation tools are hard to use. Fine partitioning is hard, tension with fast devel and evolution. OKWS partitioning not v. useful if you have only one important service. Many new isolation tools, better than processes/chroot: VMs, FreeBSD jail, Linux containers (Docker), Linux seccomp, .. Easier (but more expensive) to partition at coarse grain (server or VM): Load balancer, login svc, profile svc, password DB, profile DB. Google architecture paper is a good example. Even finer-grained than OKWS: per-user tickets. Some systems with OKWS-style fine-grained partition: ssh-agent process to hold crypto keys, vs ssh itself. Chrome runs each frame in an isolated process. Monday: more about UNIX and its tools for isolation.