6.858 2020 Lecture 5: Privilege Separation, Isolation, and OKWS The problem: what to do about bugs? Plan A: find them, fix them, avoid making new ones. Huge progress here, e.g. with buffer overflows. Is this plan enough? Example: traditional web server setup (Apache). Apache runs N identical processes, handling HTTP requests. Each Apache process has all application code: executes requests for many users; executes lots of different kinds of requests (log in, read e-mail, &c). Storage: SQL database w/ passwords, cookies, messages, &c. This arrangement is convenient and efficient. But it's secure only if the web code has no bugs! History suggests that developers will keep inventing new bugs: Buffer overflow + code injection exposes whole DB to attacker. Memory bugs may let attacker read data from app memory. Missing access control checks before DB queries. Bugs in file handling may give access to sensitive files. e.g. open("/profiles/" + user) what about user=../etc/passwd or ../mail/rtm SQL injection may let attacker r/w all DB data. "SELECT email FROM users WHERE id = " + userid So: every reason to expect we'll keep seeing bugs. And: this "hard shell, soft inside" setup makes bugs devastating. Plan B: build systems that are secure even if there are bugs. Can we do anything like this? Big idea: privilege separation divide up the s/w and data to limit damage from bugs two related benefits: limit damage from successful exploit -- "least privilege" limit attacker's access to buggy code -- "attack surface" designer must choose the separation scheme(s): by service / type of data (friend lists vs passwords) by user (my e-mail vs your e-mail) by buggyness (image resizing vs everything else) by exposure to direct attack (HTTP parsing vs everything else) by inherent privilege (hide superuser processes; hide the DB) Privilege separation is difficult Need to isolate (client/server, VMs, containers, processes, &c). Need to allow controlled interaction. Need to retain good performance. We've seen these ideas in the Google Architecture paper. Now we'll dig into a detailed case study, OkCupid's OKWS web server. You'll use an OKWS-like setup in Lab 2! How does OKWS partition the web server? Figure 1 in paper. How does a request flow in this web server? okld starts all other processes, from a config file. okd -> oklogd -> pubd -> svc -> dbproxy -> DB -> oklogd How does this design map onto physical machines? Many front-end machines, each with okld, okd, pubd, oklogd, svc*. A few DB machines, each with dbproxy, DB. What are the different services? Application-specific. Paper (5.3) mentions matching, messaging, profile editor, photos. Login is probably also a separate service. Why this privilege separation arrangement? Most bugs will be in svc code. Lots of them, relatively complex. Written by online dating experts, not security experts. Hopefully security expert writes okld, okd, db proxies, &c. Each svc can only get at relevant data from DB. dbproxy restricts what queries each service can use. Can't read/write files, or affect other components. Thus a buffer overflow in the profile editor won't expose passwords. Though it may let attacker read/write any user's profile. What harm if each component compromised? How vulnerable is each? harm == privileges attack surface == avenues by which attacker could tickle bugs okld: privileges: superuser access to web server machine. attack surface: small (no user input other than svc exit). okd: privileges: intercept all user HTTP reqs/responses, steal passwords. attack surface: parsing the first line of HTTP request. pubd: privileges: some file system access, could corrupt templates. attack surface: requests to fetch templates from okd. oklogd: privileges: change/delete log entries -- cover attacker's tracks. attack surface: log messages from okd, okld, svcs service: privileges: service's data for any user, requests to dbproxy. attack surface: HTTP requests, DB content (!). dbproxy: privileges: access/change all data in the database it's talking to. attack surface: requests from authorized services requests from unauthorized services (easy to drop) How powerful is separation by service? Does it prevent successful attacker from seeing anyone else's data? Would it make sense to separate by user instead? For reading messages? For matching and viewing profiles? Where should an attacker look for weaknesses? Probably lots of bugs in svc implementations Maybe not so bad for the "friend status" service Bad that bug in e-mail svc means I can read/write your e-mail Very bad if bugs in the password authentication service Hopefully sensitive services have few lines of code (== few bugs) Bugs in OS kernel Code injected into a svc might be able to exploit a kernel bug to become superuser, break out of chroot. Bugs in okd URL parsing Bugs in DB proxies (SQL injection, too permissive). Why are database proxies separate? Why not let svcs talk to the DB? DB accepts general SQL queries, can retrieve/modify anything. dbproxy accepts RPCs (not SQL); dbproxy generates SQL and talks to DB. thus svc code isn't subject to SQL injection. dbproxy knows which queries each svc is allowed to make. this is where the security policy lives: svc / query matrix. A knowledgeable developer must maintain dbproxy. How does a dbproxy know what svc is talking to it? dbproxies on separate DB machines, svcs use TCP sockets to connect. You can't tell from TCP much about who connected to you. So: Each svc has a unique secret 20-byte token. svc supplies its token in RPC when talking to dbproxy. dbproxy has list of allowed queries for each token. Where does the 20-byte token come from? okld reads from config, passes it to svc. What if a token disclosed (this is the homework question)? e.g. e-mail service's token is disclosed. there is no immediate problem. but, if any svc were then compromised, the attacker could read all users' e-mail, even w/o breaking into e-mail svc. What if an exploited svc tries to read tokens from okld's config file? Or tries to use gdb to look inside another service process? Or tries to read a token from an svc's core dump file? We need help from the O/S to enforce isolation! What are the mechanisms for isolation and control over sharing? Paper uses Unix processes, user IDs (UIDs), file permissions, and fd passing What is setuid(uid)? a process can drop its privileges from root to an ordinary uid What is chroot(dirname)? causes / to refer to dirname for this process and descendants, so they can't name files outside of dirname. What is FD passing? One process open network connection and passes the file descriptor for it to another process For example, okld passes file descriptor for port 80 to okd. How does OKWS enforce isolation between components in Figure 1? okld runs each service with a separate UID. [In lab 2, you would run each service as a separate container] So services can't read/write each other's memory. okld uses chroot to prevent processes from seeing most files. Table 1 pubd and oklog can only get at their own files. okld runs as root (for setuid() and to allocate TCP port 80) So we want it to do as little as possible! Why is okd a separate process? We need a way to route HTTP requests to the right svc. okd sees all requests, so we don't want to do anything else in okd. note okd does *not* run as superuser; okld gives it port 80. Why is oklogd a separate process? We don't want corrupt svc to delete/overwrite log files. Why is pubd a separate process? Keeps file handling code out of svcs. Table 1: why are all services and okld in the same chroot? We want to chroot okld -- it may have bugs too. okld needs to re-launch okd + services. So okd and services need to live somewhere in okld's chroot jail. What are we exposing by having okld, okd, and svc share chroot jail? Readable: shared libraries containing service code. Writable: each service can write to its own /cores/. Where's the config file? /etc/okws_config, maybe okld reads on startup before chroot. oklogd & pubd have separate chroots because they use files. so okld must start oklogd and pubd before it chroots itself. Why a separate UID for each service? kill, ptrace, core files Why a separate GID for each service? So svc can execute its binary but not read/write/chmod it. Binary owned by root, and x-only for svc GID: rwx--x--- Thus svc can't read secrets out of its executable, and can't modify it to persist an attack. How do OKWS components interact? okld sets up socketpairs (bidirectional pipes) for each service. Control RPCs. Logging. HTTP connections, okd -> svc. Services talk to DB proxy over TCP. Most state in DB, most interaction via state in DB. UNIX process-level isolation tools are hard to use. Many global name-spaces: files, UIDs, PIDs, ports. Each may allow processes to see what others are up to. Each is an invitation for bugs or careless set-up. No idea of "default to no access". Thus hard for designer to reason about what a process can do. No fine-grained grants of privilege. Can't say "process can read only these three files." Chroot() and setuid() can only be used by superuser. So non-superusers can't reduce/limit their own privilege. Awkward since security suggests *not* running as superuser. Lab 2 uses Linux containers (lxc) Didn't exist when author build OKWS Containers provide the illusion of virtual machines wo. using virtual machines Containers are more efficient than virtual machines Container is a Linux process, but strongly isolated: Limited access to the kernel name spaces Limited access to system calls No access to the file system Containers behave like a virtual machine Started from a VM image Have their own IP address Have their own file system Lab 2 uses *unprivileged* containers These containers run as non-root user processes If the process inside the container runs as root, still limited privileges More difficult to break out of container than chrooted-process Lab2 also uses chroot/uid privilege separate processes in one container (profile) But that is the exception, lab 2 mostly relies on containers Using containers for privilege separation Plan: turn single-process application into a virtual "distributed" application Create a container for different services Copy the right files into the container Assign its own IP address Use RPC over TCP to communicate with other containers Limit communication between containers Setup firewall rules to limit communication between containers Lab 2 has zookld that does this; similar to okld The google architecture paper uses physical machines to split services Containers support the same idea using a single physical machine What has happened since 2004, when OKWS was published? OKWS itself still (probably) used at OK Cupid, but not elsewhere. C++ isn't popular for web programming. UNIX process-level isolation tools are hard to use. Fine partitioning is hard, tension with fast devel and evolution. OKWS partitioning not v. useful if you have only one important service. Privilege separation commonly used in practice Load balancer, login svc, profile svc, password DB, profile DB. Google architecture paper is a good example. Even finer-grained than OKWS: per-user tickets. Some systems with OKWS-style fine-grained partition: ssh-agent process to hold crypto keys, vs ssh itself. Chrome runs each frame in an isolated process. Many new isolation tools, better than processes/chroot: VMs, FreeBSD jail, Linux containers (Docker), Linux seccomp, ..