6.858 Fall 2011 Lab 3: Server-side sandboxing

Handed out: Friday, September 30, 2011

Part 1 due: Friday, October 7, 2011 (11:59pm)

Parts 1 and 2 due: Friday, October 14, 2011 (11:59pm)

All parts due: Friday, October 21, 2011 (11:59pm)

Introduction

In this lab, we will extend the zoobar web application to allow users to use Python code as their profiles. Whenever someone requests a user's profile, the server will execute that user's Python code to generate the resulting profile output. This will allow users to implement a variety of features in their profiles, such as:

A profile that greets visitors by their user name.
A profile that keeps track of the last several visitors to that profile.
A profile that gives a zoobar to every visitor (limit 1 per minute).

Supporting this safely requires sandboxing the profile code on the server, so that it cannot perform arbitrary operations or access arbitrary files. On the other hand, this code may need to keep track of persistent data in some files, or to access existing zoobar databases, to function properly.

To fetch the new source code for this lab, use git to commit your lab 2 solutions, fetch the latest version of the course repository, and then create a local branch called lab3 based on our lab3 branch, origin/lab3:

httpd@vm-6858:~$ cd lab 
httpd@vm-6858:~/lab$ git commit -am 'my solution to lab2' 
[lab2 f524ff8] my solution to lab2
 1 files changed, 1 insertions(+), 0 deletions(-)
httpd@vm-6858:~/lab$ git pull 
Already up-to-date.
httpd@vm-6858:~/lab$ git checkout -b lab3 origin/lab3 
Branch lab3 set up to track remote branch lab3 from origin.
Switched to a new branch 'lab3'
httpd@vm-6858:~/lab$

The new source code includes the following components, which you should familiarize yourself with:

profiles/ contains five Python-based profiles, which you will use as examples throughout this lab:
- profiles/hello-user.py is a simple profile that prints back the name of the visitor when the profile code is executed, along with the current time.
- profiles/visit-tracker.py keeps track of the last time that each visitor looked at the profile, and prints out the last visit time (if any).
- profiles/last-visits.py records the last three visitors to the profile, and prints them out.
- profiles/xfer-tracker.py prints out the last zoobar transfer between the profile owner and the visitor.
- profiles/granter.py gives the visitor one zoobar, as long as the profile owner has any zoobars left, the visitor has less than 20 zoobars, and it has been at least a minute since the last time the visitor got a free zoobar.
zoobar/proflib.py is a Python module imported by the Python-based profiles to provide an API for accessing zoobar state. For example proflib.py provides functions to get parameters passed to the Python profile from the zoobar web application, to look up a user's zoobar balance and profile, look up a list of transfers for a user, and to transfer zoobars.
zoobar/nullsandbox.py is a Python module that provides a run function to execute Python code and return its output. As suggested by its name, it does not provide any isolation for the code; it is the starting point for an (insecure) Python profile system.
zoobar/pypysandbox.py provides some initial code for a module that uses the PyPy interpreter to implement a secure sandbox for Python code, which you will fully implement in this lab. We will discuss the PyPy sandbox more later.

To get started, verify that the lab 3 code you have checked out works, by setting up a user with each of the five Python profiles in the profiles/ directory, and checking that the profile code works properly. To run the server, follow the same steps as before to set up the /jail directory and then run zookld:

httpd@vm-6858:~/lab$ make
cc -m32 -g -std=c99 -fno-stack-protector -Wall -Werror -D_GNU_SOURCE   -c -o zookld.o zookld.c
cc -m32 -g -std=c99 -fno-stack-protector -Wall -Werror -D_GNU_SOURCE   -c -o http.o http.c
cc -m32  zookld.o http.o  -lcrypto -o zookld
cc -m32 -g -std=c99 -fno-stack-protector -Wall -Werror -D_GNU_SOURCE   -c -o zookfs.o zookfs.c
cc -m32  zookfs.o http.o  -lcrypto -o zookfs
cc -m32 -g -std=c99 -fno-stack-protector -Wall -Werror -D_GNU_SOURCE   -c -o zookd.o zookd.c
cc -m32  zookd.o http.o  -lcrypto -o zookd
cc -m32 -g -std=c99 -fno-stack-protector -Wall -Werror -D_GNU_SOURCE   -c -o zooksvc.o zooksvc.c
cc -m32  zooksvc.o  -lcrypto -o zooksvc
httpd@vm-6858:~/lab$ sudo make setup
[sudo] password for httpd: 6858
./chroot-setup.sh
+ grep -qv uid=0
+ id
...
httpd@vm-6858:~/lab$ sudo ./zookld
zookld: Listening on port 8080
zookld: Launching zookd
...

You can run make check to run some basic tests and verify that the profile code is working properly (although, keep in mind that these tests are not exhaustive). At this point, the sandbox check and /tmp check will not pass. This is expected, as the PyPy sandbox is not enabled yet.

If something doesn't seem to be working, try to figure out what went wrong, or contact the course staff, before proceeding further.

Part 1: Python profiles with privilege separation

The first part of this lab will require you to combine your privilege-separated design from lab 2 with the Python profiles from this lab. There are two main ways in which these features interact. First, the databases used to look up information are different (e.g., the zoobar balances are stored in a separate database in a privilege-separated design). Second, transferring zoobars between users in a privilege-separated design requires an authentication token for the sender.

To get started, you will next need to merge your solutions to achieve privilege separation for lab 2 into the lab3 branch, by running:

httpd@vm-6858:~/lab$ git merge lab2 
Merge made by recursive
...
httpd@vm-6858:~/lab$

At this point, if git reports any conflicts, you should resolve them first, and commit the resolved merge, before proceeding.

Exercise 1. Make Python profiles work in your privilege-separated design. Verify that the resulting system can correctly execute all five of the example profiles. In order to support the granter.py profile, which performs zoobar transfers, you may need to give the profile code an authentication token for the profile owner. Be sure that you do not create a way for an arbitrary user to get another user's authentication token. One way around this would be to extend the authentication service you implemented in lab 2, to perform an operation that runs a given user's profile with that user's current authentication token.

Run sudo make check to verify that your modified configuration passes our basic tests, except for the sandbox check and the /tmp check.

Part 2: Initial sandboxing with PyPy

At this point, your web server can run user-supplied Python profiles. However, a malicious user may supply arbitrary Python code. Since the profile code is currently executed using nullsandbox.py, it can potentially perform arbitrary actions on the server, such as reading, writing, or deleting files accessible to the user ID under which the code is running.

To provide stronger isolation guarantees, we will use the PyPy sandbox. At a high level, PyPy is a Python interpreter, just like the standard CPython interpreter called python that you are used to using. One difference is that PyPy has a "sandbox" mode of execution. In this sandbox mode, whenever the PyPy interpreter wants to perform a system call (e.g., in order to open a file when it encounters a call to the Python open() function), it does not issue the system call directly, but instead sends the system call arguments over RPC to another process. It then waits for the RPC server to interpret the system call arguments, and send back the appropriate system call return values, before proceeding with its execution. Thus, the RPC server is in complete control of how the sandboxed code can interact with the outside world, and can implement different sandboxing policies.

For the purposes of this lab, the PyPy interpreter is fixed, but you will be responsible for implementing parts of the RPC server that interprets and executes "system calls" issued by the sandboxed interpreter, to support operations needed by our five Python profiles.

As an aside, to ensure strong isolation, PyPy also uses a feature of the Linux kernel called seccomp. This feature prevents the sandboxed PyPy interpreter from making any system calls other than read and write on existing file descriptors, and the only open file descriptor is a pipe leading to the RPC server for the sandbox. (To be perfectly complete, we should say that seccomp also allows two other system calls, exit and sigreturn, which have little further relevance for this discussion.) Thus, even if the PyPy sandbox gets compromised by some malicious Python code it is executing, it will be unable to do anything other than issue arbitrary messages to the RPC server. As long as the RPC server has no vulnerabilities, the sandbox will remain isolated from the rest of the system.

You can read more about the PyPy sandbox here:

http://codespeak.net/pypy/dist/pypy/doc/sandbox.html

In the lab 3 source code, zoobar/pypysandbox.py implements an initial version of PyPy-based sandboxed execution using its run function. The MySandboxedProc class in pypysandbox.py implements the RPC server we described above. This class inherits from existing library code for implementing such an RPC server, which you can find in /jail/zoobar/pypy-sandbox/pypy/translator/sandbox/pypy_interact.py and /jail/zoobar/pypy-sandbox/pypy/translator/sandbox/sandlib.py. This library code invokes a method called do_ll_os__ll_os_syscall() to perform system call syscall; you can see a few examples in pypysandbox.py already. The library invokes the sandboxed PyPy interpreter binary, called pypy-c, from the /zoobar/pypy-sandbox/pypy/translator/goal directory (inside of a chroot to /jail).

To implement system calls related to the file system, the RPC server uses a Python-based representation of a file system, which you can see in /jail/zoobar/pypy-sandbox/pypy/translator/sandbox/vfs.py. You will be extending the file system parts of the RPC server in later exercises, but for now you may want to simply familiarize yourself with this code.

Exercise 2. Modify the zoobar web application to use pypysandbox instead of nullsandbox to execute Python profiles. For now, focus on supporting basic functionality working (namely, the hello-user.py profile). We will get to supporting other example profiles later.

To see system calls being issued by the sandboxed PyPy interpreter, set self.debug to True in the MySandboxedProc constructor __init__().

You will need to provide a different version of proflib.py for Python profiles running inside of the sandbox, because the PyPy interpreter does not support some modules, such as the sqlite database. For now, you only need to implement the get_param() function in the sandboxed version of proflib.py. You will also need to expose your new version of proflib.py in the sandboxed file system, perhaps by modifying pypysandbox.py. Note that the sandboxed PyPy interpreter loads all files, including proflib.py, via calls to the RPC server in pypysandbox.py.

Run sudo make check to verify that your modified configuration passes our tests for hello-user.py. The sandbox check should also pass at this time.

Submit your answers to the first two parts of the lab assignment by running make handin, and upload the resulting lab3-handin.tar.gz file at http://pdos.csail.mit.edu/cgi-bin/858handin.

Part 3: Extending the PyPy sandbox

In this part of the lab, we will extend the PyPy sandbox implemented in pypysandbox.py to support operations needed for a profile to store persistent data, and to access zoobar application state.

Exercise 3. Implement a writable and persistent /tmp directory in the PyPy sandbox. This is needed by the visit-tracker.py and last-visits.py profiles to store their persistent information. Make sure that the /tmp directory seen by each user's profile is separate from other users, so that profiles of different users cannot tamper with each other's files.

Ensure that the visit-tracker and last-visits profiles work correctly after you implement your changes.

Exercise 4. Since the standard Python SQLite module is implemented by calling into the native SQLite C/C++ library, it is not available in the PyPy sandbox (because the native library does not know how to forward its system calls via the RPC channel). In this exercise, your job is to support the get_xfers() function from proflib.py in the sandbox. A reasonable approach to do this is to extend the RPC server (pypysandbox.py) to perform the get_xfers() functionality on behalf of the sandboxed code. You will also need to modify proflib.py to invoke this new interface.

Hint: to create a new interface between code in the sandbox and the RPC server outside of the sandbox, such as for performing get_xfers() calls, consider overloading the file namespace by defining a special file name that corresponds to get_xfers calls. You can take a look at VirtualizedSocketProc in .../pypy-sandbox/pypy/translator/sandbox/sandlib.py to see an example of how the PyPy sandbox exposes access to TCP sockets in this manner.

Once you are finished with this exercise, the xfer-tracker.py should be functioning correctly in the sandbox.

Exercise 5. Implement the last remaining parts of proflib.py in the sandbox: get_user() and xfer(). Once you are done, granter.py should work from within the sandbox.

Challenge! (optional) For extra credit, allow sandboxed code to safely manipulate sub-directories under /tmp using mkdir and rmdir, to open files in those sub-directories, and to be able to unlink and rename files and sub-directories. Write an example Python profile that uses sub-directories and renames files.

Challenge! (optional) Allow sandboxed code to safely create and use symlinks inside of its /tmp directory.

You are done! Run make handin and follow instructions to upload the resulting file.

Handed out:	Friday, September 30, 2011
Part 1 due:	Friday, October 7, 2011 (11:59pm)
Parts 1 and 2 due:	Friday, October 14, 2011 (11:59pm)
All parts due:	Friday, October 21, 2011 (11:59pm)