6.858 Spring 2018 Lab 1: Buffer overflows

Handed out: Wednesday, February 7, 2018
Parts 1 and 2 due: Friday, February 16, 2018 (5:00pm)
All parts due: Friday, February 23, 2018 (5:00pm)


You will do a sequence of labs in 6.858. These sequence of labs will give you practical experience with common attacks and counter measures. To make the issues concrete, you will explore the attacks and counter meatures in the context of the zoobar web application in the following ways:

Lab 1 will introduce you to buffer overflow vulnerabilities, in the context of a web server called zookws. The zookws web server is running a simple python web application, zoobar, where users transfer "zoobars" (credits) between each other. You will find buffer overflows in the zookws web server code, write exploits for the buffer overflows to inject code into the server, figure out how to bypass non-executable stack protection, and finally look for other potential problems in the web server implementation. Later labs look at other security aspects of the zoobar and zookws infrastructure.

Each lab requires you to learn a new programming language or some other piece of infrastructure. For example, in this lab you must become intimately familiar with certain aspects of the C language, x86 assembly language, gdb, etc. The labs do so because that allows you to understand attacks and defenses in realistic situations. Often you need to understand certain parts of this new infrastructure in detail; security weaknesses often show up in corner cases, and so you need to understand the details to craft exploits and design defenses for those corner cases. These two factors (new infrastructure and details) can make the labs time consuming. You should start early on the labs and work on them daily for some limited time (each lab has several exercises), instead of trying to do all exercises in a single shot before the deadline. You should also try to understand the necessary details, instead of muddling your way through the exercises. If you don't, the labs will take a lot of time. If you get stuck on a detail, post a question on Piazza.

Several labs, including this lab, ask you to design exploits. These exploits are realistic enough that you might be able to use them for a real attack, but you should not do so. The point of the designing exploits is to teach you how to defend against them, not how to use them---attacking computer systems is illegal (see MIT network rules) and can get you into serious trouble. Don't do it.

NOTE: Since we re-use the same lab assignments across years, we ask that you please do not make your lab code publicly accessible (e.g., by checking in your lab solutions into a public repository on github). This helps keep the labs fair and interesting for students in future years.

Lab infrastructure

Exploiting buffer overflows requires precise control over the execution environment. A small change in the compiler, environment variables, or the way the program is executed can result in slightly different memory layout and code structure, thus requiring a different exploit. For this reason, this lab uses a VMware virtual machine to run the vulnerable web server code.

To start working on this lab assignment, you should download the VMware Player, which can run virtual machines on Linux and Windows systems. For Mac users, MIT has a site license for VMware Fusion. You can download VMware Fusion from this web site.

Once you have VMware installed on your machine, you should download the course VM image, and unpack it on your computer. This virtual machine contains an installation of Ubuntu 16.04 Linux, and the following accounts have been created inside the VM.
root6858 You can use the root account to install new software packages into the VM, if you find something missing, using apt-get install pkgname.
httpd6858 The httpd account is used to execute the web server, and will contain the source code you will need for this lab assignment, in /home/httpd/lab (see instructions for getting the code below).

For Linux users, we've also tested running the course VM on KVM, which is built into the Linux kernel and should be much easier to get working than VMware. KVM should be available through your distribution, and is preinstalled on Athena cluster computers; on Debian or Ubuntu, try apt-get install qemu-kvm. Once installed, you should be able to run a command like kvm -m 512 -net nic -net user,hostfwd=tcp:,hostfwd=tcp: vm-6858.vmdk to run the VM and forward the relevant ports. Note that KVM leverages hardware virtualization support in your CPU. You must enable this support in your BIOS (which is often, but not always, the default). If you have another virtual machine monitor installed on your machine (e.g., VMware), that virtual machine monitor may grab the hardware virtualization support exclusively and prevent KVM from working.

You can either log into the virtual machine using its console, or you can use ssh to log into the virtual machine over the (virtual) network. To determine the virtual machine's IP address, log in as root on the console and run /sbin/ifconfig eth0. (If using KVM with the command above, then ssh -p 2222 httpd@localhost should work.)

Getting started

The files you will need for this and subsequent lab assignments in this course is distributed using the Git version control system. You can also use Git to keep track of any changes you make to the initial source code. Here's an overview of Git and the Git user's manual, which you may find useful.

The course Git repository is available at http://web.mit.edu/6858/2018/6.858-lab-2018.git. To begin with, log into the VM using the httpd account and clone the source code for lab 1 as follows.

httpd@vm-6858:~$ git clone http://web.mit.edu/6858/2018/6.858-lab-2018.git lab
Initialized empty Git repository in /home/httpd/lab/.git/
httpd@vm-6858:~$ cd lab

Before you proceed with this lab assignment, make sure you can compile the zookws web server:

httpd@vm-6858:~/lab$ make
cc zookd.c -c -o zookd.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc http.c -c -o http.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc -m32  zookd.o http.o  -lcrypto -o zookd
cp zookd zookd-exstack
execstack -s zookd-exstack
cp zookd zookd-nxstack
cc zookd.c -c -o zookd-withssp.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE
cc http.c -c -o http-withssp.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE
cc -m32  zookd-withssp.o http-withssp.o  -lcrypto -o zookd-withssp
cc -m32   -c -o shellcode.o shellcode.S
objcopy -S -O binary -j .text shellcode.o shellcode.bin
cc run-shellcode.c -c -o run-shellcode.o -m32 -g -std=c99 -Wall -Werror -D_GNU_SOURCE -fno-stack-protector
cc -m32  run-shellcode.o  -lcrypto -o run-shellcode
rm shellcode.o

The component of zookws that receives HTTP requests is zookd. It is written in C and serves static files and executes dynamic scripts. For this lab you don't have to understand the dynamic scripts; they are written in Python and the exploits in this lab apply only to C code. The HTTP-related code is in http.c. Here is a tutorial of the HTTP protocol.

There are two versions of zookd you will be using:

zookd-exstack has an executable stack, which makes it easier to inject executable code given a stack buffer overflow vulnerability. zookd-nxstack has a non-executable stack, and you will write exploits that bypass non-executable stacks later in this lab assignment.

In order to run the web server in a predictable fashion---so that its stack and memory layout is the same every time---you will use the clean-env.sh script. This is the same way in which we will run the web server during grading, so make sure all of your exploits work on this configuration!

The reference binaries of zookd are provided in bin.tar.gz, which we will use for grading. Make sure your exploits work on those binaries.

Now, make sure you can run the zookws web server and access the zoobar web application from a browser running on your machine, as follows:

httpd@vm-6858:~/lab$ /sbin/ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:0c:29:57:90:a1
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::20c:29ff:fe57:90a1/64 Scope:Link
          RX packets:149 errors:0 dropped:0 overruns:0 frame:0
          TX packets:94 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:15235 (15.2 KB)  TX bytes:12801 (12.8 KB)
          Interrupt:19 Base address:0x2000

httpd@vm-6858:~/lab$ ./clean-env.sh ./zookd 8080

The /sbin/ifconfig command will give you the virtual machine's IP address. The ./clean-env.sh commands starts zookd on port 8080. In this particular example, you would want to open your browser and go to the URL (If you're using KVM with the command above, just access http://localhost:8080/ on your host.) If something doesn't seem to be working, try to figure out what went wrong, or contact the course staff, before proceeding further.

Part 1: Finding buffer overflows

In the first part of this lab assignment, you will find buffer overflows in the provided web server. To do this lab, you will need to understand the basics of buffer overflows. To help you get started with this, you can watch a video tutorial on lab 1 by Ben Yuan. If you want more details, you can read Aleph One's article, Smashing the Stack for Fun and Profit, as well as this paper, to figure out how buffer overflows work.

Exercise 1. Study the web server's code, and find examples of code vulnerable to memory corruption through a buffer overflow. Write down a description of each vulnerability in the file /home/httpd/lab/bugs.txt; use the format described in that file. For each vulnerability, describe the buffer which may overflow, how you would structure the input to the web server (i.e., the HTTP request) to overflow the buffer, and whether the vulnerability can be prevented using stack canaries. Locate at least 5 different vulnerabilities.

You can use the command make check-bugs to check if your bugs.txt file matches the required format, although the command will not check whether the bugs you listed are actual bugs or whether your analysis of them is correct.

Now, you will start developing exploits to take advantage of the buffer overflows you have found above. We have provided template Python code for an exploit in /home/httpd/lab/exploit-template.py, which issues an HTTP request. The exploit template takes two arguments, the server name and port number, so you might run it as follows to issue a request to zookws running on localhost:

httpd@vm-6858:~/lab$ ./clean-env.sh ./zookd-exstack 8080 &
[1] 2676
httpd@vm-6858:~/lab$ ./exploit-template.py localhost 8080
HTTP request:
GET / HTTP/1.0


You are free to use this template, or write your own exploit code from scratch. Note, however, that if you choose to write your own exploit, the exploit must run correctly inside the provided virtual machine.

You may find gdb useful in building your exploits (though it is not required for you to do so). As zookd forks off many processes (one for each client), it can be difficult to debug the correct one. The easiest way to do this is to run the web server ahead of time with clean-env.sh and then attaching gdb to an already-running process with the -p flag. You can find the PID of a process by using pgrep; for example, to attach to zookd-exstack, start the server and, in another shell, run

httpd@vm-6858:~/lab$ gdb -p $(pgrep zookd-exstack)
0x4001d422 in __kernel_vsyscall ()
(gdb) break your-breakpoint
Breakpoint 1 at 0x1234567: file zookd.c, line 999.
(gdb) continue

Keep in mind that a process being debugged by gdb will not get killed even if you terminate the parent zookd process using ^C. If you are having trouble restarting the web server, check for leftover processes from the previous run, or be sure to exit gdb before restarting zookd.

When a process being debugged by gdb forks, by default gdb continues to debug the parent process and does not attach to the child. Since zookd forks a child process to service each request, you may find it helpful to have gdb attach to the child on fork, using the command set follow-fork-mode child. We have added that command to /home/httpd/lab/.gdbinit, which will take effect if you start gdb in that directory.

For this and subsequent exercises, you may need to encode your attack payload in different ways, depending on which vulnerability you are exploiting. In some cases, you may need to make sure that your attack payload is URL-encoded; that is, use + instead of space and %2b instead of +. Here is a URL encoding reference and a handy conversion tool. You can also use quoting functions in the python urllib module to URL encode strings. In other cases, you may need to include binary values into your payload. The Python struct module can help you do that. For example, struct.pack("<I", x) will produce a 4-byte (32-bit) binary encoding of the integer x.

Exercise 2. Pick two buffer overflows out of what you have found for later exercises (although you can change your mind later, if you find your choices are particularly difficult to exploit). The first must overwrite a return address on the stack, and the second must overwrite some other data structure that you will use to take over the control flow of the program.

Write exploits that trigger them. You do not need to inject code or do anything other than corrupt memory past the end of the buffer, at this point. Verify that your exploit actually corrupts memory, by either checking the last few lines of dmesg | tail, using gdb, or observing that the web server crashes (i.e., it will print Child process 19219 terminated incorrectly, receiving signal 11)

Provide the code for the exploits in files called exploit-2a.py and exploit-2b.py, and indicate in answers.txt which buffer overflow each exploit triggers. If you believe some of the vulnerabilities you have identified in Exercise 1 cannot be exploited, choose a different vulnerability.

You can check whether your exploits crash the server as follows:

httpd@vm-6858:~/lab$ make check-crash

Part 2: Code injection

In this part, you will use your buffer overflow exploits to inject code into the web server. The goal of the injected code will be to unlink (remove) a sensitive file on the server, namely /home/httpd/grades.txt. Use zookd-exstack, since it has an executable stack that makes it easier to inject code. The zookws web server should be started as follows.

httpd@vm-6858:~/lab$ ./clean-env.sh ./zookd-exstack 8080

You can build the exploit in two steps. First, write the shell code that unlinks the sensitive file, namely /home/httpd/grades.txt. Second, embed the compiled shell code in an HTTP request that triggers the buffer overflow in the web server.

When writing shell code, it is often easier to use assembly language rather than other higher-level languages, such as C. This is because the exploit usually needs fine control over the stack layout, register values and code size. C compiler will generate additional function preludes, perform various optimizations, which makes the compiled binary code unpredictable.

We have provided Aleph One's shell code for you to use in /home/httpd/lab/shellcode.S, along with Makefile rules that produce /home/httpd/lab/shellcode.bin, a compiled version of the shell code, when you run make. Aleph One's exploit is intended to exploit setuid-root binaries, and thus it runs a shell. You will need to modify this shell code to instead unlink /home/httpd/grades.txt.

To help you develop your shell code for this exercise, we have provided a program called run-shellcode that will run your binary shell code, as if you correctly jumped to its starting point. For example, running it on Aleph One's shell code will cause the program to execve("/bin/sh"), thereby giving you another shell prompt:

httpd@vm-6858:~/lab$ ./run-shellcode shellcode.bin

Exercise 3 (warm-up). Modify shellcode.S to unlink /home/httpd/grades.txt. Your assembly code can either invoke SYS_unlink system call, or call the unlink() library function.

To test whether the shell code does its job, run the following commands:

httpd@vm-6858:~/lab$ make
httpd@vm-6858:~/lab$ touch ~/grades.txt
httpd@vm-6858:~/lab$ ./run-shellcode shellcode.bin
# Make sure /home/httpd/grades.txt is gone
httpd@vm-6858:~/lab$ ls ~/grades.txt
ls: cannot access /home/httpd/grades.txt: No such file or directory

Next, we construct a malicious HTTP request that injects the compiled byte code to the web server, and hijack the server's control flow to run the injected code. When developing an exploit, you will have to think about what values are on the stack, so that you can modify them accordingly. For your reference, here is what the stack frame of some function foo looks like; here, foo has a local variable char buf[256]:

 STACK               +------------------+             MEMORY
                     |       ...        |                  
   |                 |  stack frame of  |               /|\
   |                 |   foo's caller   |                |
   |                 |       ...        |                |
   |                 +------------------+                |
   |                 |  return address  | (4 bytes)      |
   |                 | to foo's caller  |                |
   |                 +------------------+                |
   |    %ebp ------> |    saved %ebp    | (4 bytes)      |
   |                 +------------------+                |
   |                 |       ...        |                |
   |                 +------------------+                |
   |                 |     buf[255]     |                |
   |                 |       ...        |                |
  \|/    buf ------> |      buf[0]      |                |

Note that the stack grows down in this figure, and memory addresses are increasing up.

When you're constructing an exploit, you will often need to know the addresses of specific stack locations, or specific functions, in a particular program. One way to do this is to add printf() statements to the function in question. For example, you can use printf("Pointer: %p\n", &x); to print the address of variable x or function x. However, this approach requires some care: you need to make sure that your added statements are not themselves changing the stack layout or code layout. We (and make check) will be grading the lab without any printf statements you may have added.

A more fool-proof approach to determine addresses is to use gdb. For example, suppose you want to know the stack address of the pn[] array in the http_serve function in zookd-exstack, and the address of its saved %ebp register on the stack. You can obtain them using gdb as follows:

httpd@vm-6858:~/lab$ gdb -p $(pgrep zookd-exstack)
0x40022416 in __kernel_vsyscall ()
(gdb) break http_serve
Breakpoint 1 at 0x804977c: file http.c, line 275.
(gdb) continue

Be sure to run gdb from the ~/lab directory, so that it picks up the set follow-fork-mode child command from ~/lab/.gdbinit. Now you can issue an HTTP request to the web server, so that it triggers the breakpoint, and so that you can examine the stack of http_serve:

[New process 15177]
[Switching to process 15177]

Thread 2.1 "zookd-exstack" hit Breakpoint 1, http_serve (fd=4, 
    name=0x8053744 "/zoobar/index.cgi/users") at http.c:275
275	    void (*handler)(int, const char *) = http_serve_none;
(gdb) print &pn
$1 = (char (*)[1024]) 0xbffff19c
(gdb) info registers
eax            0x8053744	134559556
ecx            0x804a319	134521625
edx            0x8053742	134559554
ebx            0x0	0
esp            0xbffff140	0xbffff140
ebp            0xbffff5a8	0xbffff5a8
esi            0x401dc000	1075691520
edi            0x401dc000	1075691520
eip            0x804977c	0x804977c 
eflags         0x282	[ SF IF ]
cs             0x73	115
ss             0x7b	123
ds             0x7b	123
es             0x7b	123
fs             0x0	0
gs             0x33	51

From this, you can tell that, at least for this invocation of http_serve, the pn[] buffer on the stack lives at address 0xbffff19c, and the value of %ebp (which points at the saved %ebp on the stack) is 0xbfffd5a8.

Now it's your turn to develop an exploit.

Exercise 3. Starting from one of your exploits from Exercise 2, construct an exploit that hijacks control flow of the web server and unlinks /home/httpd/grades.txt. Save this exploit in a file called exploit-3.py.

Explain in answers.txt whether or not the other buffer overflow vulnerabilities you found in Exercise 1 can be exploited in this manner.

Verify that your exploit works; you will need to re-create /home/httpd/grades.txt after each successful exploit run.

Suggestion: first focus on obtaining control of the program counter. Sketch out the stack layout that you expect the program to have at the point when you overflow the buffer, and use gdb to verify that your overflow data ends up where you expect it to. Step through the execution of the function to the return instruction to make sure you can control what address the program returns to. The next, stepi, info reg, and disassemble commands in gdb should prove helpful.

Hint: Some HTTP requests might cause the function you're exploiting returns early. If necessary, sanitize the your attack payload (e.g., using urllib.quote) to make sure the control flow can reach the vulnerable point that you intended to exploit.

Once you can reliably hijack the control flow of the program, find a suitable address that will contain the code you want to execute, and focus on placing the correct code at that address---e.g. a derivative of Aleph One's shell code.

You can check whether your exploit works as follows:

httpd@vm-6858:~/lab$ make check-exstack

The test either prints "PASS" or fails. We will grade your exploits in this way. If you use another name for the exploit script, change Makefile accordingly.

The standard C compiler used on Linux, gcc, implements a version of stack canaries (called SSP). You can explore whether GCC's version of stack canaries would or would not prevent a given vulnerability by using the SSP-enabled versions of zookd: zookd-withssp.

Submit your answers to the first two parts of this lab assignment by running make submit-a. Alternatively, run make prepare-submit-a and upload the resulting lab1a-handin.tar.gz file to the submission web site.

Part 3: Return-to-libc attacks

Many modern operating systems mark the stack non-executable in an attempt to make it more difficult to exploit buffer overflows. In this part, you will explore how this protection mechanism can be circumvented. Run the web server configured with binaries that have a non-executable stack, as follows.

httpd@vm-6858:~/lab$ ./clean-env.sh ./zookd-nxstack 8080

The key observation to exploiting buffer overflows with a non-executable stack is that you still control the program counter, after a RET instruction jumps to an address that you placed on the stack. Even though you cannot jump to the address of the overflowed buffer (it will not be executable), there's usually enough code in the vulnerable server's address space to perform the operation you want.

Thus, to bypass a non-executable stack, you need to first find the code you want to execute. This is often a function in the standard library, called libc, such as execl, system, or unlink. Then, you need to arrange for the stack to look like a call to that function with the desired arguments, such as system("/bin/sh"). Finally, you need to arrange for the RET instruction to jump to the function you found in the first step. This attack is often called a return-to-libc attack. This article contains a more detailed description of this style of attack.

In the next exercise, you will need to understand the calling convention for C functions. For your reference, consider the following simple C program:

foo(int x, char *msg, int y)
  /* ... */

  int a = 3;
  foo(5, "Hello, world!", 7);

The stack layout when bar invokes foo, just after the program counter has switched to the beginning of foo, looks like this:

        %ebp ------> |    saved %ebp    | (4 bytes)
                     |       ...        |
     bar's a ------> |        3         | (4 bytes)
                     |       ...        |
                     |        7         | (4 bytes)
                     |    pointer to    | ------>  "Hello, world!", somewhere in memory
                     |      string      | (4 bytes)
                     |        5         | (4 bytes)
                     |  return address  | (4 bytes)
        %esp ------> |     into bar     |
                     |                  |

When foo starts running, the first thing it will do is save the %ebp register on the stack, and set the %ebp register to point at this saved value on the stack, so the stack frame will look like the one shown just above Exercise 3.

Exercise 4. Starting from your two exploits in Exercise 2, construct two exploits that take advantage of those vulnerabilities to unlink /home/httpd/grades.txt when run on the binaries that have a non-executable stack. Name these new exploits exploit-4a.py and exploit-4b.py.

Although in principle you could use shellcode that's not located on the stack, for this exercise you should not inject any shellcode into the vulnerable process. You should use a return-to-libc (or at least a call-to-libc) attack where you vector control flow directly into code that existed before your attack.

In answers.txt, explain whether or not the other buffer overflow vulnerabilities you found in Exercise 1 can be exploited in this same manner.

You can test your exploits as follows:

httpd@vm-6858:~/lab$ make check-libc

The test either prints two "PASS" messages or fails. We will grade your exploits in this way. If you use other names for the exploit scripts, change Makefile accordingly.

Part 4: Fixing buffer overflows and other bugs

Now that you have figured out how to exploit buffer overflows, you will try to find other kinds of vulnerabilities in the same code. As with many real-world applications, the "security" of our web server is not well-defined. Thus, you will need to use your imagination to think of a plausible threat model and policy for the web server.

Exercise 5. Look through the source code and try to find more vulnerabilities that can allow an attacker to compromise the security of the web server. Describe the attacks you have found in answers.txt, along with an explanation of the limitations of the attack, what an attacker can accomplish, why it works, and how you might go about fixing or preventing it. You should ignore bugs in zoobar's code. They will be addressed in future labs.

One approach for finding vulnerabilities is to trace the flow of inputs controlled by the attacker through the server code. At each point that the attacker's input is used, consider all the possible values the attacker might have provided at that point, and what the attacker can achieve in that manner.

You should find at least two vulnerabilities for this exercise.

Finally, you will fix the vulnerabilities that you have exploited in this lab assignment.

Exercise 6. For each buffer overflow vulnerability you have exploited in Exercises 2, 3, and 4, fix the web server's code to prevent the vulnerability in the first place. Do not rely on compile-time or runtime mechanisms such as stack canaries, removing -fno-stack-protector, baggy bounds checking, etc.

Make sure your code still passes all tests using make check, and you don't see messages of the form Child process 19219 terminated incorrectly, receiving signal 11.

Make sure that your code actually stops your exploits from working. Use make check-fixed to run your exploits against your modified source code (as opposed to the staff reference binaries from bin.tar.gz). These checks should report FAIL (i.e., exploit no longer works). If they report PASS, this means the exploit still works, and you did not correctly fix the vulnerability.

Note that your submission should not make changes to Makefile and other grading scripts. We will use our unmodified version during grading.

You are done! Submit your answers to the lab assignment by running make submit. Alternatively, run make prepare-submit and upload the resulting lab1-handin.tar.gz file to the submission web site.