Introduction ============ Administrivia Lectures will be MW11-12:30, in 4-237 (unless the registrar moves us again). Each lecture will cover a paper in systems security (except today). Preliminary paper list posted online, likely to change a bit. If you are interested in specific topics or papers, send us email. Read the paper before lecture. Turn in answers to a short homework question before lecture. Send email with a question about the paper; will try to answer in lecture. Will discuss the paper in class. Interrupt, ask questions, point out mistakes. Two quizzes during regular lecture time slot. No "final exam" during finals week; second quiz near end-of-term. Assignments: 6 labs + final project. Lab 1 out today: buffer overflows. Start early. Labs will look like real-world systems, in some respects: Many interacting parts written in different languages. Will look at/write x86 asm, C, Python, Javascript, .. Final project at the end of the course (groups of 2-3 people). Presentations during the last week of class. Think of projects you'd like to work on as you're reading papers. OK to combine with other class projects or your own research. Tutorial on how to get started with the VM, start writing your exploit. Thursday (tomorrow) 7pm, room TBD. Two TAs: David, Taesoo. Sign up for Piazza (link on course web site). Use it to ask questions about labs, see what others are stuck on, etc. We will post any important announcements there. Warning about security work/research on MITnet (and in general). Know the rules: http://ist.mit.edu/services/athena/olh/rules. Just because something is technically possible, doesn't mean it's legal. Ask course staff for advice if in doubt. What is security? Achieving some goal in the presence of an adversary. Many systems are connected to the internet, which has adversaries. Thus, design of many systems might need to address security. i.e., will the system work when there's an adversary? High-level plan for thinking about security: Policy: the goal you want to achieve. e.g. only Alice should read file F. Common goals: confidentiality, integrity, availability. Threat model: assumptions about what the attacker could do. e.g. can guess passwords, cannot physically grab file server. Better to err on the side of assuming attacker can do something. Mechanism: knobs that your system provides to help uphold policy. e.g. user accounts, passwords, file permissions, encryption. Resulting goal: no way for adversary within threat model to violate policy. Note that goal has nothing to say about mechanism. Why is security hard? Negative goal. Need to guarantee policy, assuming the threat model. Difficult to think of all possible ways that attacker might break in. Realistic threat models are open-ended (almost negative models). Contrast: easy to check whether a positive goal is upheld, e.g., Alice can actually read file F. Weakest link matters. Iterative process: design, update threat model as necessary, etc. What's the point if we can't achieve perfect security? For every system, its security rests on some assumptions. If assumptions are violated, security guarantees may be broken. Of course, some assumptions are stronger than others, but it's a continuum. In every paper we will read, something could lead to security compromise. Doesn't necessarily mean the system is broken. In reality, must manage security risk vs benefit. More secure systems means less risk (or consequence) of some compromises. Insecure system may require manual auditing to check for attacks, etc. Higher cost of attack means more adversaries will be deterred. Better security often makes new functionality practical and safe. Suppose you want to run some application on your system. Large companies often prohibit users from installing software that hasn't been approved on their desktops, partly due to security. Javascript in the browser is isolated, making it ok (for the most part) to run new code/applications without manual inspection/approval. (or virtual machines, or Native Client, or better OS isolation mechanisms) Similarly, VPNs make it practical to mitigate risk of allowing employees to connect to a corporate network from anywhere on the Internet. What goes wrong #1: problems with the policy. Example: Fairfax County, VA school system. Each user has a principal corresponding to them, files, and password. (Just to be clear: technical term, not the job of school principal) Student can access only his/her own files. Teacher can access only files of students in his/her class. Superintendent has access to everyone's files. Teachers can add students (principals) to their class. Teachers can change password of students in their class. What's the worst that could happen if student gets teacher's password? Policy amounts to: teachers can do anything. Example: Sarah Palin's email account. Yahoo email accounts have a username, password, and security questions. User can log in by supplying username and password. If user forgets password, can reset by answering security Qs. Security questions can sometimes be easier to guess than password. Some adversary guessed Sarah Palin's high school, birthday, etc. Policy amounts to: can log in with either password or security Qs. (no way to enforce "Only if user forgets password, then ...") Example: Amazon/Apple break-in. Amazon allows adding credit card, then use last 4 digits to reset password, ... http://www.wired.com/gadgetlab/2012/08/apple-amazon-mat-honan-hacking/all/ How to solve? Think hard about implications of policy statements. Some policy checking tools can automate this process. Automation requires higher-level goal (e.g. no way for student to do X). What goes wrong #2: problems with threat model / assumptions. Example: human factors not accounted for. Phishing attacks. User gets email asking to renew email account, transfer money, or ... Tech support gets call from convincing-sounding user to reset password. "Rubberhose cryptanalysis". Example: all SSL certificate CAs are fully trusted. To connect to an SSL-enabled web site, web browser verifies certificate. Certificate is a combination of server's host name and cryptographic key, signed by some trusted certificate authority (CA). Long list (hundreds) of certificate authorities trusted by most browsers. If any CA is compromised, adversary can intercept SSL connections with a "fake" certificate for any server host name. Last year, a Dutch CA was compromised, issued fake certs for many domains (google, yahoo, tor, ...), apparently used in Iran (?). Example: assuming good randomness for cryptography. Need high-quality randomness to generate the keys that can't be guessed. Problem: embedded devices, virtual machines may not have much randomness. As a result, many keys are similar or susceptible to guessing attacks. [ https://factorable.net/weakkeys12.extended.pdf ] Example: subverting military OS security. In the 80's, military encouraged research into secure OS'es. One unexpected way in which OS'es were compromised: adversary gained access to development systems, modified OS code. Example: subverting firewalls. Adversaries can connect to an unsecured wireless behind firewall Adversaries can trick user behind firewall to disable firewall Might suffice just to click on link http://firewall/?action=disable Or maybe buy an ad on CNN.com pointing to that URL (effectively)? Example: machines disconnected from the Internet are secure? Stuxnet worm spread via specially-constructed files on USB drives. How to solve? Think hard (unfortunately). Simpler, more general threat models. Better designs may eliminate / lessen reliance on certain assumptions. E.g., alternative trust models that don't have fully-trusted CAs. E.g., authentication mechanisms that aren't susceptible to phishing. What goes wrong #3: problems with the mechanism -- bugs. Example: programming mistakes: buffer overflows, etc. Security-critical program manipulates strings in an unsafe way. int read_num(void) { char buf[128]; gets(buf); return atoi(buf); } Adversary can manipulate inputs to run arbitrary code in this program. Example: integer overflows matter, in C code (e.g., malloc). Example: Moxie's SSL certificate name checking bug Null byte vs. length-encoding. Example: PayMaxx W2 form disclosure. Web site designed to allow users to download their tax forms online. Login page asks for username and password. If username and password OK, redirected to new page. Link to print W2 form was of the form: http://paymaxx.com/print.cgi?id=91281 Turns out 91281 was the user's ID; print.cgi did not require password Can fetch any user's W2 form by going directly to the print.cgi URL Possibly a wrong threat model: doesn't match the real world? System is secure if adversary browses the web site through browser System not secure if adversary synthesizes new URLs on their own Hard to say if developers had wrong threat model, or buggy mechanism.. Example: Debian PRNG weakness. Debian shipped with a library called OpenSSL for cryptography. Used to generate secret keys (for signing or encrypting things later). Secret key generated by gathering some random numbers. Developer accidentally "optimized away" part of random number generator. No-one noticed for a while, because could still generate secret keys. Problem: many secret keys were identical, and not so secret as a result. Example: bugs in sandbox (NaCl, Javascript). Allows adversary to escape isolation, do operations they weren't supposed to. How to avoid mechanism problems? Use common, well-tested security mechanisms ("Economy of mechanism") Audit these common security mechanisms (lots of incentive to do so) Avoid developing new, one-off mechanisms that may have bugs Good mechanism supports many uses, policies (more incentive to audit) Examples of common mechanisms: - OS-level access control (but, could often be better) - network firewalls (but, could often be better) - cryptography, cryptographic protocols. Open vs. closed design or mechanism Why not make everything closed or secret (design, impl, code, ...)? Threat model: best to have the most conservative threat model possible What if you assume your implementation or design are secret? If implementation is revealed, hard to change to re-gain security! Must re-implement or re-design system If only assumption was that password/key/... is secret, can change. What if you don't make assumptions about design/impl being secret? Often a good idea to publish design/impl to get more review. If others use your mechanism, it will be much better tested! Helps ensure you don't accidentally assume design/impl are secret. Rely on less mechanism, by changing the threat model when possible. Terminology: (X) is trustworthy: means that (X) is worth trusting -- good! (X) is not buggy, will not be compromised, will not fail. (X) is trusted: means that (X) _must_ be trustworthy -- bad! Something will fail if (X) is buggy, compromised, or fails. Suppose you're editing a text file on some machine. The OS kernel is trusted by all parties (users, apps, etc). The text editor is trusted by you (but maybe not other users). "Trusted" is often transitive -- bad again! If you have to trust the text editor, and the text editor has to trust developers of some library it uses, then you may have no choice but to trust that library too. High-level plan: Reduce amount of trusted code. .. by enforcing policy using a more trustworthy mechanism. .. e.g. the OS kernel should let this app write to only one file? "Principle of least privilege". Doing this requires well-designed, flexible security mechanisms. Improve trustworthiness of trusted code. .. by auditing, program analysis, better languages, etc. References http://catless.ncl.ac.uk/Risks/26.02.html#subj7.1 http://en.wikipedia.org/wiki/Sarah_Palin_email_hack http://www.thinkcomputer.com/corporate/whitepapers/identitycrisis.pdf http://en.wikipedia.org/wiki/DigiNotar#Issuance_of_fraudulent_certificates http://www.wired.com/gadgetlab/2012/08/apple-amazon-mat-honan-hacking/all/