Introduction ============ Welcome to 6.566 -- Computer Systems Security Course structure Lectures will be TR2:30-4, in 45-230. One paper/reading per lecture. Tentative schedule online. Likely stable up until spring break. Lectures after spring break may change. Read the paper before lecture, and submit before lecture: Answer to a short homework question (link from schedule page). Your own question about the paper (will try to answer in lecture). Some papers about production systems, others about research ideas Even if the overall system described in the paper didn't pan out, many of the ideas and techniques in the paper are important and useful. Interrupt, ask questions, point out mistakes. Anonymous questions: https://lec.csail.mit.edu/room/6.566/2adb3e10 Lectures recorded, available on class web site. One quiz, one final exam. Quiz during class, final during finals week. Assignments: Five labs. Defenses and/or attacks on fairly real systems. Not a lot of coding, but lots of non-standard thinking. Poke into obscure corners of x86 asm, C, Python, Javascript, .. Office hours for lab/lecture help. Due 5pm on Fridays. Lab 1, buffer overflows, first part due next Friday. Start early. Setting up the VM for your labs might take you a bit of time. Lecturer: Nickolai. TAs: Anna, Bill, Sanjit, Kelly, Derek. Sign up for Piazza (link on course web site). Mostly questions/answers about labs. We will post any important announcements there. Warning about security work/research on MITnet (and in general). You will learn how to attack systems, so that you know how to defend them. Know the rules: https://ist.mit.edu/network/rules Don't mess with other peoples' data/computers/networks w/o permission. Ask course staff for advice if in doubt. 6.566 is about building secure computer systems Secure = achieves some property despite attacks by adversaries. Systematic thought is required for successful defense. Details matter! High-level plan for thinking about security: Goal: what your system is trying to achieve. e.g. only Alice should read file F. Categories of goals: confidentiality, integrity, availability. Integrity: no way for the adversary to corrupt the state of the system. Availability: system keeps working despite the adversary. Confidentiality: no way for adversary to learn secret information. Threat model: assumptions about what the attacker can do. e.g. can guess passwords, cannot physically steal our server. Implementation: how to achieve your goal under your threat model. Often the implementation is layered; terminology for a layer: Policy: configuration of a layer (rules) to achieve your goal. e.g. set permissions on F so it's readable only by Alice's processes. e.g. require a password and two-factor authentication. Mechanism: software/hardware that the layer uses to enforce policy rules. e.g. user accounts, passwords, file permissions, encryption. Policy might include parts of the system managed by the operator (not developer), at the top level. Policy might include human components (e.g., do not share passwords) that's outside of the scope of the security mechanisms With layers, mechanism of one layer is often the policy of next level down. Goal defines the security property you want to achieve. Threat model specifies which attacks are out of scope. Threat model and goal are part of the "definition" of security. In practice, wrong threat model or goal could lead to security problems. Hard to formally talk about goal or threat model being correct. Implementation (policy and mechanism) are how your system tries to achieve security. Can talk about policy+mechanism achieving (or not) some goal / threat model. Building secure systems is hard -- why? Example: 6.566 grade file, stored on an Athena AFS server. Policy: only TAs should be able to read and write the grades file. Easy to implement the *positive* aspect of the policy: There just has to be one code path that allows a TA to get at the file. But security is a *negative* goal: We want no tricky way for a non-TA to get at the file. There are a huge number of potential attacks to consider! Exploit a bug in the server's code. Guess a TA's password. Steal a TA's laptop, maybe it has a local copy of the grades file. Intercept grades when they are sent over the network to the registrar. Break the cryptographic scheme used to encrypt grades over the network. Trick the TA's computer into encrypting grades with the attacker's key. Get a job in the registrar's office, or as a 6.566 TA. Result: cannot get policies/threats/mechanisms right on the first try. One must usually iterate: Design, watch attacks, update understanding of threats and policies. Use well-understood components, designs, etc. Post-mortems important to understand Public databases of vulnerabilities (e.g., https://cve.mitre.org/) Encourage people to report vulnerabilities (e.g., bounty programs) Threat models change over time. Defender is often at a disadvantage in this game. Defender usually has limited resources, other priorities. Defender must balance security against convenience. A determined attacker can usually win! Defense in depth Recovery plan (e.g., secure backups) Most of this lecture is about failures to make you start thinking in this way What's the point if we can't achieve perfect security? Perfect security is rarely required. Many possible defenses and threat models one could target. Real costs typically associated with building more secure systems. Make cost of attack greater than the value of the information/system. So that perfect defenses aren't needed. Make cost of defense less than the value of the information/system. So that it's feasible to apply broader defenses a wider range of systems. Make our systems less attractive than other peoples'. Works well if attacker e.g. just wants to generate spam. Find techniques that have big security payoff (i.e. not merely patching holes). We'll look at techniques that cut off whole classes of attacks. Successful: popular attacks from 10 years ago are no longer very fruitful. Sometimes security *increases* value for defender: VPNs might give employees more flexibility to work at home. Sandboxing (JavaScript) might give confidence to run software I don't fully understand. No perfect physical security either. But that's OK: cost, deterrence. One big difference in computer security: attacks are cheap. What goes wrong #1: problems with the goal / policy. I.e. system correctly enforces policy -- but policy is inadequate. Example: Business-class airfare. Airlines allow business-class tickets to be changed at any time, no fees. Is this a good policy? Turns out, in some systems ticket could have been changed even AFTER boarding. Adversary can keep boarding plane, changing ticket to next flight, ad infinitum. Revised policy: ticket cannot be changed once passenger has boarded the flight. Sometimes requires changes to the system architecture. Need computer at the aircraft gate to send updates to the reservation system. Lesson: corner cases matter! Example: Verifying domain ownership for TLS certificates. Browser verifies server's certificate to ensure talking to the right server. Certificate contains server's host name and cryptographic key, signed by some trusted certificate authority (CA). Browser has CA's public key built in to verify certificates. CA is in charge of ensuring that certificate is issued only to legitimate domain (hostname) owner. Typical approach: send email to the contact address for a domain. Some TLDs (like .eu) do not reveal the contact address in ASCII text. Most likely to prevent spam to domain owners. Instead, they reveal an ASCII image of the email address. One CA (Comodo) decided to automate this by OCR'ing the ASCII image. Turns out, some ASCII images are ambiguous! E.g., foo@a1telekom.at was mis-OCRed as foo@altelekom.at Adversary can register mis-parsed domain name, get certificate for someone else's domain. [ Ref: https://www.mail-archive.com/dev-security-policy@lists.mozilla.org/msg04654.html ] Example: Fairfax County, VA school system. [ Ref: https://catless.ncl.ac.uk/Risks/26.02.html#subj7.1 ] Student can access only his/her own files in the school system. Superintendent has access to everyone's files. Teachers can add new students to their class. Teachers can change password of students in their class. What's the worst that could happen if student gets teacher's password? Student adds the superintendent to the compromised teacher's class. Changes the superintendent's password, since they're a student in class. Logs in as superintendent and gets access to all files. Policy amounts to: teachers can do anything. Lesson: have clear security goals, separate from bulk of application logic. Example: Sarah Palin's email account. [ Ref: https://en.wikipedia.org/wiki/Sarah_Palin_email_hack ] Yahoo email accounts have a username, password, and security questions. User can log in by supplying username and password. If user forgets password, can reset by answering security Qs. Some adversary guessed Sarah Palin's high school, birthday, etc. Policy amounts to: can log in with either password *or* security Qs. No way to enforce "Only if user forgets password, then ..." Thus user should ensure that password *and* security Qs are both hard to guess. Example: Mat Honan's accounts at Amazon, Apple, Google, etc. [ Ref: https://www.wired.com/gadgetlab/2012/08/apple-amazon-mat-honan-hacking/all/ ] Honan an editor at wired.com; someone wanted to break into his gmail account. Gmail password reset: send a verification link to a backup email address. Google helpfully prints part of the backup email address. Mat Honan's backup address was his Apple @me.com account. Apple password reset: need billing address, last 4 digits of credit card. Address is easy, but how to get the 4 digits? How to get hold of that e-mail? Call Amazon and ask to add a credit card to an account. No authentication required, presumably because this didn't seem like a sensitive operation. Call Amazon tech support again, and ask to change the email address on an account. Authentication required! Tech support accepts the full number of any credit card registered with the account. Can use the credit card just added to the account. Now go to Amazon's web site and request a password reset. Reset link sent to the new e-mail address. Now log in to Amazon account, view saved credit cards. Amazon doesn't show full number, but DOES show last 4 digits of all cards. Including the account owner's original cards! Now attacker can reset Apple password, read gmail reset e-mail, reset gmail password. Lesson: attacks often assemble apparently unrelated trivia. Lesson: individual policies OK, but combination is not. Apple views last 4 as a secret, but many other sites do not. Lesson: big sites cannot hope to identify which human they are talking to; at best "same person who originally created this account". security questions and e-mailed reset link are examples of this. Example: account lifetime. Email addresses might get reused. Other systems assume account still belongs to same email address owner. [[ Ref: https://www.gruss.cc/files/uafmail.pdf ]] Example: Insecure defaults. Well-known default passwords in routers. Public default permissions in cloud services (e.g., objects in AWS S3 bucket). Secure defaults are crucial because of the "negative goal" aspect. Large systems are complicated, lots of components. Operator might forget to configure some component in their overall system. Important for components to be secure if operator forgets to configure them. Policies typically go wrong in "management" or "maintenance" cases. Who can change permissions or passwords? Who can access audit logs? Who can access the backups? Who can upgrade the software or change the configuration? Who can manage the servers? Who revokes privileges of former admins / users / ...? What goes wrong #2: problems with threat model / assumptions. I.e. designer assumed an attack wasn't feasible (or didn't think of the attack). Example: assume the design/implementation is secret "Security through obsecurity" Clipper chip [ Ref: https://en.wikipedia.org/wiki/Clipper_chip ] Broken secret crypto functions Example: users will not give their two-factor authentication codes to adversary. Two-factor authentication defends against password compromises. E.g., authenticator app (TOTP), code sent via SMS or email, hardware token, .. Assumes user will keep their codes secret. Only enter the code into the legitimate application or web site. Adversary can try to confuse / trick the user into giving out their code. User doesn't have a good way to identify legitimate web site from adversary. Especially if adversary asks over the phone rather than via web site. [ Ref: https://www.vice.com/en/article/y3vz5k/booming-underground-market-bots-2fa-otp-paypal-amazon-bank-apple-venmo ] Example: computational assumptions change over time. MIT's Kerberos system used 56-bit DES keys, since mid-1980's. At the time, seemed fine to assume adversary can't check all 2^56 keys. No longer reasonable: now costs about $100. [ Ref: https://www.cloudcracker.com/dictionaries.html ] Several years ago, 6.858 final project showed can get any key in a day. Example: information availability changes over time. Used to be difficult to learn personal information about an individual. So, "security questions" for password reset were a reasonable thing. Nowadays easy to find information about someone online (e.g., Facebook). Example: assuming a particular kind of a solution to the problem. Many services use CAPTCHAs to check if a human is registering for an account. Requires decoding an image of some garbled text, for instance. Goal is to prevent mass registration of accounts to limit spam, prevent high rate of password guessing, etc. Assumed adversary would try to build OCR to solve the puzzles. Good plan because it's easy to change image to break the OCR algorithm. Costly for adversary to develop new OCR! Turns out adversaries found another way to solve the same problem. Human CAPTCHA solvers in third-world countries. Human solvers are far better at solving CAPTCHAs than OCRs or even regular users. Cost is very low (fraction of a cent per CAPTCHA solved). [ Ref: https://www.cs.uic.edu/pub/Kanich/Publications/re.captchas.pdf ] Example: all TLS CAs are fully trusted. If attacker compromises CA, can generate fake certificate for any server name. Originally there were only a few CAs; seemed unlikely that attacker could compromise a CA. But now browsers fully trust 100s of CAs! In 2011, two CAs were compromised, issued fake certs for many domains (google, yahoo, tor, ...), apparently used in Iran (?). [ Ref: https://en.wikipedia.org/wiki/DigiNotar ] [ Ref: https://en.wikipedia.org/wiki/Comodo_Group ] In 2012, a CA inadvertently issued a root certificate valid for any domain. [ Ref: http://www.h-online.com/security/news/item/Trustwave-issued-a-man-in-the-middle-certificate-1429982.html ] Several other high-profile incidents since then too. Mistake: maybe reasonable to trust one CA, but not 100s. Example: assuming your hardware is trustworthy. If NSA is your adversary, turns out to not be a good assumption. [ Ref: https://www.schneier.com/blog/archives/2013/12/more_about_the.html ] Example: assuming you are running the expected software. 1. In the 80's, military encouraged research into secure OS'es. Surprise: successful attacks by gaining access to development systems Mistake: implicit trust in compiler, developers, distribution, &c 2. Apple's development tools for iPhone applications (Xcode) are large. Downloading them from China required going to Apple's servers outside of China. Takes a long time. Unofficial mirrors of Xcode tools inside China. Some of these mirrors contained a modified version of Xcode that injected malware into the resulting iOS applications. Found in a number of high-profile, popular iOS apps! [ Ref: https://en.wikipedia.org/wiki/XcodeGhost ] Classic paper: Reflections on Trusting Trust. Example: assuming users can unambiguously understand the UI. [ Ref: https://en.wikipedia.org/wiki/IDN_homograph_attack ] [ Ref: https://www.trojansource.codes/trojan-source.pdf ] Example: decomissioned disks. Many laptops, desktops, servers are thrown out without deleting sensitive data. One study reports large amounts of confidential data on disks bought via ebay, etc. [ Ref: https://simson.net/page/Real_Data_Corpus ] Example: software updates. Apple iPhone software updates vs FBI. [ Ref: https://www.apple.com/customer-letter/ ] Chrome extensions bought by malware/adware vendors. [ Ref: https://arstechnica.com/security/2014/01/malware-vendors-buy-chrome-extensions-to-send-adware-filled-updates/ ] Node.js library updated to include code that steals Bitcoin keys. [ Ref: https://www.theregister.co.uk/2018/11/26/npm_repo_bitcoin_stealer/ ] Example: machines disconnected from the Internet are secure? Stuxnet worm spread via specially-constructed files on USB drives. What to do about threat model problems? More explicit threat models, to understand possible weaknesses. Simpler, more general threat models. E.g., should a threat model assume that system design is secret? May be incrementally useful but then hard to recover. Probably not a good foundation for security. Better designs may eliminate / lessen reliance on certain assumptions. E.g., alternative trust models that don't have fully-trusted CAs. E.g., authentication mechanisms that aren't susceptible to phishing. Defense in depth (good idea for problems w/ policy, mechanism too). Compensate for possibly having the wrong threat model. Provide different levels of security under different levels of assumptions. E.g., audit everything in case your enforcement threat model was wrong. Ideally the audit system has a simpler, more general threat model. E.g., enforce coarse-grained isolation between departments in company, even if fine-grained permissions get misconfigured by admins. Learn from mistakes and case studies. Whole new categories of threat-model problems come along relatively rarely. What goes wrong #3: problems with the mechanism -- bugs. Bugs routinely undermine security. Rule of thumb: one bug per 1000 lines of code. Bugs in implementation of security policy. But also bugs in code that may seem unrelated to security, but they are not. Good mindset: Any bug is a potential security exploit. Especially if there is no isolation around the bug. Example: Data deletion. [ Ref: https://www.da.vidbuchanan.co.uk/blog/exploiting-acropalypse.html ] Android screenshot editor supported cropping. But forgot to truncate image file when overwriting it. Passed open mode "w" (O_RDWR) and not "wt" (O_RDWR | O_TRUNC). Leftover bytes if cropped image is smaller than original image data. PNG file still valid, hard to notice that there's extra information.. Example: Apple's iCloud password-guessing rate limits. [ Ref: https://github.com/hackappcom/ibrute ] People often pick weak passwords; can often guess w/ few attempts (1K-1M). Most services, including Apple's iCloud, rate-limit login attempts. Apple's iCloud service has many APIs. One API (the "Find my iPhone" service) forgot to implement rate-limiting. Attacker could use that API for millions of guesses/day. Lesson: if many checks are required, one will be missing. Example: Missing access control checks in Citigroup's credit card web site. [ Ref: https://www.nytimes.com/2011/06/14/technology/14security.html ] Citigroup allowed credit card users to access their accounts online. Login page asks for username and password. If username and password OK, redirected to account info page. The URL of the account info page included some numbers. e.g. x.citi.com/id=1234 The numbers were (related to) the user's account number. Adversary tried different numbers, got different people's account info. The server didn't check that you were logged into that account! Lesson: programmers tend to think only of intended operation. Example: poor randomness for cryptography. Need high-quality randomness to generate the keys that can't be guessed. Debian accidentally "disabled" randomness in the OpenSSL library. [ Ref: https://www.debian.org/security/2008/dsa-1571 ] The randomness was initialized using C code that wasn't strictly correct. Program analysis tool flagged this as a problem. Debian developers fixed the warning by removing the offending lines. Everything worked, but turned out that also prevented seeding the PRNG. A Pseudo-Random Number Generator is deterministic after you set the seed. So the seed had better be random! API still returned "random" numbers but they were guessable. Adversary can guess keys, impersonate servers, users, etc. Android's Java SecureRandom weakness leads to Bitcoin theft. [ Ref: https://bitcoin.org/en/alert/2013-08-11-android ] [ Ref: https://www.nilsschneider.net/2013/01/28/recovering-bitcoin-private-keys.html ] Bitcoins can be spent by anyone that knows the owner's private key. Many Bitcoin wallet apps on Android used Java's SecureRandom API. Turns out the system sometimes forgot to seed the PRNG! A Pseudo-Random Number Generator is deterministic after you set the seed. So the seed had better be random! As a result, some Bitcoin keys turned out to be easy to guess. Adversaries searched for guessable keys, spent any corresponding bitcoins. Really it was the nonce in the ECDSA signature that wasn't random, and repeated nonce allows private key to be deduced. Lesson: be careful. Embedded devices generate predictable keys. Problem: embedded devices, virtual machines may not have much randomness. As a result, many keys are similar or susceptible to guessing attacks. [ Ref: https://factorable.net/weakkeys12.extended.pdf ] Casino slot machines. [ Ref: https://www.wired.com/2017/02/russians-engineer-brilliant-slot-machine-cheat-casinos-no-fix/ ] Example: Moxie's SSL certificate name checking bug [ Ref: https://www.wired.com/2009/07/kaminsky/ ] Certificates use length-encoded strings, but C code often is null-terminated. CAs would grant certificate for amazon.com\0.nickolai.org Browsers saw the \0 and interpreted as a cert for amazon.com Lesson: parsing code is a huge source of security bugs. What goes wrong #4: combination of problems in all of the above. Sophisticated attacks often combine many weaknesses. E.g., adversary obtained Microsoft's cryptographic key for user authentication. Multiple steps went wrong in order to allow adversary access to this key. [ Ref: https://msrc.microsoft.com/blog/2023/09/results-of-major-technical-investigations-for-storm-0558-key-acquisition/ ] Similarly, many bugs often chained together by serious attacks. [ Ref: https://googleprojectzero.blogspot.com/2023/09/analyzing-modern-in-wild-android-exploit.html ] [ Ref: https://googleprojectzero.blogspot.com/2023/10/an-analysis-of-an-in-the-wild-ios-safari-sandbox-escape.html ] How to build secure systems? Lots of example problems above. Switch gears: what to do about it? Rough outline of the rest of this class. Isolation: the starting point for security The goal: by default, activity X cannot affect activity Y even if X is malicious even if Y has bugs Without isolation, there's no hope for security With isolation, we can allow interaction (if desired) and control it Many kinds of isolation. Hardware isolation: processes, containers, virtual machines. Software isolation: Javascript, WebAssembly. Physical isolation: USB security keys, etc. Every isolation plan depends on some host to securely isolate. OS kernel, language runtime, physics. Next few lectures will cover isolation. Controlled sharing: interaction between isolated domains. 100% isolation is usually not what we want. We need controlled sharing/interaction as well. Here's a model for sharing: +-----------------------+ | Policy | | | | request | V | principal ----------|--> GUARD --> resource | | | | | +-----------+ | | | | | | | | V | | | | Audit log | | | +-----------+ | +-----------------------+ HOST enforcing isolation This model has been very influential. Principals: person, device, program, service. Resources: files, services, accounts themselves, ... What does the guard do? Authenticate: principal. Authorize: principal, resource -> rights. Audit. Privilege separation: limit damage from individual components. Powerful idea to deal with buggy or malicious software. Challenge: how to architect a useful system from isolated components? Need overall application to work. Need security guarantees if some part breaks. Need good performance. Second module of lectures looks at case studies of security architectures. Software security. How to make sure you are running trustworthy code in your isolated services? Dealing with bugs: runtime defenses, testing, bug-finding, verification. Supply chain: precise dependencies, deterministic builds, etc. Back doors: code review, approvals, security audits. Deployment: systematic plan for ensuring all of the above are followed. Third module of lectures will talk about some of these ideas. Distributed systems. Operating over the network, and over the Internet, introduces new threats. Big ideas: cryptography, certificates, trust. Fourth module will cover lots of network / distributed system security topics.