PRIVATE BROWSING What is the goal of privacy? -Vague ideal: (activity of) a given user is indistinguishable from (activity of) many other users. -Today we'll discuss privacy in the context of web browsers. *There's no formal definition of what private browsing means, in part because web applications are so complicated and so much incriminating state can be generated. *Browsers update their implementation of private browsing according to user demand and what other browser vendors do. *As users rely on private browsing mode, they expect more from it . . . and more implementation deficiencies emerge! What do the browsers mean by "private browsing"? -Paper formalizes this as two separate threat models + attacks: 1) A local attacker who possesses your machine post-browsing session, and wants to discover which sites you've visited. 2) A web attacker who has compromised a web server that you contact, and wants to link you across private and/or normal sessions. -If the two attackers can collude, it's easier for them to identify user. *Ex: A local attacker asks the server to check for the local IP address in the server's access logs. *So, there's practical value to security against these two attacks in isolation. Threat 1: Local attacker. -Assumption: Attacker gets control of the user's machine post-session, and wants to learn what sites the user visited in private browsing mode. -Security goal: Attacker can't learn those sites! -Non-goals *Don't care about achieving privacy for *future* private browsing sessions. -Attacker could modify software on the machine (e.g., installing a keystroke logger) and track future browsing. -This is why we also assume that the attacker can't access the machine *before* private browsing starts. *Hide the fact that private browsing was used. -Often called "plausible deniability". -The paper says that this is difficult to achieve, but doesn't explain why. Later in the lecture, we'll discuss some potential reasons. What kinds of persistent client-side state can a private session leak? [By persistent, we mean "stored on the local disk."] 1) JavaScript-accessible state: Cookies, DOM storage 2) Browser cache 3) History of visited addresses 4) Configuration state: New client certificates, updates to saved password database, bookmarks 5) Downloaded files 6) New plugins/browser extensions -Private browsing implementations all try to prevent persistent leaks to 1, 2, and 3. However, 4, 5, and 6 often persist after the private session ends. -Network activity can leave persistent evidence---DNS resolution records! *To fix this, private browsing mode would need to flush the DNS cache upon session termination. However, this is tricky, because flushing the cache typically requires admin rights on your machine (do you want the browser having admin rights?) and deletes all DNS state, not the state generated by a particular user. -During private browsing, objects in RAM can also get paged out to disk! Demo Open Firefox in Private Browsing Mode Visit http://pdos.csail.mit.edu/ sudo gcore $(pgrep firefox) strings core.* | grep -i pdos //-e l: Look for string using the // character encoding 16-bit // little-endian. //-a: Scan all of the file. //-t: Print the offset within // the file. Data lifetime is a broader problem than just private browsing! -Ex: cryptographic keys or passwords might be problematic if disclosed. [Ref: http://css.csail.mit.edu/6.858/2010/readings/chow-shredding.pdf] Demo cat memclear.c cat secret.txt make memclear ./memclear & sudo gcore $(pgrep memclear) strings core.* | grep secret Where does data persist? -Process memory: heap, stack. *Terminal scrollback *I/O buffers, X event queues, DNS cache, proxy servers, ... *Language runtime makes copies (e.g., immutable strings in Python) -Files, file backups, SQLite databases -Swap file, hibernate file -Kernel memory: *IO buffers: keyboard, mouse inputs *Freed memory pages *Network packet buffers *Pipe buffers contain data sent between processes *Random number generator inputs (including keystrokes again). How could an attacker get a copy of leftover data? -Files themselves may contain multiple versions (e.g., Word used to support this feature). -Programs may leak information if they don't scrub memory on deallocation or program shutdown: *Ex: In older Linux kernels, up to 4 KB of kernel memory could be leaked to disk when a new directory was created. *Ex: If the kernel/VMM doesn't wipe memory pages, then information from process X can leak into process Y that uses X's old memory pages. -Core dumps -Direct access to the machine -Flash SSDs implement logging---they don't erase old data right away! -Stolen disks, or just disposing of old disks [Ref: http://news.cnet.com/2100-1040-980824.html] How can we deal with the data lifetime problems? -Zero out unused memory [with some performance degradation]. -Encrypt data in places where zeroing out is difficult (e.g., on an SSD). *Securely deleting the key means data cannot be decrypted anymore! *Ex: OpenBSD swap uses encryption, with a new encryption key generated at boot time. *CPU cost of encryption is modest compared to disk I/O. Threat 2: Web attacker -Assumptions: *Attacker controls the web sites that the user visits. *Attacker does not control the user's machine. *Attacker wants to detect when the user visits the site. -Security goals: *Attacker cannot identify the user. *Attacker cannot determine if the user is employing private browsing mode. Defending against a web attacker is very difficult! -What does it mean to identify a user? *Link visits by the same user from different private browsing sessions. *Link visits by user from private browsing and public browsing sessions. -Easy way to identify user: IP address. *With reasonable probability, requests from the same IP address are the same user. *Next lecture, we'll discuss Tor. Tor protects the privacy of the source of a TCP connection (i.e., user's IP). However, Tor doesn't solve other challenges with implementing private browsing. -Even if the user employs Tor, a web server can still identify her by analyzing the unique characteristics of her browser runtime! Browser fingerprinting demo. -Open Chrome, go to http://panopticlick.eff.org/ -Open the same web site in private browsing mode. Good way to think of privacy: what is the anonymity set of a user? I.e., what is the largest set of users among which some user is indistinguishable? -Panopticlick shows that this set is small for most users, because users tend to have unique local settings for fonts, plugins, etc. -How can a web attacker determine if you're using private browsing mode? *Paper describes a history sniffing attack based on link colors. -Attacker page loads a URL in an iframe, then creates a link to that URL and sees whether the link is purple (private sessions don't store history). -This attack doesn't work any more, since browsers no longer expose link color to JavaScript! [See discussion of history sniffing attacks from a few lectures ago.] *However, there may be other ways for the attacker to tell that you're using private mode. -Ex: Public-mode cookies cannot be seen by private-mode pages. So, if you visit a page in public mode, and then in private mode, the page can detect that an expected cookie is missing. How can we provide stronger guarantees for private browsing? [Let's ignore IP address privacy for now, or assume that users employ Tor.] -Approach 1: VM-level privacy Plan: -Run each private browsing session in a separate VM. -Ensure that the VM is deleted after private browsing is done. -Somehow make sure that no VM state ends up on disk [disable paging? secure deallocation?]. Advantages: -Strong guarantees against both a local attacker and a web attacker. -No changes required to application, just need secure deletion of VM. Drawbacks: -Spinning up a separate VM for private browsing is heavyweight. -Poor usability: It's harder for users to save files from private browsing, use bookmarks, etc. Inherent trade-off between usability and privacy! -Approach 2: OS-level privacy Plan: Implement similar guarantees at the OS kernel level. -A process can run in a "privacy domain", which will be deleted afterwards. Advantages over VM: Lighter-weight. Drawbacks w.r.t VM: Harder to get right, since the OS kernel manages a lot of state. Are there ways to de-anonymize a user who employs these approaches? -Maybe the VM itself is unique! So, we need to ensure that all users have similar VMs. *This limits the extent to which users can customize VMs. -Maybe the VMM or host computer introduces some uniqueness. *Ex: TCP fingerprinting: The TCP protocol allows some parameters to be set by the implementation (e.g., the initial packet size, the initial TTL). *Tools like nmap send carefully crafted packets to a remote server; can guess the remote OS with high likelihood! -The user is still shared! So, perhaps the attacker can: *Detect the user's keystroke timing. *Detect the user's writing style. This is called stylography. [Ref: http://33bits.org/2012/02/20/is-writing-style-sufficient-to-deanonymize-material-posted-online/] Why do browsers implement their own private browsing support? -Main reason is deployability: Users don't have to run their browser in a custom VM or OS. *Similar motivation for Native Client. -Another reason is usability: Some types of state generated in private mode should be able to persist after the session is finished. [ex: downloaded files]. *This is a dangerous plan! Browsers are complicated pieces of software, so it's difficult to find clean cuts in the architecture which allow some types of state (but not others) to persist. How do we categorize those types of state? The paper says that we should think about who initiated the state change (Section 2.1). 1) Initiated by web site, no user interaction: cookies, history, cache. *Stays within session, is deleted on session teardown. 2) Initiated by web site, requires user interaction: client certificates, saved passwords. *Unclear what's the best strategy; browsers tend to store this state persistently, probably because the user has to explicitly authorize the action. 3) Initiated by user: bookmarks, file downloads. *Same as above: browsers tend to store this state persistently because the user authorizes the action . . . *. . . but note that storing this state may reveal the fact that the user employed private browsing mode! -Ex: In Firefox and Chrome, bookmarks live in a SQLite database. Bookmarks generated in private browsing mode will have empty values for metadata like "last_visit_count". [Ref: http://homepages.cs.ncl.ac.uk/feng.hao/files/DPM13.pdf] 4) Unrelated to a session: Browser updates, certificate revocation list updates. *Treat as a single global state shared between public mode and private mode. What do browsers actually implement? -Each browser is, of course, different. -Moreover, some state "bleeds over" in one direction but not another! There isn't a strict partitioning between private mode and public mode state. Q: What happens if public state bleeds over into private state? A: Easier for web attacker to link private session to public session. *Ex: A client-side SSL certificate that's installed in public mode can identify the user in private mode. Q: What happens if private state bleeds over into public state? A: This helps both a web attacker and a local attacker: observing state from a public session will reveal information about private sessions! Q: What should happen to state while user remains in a private mode session? A: Most browsers allow state to persist within a private mode session (see Table 3). *A "no" entry means that a web attacker might be able to detect private mode browsing! Q: Why is it OK to allow cookies in private browsing mode? A: It's convenient for users to be able to create ephemeral sessions in private browsing mode---the browser will delete the associated cookies when the private session ends. Q: What should happen to state across private mode sessions? A: Ideally, each private session should start with a blank slate---if state carries over between multiple private sessions, this increases the likelihood that the user can be fingerprinted! However, since some kinds of state can leak from private-to-public, and some kinds of state can leak from public-to-private, some kinds of state can indeed persist across private mode sessions. [Ex: certificates, downloaded items.] *So, think of each private mode session as sharing some state with a single public mode. Browser extensions and plugins are special. -They are privileged code that can access sensitive state. -They are not subject to the same-origin policy or other browser security checks. -Also, they are often developed by someone other than the browser vendor! *Thus, they might not be aware of private mode semantics, or they might misimplement the intended policy. *However, plugins are probably going to become extinct in the near future! HTML5 offers new features which provide native support for features that used to require Flash, applets, Silverlight, etc. [Ref: http://msdn.microsoft.com/en-us/library/ie/hh968248(v=vs.85).aspx] Multimedia: