Web security ============ This lecture: isolation between sites in a web browser. Overall plan is called the "same-origin policy" (SOP) One of the best descriptions is in "The Tangled Web" (today's reading). Will try to cover some over-arching principles Will also talk about some interesting past/present pitfalls. Web browsers continuously change. New mechanisms have come out since "The Tangled Web". But mostly adding onto the existing design, rather than replacing it. How did browser security plan come about? Origin: Netscape browser introduced SOP when adding support for Javascript Incremental design/development: no single coherent design. Noone expected web browsers to be used in the ways they are today. Security issues patched as they were discovered, with extra rules/checks. Browser vendors competed (and to some extent still compete) on functionality. Adding new features (or even security mechanisms) before standards. Historically, W3C has largely been documenting what browsers already do, instead of proposing new standards that browsers will then implement. Browsers didn't always agree on overall plan, or the implementation details. Browser vendors concurrently implement similar features. Implementations get deployed before specifications are discussed or agreed on. Many quirks, see quirksmode.org. As a result, many inconsistent corner cases that can be exploited. Now, there's quite a bit of collaboration "behind the scenes". Developers of Chrome, Firefox, IE talk to each other a fair amount. Important issues get fixed slowly over time. Compatibility is a huge constraint, hard to break old sites. (Users will stop using your web browser!) Some of the fixes take place in the browser and Javascript libraries (jQuery, etc). When possible, just a compatibility layer on top of raw browser APIs. Some of the improvements through new headers E.g., Content-Security-Policy E.g., same-site cookies Many of the attacks we talk about today are more difficult to pull off E.g., most of lab4 attacks don't work with Chrome One reason why this is a complicated security story is because there's a LOT of sharing! In this lecture, we're going to focus on the client-side of a web application. In particular, how to isolate content from different providers in the same browser. Some of these details are handled by web frameworks and libraries (Meteor, jQuery, ...) Anyone building moderately complex web apps must know these details anyway. Need to know the limits of what the framework does and doesn't do. May need to extend the framework. May need to add a library that's not in the same framework. May need to interact with other web sites (facebook like button, google analytics, etc). May need to handle links to/from your application. May need to handle embedding of your application by other web sites. Threat model / assumptions. [ Are they reasonable? ] Attacker controls his/her own web site, attacker.com. Inevitable, with some other domain name. Attacker's web site is loaded in your browser. Advertisements, links, etc. Attacker cannot intercept/inject packets into the network. Will try to solve separately with SSL. Browser/server doesn't have implementation bugs (e.g., buffer overflows). Will try to solve separately with wide variety of techniques. A single web application contains several types of content from a bunch of different principals. Example: http://foo.com/index.html +--------------------------------------------+ | +--------------------------------------+ | | | ad.gif from ads.com | | | +--------------------------------------+ | | +-----------------+ +------------------+ | | | Analytics .js | | jQuery.js from | | | | from google.com | | from cdn.foo.com | | | +-----------------+ +------------------+ | | | | HTML (text inputs, buttons) | | | | +--------------------------------------+ | | | Inline .js from foo.com (defines | | | | event handlers for HTML GUI inputs) | | | +--------------------------------------+ | |+------------------------------------------+| ||iframe: https://facebook.com/likeThis.html|| || || || +----------------------+ +--------------+|| || | Inline .js from | | f.jpg from https:// || | https://facebook.com | | facebook.com ||| || +----------------------+ +--------------+|| || || |+------------------------------------------+| | | Q: Which pieces of JavaScript code can access which pieces of state? For example: Can the analytics code from google.com access state in the jQuery code from cdn.foo.com? Seems maybe bad since different principals wrote the code, but they are included in the same frame. Can the jQuery code from cdn.foo.com access state in the inline JavaScript code defined by foo.com? They're *almost* from the same place. Can the analytics code or jQuery access the HTML text inputs? We've got to make that content interactive somehow. Can JavaScript in the Facebook frame touch any state in the foo.com frame? Does it matter that the Facebook frame is https://, but the foo.com frame is regular http://? One complication: to have policies browser must parse correctly It is difficult to identify Javascript precisely Example: // Single quote breaks out of JS string // context into JS context // // "" breaks out of JS context // into HTML context The UNTRUSTED string could contain , breaking out of JS context. May be unintuitive. Similar challenges for URL parsing, dealing with internationalization, quoting rules. Document object model After parsing, page is represented as a tree of objects, with which JavaScript interacts HTML elements -> DOM nodes organized in a tree. DOM nodes are objects that can be manipulated by Javascript. Global objects (window, document, XMLHttpRequest) allow add'l operations. HTML elements / DOM nodes can invoke Javascript via event handlers. JS issues HTTP requests using XMLHttpRequest or by creating DOM nodes. You will learn much more about Javascript in lab 4. Isolation challenges The actor in a web browser is a document loaded in a window (or iframe). Moral equivalent in Unix: a program running in a process. Most interesting is an HTML document, but can have others (e.g., PDF). What can a document "do"? Link to other pages; user might click on a link: Include image files, style sheet files, etc: Load another document in a frame: