Web application security ======================== Historically, much of the security action was on the server. So far, we have been looking at security of servers. E.g., OKWS (paper from 2004) worried about bugs in server-side code. Web applications were mostly running code on the server. Browser received HTML, displayed it, followed links, etc. Modern web applications rely on client-side code to run in web browsers. Mostly Javascript, but also Flash, Native Client (later lecture), and even HTML, CSS, and PDF, in various ways. Advantages: dynamic content, low-latency responsiveness, etc. Drawback: much harder to reason about and avoid security problems. What does a browser need to do in order to isolate client-side code? Sandbox code: interpose on all interactions with resources. Controlled access to resources: decide what operations can be performed. Some bugs arise in sandboxing code, but many more arise in allowed operations. Both implementation bugs in browser code, and design-level bugs. (E.g., designers did not think through all implications of some API..) This lecture's discussion will inevitably be incomplete, possibly buggy.. See "Browser Security Handbook" and "The Tangled Web" for more completeness. Will try to cover some over-arching principles (to the extent they exist). Will also talk about some interesting past/present pitfalls. How did this design come about? Incremental design/development: no single coherent design. Security issues patched as they were discovered, with extra rules/checks. Browser vendors competed (and to some extent still compete) on functionality. Adding new features (or even security mechanisms) before standards. Historically, W3C has largely been documenting what browsers already do, instead of proposing new standards that browsers will then implement. Browsers didn't always agree on overall plan, or the implementation details. As a result, many inconsistent corner cases that can be exploited. Now, there's quite a bit of collaboration "behind the scenes". Developers of Chrome, Firefox, IE talk to each other a fair amount. Important issues get fixed slowly over time. Compatibility is a huge constraint, hard to break old sites. (Users will stop using your web browser!) Some of the fixes take place in Javascript libraries (jQuery, etc). When possible, just a compatibility layer on top of raw browser APIs. Threat model / assumptions. [ Are they reasonable? ] Attacker controls his/her own web site, attacker.com. Inevitable, with some other domain name. Attacker's web site is loaded in your browser. Advertisements, links, etc. Attacker cannot intercept/inject packets into the network. Will try to solve separately with SSL. Browser/server doesn't have buffer overflows. Will try to solve separately with wide variety of techniques. Policy / goals. [ Not complete, but at least a subset.. ] Isolation of code from different sites. One web site shouldn't be able to interfere with another web site. Hard to pin down: what is interference vs. what is legitimate interaction? Will look at what this means in various contexts.. Allow user to identify web site. User should be able to tell what web site they are interacting with. Necessary if user is relying on page contents, or enters confidential data. Phishing attacks often try to mislead the user / violate UI security. Note: identifying site would be meaningless without code isolation. We will largely focus on code isolation for this lecture. UI security is quite important but is even less clear / well-defined. Will cover common programming mistakes (SQL injection, XSS) next lecture. How does Javascript interact with a web page? HTML elements -> DOM nodes organized in a tree. Javascript code in .. then amazon's web server will print Searching for . Attacker creates a page with an