Web application defenses ======================== Overall plan. Last lecture, looked at basic security mechanism: same-origin policy. This lecture will focus on how to build secure applications. Relies on same-origin policy, at some level. Focusing on common mistakes, and techniques/tools to avoid them. Next lecture will focus on HTTPS/SSL, network adversaries. What's Django? Moderately popular web framework, used by some large sites. But other frameworks are more popular: PHP, Ruby on Rails. In the enterprise world, Java servlets, ASP, are widely used. Developers have put some amount of thought into security. A few reasonable examples of how to deal with security in web apps. Django is probably better in terms of security than some alternatives (PHP, Ruby on Rails), though of course depends on the details.. Some trade-off between security and usability. E.g., displays nice error messages, which adversary may take advantage of. Some research frameworks may provide better security properties. E.g., Ur/Web: http://www.impredicative.com/ur/ Session management: cookies. [ Ref: http://pdos.csail.mit.edu/papers/webauth:sec10.pdf ] What should go into a cookie? Zoobar, Django, many web frameworks: random session ID. Session ID refers to an entry in some session table on the web server. Session cookies are sensitive: adversary can use them to impersonate user. Recall: cookie injection attacks with a shared domain. "Session fixation attack". Don't share a domain with sites you don't trust. What if we don't want to have server-side state for every logged in user? Stateless cookies. Need to prevent user from manufacturing a cookie. General plan: authenticate the cookie using cryptography. Basic primitive: a message authentication code (MAC). Think of it as a keyed hash (e.g., HMAC-SHA1): H(k, m). Need key k both to produce the hash value and to check it. Strawman design: cookie contains user=name&hash=H(k, name) Problem: no expiration; cookie remains valid forever! Strawman design: cookie contains user=name&exp=date&hash=H(k, name+date) Problem: name and date could be confused in the hash! Hash value is the same for "alice1" "1/1/2014" and "alice" "11/1/2014". Adversary can take hash value for alice1 and use it in cookie for alice. Strawman design: encode name and date in an unambiguous way. E.g., fixed-length date, or explicit length, or escape separator symbol. Problem: cookie still remains valid after user changes the password. Fix: include password version# in cookie/hash. How do you log out with this kind of cookie design? Impossible, if the server is stateless. If server can be stateful, session IDs make this much simpler. Alternatives to cookies for session management. HTML5 local storage: implement your own authentication in Javascript. Some web frameworks like Meteor do this. Benefit: cookie is not sent over the network to the server. Benefit: not subject to complex same-origin policy for cookies. Client-side X.509 certificates. Expiration dates; weak revocation story (will talk about this next week). Cannot steal certificates, unlike cookies. Web applications have little control over certificates. No session at all: require password for important operations. Low usability. Cross-site scripting. Example: xss-demo.cgi simply echos back your name. [ demo: http://localhost/cgi-bin/xss-demo.cgi in `urxvt -fn xft:Monospace-20`, look at /var/www/cgi-bin/xss-demo.cgi, and look at page source in web browser. ?name=abc ?name= ] Why is this bad? Why would the user type in such a URL? The problem is an adversary that may trick user into visiting this URL. What can an adversary do by running code in vulnerable origin? Steal user's cookie: document.cookie [ demo: ?name=, view source ] How does attacker get data out of a compromised site? Take advantage of allowed cross-origin requests. Simple plan: redirect the user's browser. [ demo: ?name= ] More subtle: create an tag Why is cross-site scripting so prevalent? Dynamic web sites incorporate user content in HTML pages. Web sites host uploaded user documents. HTML documents can contain arbitrary Javascript code. Non-HTML documents may be content-sniffed as HTML by browsers. Javascript APIs often "eval" supplied code. eval(), setTimeout(), etc. XSS defenses. Browser XSS filters. Chrome: XSSAuditor. Complex heuristics, but looks for matches between args & script tags. Danger: adversary can selectively turn off scripts on a page [ demo: turn off X-XSS-Protection: 0, visit ?name= ] Problem: attacks that inject script in multiple pieces. [ demo: ?name= ] Most browsers have some kind of XSS filter in place for reflected XSS. Limitation: cannot handle stored XSS (not immediately echoed from query). Prevent Javascript access to cookies. "httponly" cookies. Partial defense. Adversary can still do damage by issuing requests with user's cookies. Privilege separation: another origin for untrusted content. Google hosts its cache, gmail attachments, etc from googleusercontent.com. Even if XSS is possible, injected code runs in a different origin. Possibly still a problem: gmail attachment data might be in this origin. General plan: encode user-supplied data so it's not interpreted as HTML tags. Django: templates. Targets cases when developer forgets to sanitize user input. Define output page by describing the "holes" in an HTML template. By default, the holes in a template are sanitized into HTML entities. Replace angle brackets, single and double quotes, and &. <, >, ", etc. [ demo: add cgi.escape() around both variables ] Selectively sanitizing user inputs: if need to allow some HTML markup. Dangerous! Recall how difficult it is to unambiguously parse HTML. Alternative plan: non-HTML markup, like Markdown. Easier to sanitize/check Markdown, then transform it into HTML. CSP: no inline Javascript, no eval-like Javascript functions. Programmers must load Javascript via