Web security
============

Today's topic: isolation between sites in a web browser.
  Overall plan is called the "same-origin policy" (SOP)
  A case-study of real-world security policies.
  A mixture:
    Principles.
    Compromises with compatibility, convenience.

Why is there a problem?
  Your browser follows instructions provided by attackers!
  Most of us follow links to web sites we don't know much about.
    And thus probably view malicious web sites.
  Our browsers execute HTML, JavaScript from malicious web sites.
  Good news: the browser doesn't let JavaScript read your local files &c.
    I.e. the browser runs JavaScript in a sandbox.
    So web pages can only ask the browser to do web-related things.
  Bad news: some web-related things can be pretty damaging.

What might go wrong if web browsers weren't careful?
  (These generally don't work now, but often used to).
  (Assume I'm viewing a malicious web site in my browser.)
  Read my private data from web sites, e.g. e-mail?
  Post things as me?
  Act as me on my bank web site?
  Look at web sites inside MIT's firewall?
  Look at data in other browser windows?
  Change information displayed in other browser windows?

What has made securing browsers a long and complex story?
  Initially there didn't seem to be any security problem at all!
    Web was static text and images, nothing sensitive.
    JavaScript was a big change.
    Sensitive web sites (commerce, banks, e-mail, &c) was a big change.
  Rapid evolution in uses and features:
    Security risks often not apparent until much later.
    Initial designs often hard to secure, and hard to change.
    So security often retrofitted.
  Compatibility with old web sites and old browsers is important.
    Users care more about convenience than security.
  Compatibility and late arrival of security ->
    Often implemented on the side.
    Explicit in JS and server code would have been better.
  Lots of browsers, weak standards mechanisms.
    Slow to get consensus about how security should work.
  Lots of sharing between web sites, so strict isolation isn't realistic.
    Mash-ups, APIs, advertisements, "Like" buttons, &c.

Threat model / assumptions.
  - Attacker controls a web site, attacker.com.
  - You visit the attacker's web site.
      e.g. attacker.com == cute-kitten-photos.com
  - You are using the browser for other things (e-mail, bank, &c).
  + Browser is trusted, so we can design it to contain attacks.
  + Browser doesn't have implementation bugs (e.g., buffer overflows).
  ? For this lecture, assume network is secure (will talk about HTTPS later).

Solution: the Same Origin Policy (SOP).
  SOP is imposed by the browser on web pages.
  Browser labels each script (HTML, JS) with an origin.
    Origin = the web server the page (or frame) came from.
    protocol + host name + port
    E.g. the origin of https://foo.com/x/y/z is https://foo.com:443
      All pages on a given server share an origin.
  Browser labels each resource with an origin also.
    Resource = network server, displayed page, JS variable, &c.
  The SOP rule:
    A script can only access a resource if they have the same origin.
  Two views of SOP:
    Enforces isolation.
    Authorizes some sharing.

A simple view of the SOP
  Web:      |  gmail.com  |  attacker.com
                ... Internet ...
  Browser:  |  gmail      |  attacker
            |  windows    |  windows

Example: XMLHttpRequest()
  XMLHttpRequest(url) is a JavaScript call
  It fetches URL and lets JavaScript see the result.
  Often used to get at "web APIs", to fetch data for JS.
  Could attacker.com use it to steal data from gmail?
    No: browser enforces SOP, so it can only fetch from the
        same server the JavaScript came from.

Does the SOP do what we want?
  It's mostly automatically imposed, no choice (has a MAC flavor).
    So it's important that it neatly slice the boundary between
    "always OK" and "never OK".
  It prevents attacker.com JS from talking to gmail or my bank,
    or MIT internal web sites.
    i.e. with XMLHttpRequest()
  It prevents attacker JS from looking at my other windows, or
    changing them.
  Attacker can probably trick me into clicking gmail.com
    But then the browser is running HTML/JS from gmail,
    not from the attacker.

How to preserve SOP's isolation over the Internet?
  How does page know network data is really from gmail.com?
    And not from attacker's server?
    Answer: TLS + certificate with DNS name
    Will dive into these issues in lectures after spring break.
  How does gmail.com know command is really from your browser?
    And not from attacker's machine w/ hacked browser?
    Answer: cookies

Cookies
  They let servers keep state in the browser.
    For shopping cart, ad tracking, user authentication, &c.
  Web sites can tell the browser to set cookies.
    Set-Cookie: key=value
  The browser sends a server's cookies back in each request.
  Web site can specify a domain, e.g. mit.edu
    Domain has to be a (maybe full) suffix of site's DNS name.
    Browser sends matching cookies in all requests.
    E.g. a cookie w/ domain=google.com matches server mail.google.com.
  A typical setup:
    When you log in w/ password, server sends a session ID cookie.
    Set-Cookie: session=<sessionID> (a long hex string)
    When server sees requests, looks up sessionID in DB to find user.
    sessionID must be kept secret!
      Random and long so it's hard to guess.
  Javascript can't see cookies except as allowed by SOP.

A few cookie problems.
  It's a disaster that the browser sends them automatically.
  Overwriting is a potential problem:
    Can attacker.com change a google.com cookie?
    So I'm logged into google.com as attacker, not me?
    So attacker sees my search history?
    Suffix rule helps here.
    But can't let attacker.com set a cookie for .com!
      Or any other top-level domain, e.g. co.uk.
      Browser must have list of all top-level domains.

Why is strict application of SOP not the end of the story?
  Developers should be able to create "mash-up" sites that
    combine content from multiple places.
  Example: A site that combines Google Map data with real estate data.
  Example: Advertisements.
  Example: Social media widgets (e.g., the Facebook "like" button).
  Also compatibility with pre-SOP HTML.

SOP Exception: ordinary links.
  E.g. an attacker.com page can contain a link to gmail.com.
    And such a click will navigate the user to gmail.com.
  Why this exception?
    These links are a big part of how people find stuff on the web.
  Usually inter-domain links are harmless.
  BUT the browser will send cookies (if any) to gmail.com
    So the link is followed as the gmail user.
    This might be a problem if visiting the link has side-effects.

SOP Exception: IMG
  attacker.com page can contain <IMG SRC="https://foo.com/x.gif">
  Browser will fetch and display the image, despite different origins.
  Why this exception?
    Avoid lots of copies of commonly-used images.
    Allow easy incorporation of image content.
  Can an attacker.com page steal content this way?
    E.g. use IMG fetch and inspect web pages inside the MIT firewall?
    No: browser does enforce SOP to the retrieved pixels.
    User sees image, but not attack.com page.
  BUT the browser will send cookies for foo.com.
    So foo.com may grant attacker's request my permissions.
    This is a problem!

Cross-Site Request Forgery (CSRF)
  Suppose a page from attacker.com contains
    <IMG SRC="https://bank.com/xfer?amount=500&to=attacker">
  User doesn't see anything special, maybe a little broken image.
  What if user is logged into bank.com?
  bank.com sees a transfer request with a valid session cookie!
  CSRF has been a big source of real attacks.
    One underlying flaw is the exception to the SOP.
    Another flaw is automatic sending of cookies.
      Example of "ambient authority".
  General term is "confused deputy".
    Browser sends a request to bank.com.
    Browser *should* say it's forwarding a request from attacker.com.
    But it actually sends my cookie, implying the request
      is on behalf of me or a bank web page.

How to guard against CSRF?
  bank.com sends a random token with every URL it generates.
    E.g. https://bank.com/xfer..&token=...
  bank.com records all the legitimate anti-CSRF tokens.
  bank.com accepts only a request w/ a legitimate unused token
    associated with the requesting user.
  Hopefully the attacker can't predict or steal tokens.

SOP Exception: SCRIPT
  <SCRIPT SRC="https://foo.com/lib.js"></SCRIPT>
  Loads and runs JavaScript from anywhere; SOP not imposed.
  Why this exception?
    So people can use JavaScript libraries fetched from anywhere.
  As what origin should the fetched JS run?
    As origin = foo.com?
    As origin = fetching page?
  Browsers execute with fetching page's origin.
    Intuition: just like running library code as part of your app.
    So you must be careful about where you fetch scripts from.

SOP Exception: IFRAME
  Loads and displays a web page in a rectangle.
    The framed page can come from anywhere: SOP not imposed on fetch.
  Why this exception?
    Used for advertisements, Facebook "like" buttons, &c.
  But SOP *is* applied to the frame's actions once fetched.
    If main page and frame's origins are different,
      then SOP prevents them from interacting in most ways.
    SOP defends each against the other, making IFRAMEs fairly safe.
    And the IFRAME gets the SOP rights of its origin, so
      it can e.g. fetch data from its origin server.
  The main page can, however, navigate the frame via JS.

SOP Exception: Cross Origin Resource Sharing (CORS)
  A server can tell the browser to allow cross-origin XMLHttpRequest(url).
  If browser sees a page's request is cross-origin, it asks server
    first, tells server requesting origin, server can say
    yes or no.
  Why this exception?
    Some web API data is public.
    Some specific mash-ups are intentionally authorized.
  Why is the exception safe?
    The purpose of SOP is to defend the server.
    If the server explicitly doesn't want to be defended, that's OK.
    Defaults to "no" if the server doesn't understand.
  This is a nice design, since it is explicit about security.

Here are a few attacks that work around the SOP.

Cross-site Scripting (XSS) attack.
  Consider sites that show users each others' comments (e.g., Facebook).
  Attacker posts a comment like this:
    <SCRIPT> ... </SCRIPT>
  I view the comment.
  If the site didn't prevent this:
    Now attacker's JavaScript code is running in my browser.
    Is that bad? After all the browser sandboxes JS.
  The real problem is that attacker's code is running with
    the origin of the surrounding page, e.g. facebook.com.
  Attacker's JS can see my Facebook cookie with session ID!
    Can act as me, or send my cookie to the attacker.
  You'll see this in Lab 4.

How to defend against XSS attacks?
  HTTP-Only cookies -- hides cookie from all JS.
    Not a complete fix, since attacker's JS can still do other things
    as page's origin, e.g. send requests to the server as me.
  Server could strip all HTML from comments.
    But users like to include formatting, links, &c.
  Server could carefully parse comments to prohibit certain tags.
    Tricky but often done; use a good library!
  Content-Security-Policy HTTP header, a new mechanism.
    Server tells the browser to forbid inline scripts.
    So the server doesn't have to guess how browser parses.
  But what if web site allows users to upload photos?
    Adversary could upload Javascript code as their photo.
    Hard to get precise security guarantees with retrofitted mechanisms.

Clickjacking attack.
  The browser treats user clicks as having full authority.
  So it's vital that the user understand consequences of clicks.
    Will it buy something on Amazon? Send e-mail? "Like"?
  Main cues for user's understanding are visual.
    So it's vital that page layout make click consequences clear.
  Sadly, HTML isn't always helpful at ensuring clear visual cues.
  Example:
    attacker.com page includes IFRAME displaying amazon.com page.
      With a One-Click ordering button.
    attacker.com makes the IFRAME transparent (!)
      <iframe style="opacity:0;" ...
    attacker.com can paint anywhere on the page, including over IFRAME.
      e.g. "Click here for a free iPad!"
    A click buys the item on amazon; no free iPad.
    amazon frame invisible, so user can't see that something odd happened.
    Often used to increase Likes on Facebook.
  Defense:
    X-Frame-Options: DENY header -- prohibits page in IFRAME.
    Content-Security-Policy: frame-ancestors 'none'
    
Improvements since The Tangled Web:
  https://infosec.mozilla.org/guidelines/web_security

  Notable examples:
    Same-Site cookies
    https://en.wikipedia.org/wiki/Content_Security_Policy
    https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
    https://en.wikipedia.org/wiki/Strict_Transport_Security
    https://www.w3.org/TR/SRI/
    HTML5 iframe sandbox attribute
      https://developer.mozilla.org/en-US/docs/Web/HTML/Element/iframe

Why not re-design the security model from scratch?
  A1: Backwards compatibility! There's a huge amount of preexisting web
      infrastructure that people rely on.
  A2: How do we know that a new security model would be expressive enough?
      Users won't trade much convenience for security.
  A3: Any security model must evolve.

What ideas could go into an improved design?
  Separate user credentials from other cookies.
    They should obey different rules.
  Explicit indication of what principal to use.
    No ambient authority.
    Page should say how foreign JS should run.
    All fetches should explicitly indicate origin
      and user credentials to use.
  Require less parsing, less escaping.
    E.g. no mixing of JavaScript and HTML.
  Explicit notion of permissions -- access control policy.
  More visual clarity about what user is about to click on.
    Who is showing the button? Where will it go?

Is the bottom line "hopeless mess" or "tricky but adequate"?
  The SOP does prevent a big set of attacks.
  Browser maintainers are serious about fixing problems.
    And they work together quite a bit for semi-standardization.
  Frameworks like Django are helpful.
    Libraries for tricky stuff like parsing / stripping / escaping.
    Automatic deployment of protective mechanisms.

https://lchsk.com/stay-paranoid-and-trust-no-one-overview-of-common-security-vulnerabilities-in-web-applications.html