OAuth ===== Administrivia. Quiz. Will post solutions, grades, grade distribution after lecture. Pick up quiz from TAs. Tricky questions -> low average grades; check the histogram. Lab 5 posted: browser-based attacks in Javascript, etc. Final project. Your choice of topic. Should be security-related, but more important to find interesting project. Groups of 3-4 students. Some idea suggestions posted on the web site. Due: post your final project idea on Piazza by Monday. Due: project proposal on Monday after that. This lecture: single sign-on (SSO) for web applications. The paper is exploring common bugs in OAuth implementations. Recent paper: presented last week at ACM CCS 2012. Somewhat overplays importance of bugs or relevance of bugs to OAuth. "Root cause" of several reported problems has little to do with OAuth. Perhaps there's some tangential way in which OAuth could mitigate problems. In some cases, the threat model and security goal is unclear. Authors seem to lean heavily towards security at all costs. Unclear what fundamental thing is being measured in eval (e.g., correlation). MIT uses another SSO system, called Shibboleth. Interesting final project: study security of MIT's IdP and RP equivalents. What does OAuth provide? Two important aspects, unfortunately combined in OAuth. Problem 1: what is the identity of the user? Authentication. Naive plan: user has an account on each site. Not great: inconvenient, users can't remember too many distinct passwords. SSO: trusted third party helps determine identity of user, like Kerberos. Traditional meaning of single sign-on: no need to sign in for each server. Problem 2: how to access user's resources on another site? Authorization. For example, Facebook allows other sites to access user data via its API. How should users specify which sites can/cannot access their data? Naive plan: user gives their Facebook password to site that needs access. Insecure: site gets complete access to user's Facebook account. OAuth's design mostly revolves around the second problem: authorization. Protocol for accessing data is completely application-specific. OAuth is just about getting the token used in the data access protocol. OAuth includes (app-specific) way to specify needed resources/privileges. Token should be good to access only the specified resources. OAuth can also be used for authentication, in a slightly roundabout way. One of the "resources" that most data access protocols support is user ID. To figure out who user is, ask for access to user's identity. Then, given the access token, fetch user's ID to authenticate the user. OAuth requires servers to know about each other. Need to agree on data access protocol, etc. Annoying: each site must separately add support for Facebook, Google, etc. Alternative protocol: OpenID, just authentication. Authentication can be completely standardized. A site can accept authentication from any other server that user chose. (Just include OpenID server name in the username.) OAuth components. User (human). User's browser. Relying party (RP): server that wants access to user's identity / data. Identity provider (IdP): server that knows your identity / holds your data. RP and IdP must learn about each other beforehand, exchange a secret value. User must have an account with IdP beforehand. But, user does not need to have used RP before -- this is SSO's job. Typical OAuth workflow ("server-flow"). Figure 1 in paper. i: RP's identity as registered with IdP. Why do we need this? Must ask the user who they're giving privileges to. p: Permissions requested by RP (e.g., read user's email, post on their wall). Why do we need this? To ask user / allow certain operations later. r: Redirect URL for later step. Why do we need this? Partly for convenience. a: Some additional state for when login completes. Why do we need this? Help redirect URL match up login attempt & reply. What does IdP check for in (i, p, r)? Permissions p are subset of ones that RP originally promised to ask for. Redirect URL r matches the URL pattern supplied by RP. User might already be logged in to IdP: use cookies to bypass login screen. User must decide whether to allow these specific permissions. c: Authorization code meant for RP. t: Token that gets used in app-specific data access protocol. Token has an expiration time. User can also revoke token through some IdP-specific interface. Why separate 'c' and 't'? Want to limit the damage in case 'c' is compromised. IdP wants to ensure only the intended recipient gets the token. OAuth requires RP's secret value, 's', to get the real token. Why include 'r' when sending 'c' to IdP in order to obtain token 't'? The party that's getting the token wants to know if the token was for it. Suppose Alice logs into malicious server with OAuth. That malicious server may replay its 'c' to another site that Alice uses. 'r' helps RP and IdP check whether 'c' was issued for 'r' or not. Prevents one site from replaying authorization code on different site. OAuth "client-flow". Why? Some applications might be server-less, purely in Javascript on client. Figure 2 in the paper. Difference: IdP sends token 't' directly to RP, instead of code 'c'. Why? RP has no way to keep a secret value if it's running in the browser. How to ensure token is only sent to the appropriate RP? Redirect token to RP's URL. Include token in URL's hash fragment (http://www.rp.com/foo#token). Does not get sent over the network, but can be accessed from Javascript. How does RP's web page talk to IdP's server? Same-origin policy usually prohibits cross-origin communication. Some recent browsers: can use CORS to allow cross-origin requests. Older browsers: spawn an