User authentication
===================

Problem: how to authenticate users?
  Setting: user <-> computer <-> verifier server.
  Potential extra components might help authentication:
    A trusted third party.
    User's portable device (either dedicated or app in mobile phone).
    A proxy server.
  This paper proposes a number of criteria to evaluate authentication schemes.
  Proposed criteria are reasonable; sometimes non-orthogonal, and not complete.
  Useful as a starting point to think about a new authentication scheme.

Standard authentication: passwords.
  User types in username and password.
  Server checks whether password is correct for that username.

How to store passwords?
  Server needs to be able to verify passwords.
  Naive plan: store plaintext passwords.
    Problem: if adversary compromises server, gets full list of passwords.
  Hashing: store a table of (username, H(password)>.
    Can still check a password: hash the supplied string, compare with table.
    Hard for adversary to invert the hash function.
    Problem: password space is quite small.
      Top 5000 password values account for 20% of users.
      Skewed distribution towards common passwords chosen by many users.
      Yahoo password study: rule-of-thumb passwords are 10-20 bits of entropy.
    Problem: hash functions optimized for performance -- helps adversary here.
      E.g., my laptop can do ~2M SHA1's (of small blocks) per second.
      Even with reasonable password (20 bits entropy), crack one account/second.
  Expensive key-derivation function (e.g., PBKDF2 or BCrypt).
    Replace the hash with a much more expensive hash function.
    Key-derivation functions have adjustable cost: make it arbitrarily slow.
      E.g., can make hash cost be 1 second -- O(1M) times slower than SHA1.
    Internally, often performs repeated hashing using a slow hash.
    Problem: adversary can build "rainbow tables".
      Table of password-to-hash mappings.
      Expensive to compute, but helps efficiently invert hashes afterwards.
      Only need to build this rainbow table for dictionary of common passwords.
    Roughly: 1-second expensive hash -> 1M seconds = 10 days to hash common pws.
      After that, can very quickly crack common passwords in any password db.
  Better: use salting.
    Input some additional randomness into the password hash: H(salt, pw).
    Where does the salt value come from?  Stored on server in plaintext.
    Why is this better if adversary compromises the salt too?
      Cannot build rainbow tables.
    Choose a long random salt.
    Choose a fresh salt each time user changes password.

How to transmit passwords?
  Poor idea: sending password to the server in cleartext.
  Slightly better: send password over encrypted connection.
  Why is this bad?
    Connection may be intercepted.
    Shared passwords mean that one server can use password on another server.
  Strawman alternative: send hash of password, instead of the password.
    Not so great: hash becomes a "password equivalent", can still be resent.
  Better alternative: challenge-response scheme.
    User and server both know password.
    Server sends challenge R.
    User responds with H(R || password).
    Server checks if response is H(R || password).
    Server convinced user knows password (modulo MITM attacks), if it knew it.
    Server does not learn password if it didn't already know it.
    How to prevent server from brute-force guessing password based on H() value?
      Expensive hash + salting.
      Allow client to choose some randomness too: guard against rainbow tables.
    To avoid storing the real password on the server, use protocol like SRP.
  But challenge-response requires client-side and server-side changes.

How to check passwords?
  Guessing attacks are a problem because of small key space.
  Kerberos v4, and v5 without preauth: not great -- offline guessing.
  Rate-limiting authentication attempts is important.
  What to do after many failed authentication attempts?

What matters in user's password choice?
  Many sites impose certain requirements on passwords (e.g., length, chars).
  In reality, what matters is entropy.
  Format requirements rarely translate into higher entropy.
  Defeats only the simplest dictionary attacks.
  Also has an unfortunate side-effect of complicating password generation.
    E.g., no single password-gen algorithm satisfies every possible web site.
    Conflicting length, symbol rules.
  Password distribution "key spaces" are quite small in practice [above].

Password recovery.
  Important part of the overall security story.
  Recall story with Sarah Palin's email account, etc.
  Think of this as yet another authentication mechanism.
  Composing authentication mechanisms is tricky: are both or either required?
  Recovery mechanisms are typically "either".
  Sometimes composing "both" is a good idea: token/paper + password/PIN, etc.

What factors do the authors suggest are important in replacement schemes?
  Table I.
  Why are these factors important?
  What are some schemes that fail at each of the factors?
  What are some schemes that manage to achieve each of the factors?

Password managers.
  Why resilient to phishing?
  Why poor quality passwords?

Proxy-based: URRSA.
  What problem does it solve?
  How does it work?
    User has some password P.
    Proxy stores many keys K_i, generates C_i = E_{K_i}(P).
    Proxy keeps track of whether each C_i has been used (initially unused).
    User gets a printout with all C_i values.
    To log in, user sends an unused C_i to proxy.
    Then user visits target login page via proxy server.
    User submits a fake password, proxy replaces fake password with real one.
  How does it rank on the metrics?

Graphical authentication scheme: PCCP.
  Sequence of 5 images, user remembers points on each image.
  Interesting design point: suggest random points to remember on each image.
  How does it rank on the metrics?

Cognitive authentication (human challenge-response): GrIDsure.
  5x5 grid, user chooses sequence of cells when registering.
  As challenge, server populates grid with random numbers.
  As response, user types in numbers from the chosen sequence of cells.
  How does it rank on the metrics?

Single-signon (SSO).
  Like Kerberos, will talk more about OAuth specifically next week.
  "Popular" protocols: OpenID, OAuth.  "Signin with Facebook".
  SSO can be thought of as a meta-authentication system: composes with others.
    Security/usability depends on how user authenticates with identity provider.
    Usability costs amortized over many sites that share an identity provider.
  Many sites want to have direct control over trust relationship w/ user, etc.
    SSO makes one site dependent on another trusted third party.

One-time passwords.
  Hash chaining.
  OTPW: avoid chaining because last password reveals entire chain.

Hardware tokens.
  RSA SecurID.
  Google's 2-step authentication.
    Similar to hash-based challenge-response scheme; implicit challenge = time.

Biometrics.
  How big is the keyspace?
    Fingerprints: ~13.3 bits.
    Iris scan: ~19.9 bits.
    Voice recognition: ~11.7 bits.
  Problem: hard to recover from loss of "personal" information.
  Perhaps more sensible in the presence of trusted biometric reader devices.
    But not practical for web application authentication.

Try ranking some familiar authentication schemes.
  Kerberos v4?  v5?
    Usability: renewing credentials.
  SSL client certificates?
  Challenge-response scheme?
  Bank of America "SiteKey" (recognize image, type in a word)?

How effective is scheme combining?  Which ranking criteria compose well?
  "Either" vs. "both".

What factors are difficult to achieve?  [par 4 of section V]
  Memorywise-Effortless + Nothing-to-Carry.
  Memorywise-Effortless + Resilient-to-Theft.
    Either user remembers something, or it can be stolen, except for biometric.
  Server-Compatible + Resilient-to-Internal-Observation.
  Server-Compatible + Resilient-to-Leaks-from-Other-Verifiers.
    Server compatible means sending a password.
    Passwords can be stolen on user machine, replayed by one server to another.

What are potential answers to the homework questions?  What factors matter?
  Logging into public Athena machine?
    Resilient-to-Internal-Observation: easy to install malware on machine.
    Resilient-to-Physical-Observation: though maybe less so.
    MIT IDs could be a good thing to leverage (smartcard?).
    Biometrics?  Untrusted terminals, probably not a great plan.
    Would some proxy-based schemes work (like URRSA)?
  Checking bank balance via HTTPS from private laptop?
    Less relevant: Resilient-to-Physical-Observation, Resilient-to-Theft.
    Good idea: separate transfer operations from looking at balance.
      "Progressive authentication".
      Would be a good thing for Athena machines too.
      Get access to a browser to check maps or print paper vs. websis.
    Password managers make sense for private/trusted machines.
  Accessing Facebook from Internet cafe?
    Password managers not a good idea here.
    How sensitive is the data?
    Might be leveraged to authenticate to other sites.
    (Either "Login with facebook" or by answering personal questions.)
    Maybe authentication-via-proxy (URRSA).
  Withdrawing cash from ATM?
    Security matters highly.
      Resilient-to-Physical-Observation.
      Resilient-to-Theft.
      Unlinkability less so.
    Possibly trusted terminal: biometrics might be worth considering.
    (Although in practice bank may not want to trust the terminals.)
    One thing that matters but wasn't included in list: authenticating txn.
      Adversary can re-purpose entered credentials for different operation.
      Hardware-token-based or phone-based solution could authenticate txn.
      E.g., H(challenge || pw || withdrawal-amt || atm-location).

What other factors should we worry about in user authentication?  [sec V.B]
  Continuous authentication, instead of session start.
  Migration cost from passwords / incentives for deployment (OpenID).
  Renewing credentials (Kerberos).
  Availability / DoS attacks.

Why aren't these schemes widely used?
  No single answer.
  Convenience of passwords.
  For many scenarios, security isn't important enough to justify switching cost.
    Per-user cost on the server, on the user's end, software changes, etc.
  Limited benefits of some alternative schemes.
  Often hard for an individual user to improve his/her own security.
    Perhaps partially fixed with SSO, where users can choose a better IdP.

References: 
  Full tech report: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf
  http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf

[ For next year:
  Talk less about specific authentication schemes.
  Talk more about range of issues authentication schemes might need to address.
  Talk about how to compose authentication schemes: two-factor, recovery, etc. ]