User authentication
===================

User authentication
  An important problem
    Underpinning of many security policies
    Interesting technical issues
    Easy to do wrong
  Continues to be a challenging, because security isn't just a technical problem
    Users pick bad passwords
    But, passwords have other redeeming properties (easy to use, deployable)
    Less technical, more qualitative

Authentication: who is the user?
  Challenging to know for sure
    User registers some secret --- but who registers it?
    At the scale of MIT, we might be check identity of user when registering
  Typically settle for weaker guarantee
    Establish that the user who logs in has the secret when registering
    If so, then assume it is the same user
    But, we no guarantee that we know the true identity of the user
    For many usages that is fine
      E.g., Amazon doesn't really care who you really are as long as you pay

Problem: how to authenticate users?
  Setting: user <-> computer <-> verifier server.
  Potential extra components might help authentication:
    A trusted third party.
    User's portable device (either dedicated or app in mobile phone).
    A proxy server.
  This paper proposes a number of criteria to evaluate authentication schemes.
  Proposed criteria are reasonable; sometimes non-orthogonal, and not complete.
  Useful as a starting point to think about a new authentication scheme.

Plan:
  Passwords 
  Criteria (U, D, S)
  Schemes 
    Surprisingly, passwords are pretty good!

Starting point: passwords. 
  Need some secret between user and verifier 
    call this set of bits a "password"
  User types in username and password.
  Server checks whether password is correct for that username.
  Passwords is a valuable secret so want to avoid repetitive use and exposure
    Just for user authentication
    Once authenticated, use crypto keys between server/clients
      Client certificates, cookies, etc.
    Even for user authentication, corner passwords by composing them with other ideas
      Password manager, single-sign on, two-factor, etc.
      Progressive authentication
      Biometric (e.g., apple/android fingerprint button)
      Be careful in combining!

Passwords are difficult to get right.
  [[ slide: top password stats ]]

Several important considerations to keep in mind.
  Users choose guessable passwords
    20% of accounts use the same set of 5000 most popular passwords
    Cannot allow an adversary to make 5000 guesses at a user's password
    Cannot allow an adversary to guess "123456" as the password of each user
  Common passwords contain digits, upper and lower case, etc
    Is "1Password!" a good password?
    What matters is entropy: how common is that password?
    Format/character requirements rarely translate into higher entropy.
      Defeats only the simplest dictionary attacks.
      Also has an unfortunate side-effect of complicating password generation.
        E.g., no single password-gen algorithm satisfies every possible web site.
        Conflicting length, symbol rules.
  Passwords are often shared across sites / applications / systems
    Important for authentication protocols, password storage
  Want to encourage users to choose high-entropy passwords
    E.g., is it a good idea to frequently change passwords?
      Depends on the threats
    Benefit of new password: adversary cannot use old passwords they stole
    Benefit of new password: maybe forces the user to not reuse passwords?
      .. unless they all expire and require changing all at the same time
    Downside of new password: user might have a hard time remembering it
      User might choose a weaker password, or write it down somewhere
    No clear winning policy

Defend against guessing
  Guessing attacks are a problem because of small key space.
    To get a sense try Telepathwords (https://telepathwords.research.microsoft.com/)
      As you type in a potential password letter, tries to guess the next letter
      Common passwords (e.g., via leaks of password databases)
      Popular phrases from web sites
      Common user biases in selecting characters
  Password-encrypted data vulnerable to offline guessing
    No server involved in checking a guess
    [[ Semi-related: http://www.gnu.org/software/shishi/wu99realworld.pdf ]]
  Rate-limiting authentication attempts is important.
    Implement time-out periods after too many incorrect guesses.
  What to do after many failed authentication attempts?
  How to rate-limit across users?

How to store passwords?
  Server must be able to verify passwords.
  Naive plan: store plaintext passwords.
    Problem: if adversary compromises server, gets full list of passwords.
  Hashing: store a table of <username, H(password)>.
    [[ slide: hash passwords ]]
    Can still check a password: hash the supplied string, compare with table.
    Hard for adversary to invert the hash function.
    Problem: password space is quite small.
      Top 5000 password values account for 20% of users.
      Skewed distribution towards common passwords chosen by many users.
      Yahoo password study: rule-of-thumb passwords are 10-20 bits of entropy.
        roughly password is equivalent to 10 random bits
        attacker needs try 2^10 combinations to find password
    Problem: hash functions optimized for performance -- helps adversary here.
      E.g., my laptop can do ~2M SHA1's (of small blocks) per second.
      Even with reasonable password (20 bits entropy), crack one account/second.
  Expensive key-derivation function (e.g., PBKDF2 or BCrypt).
    Replace the hash with a much more expensive hash function.
    Key-derivation functions have adjustable cost: make it arbitrarily slow.
      E.g., can make hash cost be 1 second -- O(1M) times slower than SHA1.
    Internally, often performs repeated hashing using a slow hash.
    Problem: adversary can build "rainbow tables".
      Table of password-to-hash mappings.
      Expensive to compute, but helps efficiently invert hashes afterwards.
      Only need to build this rainbow table for dictionary of common passwords.
    Roughly: 1-second expensive hash -> 1M seconds = 10 days to hash common pws.
      After that, can very quickly crack common passwords in any password db.
  Better: use salting.
    [[ slide: password hash salting ]]
    Input some additional randomness into the password hash: H(salt, pw).
    Where does the salt value come from?  Stored on server in plaintext.
    Why is this better if adversary compromises the salt too?
      Cannot build rainbow tables.
    Choose a long random salt.
    Choose a fresh salt each time user changes password.

How to transmit passwords?
  Poor idea: sending password to the server in cleartext.
  Slightly better: send password over encrypted connection.
  Why is this bad?
    Connection may be intercepted.
    Shared passwords mean that one server can use password on another server.
  Strawman alternative: send hash of password, instead of the password.
    Not so great: hash becomes a "password equivalent", can still be resent.
  Better alternative: challenge-response scheme.
    User and server both know password.
    Server sends challenge R.
    User responds with H(R || password).
    Server checks if response is H(R || password).
    Server convinced user knows password (modulo MITM attacks), if it knew it.
    Server does not learn password if it didn't already know it.
    How to prevent server from brute-force guessing password based on H() value?
      Expensive hash + salting.
      Allow client to choose some randomness too: guard against rainbow tables.
    To avoid storing the real password on the server, use protocol like SRP.
      [[ Ref: http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol ]]
    Implementing challenge/response often means changing the client and the server.

Password recovery.
  Important part of the overall security story.
  Recall story with Sarah Palin's email account, etc.
  Think of this as yet another authentication mechanism.
  Composing authentication mechanisms is tricky: are both or either required?
    Recovery mechanisms are typically "either".
    Sometimes composing "both" is a good idea: token/paper + password/PIN, etc.
  Password recovery should reset password, rather than reveal user's old password.

Many proposals to improve on passwords.
  But not that much progress in practice; passwords still widely used.
  Paper's plan: evaluate proposed techniques on three top-level metrics.
    Usability: how hard is it for users to use?
    Deployability: how hard is it for developers to implement / set up?
    Security: what security guarantees does it provide?
  What are the specific sub-metrics, and what do they mean?
    How could an authentication scheme win or lose in each specific dimension?

Usability.
  Easy-to-Learn.
  Infrequent-Errors.
  Scalable-for-Users.
  Easy-Recovery-from-Loss.
  Nothing-to-Carry.
  Efficient-to-Use.
  Memorywise-Effortless.
  Physically-Effortless.

Deployability.
  Server-Compatible.
  Browser-Compatible.
  Accessible.
  Negligible-Cost-per-User.
  Mature.
  Non-Proprietary.

Security.
  Resilient-to-Physical-Observation.
  Resilient-to-Targeted-Impersonation.
  Resilient-to-Throttled-Guessing.
  Resilient-to-Unthrottled-Guessing.
  Resilient-to-Internal-Observation.
  Resilient-to-Leaks-from-Other-Verifiers.
  Resilient-to-Phishing.
  Resilient-to-Theft.
  No-Trusted-Third-Party.
  Requiring-Explicit-Consent.
  Unlinkable.

Password managers.
  Why resistant to phishing?
  Why poor quality passwords?

Proxy-based schemes.
  Problem: log in via untrusted device.
  Some form of one-time passwords with help of a proxy.

Federated SSO.
  Rely on one service to authenticate users.
  Good idea in general.
  Deployment challenges: service must give up some control.
  Still need to authenticate users to the SSO service.

Graphical authentication.
  Goal: encourage user to remember more bits, or higher entropy bits.

Cognitive authentication.
  Challenge-response in the user's head.
  Doesn't seem usable enough for most situations.

Paper tokens.
  One-time passwords.

Visual crypto.
  More resilient than paper tokens to physical observation.

Hardware tokens.
  Expensive.

Phone-based.
  Susceptible to phone compromise.

Biometrics.
  Keyspace is not that large.
    Fingerprints: ~13.3 bits.
    Iris scan: ~19.9 bits.
    Voice recognition: ~11.7 bits.
  Entropy is roughly the same as passwords.
  Works best in trusted terminal scenarios, not paper's target use case.

Recovery.
  In principle, any authentication scheme can be used for recovery.
  Knowledge-based schemes tricky: users likely to forget rarely-used secrets.

Multi-factor authentication.
  Requires users to authenticate using two or more authentication mechanisms.
  Should involve different modalities.
    Something you know (e.g., a password)
    Something you possess (e.g., a cellphone, a hardware token)
    Something you are (e.g., biometrics)
    Most commonly, combine password with cellphone or USB token.
  Less likely that adversary compromises multiple modalities.
  Interesting observation: users choose weaker passwords when using multi-factor.

How should we do account sharing?
  Should happen at another level -- not authentication!

Some sets of goals seem difficult to achieve at the same time.
  Memorywise-Effortless + Nothing-to-Carry.
  Memorywise-Effortless + Resilient-to-Theft.
    Either the user remembers something, or it can be stolen (except for biometrics).

  Server-Compatible + Resilient-to-Internal-Observation.
  Server-Compatible + Resilient-to-Leaks-from-Other-Verifiers.
    Server compatible means sending a password.
    Passwords can be stolen on user machine, replayed by one server to another.

What are potential answers to the homework questions?  What factors matter?
  Logging into public Athena machine?
    Resilient-to-Internal-Observation: easy to install malware on machine.
    Resilient-to-Physical-Observation?
    MIT IDs could be a good thing to leverage (use them as a smartcard).
    Biometrics? Untrusted terminals, probably not a great plan.
  Accessing Facebook from Internet cafe?
    Password managers not a good idea here.
    How sensitive is the data?
      Might be leveraged to authenticate to other sites!
      "Login with Facebook"
      Attacker may be able to answer personal questions to reset a password.
  Withdrawing cash from ATM?
    Different scenario than in the paper: possibly trusted terminal!
    Security matters highly.
      Resilient-to-Physical-Observation.
      Resilient-to-Theft.
    Possibly trusted terminal: biometrics might be worth considering.
      However, in practice, bank may not want to trust the terminals.
    You also might care about authenticating individual transactions!
      Prevent adversary from using stolen credentials for different,
          attacker-chosen operations.
      E.g., maybe user can examine balance using just a password,
          but if she wants to withdraw money, she uses two-factor
          authentication using her phone.
      The Duo system, used by MIT, might be a reasonable fit here.
      This is called "Progressive authentication".
  Checking bank balance via HTTPS from private laptop?
    Less relevant: Resilient-to-Physical-Observation, Resilient-to-Theft.
    "Progressive authentication" probably a good idea for this scenario too.
    Password managers make sense for private/trusted machines.

What's this CAP reader thing that seems to be doing well?
  [[ Ref: https://en.wikipedia.org/wiki/Chip_Authentication_Program ]]
  Dedicated device for authenticating credit card transactions.
  User inserts credit card into CAP reader device.
  User types PIN into the CAP reader (bypassing keyloggers on host).
  CAP reader outputs 8-digit code that user types into web site.

  Strong security, but low on usability and deployability.
  Similar to how credit card terminals work in physical stores.

What other factors should we worry about in user authentication?  [sec V.B]
  Continuous authentication, instead of session start.
  Migration cost from passwords / incentives for deployment (OpenID).
  Renewing credentials (Kerberos).
  Availability / DoS attacks.

Why aren't these schemes widely used?
  No single answer.
  Convenience of passwords.
  For many scenarios, security isn't important enough to justify switching cost.
    Per-user cost on the server, on the user's end, software changes, etc.
  Limited benefits of some alternative schemes.
  Often hard for an individual user to improve his/her own security.
    Perhaps partially fixed with SSO, where users can choose a better IdP.
  Since the paper was published, 2FA more widely deployed, U2F seems promising.

References: 
  Full tech report: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf
  http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf
  http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-fueling-the-next-frontier-of-password-cracking/
  http://cynosureprime.blogspot.com/2015/09/how-we-cracked-millions-of-ashley.html