User Authentication
===================

User authentication
  An important problem
    Underpinning of many security policies
    Interesting technical issues
    Easy to do wrong
  Continues to be a challenging, because security isn't just a technical problem
    Users pick bad passwords
    But, passwords have other redeeming properties (easy to use, deployable)

Recall: where does authentication fit in?
  Guard model of computer system security
    client -> request -> server
  Server contains some resource named by request
  Server contains a guard that checks each request
    E.g., function invoked in server's code when each request is handled
  Complete mediation: all requests checked by a guard
    1. server isolation: no way to bypass interface & access resource directly
    2. guard invoked on all requests
  Challenge: how does guard authenticate a request?
    which principal originated the request?

The three parts to user authentication
  Registration
    e.g., set up secret between user and guard
  Authentication check
    e.g., user sends (user, secret) along with request
    guard checks checks that secret matches its copy
  Recovery
    e.g., users loses secret
    often overlooked; can be an Achilles heel
  Many challenges in getting right

Challenge: intermediate principals
  Users rarely issue requests directly
    From the point of view of the final resource, just got a TCP packet..
  Request typically issued via client machine, load-balancer, app server, ..
    Typically we say these intermediate principals "speak for" the user
  Important to consider these intermediate entities as principals
    Forces considering the possibility request isn't actually from user
  Need to be sure every intermediate principal is actually trustworthy

Example: login into a web site
  Request comes from user device (e.g., phone/laptop)
    Device is the intermediate principal
  User types password/code into the device
  What if the attacker controls the device?
    Malware/keylogger
  Attacker can launch man-in-the-middle attack
    e.g., attacker records password and then uses it later
  Big problem in practice (more below)

Challenge: what is the user's identity?
  User registers some secret --- who is the user for real?
    At the scale of MIT, we might be check identity of user when registering
  Typically settle for weaker guarantee
    Establish that the user who logs in has the secret when registering
    If so, then assume it is the same user
    But, we no guarantee that we know the true identity of the user
    For many usages that is fine
      E.g., Amazon doesn't really care who you really are as long as  you pay

Registration approaches
  First-come first-served
    E.g., register for an account at gmail.com
  Bootstrap from another mechanism
    E.g., verify via email
  Created by administrator
    E.g., new employee at a company

Recovery approaches
  "Security questions": OR policy
  Verify via email
  Prove knowledge of credit card number, etc
  Create a new account (if it's not important to retain same principal / name)
  Call customer service: can be an escape hatch without a precise policy
    Often susceptible to social engineering attacks

Common secret: passwords
  Principal and guard share a secret set of bits
    call this set of bits a "password"
  User types in username and password.
  Guard checks whether password is correct for that username.
  Advantage: easy to use and deploy
  Disadvantage: passwords are often weak secrets

Challenge: passwords are valuable, but often weak
  Defense: use them as little as possible
    Just for user authentication
  Once authenticated, use crypto keys between server/clients
     Client certificates, cookies, etc.
  Even for user authentication, corner secret by composing them with other ideas
    Password manager, single-sign on, two-factor, etc.
    Biometric (e.g., apple/android fingerprint button)
      Be careful in combining!

Passwords: hard because of human factors
  [[ Ref: https://css.csail.mit.edu/6.858/2022/lec/l03-user-auth.pdf ]]

  1. Users choose guessable passwords
    20% of accounts use the same set of 5,000 most popular passwords
    Cannot allow an adversary to make 5,000 guesses at a user's password
    Cannot allow an adversary to guess "123456" as the password of each user
  2. Common passwords contain digits, upper and lower case, etc
    Is "1Password!" a good password?
    What matters is entropy: how common is that password?
      Character requirements not especially helpful
    Password entropy is usually expressed in terms of bits:
      A password that is already known has zero bits of entropy
      One that would be guessed on the first attempt half the time has 1 bit of entropy.
      A password of 16 bits of entropy requires 2^16 guesses to try all possibilities
  3. Passwords are often shared across sites / applications / systems
    Important when we talk about how to use and store passwords
  4. Want to encourage users to choose high-entropy passwords
    Is it a good idea to frequently change passwords?
      Depends on the threats
    Benefits of new passwords:
      Even if adversary obtained old password, it's no longer useful
      Maybe this forces the user to not reuse password across sites
    Downsides of new passwords:
      User might have a hard time remembering it
      User might choose a weaker password, or write it down somewhere
    No clear winning policy

Defense: Password managers
  Users are tempted to use simple passwords
    Can remember them
    But low entropy
  Users are tempted to use same password for different sites
    Bad idea!
  Password managers: convenient strong, different passwords
    Password manager picks password with high entropy
    Password manager stores different passwords for different sites
    Optionally: password manager fills password field (e.g., in a browser PM)
  User must authenticate to password manager
    User must remember one strong password
  Password manager is trusted!

Defend against guessing
  Guessing attacks are a problem because of small key space.
    Adversary has access to lots of information about password distributions.
    Common passwords (e.g., via leaks of password databases).
    Popular phrases from web sites.
    Common user biases in selecting characters.
  Password-encrypted data vulnerable to offline guessing
    No server involved in checking a guess
    [[ Semi-related: http://www.gnu.org/software/shishi/wu99realworld.pdf ]]

Limiting authentication attempts
  Don't want to allow an adversary to guess passwords
  Important to rate-limit login attempts
    Implement time-out periods after too many incorrect guesses.
  Limiting per-user might not be enough
    Adversary can guess "123456" for every username
  CAPTCHAs?
    Economic cost of solving CAPTCHAs quite low
  Most systems have several heuristics to rate-limit password guessing

Storing passwords
  Naive plan: store a table containing (username, password) pairs
  Risk: adversary that compromises server learns all passwords
  Problem 1: even after recovery from compromise, must reset all user passwords
  Problem 2: adversary can use same passwords to log into other services

Hashing
  Store pairs of (username, H(password))
  Can still check if supplied password matches, by hashing it
  Cryptographic hash is one-way, cannot invert

Salting
  Rainbow tables: can build a dictionary of hashes of all common passwords
  Solution: store (username, salt, H(salt || password))
  Can check by hashing supplied password w/ known salt
  But now the same password can correspond to many different hashes
  Expensive to build a table of all common salt+password combinations, if salt is large

Make hashing expensive
  Typical crypto hash functions are fast
  Adversary not rate-limited when guessing against a compromised list of password hashes
  Solution: use a purposely expensive hash function (called key derivation function, or KDF)
  Google for bcrypt, scrypt, PBKDF2, ..

Augment passwords: two-factor authentication
  Helps defend against weak passwords and password reuse
  Helps against MITM and phishing attacks
    MITM = man in the middle

  Several common variants

2FA code sent via SMS message to user's cell phone
  Server stores just the user's phone number (and recently sent code)

  Advantage: easy to start using
  Advantage: easy to recover from a lost phone, switching providers, ..
    Outsource the problem to cell phone carrier, number portability
  Advantage: server compromise does not break security
  Downside: trust cell phone network and carrier
  Downside: require user to be in range of cell phone network
  Downside: phishing attacks

2FA with time-based one-time passwords (TOTP)
  Server and user device agree on secret value (e.g., scan QR code)
  User device generates code = H(secret || current time)
  Server checks that code corresponds to current time

  Advantage: no need for cell phone network to be available
  Advantage: no need to trust cell phone carrier
  Disadvantage: user setup involves installing app, loading secret value
  Disadvantage: dealing with user changing devices (reload secret value)
  Disadvantage: server compromise breaks 2FA, need to re-register secrets
  Disadvantage: still susceptible to phishing attacks

2FA with challenge-response (U2F)
  User's USB dongle has a public/secret key pair
  Server stores USB dongle's public key
  To log in, server sends random challenge string to user's computer (e.g., browser)
  Browser sends the server's challenge and identity to USB dongle
  USB dongle signs (challenge, server identity) with private key
  Server verifies signature refers to correct challenge and identity

  [[ Ref: https://developers.yubico.com/U2F/Protocol_details/Overview.html ]]

  Advantage: not susceptible to phishing attacks
  Advantage: no need for per-server setup
  Advantage: server compromise does not allow adversary to authenticate later
  Disadvantage: need special software on user computer (not just typing in code)
  Disadvantage: user needs to carry dongle

U2F protocol relies on public-key crypto; two operations in particular
  Sign(Kpriv, m) -> signature sig
  Verify(Kpub, m, sig) -> ok?

U2F state:
  Device (D): (H/Origin, Kpub, Kpriv)
  Server (S): (H/Origin, Kpub)
  Browser (B): running S's Javascript in browser
    Never sees D's Kpriv!
    Even if B is compromised, B cannot steal Kpriv

Simplified starting point for U2F protocol:
  S->B: challenge
  B->D: challenge
  D->B: Sign(Kpriv, challenge) -> signature s
  B->S: s
  S: Verify(Kpub, challenge, s) -> ok?

  Challenge is a random number, often called a "nonce"

  Why will replay attack not work?
  Consider attack steps:
    - Attacker records s when victim logs into bank.com
    - Later attacker visits bank.com, logs in as the victim
      S picks a new challenge, and sends it to attacker
    - When prompted for U2F response, attacker resends recorded s from victim
    - S will compute Verif(Kpub, new challenge, s), which will fail
      because D signed the old challenge, not the new one

Problem 1 with simplified protocol: phishing attack
  User visits adversary's web site (bad.com)
  Adversary (bad.com) visits a real site (bank.com), gets challenge for user login
  bad.com sends challenge to user
  User's device signs challenge
  bad.com sends the challenge to the real site (bank.com), login succeeds

U2F protects against phishing attacks by tying challenge to server's identity
  S->B: challenge
  B->D: CD={challenge, origin}
    where origin is really H(protocol || hostname || port)
  D->B: Sign(Kpriv, CD) -> signature s
  B->S: s, CD
  S: Verify(Kpub, CD, s), and check that CD contains expected challenge + origin

  Adversary will cause the browser B to send different CD, so different signature

Problem 2 with simplified protocol: can link devices across sites
  Servers can compare their Kpub values to learn which users are the same.
  Privacy problem; would not be able to tell user is same with passwords.

U2F protects against cross-site linking with per-site key registration.
  At registration time, device generates fresh key (Kpub, Kpriv).
  Device sends out Kpub in the clear, but sends Kpriv encrypted with Kwrap.
  Kwrap stored on device (the only key that's stored long-term).
  Encryption of Kpriv is associated with origin.
  "Handle" is just this encrypted Kpriv.
  On later challenge, device decrypts handle into Kpriv, checks origin.

Other problems with simplified protocol:
  TLS MITM detection.  Somewhat different threat.
    Add TLS channel ID along with origin.
    Good point though: having a secret allows authenticating both ends.
  Device cloning detection.  Again quite different kind of threat.
    Counter is monotonically increasing.  Might decrease if cloned.
  Device attestation.
    Give server assurance that user is using a trustworthy security key.
    Sort-of like User-Agent for security keys, with signature.

Other attacks:
  Attacker compromises client.
    Could steal session cookie after login.
    Could trick user into pushing security key button for login into another site.
  Attacker steals device.
    Still need user's password, if strong.
  Attacker supplies bad device to user.
    Maybe device attestation will catch this and prevent registration.

What doesn't U2F solve?
  Some privacy concerns.
    Counter, device certificate might leak information about user.
  Not quite enough to be the only means of authentication.
    Some work on FIDO CTAP2 to support fully password-less authentication.
    E.g., can store username to identify user to web site / server.
    [[ Ref: https://www.imperialviolet.org/2018/03/27/webauthn.html ]]
  Authenticates the user, but not what the user wants to do.
    Susceptible to malware on the user's PC hijacking the user's session.
    Could do better if trusted security device could sign user's operation.
    Webauthn transaction approval (specified but not really used in practice).
    "Hardware wallet" devices for cryptocurrency transactions.

Summary
  User authentication is hard
  Passwords a long-lasting solution
  Strengthen passwords with pw manager and 2FA
  First encounter with crypto:
    Cryptographic hash function
    Sign/Verify with public key pair

References:
  http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf
  http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf
  http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-fueling-the-next-frontier-of-password-cracking/
  http://cynosureprime.blogspot.com/2015/09/how-we-cracked-millions-of-ashley.html
  https://blog.acolyer.org/2017/06/21/the-password-reset-mitm-attack/
  https://tools.ietf.org/id/draft-balfanz-tls-channelid-01.html
  https://developers.yubico.com/U2F/Protocol_details/Overview.html
  https://www.yubico.com/2017/10/infineon-rsa-key-generation-issue/
  https://www.wired.com/story/chrome-yubikey-phishing-webusb/
  http://blog.dustinkirkland.com/2013/10/fingerprints-are-user-names-not.html
  https://www.allthingsauth.com/2018/02/27/sms-the-most-popular-and-lea
  https://www.dongleauth.info/
  https://www.yubico.com/products/manufacturing/
  https://blog.duszynski.eu/phishing-ng-bypassing-2fa-with-modlishka/