User Authentication
==================

User authentication
  An important problem
    Underpinning of many security policies
    Interesting technical issues
    Easy to do wrong
  Continues to be a challenging, because security isn't just a technical problem
    Users pick bad passwords
    But, passwords have other redeeming properties (easy to use, deployable)

Recall: where does authentication fit in?
  Guard model of computer system security
    client -> request -> server
  Server contains some resource named by request
  Server contains a guard that checks each request
    E.g., function invoked in server's code when each request is handled
  Complete mediation: all requests checked by a guard
    1. server isolation: no way to bypass interface & access resource directly
    2. guard invoked on all requests

Challenge: intermediate principals
  Users rarely issue requests directly
    From the point of view of the final resource, just got a TCP packet..
  Request typically issued via client machine, load-balancer, app server, ..
    Typically we say these intermediate principals "speak for" the user
  Important to consider these intermediate entities as principals
    Forces considering the possibility request isn't actually from user
  Need to be sure every intermediate principal is actually trustworthy

Example: cross-site request forgery attacks
  Bank web server accepts requests to transfer money
  Requires that request be accompanied by a cookie
  Cookie issued only to user's browser
  But user can browse other web sites
  Browser's policy: any request to bank site is accompanied by cookie
    .. even if request came from another web page
  So bank should explicitly think of "cookie" as principal
  <Cookie> speaks for <user's browser>
  But <user's browser> speaks for many different <web pages>
  Undermines implicit assumption that cookie only used by user
    .. or by bank's own web page

Common approach: passwords
  Need some secret between user and guard
    call this set of bits a "password"
  User types in username and password.
  Guard checks whether password is correct for that username.
  Passwords is a valuable secret so want to avoid repetitive use and exposure
    Just for user authentication
    Once authenticated, use crypto keys between server/clients
      Client certificates, cookies, etc.
    Even for user authentication, corner passwords by composing them with other ideas
      Password manager, single-sign on, two-factor, etc.
      Progressive authentication
      Biometric (e.g., apple/android fingerprint button)
      Be careful in combining!

Challenging to know for sure who is the user for real
  User registers some secret --- but who registers it?
    At the scale of MIT, we might be check identity of user when registering
  Typically settle for weaker guarantee
    Establish that the user who logs in has the secret when registering
    If so, then assume it is the same user
    But, we no guarantee that we know the true identity of the user
    For many usages that is fine
      E.g., Amazon doesn't really care who you really are as long as you pay

Passwords: hard because of human factors
  [[ slide: common passwords ]]

  1. Users choose guessable passwords
    20% of accounts use the same set of 5,000 most popular passwords
    Cannot allow an adversary to make 5,000 guesses at a user's password
    Cannot allow an adversary to guess "123456" as the password of each user
  2. Common passwords contain digits, upper and lower case, etc
    Is "1Password!" a good password?
    What matters is entropy: how common is that password?
      Character requirements not especially helpful
    Password entropy is usually expressed in terms of bits:
      A password that is already known has zero bits of entropy
      One that would be guessed on the first attempt half the time has 1 bit of entropy.
      A password of 16 bits of entropy requires 2^16 guesses to try all possibilities
  3. Passwords are often shared across sites / applications / systems
    Important when we talk about how to use and store passwords
  4. Want to encourage users to choose high-entropy passwords
    Is it a good idea to frequently change passwords?
      Depends on the threats
    Benefits of new passwords:
      Even if adversary obtained old password, it's no longer useful
      Maybe this forces the user to not reuse password across sites
    Downsides of new passwords:
      User might have a hard time remembering it
      User might choose a weaker password, or write it down somewhere
    No clear winning policy

Defense: Password managers
  Users are tempted to use simple passwords
    Can remember them
    But low entropy
  Users are tempted to use same password for different sites
    Bad idea!
  Password managers: convenient strong, different passwords
    Password manager picks password with high entropy
    Password manager stores different passwords for different sites
    Optionally: password manager fills password field (e.g., in a browser PM)
  User must authenticate to password manager
    User must remember one strong password
  Password manager is trusted!

Defend against guessing
  Guessing attacks are a problem because of small key space.
    To get a sense try Telepathwords (https://telepathwords.research.microsoft.com/)
      As you type in a potential password letter, tries to guess the next letter
      Common passwords (e.g., via leaks of password databases)
      Popular phrases from web sites
      Common user biases in selecting characters
  Password-encrypted data vulnerable to offline guessing
    No server involved in checking a guess
    [[ Semi-related: http://www.gnu.org/software/shishi/wu99realworld.pdf ]]

Limiting authentication attempts
  Don't want to allow an adversary to guess passwords
  Important to rate-limit login attempts
    Implement time-out periods after too many incorrect guesses.
  Limiting per-user might not be enough
    Adversary can guess "123456" for every username
  CAPTCHAs?
    Economic cost of solving CAPTCHAs quite low
  Most systems have several heuristics to rate-limit password guessing

Storing passwords
  Naive plan: store a table containing (username, password) pairs
  Risk: adversary that compromises server learns all passwords
  Problem 1: even after recovery from compromise, must reset all user passwords
  Problem 2: adversary can use same passwords to log into other services

Hashing
  Store pairs of (username, H(password))
  Can still check if supplied password matches, by hashing it
  Cryptographic hash is one-way, cannot invert

Salting
  Rainbow tables: can build a dictionary of hashes of all common passwords
  Solution: store (username, salt, H(salt || password))
  Can check by hashing supplied password w/ known salt
  But now the same password can correspond to many different hashes
  Expensive to build a table of all common salt+password combinations, if salt is large

Make hashing expensive
  Typical crypto hash functions are fast
  Adversary not rate-limited when guessing against a compromised list of password hashes
  Solution: use a purposely expensive hash function (called key derivation function, or KDF)
  Google for bcrypt, scrypt, PBKDF2, ..

How to transmit passwords?
  Poor idea: sending password to the server in cleartext.
  Slightly better: send password over encrypted connection.
  Why is this bad?
    Connection may be intercepted.
    Shared passwords mean that one server can use password on another server.
  Strawman alternative: send hash of password, instead of the password.
    Not so great: hash becomes a "password equivalent", can still be resent.
  Better alternative: challenge-response scheme.
    User and server both know password.
    Server sends challenge R.
    User responds with H(R || password).
    Server checks if response is H(R || password).
    Server convinced user knows password (modulo MITM attacks), if it knew it.
    Server does not learn password if it didn't already know it.
    How to prevent server from brute-force guessing password based on H() value?
      Expensive hash + salting.
      Allow client to choose some randomness too: guard against rainbow tables.
    To avoid storing the real password on the server, use protocol like SRP.
      [[ Ref: http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol ]]
    Implementing challenge/response often means changing the client and the server.

Two-factor authentication
  Helps defend against weak passwords and password reuse
  Helps against MITM and phishing attacks
    MITM = man in the middle
  
  Several common variants

  1. Code sent via SMS message to user's cell phone
    Server stores just the user's phone number (and recently sent code)

    Advantage: easy to start using
    Advantage: easy to recover from a lost phone, switching providers, ..
      Outsource the problem to cell phone carrier, number portability
    Advantage: server compromise does not break security
    Downside: trust cell phone network and carrier
    Downside: require user to be in range of cell phone network
    Downside: phishing attacks

  2. Time-based one-time passwords (TOTP)
    Server and user device agree on secret value (e.g., scan QR code)
    User device generates code = H(secret || current time)
    Server checks that code corresponds to current time

    Advantage: no need for cell phone network to be available
    Advantage: no need to trust cell phone carrier
    Disadvantage: user setup involves installing app, loading secret value
    Disadvantage: dealing with user changing devices (reload secret value)
    Disadvantage: server compromise breaks 2FA, need to re-register secrets
    Disadvantage: still susceptible to phishing attacks

  3. U2F (challenge-response)
    User's USB dongle has a public/private key pair
    Server stores USB dongle's public key
    To log in, server sends random challenge string to user's computer (e.g., browser)
    Browser sends the server's challenge and identity to USB dongle
    USB dongle signs (challenge, server identity) with private key
    Server verifies signature refers to correct challenge and identity

    Advantage: not susceptible to phishing attacks
    Advantage: no need for per-server setup
    Advantage: server compromise does not allow adversary to authenticate later
    Disadvantage: need special software on user computer (not just typing in
    code)

U2F protocol

  State:
    D: (H/Origin, Kpub, Kpriv)
      B never sees Kpriv!
      Even if B is compromised, B cannot steal Kpriv
    S: (H/Origin, Kpub)
    B: S's Javascript in browser

  Base protocol:
    S->B: challenge
    B->D: challenge
    D->B: s = signed challenged
    B->S: s
    S: verify s

   challenge is a random number, to ensure for freshness
     response cannot be an old replay

   Man-in-the-Middle (MITM) attack
     MITM relays communication between B and S, including registration
     Different MITM attacks:
       - MITM masquarades as S, but doesn't have S certificate  (fishing)
       - MITM has certifcate

  With MITM protection
    S->B: challenge
    B->D: CD={challenge, origin, TLS channel id}
       origin = Hash(protocol || hostname || port)
    D->B: Signed(CD)
    B->S: Signed(CD), CD
    S: check origin, channel id, and signature
    
    Origin prevents phishing
      S sees that there is MITM, because origins don't match
    What is MITM has S's certificate
      Channel ID prevents MitM
      S sees it isn't its TLS session
    For a detailed discussion see below
    
  With privacy for user
    S->B: challenge, handle
    B->D: keyhandle, CD={challenge, origin, TLS channel id}
       key handle contains origin
    D->B: Signed(CD)
       if key handle matches origin during registration
    B->S: c, Signed(CD)
 
    Server must specify handle and D looks key up by handle
    Two gmail accounts, but gmail cannot tell that they are for same user.

  Registration:
    B->S: Add key to account, origin
     S: check if this is the correct user
    B->D: GenKey(origin)
     Check origin
     Check if user is present
     U->S: (H, Kpub)  

  Integrity of D:
    Attestation key pair for vendor
    Count of #signature operations
    (But: maybe bugs in the firmware)

Biometrics: when is this a useful credential for authenticating users?
  Easy for user to provide (no need to remember a password)
  Hard for user to delegate (difficult to give fingerprint to a friend)
  Easy for adversary to impersonate remotely (e.g., logging in over network)
  Hard to impersonate physically (e.g., authenticate to a door lock, phone, or ATM)
  Hard for user to change in case of compromise

Sessions: connecting principals to requests
  Important to securely associate request with a principal
  Don't want to require a password for every request
  1. Establish a session (e.g., long random session ID)
  2. Authenticate the user in that session
  3. Accept any requests associated with that session as coming from that user

Authentication: bootstrap and reset
  Important to determine how accounts are established
  Typical approaches for establishing accounts:
    First-come first-served
      E.g., register for an account at gmail.com
    Bootstrap from another mechanism
      E.g., verify via email
    Created by administrator
      E.g., new employee at a company
  Reset plans:
    "Security questions": OR policy
    Verify via email
    Prove knowledge of credit card number, etc
    Create a new account (if it's not important to retain same principal / name)
    Call customer service: can be an escape hatch without a precise policy
      Often susceptible to social engineering attacks

References:
  http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf
  http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf
  http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-fueling-the-next-frontier-of-password-cracking/
  http://cynosureprime.blogspot.com/2015/09/how-we-cracked-millions-of-ashley.html
  https://blog.acolyer.org/2017/06/21/the-password-reset-mitm-attack/
  https://tools.ietf.org/id/draft-balfanz-tls-channelid-01.html
  https://developers.yubico.com/U2F/Protocol_details/Overview.html
  https://www.yubico.com/2017/10/infineon-rsa-key-generation-issue/
  https://www.wired.com/story/chrome-yubikey-phishing-webusb/


---- How U2F defeats phishing/MITM attacks

First, let's see how the actual authentication handshake works, and then we can
see how it prevents attacks.

Take a look at the client data that the browser sends to the U2F device when
it's trying to authenticate. It's primarily a random challenge and the host name
of the site that is requesting the login (so the host name that the user sees in
the address bar). These values are also known by the server.

The U2F device signs the hash of this data, and send that signature back to the
server through the browser. The server checks that signature against the public
key it has on file for the U2F device in question, and that the hash that the
device signed matches the hash of the data the server expects to be in the
"client info". If both of those are true, the service authenticates the user.

So how would an attacker compromise this scheme? Well, imagine a case where the
attacker controls everything between the browser and the server, and is trying
to authenticate as the user. The first thing the attacker might try to do is to
re-use a signed hash the client has sent in the past in order to authenticate
themselves later. The attack would go something like this:

Observe that the user in the past sent the signed statement S to bank.com to
authenticate.  Visit bank.com ourselves and log in as the user.  When we're
prompted for the U2F response, just re-send S.  This attack won't work because
of the random challenge (often referred to as a "nonce" after "nonce words" in
linguistics). It is a one-off byte-string that the service includes along with
various other client data to authenticate a client, and that changes with every
authentication attempt. Since this value is included when computing the hash,
the hash that the U2F device has to sign will vary every time. Therefore the
server will not accept S when the attacker tries to sign in, since a different
challenge was issued, and thus the hash in S does not match the hash the server
computes.

What else might the attacker do? Well, they might set up a website that looks
exactly like bank.com somewhere like bank.secure.com that an unsuspecting user
might be confused by. They then trick the user into visiting that site and
entering their login credentials. When the user has sent their username and
password, the attacker stores those values and forwards them to the real
bank.com. bank.com then issues a challenge C that the attacker wants the user's
U2F device to sign.

Let's imagine what happens if the attacker just directly forwards C to the
user's browser. The browser will see that the origin in C (bank.com) doesn't
match the hostname in the current URL (bank.secure.com), and will alert the user
that something phishy is happening. So that doesn't work.

What if the attacker modifies C's origin field to say bank.secure.com (call this
new challenge C')? The user's browser will now forward C' to the device, which
will then sign the hash of C'. The browser then returns that signature to the
attacker, who then tries to send it on to bank.com in the hopes of logging in as
the user. bank.com receives the response, and sees that the signature verifies
correctly, but then it also checks that the hash the U2F device signed matches
the hash over the C that it sent out. It doesn't! hash(C) != hash(C'). And so
the service doesn't let the login attempt proceed.

So, that's how U2F protects against MITM and phishing attacks.

The one thing to be aware of here is that this protection relies on the browser
knowing what hostname it is connected to! If the attacker can somehow manage to
make the browser think that it's talking to bank.com when it's really talking to
the attacker's server, you might worry that this breaks the protection! However,
U2F actually protects even against an adversary that can impersonate the server
in a TLS connection. To understand why, we need to look at the third (optional)
field of the client info: the "ChannelID TLS extension value". This value is
basically the public key the server is using for the session it believes is to
the client. If there was a MITM, they would have to establish a new session with
a different key, and so the hash the device signs still wouldn't match what the
server expects, and the authentication attempt would fail!