User Authentication ================== User authentication An important problem Underpinning of many security policies Interesting technical issues Easy to do wrong Continues to be a challenging, because security isn't just a technical problem Users pick bad passwords But, passwords have other redeeming properties (easy to use, deployable) Recall: where does authentication fit in? Guard model of computer system security client -> request -> server Server contains some resource named by request Server contains a guard that checks each request E.g., function invoked in server's code when each request is handled Complete mediation: all requests checked by a guard 1. server isolation: no way to bypass interface & access resource directly 2. guard invoked on all requests Challenge: how does guard authenticate a request? which principal originated the request? The three parts to user authentication Registration e.g., set up secret between user and guard Authentication check e.g., user sends (user, secret) along with request guard checks checks that secret matches its copy Recovery e.g., users loses secret often overlooked; can be an Achilles heel Many challenges in getting right Challenge: intermediate principals Users rarely issue requests directly From the point of view of the final resource, just got a TCP packet.. Request typically issued via client machine, load-balancer, app server, .. Typically we say these intermediate principals "speak for" the user Important to consider these intermediate entities as principals Forces considering the possibility request isn't actually from user Need to be sure every intermediate principal is actually trustworthy Example: login into a web site Request comes from user device (e.g., phone/laptop) Device is the intermediate principal User types password/code into the device What if the attacker controls the device? Malware/keylogger Attacker can launch man-in-the-middle attack e.g., attacker records password and then uses it later Big problem in practice (more below) Challenge: what is the user's identity? User registers some secret --- who is the user for real? At the scale of MIT, we might be check identity of user when registering Typically settle for weaker guarantee Establish that the user who logs in has the secret when registering If so, then assume it is the same user But, we no guarantee that we know the true identity of the user For many usages that is fine E.g., Amazon doesn't really care who you really are as long as you pay Registration approaches First-come first-served E.g., register for an account at gmail.com Bootstrap from another mechanism E.g., verify via email Created by administrator E.g., new employee at a company Recovery approaches "Security questions": OR policy Verify via email Prove knowledge of credit card number, etc Create a new account (if it's not important to retain same principal / name) Call customer service: can be an escape hatch without a precise policy Often susceptible to social engineering attacks Common secret: passwords Principal and guard share a secret set of bits call this set of bits a "password" User types in username and password. Guard checks whether password is correct for that username. Advantage: easy to use and deploy Disadvantage: passwords are often weak secrets Challenge: passwords are valuable, but often weak Defense: use them as little as possible Just for user authentication Once authenticated, use crypto keys between server/clients Client certificates, cookies, etc. Even for user authentication, corner secret by composing them with other ideas Password manager, single-sign on, two-factor, etc. Biometric (e.g., apple/android fingerprint button) Be careful in combining! Passwords: hard because of human factors [[ slide: common passwords ]] 1. Users choose guessable passwords 20% of accounts use the same set of 5,000 most popular passwords Cannot allow an adversary to make 5,000 guesses at a user's password Cannot allow an adversary to guess "123456" as the password of each user 2. Common passwords contain digits, upper and lower case, etc Is "1Password!" a good password? What matters is entropy: how common is that password? Character requirements not especially helpful Password entropy is usually expressed in terms of bits: A password that is already known has zero bits of entropy One that would be guessed on the first attempt half the time has 1 bit of entropy. A password of 16 bits of entropy requires 2^16 guesses to try all possibilities 3. Passwords are often shared across sites / applications / systems Important when we talk about how to use and store passwords 4. Want to encourage users to choose high-entropy passwords Is it a good idea to frequently change passwords? Depends on the threats Benefits of new passwords: Even if adversary obtained old password, it's no longer useful Maybe this forces the user to not reuse password across sites Downsides of new passwords: User might have a hard time remembering it User might choose a weaker password, or write it down somewhere No clear winning policy Defense: Password managers Users are tempted to use simple passwords Can remember them But low entropy Users are tempted to use same password for different sites Bad idea! Password managers: convenient strong, different passwords Password manager picks password with high entropy Password manager stores different passwords for different sites Optionally: password manager fills password field (e.g., in a browser PM) User must authenticate to password manager User must remember one strong password Password manager is trusted! Defend against guessing Guessing attacks are a problem because of small key space. To get a sense try Telepathwords (https://telepathwords.research.microsoft.com/) As you type in a potential password letter, tries to guess the next letter Common passwords (e.g., via leaks of password databases) Popular phrases from web sites Common user biases in selecting characters Password-encrypted data vulnerable to offline guessing No server involved in checking a guess [[ Semi-related: http://www.gnu.org/software/shishi/wu99realworld.pdf ]] Limiting authentication attempts Don't want to allow an adversary to guess passwords Important to rate-limit login attempts Implement time-out periods after too many incorrect guesses. Limiting per-user might not be enough Adversary can guess "123456" for every username CAPTCHAs? Economic cost of solving CAPTCHAs quite low Most systems have several heuristics to rate-limit password guessing Storing passwords Naive plan: store a table containing (username, password) pairs Risk: adversary that compromises server learns all passwords Problem 1: even after recovery from compromise, must reset all user passwords Problem 2: adversary can use same passwords to log into other services Hashing Store pairs of (username, H(password)) Can still check if supplied password matches, by hashing it Cryptographic hash is one-way, cannot invert Salting Rainbow tables: can build a dictionary of hashes of all common passwords Solution: store (username, salt, H(salt || password)) Can check by hashing supplied password w/ known salt But now the same password can correspond to many different hashes Expensive to build a table of all common salt+password combinations, if salt is large Make hashing expensive Typical crypto hash functions are fast Adversary not rate-limited when guessing against a compromised list of password hashes Solution: use a purposely expensive hash function (called key derivation function, or KDF) Google for bcrypt, scrypt, PBKDF2, .. Augment passwords: two-factor authentication Helps defend against weak passwords and password reuse Helps against MITM and phishing attacks MITM = man in the middle Several common variants 1. Code sent via SMS message to user's cell phone Server stores just the user's phone number (and recently sent code) Advantage: easy to start using Advantage: easy to recover from a lost phone, switching providers, .. Outsource the problem to cell phone carrier, number portability Advantage: server compromise does not break security Downside: trust cell phone network and carrier Downside: require user to be in range of cell phone network Downside: phishing attacks 2. Time-based one-time passwords (TOTP) Server and user device agree on secret value (e.g., scan QR code) User device generates code = H(secret || current time) Server checks that code corresponds to current time Advantage: no need for cell phone network to be available Advantage: no need to trust cell phone carrier Disadvantage: user setup involves installing app, loading secret value Disadvantage: dealing with user changing devices (reload secret value) Disadvantage: server compromise breaks 2FA, need to re-register secrets Disadvantage: still susceptible to phishing attacks 3. U2F (challenge-response) User's USB dongle has a public/secret key pair Server stores USB dongle's public key To log in, server sends random challenge string to user's computer (e.g., browser) Browser sends the server's challenge and identity to USB dongle USB dongle signs (challenge, server identity) with private key Server verifies signature refers to correct challenge and identity Advantage: not susceptible to phishing attacks Advantage: no need for per-server setup Advantage: server compromise does not allow adversary to authenticate later Disadvantage: need special software on user computer (not just typing in code) Disadvantage: user needs to carry dongle U2F protocol https://developers.yubico.com/U2F/Protocol_details/Overview.html Relies on public-key crypto; two operations in particular Sign(Kpriv, m) -> signature sig Verify(Kpub, m, sig) -> ok? State: D: (H/Origin, Kpub, Kpriv) B never sees Kpriv! Even if B is compromised, B cannot steal Kpriv S: (H/Origin, Kpub) B: S's Javascript in browser Base protocol: S->B: challenge B->D: challenge D->B: Sign(Kpriv, challenge) -> signature s B->S: s S: Verify(Kpub, challenge, s) -> ok? challenge is a random number, often called a *nonce* Why will replay attack not work? Consider one: attacker records s when victim logs into bank.com later attacker visits bank.com, logs in as the victim S picks a new challenge, and sends it to attacker when prompted for U2F response, attacker resends recorded s from victim S: will compute Verif(Kpub, new challenge, s), which will fail because D signed the old challenge, not the new one Base protocol, however, vulnerable to Man-in-the-Middle (MITM) attack MITM relays communication between B and S, including registration Example: Attacker sets up bank.secure.com that looks identical to bank.com Trick victim to visit bank.secure.com and enter credentials attacker stores them and logs into bank.com bank.com asks for a challenge attacker forwards it to victim victim's device signs challenge and sends it to bank.secure.com attacker relays it to bank.com bank.com verifies challenge, which checks out bank.com logs attacker in as victim With MITM protection S->B: challenge B->D: CD={challenge, origin, TLS channel id} origin = Hash(protocol || hostname || port) D->B: Signed(CD) B->S: Signed(CD), CD S: check origin, channel id, and signature Does this protect against MITM? Different MITM attacks: 1. MITM masquarades as S, but doesn't have S certificate (fishing) 2. MITM has certificate 1. Attacker tricks user to connect to bank.secure.com for both registration and login Attacker connects to bank.com Attacker forwards challenge to user's browser The user's browser will now forward bank.secure.com to the device if device registered at bank.com, then device would reject request Browser returns signature and attacker forwards to bank.com bank.com receives the response, and sees that the signature verifies correctly bank.com also checks that CD returned from B matches its data origins are different so S doesn't login in attacker 2. Attacker has certificate for S for bank.com defense above won't work; attacker's origin is bank.com browser forwards bank.com to device defense: ChannelID TLS extension value in challenge ChannelID = the public key the server is using for the session victim makes a new connection to attacker, which has a different channelID so the channelID the device signs will not match what the server expects the authentication attempt will fail Other attacks: Attacker compromises client Attacker steals device Attacker supplies bad device to user Other parts of protocol: Privacy Registration Integrity With privacy for user S->B: challenge, handle B->D: keyhandle, CD={challenge, origin, TLS channel id} key handle contains origin D->B: Signed(CD) if key handle matches origin during registration B->S: c, Signed(CD) Server must specify handle and D looks key up by handle Two gmail accounts, but gmail cannot tell that they are for same user. Registration: B->S: Add key to account, origin S: check if this is the correct user B->D: GenKey(origin) Check origin Check if user is present U->S: (H, Kpub) Integrity of D: Attestation key pair for vendor Count of #signature operations (But: maybe bugs in the firmware) Summary User authentication is hard Passwords a long-lasting solution Strengthen passwords with pw manager and 2FA First encounter with crypto: Cryptographic hash function Sign/Verify with public key pair References: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-fueling-the-next-frontier-of-password-cracking/ http://cynosureprime.blogspot.com/2015/09/how-we-cracked-millions-of-ashley.html https://blog.acolyer.org/2017/06/21/the-password-reset-mitm-attack/ https://tools.ietf.org/id/draft-balfanz-tls-channelid-01.html https://developers.yubico.com/U2F/Protocol_details/Overview.html https://www.yubico.com/2017/10/infineon-rsa-key-generation-issue/ https://www.wired.com/story/chrome-yubikey-phishing-webusb/ http://blog.dustinkirkland.com/2013/10/fingerprints-are-user-names-not.html https://www.allthingsauth.com/2018/02/27/sms-the-most-popular-and-lea https://www.dongleauth.info/ https://www.yubico.com/products/manufacturing/ https://blog.duszynski.eu/phishing-ng-bypassing-2fa-with-modlishka/