User authentication =================== Plan: Passwords Criteria (U, D, S) Schemes Show that passwords are good! Intro Last 2 lectures about buffer overrun One particular attack, important for lab 1 This lectures and following about broader problems and techniques Today: user authentication An import problem - Underpinning of many security policies - Interesting technical issues - Easy to do wrong Continues to be a challenging, because security isn't just a technical problem Users pick bad passwords But, passwords have other redeeming properties (easy to use, deployable) Less technical, more qualitative Authentication: who is the user? Challenging to know for sure User registers some secret --- but who registers it? At the scale of MIT, we might be check identity of user when registering Typically settle for weaker guarantee Establish that the user who logs has the secret when registering If so, then assume it is the same user But, we no guarantee that we know the true identity of the user For many usages that is fine E.g., Amazon doesn't really care who you really are as long as you pay Problem: how to authenticate users? Setting: user <-> computer <-> verifier server. Potential extra components might help authentication: A trusted third party. User's portable device (either dedicated or app in mobile phone). A proxy server. This paper proposes a number of criteria to evaluate authentication schemes. Proposed criteria are reasonable; sometimes non-orthogonal, and not complete. Useful as a starting point to think about a new authentication scheme. Starting point: passwords. Need some secret between user and verifier call this set of bits a "password" User types in username and password. Server checks whether password is correct for that username. Passwords is a valuable secret so want to avoid repetitive use and exposure Just for user authentication Once authenticated, use crypto keys between server/clients Client certificates, cookies, etc. Even for user authentication, corner passwords by composing them with other ideas Password manager, single-sign on, two-factor, etc. Progressive authentication Biometric (e.g., apple's fingerprint button) Be careful in combining! How to store passwords? Server must be able to verify passwords. Naive plan: store plaintext passwords. Problem: if adversary compromises server, gets full list of passwords. Hashing: store a table of (username, H(password). Can still check a password: hash the supplied string, compare with table. Hard for adversary to invert the hash function. Problem: password space is quite small. Top 5000 password values account for 20% of users. Skewed distribution towards common passwords chosen by many users. Yahoo password study: rule-of-thumb passwords are 10-20 bits of entropy. roughly password is equivalent to 10 random bits attacker needs try 2^10 combinations to find password Problem: hash functions optimized for performance -- helps adversary here. E.g., my laptop can do ~2M SHA1's (of small blocks) per second. Even with reasonable password (20 bits entropy), crack one account/second. Expensive key-derivation function (e.g., PBKDF2 or BCrypt). Replace the hash with a much more expensive hash function. Key-derivation functions have adjustable cost: make it arbitrarily slow. E.g., can make hash cost be 1 second -- O(1M) times slower than SHA1. Internally, often performs repeated hashing using a slow hash. Problem: adversary can build "rainbow tables". Table of password-to-hash mappings. Expensive to compute, but helps efficiently invert hashes afterwards. Only need to build this rainbow table for dictionary of common passwords. Roughly: 1-second expensive hash -> 1M seconds = 10 days to hash common pws. After that, can very quickly crack common passwords in any password db. Better: use salting. Input some additional randomness into the password hash: H(salt, pw). Where does the salt value come from? Stored on server in plaintext. Why is this better if adversary compromises the salt too? Cannot build rainbow tables. Choose a long random salt. Choose a fresh salt each time user changes password. How to transmit passwords? Poor idea: sending password to the server in cleartext. Slightly better: send password over encrypted connection. Why is this bad? Connection may be intercepted. Shared passwords mean that one server can use password on another server. Strawman alternative: send hash of password, instead of the password. Not so great: hash becomes a "password equivalent", can still be resent. Better alternative: challenge-response scheme. User and server both know password. Server sends challenge R. User responds with H(R || password). Server checks if response is H(R || password). Server convinced user knows password (modulo MITM attacks), if it knew it. Server does not learn password if it didn't already know it. How to prevent server from brute-force guessing password based on H() value? Expensive hash + salting. Allow client to choose some randomness too: guard against rainbow tables. To avoid storing the real password on the server, use protocol like SRP. http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol High-level idea: Given a security parameter g, the client computes this . . . v = g^(hash(salt, password)) . . . and sends v and the salt to the server. The client and the server can then establish an ephemeral key using g and v Difficult for the attacker to perform discrete logarithms modulo N; Implementing challenge/response often means changing the client and the server. Defend against guessing Guessing attacks are a problem because of small key space. To get a sense try Telepathwords (https://telepathwords.research.microsoft.com/) As you type in a potential password letter, tries to guess the next letter -Common passwords (e.g., via leaks of password databases) -Popular phrases from web sites -Common user biases in selecting characters E.g., using adjacent keys for adjacent password characters Kerberos v4, and v5 without preauth: vulnerable to offline guessing [http://www.gnu.org/software/shishi/wu99realworld.pdf] * Anyone could ask the KDC for a ticket encrypted with the user's password, i.e., the KDC did not authenticate requests (although the response would be encrypted with K_c). * Attacker can try brute-force to guess the user's password This is easy to parallelize. * The ticket-granting-ticket has a known format The attacker can determine when a decryption is successful. * In Kerberos v5, ticket requestor must include { timestamp }_{K_c} along with request, to prove knowledge of K_c. Rate-limiting authentication attempts is important. implement time-out periods after too many incorrect guesses. What to do after many failed authentication attempts? What matters in user's password choice? Many sites impose certain requirements on passwords (e.g., length, chars). In reality, what matters is entropy. Format requirements rarely translate into higher entropy. Defeats only the simplest dictionary attacks. Also has an unfortunate side-effect of complicating password generation. E.g., no single password-gen algorithm satisfies every possible web site. Conflicting length, symbol rules. Password distribution "key spaces" are quite small in practice [above]. Password recovery. Important part of the overall security story. Recall story with Sarah Palin's email account, etc. Think of this as yet another authentication mechanism. Composing authentication mechanisms is tricky: are both or either required? Recovery mechanisms are typically "either". Sometimes composing "both" is a good idea: token/paper + password/PIN, etc. In the reading for today, the authors propose a bunch of factors that can be used to evaluate authentication schemes (the goal is to determine whether passwords are as bad as they seem). The authors consider three high-level metrics: usability, deployability, and security. -Usability: How easy is it for users to interact with the authentication scheme? *Easy-to-Learn: "Users who don't know the scheme can figure it out and learn it without too much trouble." -This is a key reason why password schemes are so popular! *Infrequent errors: "The task that users must perform to log in usually succeeds when performed by a legitimate and honest user." -This is an important reason why users pick easy-to-guess passwords. *Scalable-for-Users: "Using the scheme for hundreds of accounts does not increase the burden on the user." -. . . explains why people often reuse passwords or create a simple per-site uniquifying scheme for a base password. *Easy recovery from loss of the authentication token -A win for passwords--they're easy to reset. *Nothing to carry -Another win for passwords. -Deployability: How easy is it to incorporate the authentication method into real systems? *Server-Compatible: "At the verifier's end, the scheme is compatible with text-based passwords. Providers don't have to change their existing authentication setup to support the scheme." *Browser-Compatible: "Users don't have to change their client to support the scheme . . . schemes fail to provide this benefit if they require the installation of plugins or any kind of software whose installation requires administrative privileges." *Accessible: "Users who can use passwords are not prevented from using the scheme by disabilities or other physical (not cognitive) conditions." *Deployability is extremely difficult: it's difficult to get users or servers to update en masse! *Passwords do well in this category by default, since the authors define "deployability" as how well a system integrates with current password infrastructure. However, passwords don't do very well in the next category . . . -Security: What kinds of attacks can the authentication scheme prevent? *Resilient-to-Physical-Observation: "An attacker cannot impersonate a user after observing them authenticate one or more times. We grant Quasi-Resilient-to- Physical-Observation if the scheme could be broken only by repeating the observation more than, say, 10--20 times. Attacks include shoulder surfing, filming the keyboard, recording keystroke sounds, or thermal imaging of keypad." -Passwords fail this test, since, e.g., they can be captured by filming the keyboard or recording keystroke sounds. *Resilient-to-Targeted-Impersonation: "It is not possible for an acquanitance (or skilled investigator) to impersonate a specific user by exploiting knowledge of personal details (birth date, names of relatives etc.). Personal knowledge questions are the canonical scheme that fails on this point." -The authors say that passwords are "quasi-resistant" b/c they couldn't find any studies saying that your friends or acquaintances can easily guess your password. *Resilient-to-Throttled-Guessing: "An attacker whose rate of guessing is constrained by the verifier cannot successfully guess the secrets of a significant fraction of users . . . Lack of this benefit is meant to penalize schemes in which it is frequent for user-chosen secrets to be selected from a small and well-known subset." -Passwords fail because they have low entropy + skewed distributions. *Resilient-to-Unthrottled-Guessing: "An attacker whose rate of guessing is constrained only by available computing resources cannot successfully guess the secrets of a significant fraction of users. We might for example grant this benefit if an attacker capable of attempting up to 2^40 or even 2^64 guesses per account could still only reach fewer than 1% of accounts. Lack of this benefit is meant to penalize schemes where the space of credentials is not large enough to withstand brute force search from a small and well-known subset." -Passwords fail because they have low entropy + skewed distributions. *Resilient-to-Internal-Observation: "An attacker cannot impersonate a user by intercepting the user's input from inside the user's device (e.g., by keylogging malware) or eavesdropping on the cleartext communication between prover and verifier (we assume that the attacker can also defeat TLS if it is used, perhaps through the CA) . . . This penalizes schemes that are not replay-resistant, whether because they send a static response or because their dynamic response countermeasure can be cracked with a few observations. This benefit assumes that general-purpose devices like software-updatable personal computers and mobile phones may contain malware, but that hardware devices dedicated exclusively to the scheme can be made malware-free." -Passwords fail because they are static tokens: once you have one, you can use it until it expires or is revoked. *Resilient-to-Phishing: "An attacker who simulates a valid verifier (including by DNS manipulation) cannot collect credentials that can later be used to impersonate the user to the actual verifier. This penalizes schemes allowing phishers to get victims to authenticate to look-alike sites and later use the harvested credentials against the genuine sites." -Passwords fail: phishing attacks are very common! *No-Trusted-Third-Party: "The scheme does not rely on a trusted third party (other than the prover and the verifier) who could, upon being attacked or otherwise becoming untrustworthy, compromise the prover's security or privacy." -This property makes an important point: a lot of authentication problems would become easier if we could just trust one party to store passwords, run the password servers, etc. However, single points of failure are bad, since attackers can focus all of their energy on that point! *Resilient-to-Leaks-from-Other-Verifiers: "Nothing that a verifier could possibly leak can help an attacker impersonate the user to another verifier. This penalizes schemes where insider fraud at one provider, or a successful attack on one back-end, endangers the user's accounts at other sites." -This property is related to No-Trusted-Third-Party. To avoid a central point of failure, we'd like to introduced some notion of distributed authentication: however, does this mean that the system is only as strong as its weakest link? [Think back to HTTPS, and how a bad certificate authority can convince a browser to accept fake certificates for arbitrary sites. Security depends on the strength of the least secure CA!] -Authors say that passwords fail because people often reuse passwords across sites. Biometrics: Leverage the unique aspects of a person's physical appearance or behavior. -How big is the keyspace? Fingerprints: ~13.3 bits. Iris scan: ~19.9 bits. Voice recognition: ~11.7 bits. So, bits of entropy are roughly the same as passwords. Passwords Biometrics Easy-to-learn: Yes Yes Infrequent errors: Quasi-yes No Scalable for users: No Yes Easy recovery: Yes No Nothing to carry: Yes Yes 3.5 vs 3 Passwords Biometrics Server-compatible: Yes No Browser-compatible: Yes No Accessible: Yes Quasi-yes (entering biometrics is error-prone) 3 vs 0.5 Passwords Biometrics Res-to-Phys-Obs: No Yes Res-to-Trgtd-Imp: Quasi-yes No (e.g., replaying voice recording, lifting fingerprints from surfaces) Res-to-Thrtld-Guess: No Yes Res-to-UnThrtld-Guess: No No (key space isn't much bigger than that of passwords) Res-to-Internal-Obv: No No (captured biometric data can be replayed) Res-to-Phishing: No No No-trusted-3rd-Party: Yes Yes Res-Other-Ver-Leaks: No No (same biometrics are used by all verifiers) 1.5 vs 3 So, final score is 8 vs 6.5. Of course, one could assign non-unity weights to each category, but the point is that it's not obvious that biometrics are "better" than passwords! Some sets of goals seem difficult to achieve at the same time. Memorywise-Effortless + Nothing-to-Carry. Memorywise-Effortless + Resilient-to-Theft. //Either the user remembers something, or //it can be stolen (except for biometrics). Server-Compatible + Resilient-to-Internal-Observation. Server-Compatible + Resilient-to-Leaks-from-Other-Verifiers. //Server compatible means sending a password. //Passwords can be stolen on user machine, //replayed by one server to another. Multi-factor authentication (MFA): defense using depth. -Requires users to authenticate themselves using two or more authentication mechanisms. -The mechanisms should involve different modalities! *Something you know (e.g., a password) *Something you possess (e.g., a cellphone, a hardware token) *Something you are (e.g., biometrics) -Idea is that an attacker must steal/subvert multiple authentication mechanisms to impersonate a user (e.g., attacker might guess a password, but lack access to a user's phone). -Example: Google's two-factor authentication requires a password plus a cellphone which can receive authorization codes via text message. -MFA is a good idea, but empirical studies show that if users are given a second authentication factor in addition to passwords, users pick much weaker passwords! What are potential answers to the homework questions? What factors matter? -Logging into public Athena machine? *Resilient-to-Internal-Observation: easy to install malware on machine. *Resilient-to-Physical-Observation? *MIT IDs could be a good thing to leverage (use them as a smartcard). *Biometrics? Untrusted terminals, probably not a great plan. -Accessing Facebook from Internet cafe? *Password managers not a good idea here. *How sensitive is the data? -Might be leveraged to authenticate to other sites! [Either "Login with Facebook" or by answering personal security questions to reset a password.] -Withdrawing cash from ATM? *Security matters highly. Resilient-to-Physical-Observation. Resilient-to-Theft. *Possibly trusted terminal: biometrics might be worth considering. [However, in practice, bank may not want to trust the terminals.] *You also might care about authenticating individual transactions! -Prevent the adversary from using stolen credentials for different, attacker-chosen operations. -Ex: Maybe user can examine balance using just a password, but if she wants to withdraw money, she uses two-factor authentication using her phone. - This is called "Progressive authentication" Conclusion of paper: There is no authentication scheme which clearly dominates passwords! For example, according to the authors, the CAP reader has perfect scores on security! -The CAP reader was designed by Mastercard to protect online banking transactions. -Usage: 1)Put your credit card into the CAP reader (which looks like a hand-held calculator). 2)Enter PIN (bypassing keyloggers!). 3)Reader talks to the card's embedded processor, outputs an 8-digit code which the user supplies to the web site. CAP reader Easy-to-learn: Yes Infrequent errors: Quasi-yes Scalable for users: No (users require card+PIN per verifier) Easy recovery: No Nothing to carry: No 1.5 CAP reader Server-compatible: No Browser-compatible: Yes Accessible: No (blind people can't read 8-digit code) 1 CAP reader Res-to-Phys-Obs: Yes\ Res-to-Trgtd-Imp: Yes \__ One-time codes! Res-to-Thrtld-Guess: Yes / Res-to-UnThrtld-Guess: Yes/ Res-to-Internal-Obv: Yes Dedicated device Res-to-Phishing: Yes One-time codes No-trusted-3rd-Party: Yes Each site is its own verifier Res-Other-Ver-Leaks: Yes One-time codes 8 -So, passwords=8 and CAP reader=10.5. However, there are reasons why CAP readers haven't taken over the world (see the low usability and deployability scores). -In practice, deployability and usability are often more important than security. *Migration costs (coding+debugging effort, user training) make developers nervous! *The less usable a scheme is, the more that users will complain (and try to pick easier authentication tokens that are more vulnerable to attackers). -Some situations may assign different weights to different evaluation metrics. *Ex: On a military base, the security benefits of a hardware-based token might outweigh the problems with usability and deployability. What other factors should we worry about in user authentication? [sec V.B] Continuous authentication, instead of session start. Migration cost from passwords / incentives for deployment (OpenID). Renewing credentials (Kerberos). Availability / DoS attacks. Why aren't these schemes widely used? No single answer. Convenience of passwords. For many scenarios, security isn't important enough to justify switching cost. Per-user cost on the server, on the user's end, software changes, etc. Limited benefits of some alternative schemes. Often hard for an individual user to improve his/her own security. Perhaps partially fixed with SSO, where users can choose a better IdP. References: Full tech report: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-fueling-the-next-frontier-of-password-cracking/ http://cynosureprime.blogspot.com/2015/09/how-we-cracked-millions-of-ashley.html