User authentication
===================

Plan:
  Passwords 
  Criteria (U, D, S)
  Schemes 
     Show that passwords are good!

Intro
  Last 2 lectures about buffer overrun
    One particular attack, important for lab 1
  This lectures and following about broader problems and techniques
  Today: user authentication
    An import problem
     - Underpinning of many security policies
     - Interesting technical issues
     - Easy to do wrong
    Continues to be a challenging, because security isn't just a technical problem
       Users pick bad passwords
       But, passwords have other redeeming properties (easy to use, deployable)
       Less technical, more qualitative

Authentication: who is the user?
  Challenging to know for sure
    User registers some secret --- but who registers it?
    At the scale of MIT, we might be check identity of user when registering
  Typically settle for weaker guarantee
    Establish that the user who logs has the secret when registering
    If so, then assume it is the same user
    But, we no guarantee that we know the true identity of the user
    For many usages that is fine
      E.g., Amazon doesn't really care who you really are as long as you pay

Problem: how to authenticate users?
  Setting: user <-> computer <-> verifier server.
  Potential extra components might help authentication:
    A trusted third party.
    User's portable device (either dedicated or app in mobile phone).
    A proxy server.
  This paper proposes a number of criteria to evaluate authentication schemes.
  Proposed criteria are reasonable; sometimes non-orthogonal, and not complete.
  Useful as a starting point to think about a new authentication scheme.

Starting point: passwords. 
  Need some secret between user and verifier 
     call this set of bits a "password"
  User types in username and password.
  Server checks whether password is correct for that username.
  Passwords is a valuable secret so want to avoid repetitive use and exposure
     Just for user authentication
     Once authenticated, use crypto keys between server/clients
        Client certificates, cookies, etc.
     Even for user authentication, corner passwords by composing them with other ideas
        Password manager, single-sign on, two-factor, etc.
	Progressive authentication
	Biometric (e.g., apple's fingerprint button)
        Be careful in combining!

How to store passwords?
  Server must be able to verify passwords.
  Naive plan: store plaintext passwords.
    Problem: if adversary compromises server, gets full list of passwords.
  Hashing: store a table of (username, H(password).
    Can still check a password: hash the supplied string, compare with table.
    Hard for adversary to invert the hash function.
    Problem: password space is quite small.
      Top 5000 password values account for 20% of users.
      Skewed distribution towards common passwords chosen by many users.
      Yahoo password study: rule-of-thumb passwords are 10-20 bits of entropy.
        roughly password is equivalent to 10 random bits
        attacker needs try 2^10 combinations to find password
    Problem: hash functions optimized for performance -- helps adversary here.
      E.g., my laptop can do ~2M SHA1's (of small blocks) per second.
      Even with reasonable password (20 bits entropy), crack one account/second.
  Expensive key-derivation function (e.g., PBKDF2 or BCrypt).
    Replace the hash with a much more expensive hash function.
    Key-derivation functions have adjustable cost: make it arbitrarily slow.
      E.g., can make hash cost be 1 second -- O(1M) times slower than SHA1.
    Internally, often performs repeated hashing using a slow hash.
    Problem: adversary can build "rainbow tables".
      Table of password-to-hash mappings.
      Expensive to compute, but helps efficiently invert hashes afterwards.
      Only need to build this rainbow table for dictionary of common passwords.
    Roughly: 1-second expensive hash -> 1M seconds = 10 days to hash common pws.
      After that, can very quickly crack common passwords in any password db.
  Better: use salting.
    Input some additional randomness into the password hash: H(salt, pw).
    Where does the salt value come from?  Stored on server in plaintext.
    Why is this better if adversary compromises the salt too?
      Cannot build rainbow tables.
    Choose a long random salt.
    Choose a fresh salt each time user changes password.

How to transmit passwords?
  Poor idea: sending password to the server in cleartext.
  Slightly better: send password over encrypted connection.
  Why is this bad?
    Connection may be intercepted.
    Shared passwords mean that one server can use password on another server.
  Strawman alternative: send hash of password, instead of the password.
    Not so great: hash becomes a "password equivalent", can still be resent.
  Better alternative: challenge-response scheme.
    User and server both know password.
    Server sends challenge R.
    User responds with H(R || password).
    Server checks if response is H(R || password).
    Server convinced user knows password (modulo MITM attacks), if it knew it.
    Server does not learn password if it didn't already know it.
    How to prevent server from brute-force guessing password based on H() value?
      Expensive hash + salting.
      Allow client to choose some randomness too: guard against rainbow tables.
    To avoid storing the real password on the server, use protocol like SRP.
      http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol
      High-level idea: Given a security parameter
         g, the client computes this . . .
              v = g^(hash(salt, password))
         . . . and sends v and the salt to the server.
	 The client and the server can then establish an ephemeral key using g and v
           Difficult for the attacker to   perform discrete logarithms modulo N;
    Implementing challenge/response often means changing the client and the server.

Defend against guessing
  Guessing attacks are a problem because of small key space.
    To get a sense try Telepathwords (https://telepathwords.research.microsoft.com/)
      As you type in a potential password letter, tries to guess the next letter
        -Common passwords (e.g., via leaks of password databases)
        -Popular phrases from web sites
	-Common user biases in selecting characters
         E.g., using adjacent keys for adjacent password characters
  Kerberos v4, and v5 without preauth: vulnerable to offline guessing
    [http://www.gnu.org/software/shishi/wu99realworld.pdf]
    * Anyone could ask the KDC for a ticket encrypted with the user's password, i.e.,
     the KDC did not authenticate requests (although the response would be encrypted with K_c).
    * Attacker can try brute-force to guess the user's password
     This is easy to parallelize.
    * The ticket-granting-ticket has a known format
      The attacker can determine when a decryption is successful.
    * In Kerberos v5, ticket requestor must include { timestamp }_{K_c} along with
      request, to prove knowledge of K_c.
  Rate-limiting authentication attempts is important.
    implement time-out periods after too many incorrect guesses.
  What to do after many failed authentication attempts?

What matters in user's password choice?
  Many sites impose certain requirements on passwords (e.g., length, chars).
  In reality, what matters is entropy.
  Format requirements rarely translate into higher entropy.
  Defeats only the simplest dictionary attacks.
  Also has an unfortunate side-effect of complicating password generation.
    E.g., no single password-gen algorithm satisfies every possible web site.
    Conflicting length, symbol rules.
  Password distribution "key spaces" are quite small in practice [above].

Password recovery.
  Important part of the overall security story.
  Recall story with Sarah Palin's email account, etc.
  Think of this as yet another authentication mechanism.
  Composing authentication mechanisms is tricky: are both or either required?
  Recovery mechanisms are typically "either".
  Sometimes composing "both" is a good idea: token/paper + password/PIN, etc.

In the reading for today, the authors propose
a bunch of factors that can be used to evaluate
authentication schemes (the goal is to determine
whether passwords are as bad as they seem). The
authors consider three high-level metrics:
usability, deployability, and security.
   -Usability: How easy is it for users to
    interact with the authentication scheme?
        *Easy-to-Learn: "Users who don't know
         the scheme can figure it out and learn
         it without too much trouble."
              -This is a key reason why password
               schemes are so popular!
        *Infrequent errors: "The task that users
         must perform to log in usually succeeds
         when performed by a legitimate and
         honest user."
              -This is an important reason why
               users pick easy-to-guess passwords.
        *Scalable-for-Users: "Using the scheme
         for hundreds of accounts does not
         increase the burden on the user."
              -. . . explains why people often
               reuse passwords or create a simple
               per-site uniquifying scheme for a
               base password.
        *Easy recovery from loss of the
         authentication token
              -A win for passwords--they're easy
               to reset.
        *Nothing to carry
              -Another win for passwords.
   -Deployability: How easy is it to incorporate
    the authentication method into real systems?
        *Server-Compatible: "At the verifier's end,
         the scheme is compatible with text-based
         passwords. Providers don't have to change
         their existing authentication setup to
         support the scheme."
        *Browser-Compatible: "Users don't have
         to change their client to support the
         scheme . . . schemes fail to provide
         this benefit if they require the
         installation of plugins or any kind of
         software whose installation requires
         administrative privileges."
        *Accessible: "Users who can use passwords
         are not prevented from using the scheme
         by disabilities or other physical (not
         cognitive) conditions."

        *Deployability is extremely difficult:
         it's difficult to get users or servers
         to update en masse!
        *Passwords do well in this category by
         default, since the authors define
         "deployability" as how well a system
         integrates with current password
         infrastructure. However, passwords
         don't do very well in the next
         category . . .
   -Security: What kinds of attacks can the
    authentication scheme prevent?
        *Resilient-to-Physical-Observation: "An
         attacker cannot impersonate a user after
         observing them authenticate one or more
         times. We grant Quasi-Resilient-to-
         Physical-Observation if the scheme could
         be broken only by repeating the observation
         more than, say, 10--20 times. Attacks
         include shoulder surfing, filming the
         keyboard, recording keystroke sounds, or
         thermal imaging of keypad."
                 -Passwords fail this test, since,
                  e.g., they can be captured by
                  filming the keyboard or recording
                  keystroke sounds.
        *Resilient-to-Targeted-Impersonation: "It
         is not possible for an acquanitance (or
         skilled investigator) to impersonate a
         specific user by exploiting knowledge of
         personal details (birth date, names of
         relatives etc.). Personal knowledge
         questions are the canonical scheme that
         fails on this point."
                 -The authors say that passwords
                  are "quasi-resistant" b/c they
                  couldn't find any studies saying
                  that your friends or acquaintances
                  can easily guess your password.
        *Resilient-to-Throttled-Guessing: "An
         attacker whose rate of guessing is
         constrained by the verifier cannot
         successfully guess the secrets of a
         significant fraction of users . . .
         Lack of this benefit is meant to
         penalize schemes in which it is frequent
         for user-chosen secrets to be selected
         from a small and well-known subset."
                 -Passwords fail because they
                  have low entropy + skewed
                  distributions.
        *Resilient-to-Unthrottled-Guessing: "An
         attacker whose rate of guessing is
         constrained only by available computing
         resources cannot successfully guess the
         secrets of a significant fraction of users.
         We might for example grant this benefit if
         an attacker capable of attempting up to 2^40
         or even 2^64 guesses per account could
         still only reach fewer than 1% of accounts.
         Lack of this benefit is meant to penalize
         schemes where the space of credentials is
         not large enough to withstand brute force
         search from a small and well-known subset."
                 -Passwords fail because they
                  have low entropy + skewed
                  distributions.
        *Resilient-to-Internal-Observation: "An
         attacker cannot impersonate a user by
         intercepting the user's input from inside
         the user's device (e.g., by keylogging
         malware) or eavesdropping on the cleartext
         communication between prover and verifier
         (we assume that the attacker can also
         defeat TLS if it is used, perhaps through
         the CA) . . . This penalizes schemes that
         are not replay-resistant, whether because
         they send a static response or because
         their dynamic response countermeasure can
	 be cracked with a few observations. This
         benefit assumes that general-purpose devices
         like software-updatable personal computers
         and mobile phones may contain malware,
         but that hardware devices dedicated
         exclusively to the scheme can be made
         malware-free."
                 -Passwords fail because they
                  are static tokens: once you
                  have one, you can use it until
                  it expires or is revoked.
        *Resilient-to-Phishing: "An attacker who
         simulates a valid verifier (including
         by DNS manipulation) cannot collect
         credentials that can later be used to
         impersonate the user to the actual verifier.
         This penalizes schemes allowing phishers
         to get victims to authenticate to look-alike
         sites and later use the harvested
         credentials against the genuine sites."
                 -Passwords fail: phishing attacks
                  are very common!
        *No-Trusted-Third-Party: "The scheme does
         not rely on a trusted third party (other
         than the prover and the verifier) who
         could, upon being attacked or otherwise
         becoming untrustworthy, compromise the
         prover's security or privacy."
              -This property makes an important
               point: a lot of authentication
               problems would become easier if
               we could just trust one party
               to store passwords, run the password
               servers, etc. However, single
               points of failure are bad, since
               attackers can focus all of their
               energy on that point!
        *Resilient-to-Leaks-from-Other-Verifiers:
         "Nothing that a verifier could possibly
         leak can help an attacker impersonate the
         user to another verifier. This penalizes
         schemes where insider fraud at one
         provider, or a successful attack on one
         back-end, endangers the user's accounts
         at other sites."
              -This property is related to
               No-Trusted-Third-Party. To avoid
               a central point of failure, we'd
               like to introduced some notion
               of distributed authentication:
               however, does this mean that the
               system is only as strong as its
               weakest link? [Think back to HTTPS,
               and how a bad certificate authority
               can convince a browser to accept
               fake certificates for arbitrary
               sites. Security depends on the
               strength of the least secure CA!]
              -Authors say that passwords fail
               because people often reuse passwords
               across sites.

Biometrics: Leverage the unique aspects of a
person's physical appearance or behavior.
   -How big is the keyspace?
        Fingerprints: ~13.3 bits.
        Iris scan: ~19.9 bits.
        Voice recognition: ~11.7 bits.
    So, bits of entropy are roughly the same
    as passwords.
    
                        Passwords        Biometrics
    Easy-to-learn:      Yes              Yes
    Infrequent errors:  Quasi-yes        No
    Scalable for users: No               Yes
    Easy recovery:      Yes              No
    Nothing to carry:   Yes              Yes
                             3.5 vs 3    

                        Passwords        Biometrics
   Server-compatible:   Yes              No
   Browser-compatible:  Yes              No
   Accessible:          Yes              Quasi-yes (entering biometrics is error-prone)    
                             3 vs 0.5
    
                        Passwords        Biometrics
Res-to-Phys-Obs:        No               Yes
Res-to-Trgtd-Imp:       Quasi-yes        No (e.g., replaying voice recording, lifting fingerprints from surfaces)
Res-to-Thrtld-Guess:    No               Yes
Res-to-UnThrtld-Guess:  No               No (key space isn't much bigger than that of passwords)
Res-to-Internal-Obv:    No               No (captured biometric data can be replayed)
Res-to-Phishing:        No               No
No-trusted-3rd-Party:   Yes              Yes
Res-Other-Ver-Leaks:    No               No (same biometrics are used by all verifiers)
                            1.5 vs 3

So, final score is 8 vs 6.5. Of course,
one could assign non-unity weights to
each category, but the point is that it's
not obvious that biometrics are "better"
than passwords!

Some sets of goals seem difficult to achieve
at the same time.
   Memorywise-Effortless + Nothing-to-Carry.
   Memorywise-Effortless + Resilient-to-Theft.
        //Either the user remembers something, or
        //it can be stolen (except for biometrics).

   Server-Compatible + Resilient-to-Internal-Observation.
   Server-Compatible + Resilient-to-Leaks-from-Other-Verifiers.
        //Server compatible means sending a password.
        //Passwords can be stolen on user machine,
        //replayed by one server to another.

Multi-factor authentication (MFA): defense using depth.
   -Requires users to authenticate themselves using two or more authentication mechanisms.
   -The mechanisms should involve different modalities!
        *Something you know (e.g., a password)
        *Something you possess (e.g., a cellphone,
         a hardware token)
        *Something you are (e.g., biometrics)
   -Idea is that an attacker must steal/subvert
    multiple authentication mechanisms to
    impersonate a user (e.g., attacker might guess
    a password, but lack access to a user's phone).
   -Example: Google's two-factor authentication
    requires a password plus a cellphone which
    can receive authorization codes via text
    message.
   -MFA is a good idea, but empirical studies
    show that if users are given a second
    authentication factor in addition to
    passwords, users pick much weaker passwords!

What are potential answers to the homework
questions? What factors matter?
   -Logging into public Athena machine?
        *Resilient-to-Internal-Observation: easy to
         install malware on machine.
        *Resilient-to-Physical-Observation?
        *MIT IDs could be a good thing to leverage
         (use them as a smartcard).
        *Biometrics? Untrusted terminals, probably
         not a great plan.
   -Accessing Facebook from Internet cafe?
        *Password managers not a good idea here.
        *How sensitive is the data?
              -Might be leveraged to authenticate
               to other sites! [Either "Login with
               Facebook" or by answering personal
               security questions to reset a
               password.]
   -Withdrawing cash from ATM?
        *Security matters highly.
               Resilient-to-Physical-Observation.
               Resilient-to-Theft.
        *Possibly trusted terminal: biometrics
         might be worth considering. [However,
         in practice, bank may not want to
         trust the terminals.]
        *You also might care about authenticating
         individual transactions!
              -Prevent the adversary from using
               stolen credentials for different,
               attacker-chosen operations.
              -Ex: Maybe user can examine balance
               using just a password, but if she
               wants to withdraw money, she uses
               two-factor authentication using
               her phone.
              - This is called "Progressive authentication"

Conclusion of paper: There is no authentication
scheme which clearly dominates passwords! For
example, according to the authors, the CAP
reader has perfect scores on security!
   -The CAP reader was designed by Mastercard
    to protect online banking transactions.
   -Usage:
        1)Put your credit card into the CAP
          reader (which looks like a hand-held
          calculator).
        2)Enter PIN (bypassing keyloggers!).
        3)Reader talks to the card's embedded
          processor, outputs an 8-digit code
          which the user supplies to the
          web site.

                        CAP reader
    Easy-to-learn:      Yes
    Infrequent errors:  Quasi-yes
    Scalable for users: No (users require card+PIN per verifier)
    Easy recovery:      No
    Nothing to carry:   No
                             1.5

                        CAP reader
   Server-compatible:   No
   Browser-compatible:  Yes
   Accessible:          No (blind people can't read 8-digit code)
                              1
    
                        CAP reader
Res-to-Phys-Obs:        Yes\
Res-to-Trgtd-Imp:       Yes \__ One-time codes!
Res-to-Thrtld-Guess:    Yes /
Res-to-UnThrtld-Guess:  Yes/
Res-to-Internal-Obv:    Yes     Dedicated device
Res-to-Phishing:        Yes     One-time codes
No-trusted-3rd-Party:   Yes     Each site is its own verifier
Res-Other-Ver-Leaks:    Yes     One-time codes
                              8
   -So, passwords=8 and CAP reader=10.5. However,
    there are reasons why CAP readers haven't
    taken over the world (see the low usability
    and deployability scores).
   -In practice, deployability and usability are
    often more important than security.
         *Migration costs (coding+debugging effort,
          user training) make developers nervous!
         *The less usable a scheme is, the more
          that users will complain (and try to
          pick easier authentication tokens that
          are more vulnerable to attackers).
   -Some situations may assign different weights
    to different evaluation metrics.
         *Ex: On a military base, the security
          benefits of a hardware-based token might
          outweigh the problems with usability
          and deployability.

What other factors should we worry about in user authentication?  [sec V.B]
  Continuous authentication, instead of session start.
  Migration cost from passwords / incentives for deployment (OpenID).
  Renewing credentials (Kerberos).
  Availability / DoS attacks.

Why aren't these schemes widely used?
  No single answer.
  Convenience of passwords.
  For many scenarios, security isn't important enough to justify switching cost.
    Per-user cost on the server, on the user's end, software changes, etc.
  Limited benefits of some alternative schemes.
  Often hard for an individual user to improve his/her own security.
    Perhaps partially fixed with SSO, where users can choose a better IdP.

References: 
  Full tech report: http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-817.pdf
  http://www.cl.cam.ac.uk/~jcb82/doc/B12-IEEESP-analyzing_70M_anonymized_passwords.pdf
  http://arstechnica.com/security/2013/10/how-the-bible-and-youtube-are-fueling-the-next-frontier-of-password-cracking/
  http://cynosureprime.blogspot.com/2015/09/how-we-cracked-millions-of-ashley.html