Certificates
============

Paper: "SoK: SSL and HTTPs: Revisiting past challenges and evaluating
certificate trust model enhancements" by Clark and van Oorschot, 2013.

Today: Web Certificates
  The basic plan
  Problems that arose
  Evolution to fix security problems

The original context
  Mid-1990s -- commercial web just getting started
  A few people thought online commerce could someday be important
  But would people trust credit card numbers to the Internet?
  The plan: SSL encryption and certificates would make people feel safe.

The starting point: an https URL
  You want to buy something from amazon.com,
    and you need to send them your credit card number.
  You visit https://amazon.com
  "https" tells your browser to use HTTP inside SSL (today, TLS)

Recall SSL (much simplified, and certificate omitted)
  [diagram]
  C is a browser, S is a web server (e.g. amazon.com).
  C --> S: connect
  C <-- S: PK_S
  C --> S: Encrypt(PK_S, freshKey)
  C <-> S: Encrypt(freshKey, data)
  From here on, ordinary HTTP/HTML inside SSL.

What does SSL alone buy us? (without certificate)
  A private authenticated connection, which is great.
  With someone -- but whom?
  What if it was the *attacker* who sent PK_S?
    Possible: last week's paper about TCP/IP isn't secure.
    So now the attacker can decrypt your credit card number!
  Encryption is worth little if you're not sure who you're talking to.
  Client needs to authenticate the server.

How can the client check that PK_S is really amazon.com's public key?
  We need "key distribution".
  Key distribution is often a killer problem.
  Lots of solution approaches.
    Certificates are one style.
    Athena's Kerberos is another example, quite different.

SSL includes server's certificate to convince client of its public key.

What's in the certificate that amazon.com sends to the browser?
  Your browser will show you -- click on the lock icon.
  subject name -- the server's DNS name, e.g. "amazon.com"
  subject public key
  expiration -- typically a year or two
  issuer ID -- CA name
  issuer's signature over the above

What's a Certificate Authority (CA)?
  An organization trusted to validate DNSname -> public key bindings.
  How to obtain a certificate?
  Amazon generates public/private key pair, keeps private part secret.
  Amazon sends public key and proof of ownership to CA.
  CA validates the proof of ownership, may reject it.
    e.g. CA checks that the request came from an authorized
    employee of Amazon Inc, and that Amazon Inc owns the DNS
    name "amazon.com".
  CA generates a cert, signs it with CA's private key,
    and sends cert to amazon.com.
  amazon.com can use that cert until it expires (a year or two).

What does the browser do with a certificate from SSL?
  Many checks:
    Subject name in cert == name in URL.
    CA name is recognized, and browser knows CA's public key.
      Browser has a list of "root certificates" w/ public keys.
    Cert is signed by CA's public key.
    Cert has not expired.
    Cert has not been revoked (more on this in a bit).
    Server knows the private key corresponding to cert public key.
  If the checks succeed:
    Display the web page.
    Chrome, Firefox: show a gray lock icon next to the URL.
  If the checks don't succeed:
    Don't display the web page.
    Show the user an error, e.g. "invalid certificate".

Why not ask the CA directly for the server name's public key?
  I.e., why certificates?
  I.e., why not an on-line Public Key Infrastructure (PKI) service?
  [diagram]
  An on-line PKI service would be slow, susceptible to attack:
    An extra Internet round trip.
    CA's server might be down or slow.
    Heavy load on CA's server.
    Might expose CA signing key to direct Internet attacks.
    CA server could track and record your activities.
  Certificates are neat:
    Web server can prove its own identity; self-contained.
    CA is off the critical path.

How do certificates help against attacks?
  [diagram]
  You type https://amazon.com
  The attacker diverts your packets to the attacker's server
    Attacker intercepts your packets locally, via WiFi or LAN (easy)
    Attacker causes you to use wrong DNS info (hard)
    Attacker modified internet routes, or broke into core routers (harder)
  Attacker's server can produce real amazon.com certificate
    Certificates are public!
    But attacker doesn't know amazon.com's private key
    So won't be able to decrypt the key material sent by client
    So attacker won't be able to decrypt what you send it.
    Successful defense!
  Attacker's server can produce an attacker.com certificate
    But browser will reject -- attacker.com (in cert) != amazon.com (in URL)
    Successful defense!
  What other avenues can the attacker try?
    Trick you into connecting to the wrong DNS name (phishing).
    Trick you into not using HTTPS at all, but rather HTTP.
    Trick a sloppy CA in to issuing attacker an amazon.com cert.

What's the positive story for what SSL+certs guarantee?
  i.e. what does the lock icon actually mean?
  The browser displays the URL you're connected to.
  The lock means the CA believes that the server you've connected to
    is the real owner of the host name in the displayed URL.
  If you want amazon.com, and you see a lock, and you
    see the URL hostname is amazon.com, you can feel reasonably
    safe that you're look at the real amazon.com. It's likely OK
    to type your password for amazon.com.
  What if you don't inspect the lock and URL?
    I.e. you're a typical web user.
  If you start at a web site X, and it's https, and you trust
    X to be benign and careful, and X generates only
    https links, and you click a link without inspecting it,
    then you can be pretty sure the link will take you to
    the site that X wanted you to visit.
    Helps security when navigating within your bank or e-mail site,
      or if you click on a link in a secure bookmark list.
  Otherwise, if you don't look for the lock, or you don't look at the URL's
    host name, or you're not sure what host name you intend to
    connect to, or you ignore error messages,
    then SSL encryption may not be doing you much good.
    e.g. maybe you have an encrypted connection to https://g00gle.com
  The guarantee is not "this page is safe" but the
    much narrower "you are talking to the owner of the DNS
    name in the URL."
  Further complications arise with large-scale web sites.
    What if your web site uses a CDN (e.g., Akamai or CloudFlare)?
    Does the CDN need to have your private key?
    What guarantee is the certificate providing?
  SSL+certificates are powerful but may require educated careful users.

What about revocation?
  Suppose amazon.com domain name is sold to some other company?
    E.g., Facebook purchased fb.com in 2010.
  Or someone steals amazon.com's private key? e.g. via Heartbleed bug.
  The old private key and certificate will still look valid!
    Even if a CA issues a new certificate for amazon.com.
    This is a fundamental weakness of certificates.
  Certificate expiration doesn't help, since typically a year or two.
  Certificate Revocation List (CRL):
    Amazon asks the CA to revoke its certificate.
    CA maintains a list of revoked certificate -- CRL.
    CA server will tell anyone the CRL (and sign it).
    Browser should fetch and check CRL as part of certificate validation.
  This hasn't worked well in practice; will return to revocation later.

How do browsers learn the CA's public key?
  Browsers need it to verify certificates.
  There are dozens of CAs.
  Your browser (or OS) has a list of globally trusted CAs' public keys.
    "Root certificates"

The web certificate system has worked pretty well overall.
  But it has run into problems.
  Some are bugs.
  Some seem like fundamental puzzles.

Bad news: users don't understand the security model.
  Users often don't:
    ... know when to expect/demand a lock icon.
    ... look at lock or URL.
    ... know what DNS host name to expect.
    Thus the success of phishing, e.g. www.g00gle.com
  User concerns often don't match SSL+certificates guarantees.
    "Am I about to give my credit card info to an honest merchant?"
    "Can I trust this web site?"
    In such cases, the lock icon may be *decreasing* security.
  Browsers have often encouraged users to make mistakes.
    "This certificate is not valid, do you want to proceed anyway?"
    Browsers are getting better at not even asking.

EV (Extended Validation) certificates.
  An attempt to certify trustworthyness, not just domain name ownership.
  CA only grants EV certificate to a legitimate representative
    of a reputable business.
  Certificate contains company name and "EV" flag.
  Browsers check EV flag and show a green box next to URL.
    Green box shows the company name, not just the URL.
  Has not been very effective.
    Users don't seem to behave differently with/without EV.
      Difference probably not obvious to users.
    They don't know when to expect the green box.
  Also, EV certificates are expensive.

DV (Domain Validated) certificates are much more common.
  CA validation uses simple technical check for domain ownership.
  E.g. CA asks requester to put a nonce in a file on the server.
    Then the CA retrieves that file, checks nonce.
  The point: only the server owner could create such a file.
  Or: ask requester to create certain DNS records under amazon.com
  The check is often automated; fast, convenient, cheap
  Example DV CA: Let's Encrypt
    Easy to use; free; you don't have to be a real company.
  DV certificates have only low-level DNS ownership guarantee.
    "CA validated that owner of certificate owns amazon.com"
    But the guarantee closely matches what CAs can be expected to
      validate in real life, which is good.
  Standard protocol for validating domain ownership: ACME.
    [ Ref: https://tools.ietf.org/html/rfc8555#section-8.3 ]

DV validation depends on DNS and IP routing working correctly
  But if DNS and IP routing work, what's the point of certificates?
    SSL and certs are intended to defend against attackers
    who *are* able to subvert DNS and IP!
  DV certificates do defend against local WiFi/LAN network tricks.
  Good DV CAs access DNS and web site from multiple vantage points.
    So attacker would have to subvert core IP routing and DNS servers.
    [ Ref: https://letsencrypt.org/2020/02/19/multi-perspective-validation.html ]
  Ultimately there's no 100% verifiable proof of identity or
    domain ownership, so sadly we have to expect potential problems.
  If the result is two certs (legitimate and attacker's),
    there are recent mechanisms to detect.
    Pinning and Certificate Transparency, below.

Bad news: some CAs are not trustworthy
  Your browser trusts dozens (hundreds?) of CAs.
    Why so many? No one entity is globally trusted; competition.
  Each of these CAs can generate a valid certificate for any name!
  E.g. the real amazon.com may have a certificate from CA #1
    But CA #2 can still generate a *different* cert for amazon.com.
    The second certificate is just as valid as the first.
    Behavior is desirable if amazon.com decides to switch CAs.
  What if a CA is sloppy about validating ownership of domain names?
    So attacker can trick it?
  What if attacker steals a CA's private key?
  What if attacker can bribe or blackmail CA or its employees?

In 1996 these CA risks seemed far-fetched
  In 2011, two CAs issued malicious certificates (paper has references).
    E.g. for google.com.
  Some of the certs seem to have been used to intercept gmail traffic.
  It took a while for the bad certificates to emerge.
  Offline nature of certificates means there might be
    bad certifcates out there, and even in use,
    but detecting them is hard unless they're
    widely used.
  These attacks focused attention on defenses against fake certs.

Why is it bad if a CA issues a certificate for gmail.com to me?
  E.g. if I could get a cert, properly signed by a CA,
    for subject name "gmail.com", with a public key whose
    private key I know.
  [diagram]
  If I can arrange to intercept/redirect IP traffic,
    I can pretend to be gmail.com,
    and the browser will believe it,
    and display a lock &c.
  But so what -- user will then see my site, not gmail, right?
  I can then connect to the real gmail.com to get the correct content,
    and display it to the victim users. And collect their passwords,
    forward them to the real gmail, transfer the inbox back &c.
    I get to see all this as plain text, i.e. passwords and e-mail.
    The user sees a correct gmail session.
  This is a "Man-in-the-Middle" attack.

How to cope with fake certs from corrupt or sloppy CAs?
  There are a bunch of solutions, proposed and deployed.

"Key pinning" approaches detect key changes.
  I.e. detect new bogus cert b/c different key than older cert.
  Simplest pinning: Chrome has list of CAs who can issue for google.com.
    This is what caught the DigiNotar compromise of 2011.
    But it's not general.
  Pinning via browser history
    Browser remembers each host's public key the first time you visit.
    Warns you if a host's key changes in the future.
    Would detect a bogus cert if you had previously used the site.
    The paper calls this a Trust On First Use (TOFU) scheme.
    But what if site legitimately switches to a new key?
  Pinning managed by web host server
    HPKP, via Public-Key-Pins HTTP header
    Server specifies its public key, or CA's public key,
      and a period of time the pin should last.
    Pretty good but painful if you make a mistake.
  Pinning via DNS
    DANE stores the public key in DNS directly, no CA needed.
      Or the identity of the CA allowed to issue certs for a domain.
      So a different corrupt CA can't issue bogus cert.
    DANE depends on DNSSEC.
    But DNSSEC not widely used, and most browsers don't support DANE.
    Today's Q: situation where DANE is more effective than pinning
      via client history?

Certificate Transparency (CT) is a different approach
  A public log of all certificates
    Multiple copies, run by different organizations
  CAs must register new certs in the log
  Browsers ask log servers if certificate is in the log
    Reject the certificate if not
  Certificate owners check log for bogus certs for their names
  The browser checks force rogue CAs to reveal bogus certs!
    They can still issue them.
    But the real CA or cert owner will notice in the log.
  Less error-prone for server operators than HPKP pinning
  Chrome requires Certificate Transparency log entry for certificates
    [ Ref: https://github.com/chromium/ct-policy/blob/master/ct_policy.md ]

What if pinning or CT detects a rogue certificate?
  It should be revoked.
  But original CRL scheme doesn't work well enough.
    CA revocation servers are often not reliable.
    Browsers tend to accept cert if can't contact CRL server.
    Browser fetches complete list, which could be big.

Improved revocation schemes.
  Browsers push black-list updates after major breaches.
    [ Ref: https://www.imperialviolet.org/2012/02/05/crlsets.html ]
  OCSP: online certificate status protocol.
    Query whether a single certificate is valid (not the whole list).
    Still too slow and unreliable!
      If browser can't contact server, what should it do?
    And OCSP is a privacy leak.
    No longer widely implemented, in favor of...
  OCSP stapling
    OCSP responses are signed by CA.
    Server sends OCSP response in handshake instead of querying online
    Effectively a short-lived certificate.
    How does the browser know to expect a stapled OCSP response?
      Modern certificates have a "must staple" flag.

Summary
  SSL+certificates are a big win
    Greatly reduce danger from snooping and DNS manipulation
  But getting web certificates right is tricky
    There's no clearly defined ground truth to certify
    Hard to enlist user participation in security
    Can't expect CAs to validate more than DNS name ownership
    CAs are not in practice 100% trustworthy
    Infinite CA scope reduces security to that of worst CA
    Revocation is critical but hard to get right

Other references:
  http://www.imperialviolet.org/2012/07/19/hope9talk.html