Messaging security
==================

Why are we reading this paper?
  Email and messaging security is important in practice.
    Many users expect some degree of security from email and messaging systems.
    Significant activity in this space, both in research and in practice.
  This paper has a good survey of a wide range of messaging schemes.
    Optional paper describes the state of email security.
  Naively seems like a simple problem.
    Just use crypto: sign and encrypt?
  Different meanings of "security" for different users.
  Many interesting techniques that are boradly applicable, not just in messaging.

Example scenario: email security.
                    Sender's org      Recipient's org
                 +---------------+   +---------------+
  Sender -> MUA -|-> MSA -> MTA -|---|-> MTA -> MDA -|-> MUA -> Recipient
                 +---------------+   +---------------+

What security mechanisms are available today?
  SMTP.
  SMTP w/ STARTTLS.
    Not clear what name to expect in certificate.
      RFC 3207 suggests checking for server's name in certificate (end of sec 4.1).
      Server name comes from MX record, which is not the same as recipient's domain.
    Not clear whether to expect a server to support STARTTLS.
      MITM: adversary can pretend that server does not support STARTTLS.
      RFC 3207 suggests outgoing servers might remember who used to support STARTTLS.
      No clear plan here yet.
  SPF: TXT DNS record for sender's domain designates allowed sender servers.
    Goal: make it more difficult for spammers to impersonate sender's address.
  DKIM: message includes signature on selected headers and body parts.
    Key selection via DNS.
    Paper errata: DKIM uses selector._domainkey.example.net (just 1 underscore).
    Requires DNSSEC to provide cryptographic authenticity of keys.
  DMARC: indicates whether recipients should expect a DKIM signature.
    DNS TXT record _dmarc.example.net.
  Authenticated Received Chain (ARC): http://arc-spec.org/
    Sign "Received" headers in an email message.
    Same key discovery algorithm as DKIM.
  DNSSEC can help with all DNS-based key distribution.

Potential security properties that a user might want.
  Confidentiality: headers, contents.
  Integrity: message not modified.
    Two different classes of attackers: network adversary and server adversary.
    Network adversary means encrypting in transit.
    Server adversary means end-to-end encryption (MUA-to-MUA).

  Lifetime: message destroyed after opening, or message cannot be opened
    after some time, or message cannot be opened before some time.

  Authentication: who sent the message?
  Anonymity: recipient cannot tell who sent the message.
  Transport privacy: adversary cannot tell who is sending messages to whom.

  Non-repudiation: recipient can prove sender sent specific message.
  Repudiation/deniability: recipient cannot prove sender sent specific message.

  Proof of submission: sender can prove that message was sent.
  Proof of delivery: sender can know (or prove) that message was received.

  Forward secrecy.
  Backward secrecy.

  Conversation-level properties: ordering, groups, ..

Widely-deployed "email security" achieves relatively few of these properties.
  Difficult to achieve most of these properties in the context of email.

Several "end-to-end" security designs for email.
  PGP.
    Encryption or signing is relatively straightforward:
      Message m.
      To encrypt, generate fresh key K, send E_{PK_recipient}(K) || E_K(m).
      To sign, send Sign_{SK_sender}(H(m)).
    Web of trust.
    PGP key servers to discover keys.
  S/MIME.
    Certificates similar to what's used in TLS.
    Works in enterprise setting, like Kerberos: central cert directory.

Many reasons why email security is complicated.
  Open system, no account registration, anyone can send email.
    Key establishment is difficult.
    No centralized servers.
  Redirection (e.g., nz@mit.edu forwards to nickolai.zeldovich@gmail.com).
  Mailing lists that forward to many recipients, dynamic membership.
  Intermediate servers might be offline.
  Spam filtering difficult over encrypted email.
  Large existing deployment that requires compatibility.
    E.g., archive of all email messages for legal reasons.
    E.g., scan email messages for social security numbers to prevent leaks.

Messaging systems have been effective at experimenting with new designs.
  Can start with a fresh design, avoid email's existing constraints.
  Don't need to be fully general: email already exists as a fallback.
  Users willing to experiment with new messaging systems.

Main technical issues for secure messaging:
  Establish trust (key exchange).
  Conversation protocol (how to encrypt messages).
  Transport privacy (how to transmit encrypted messages).

Baseline design is like highly centralized email (e.g., Skype, Hangouts, ...).
  Clients talk to single server.
  TLS encryption over the network.
  Server sees plaintext of all messages, fully trusted.

Trust establishment (Table I).

Opportunistic.
  Don't worry about active network adversaries.

TOFU: SSH-style, assume first connection was OK.
  Easy to deploy, some degree of MITM-resistance.
  Can be difficult to change keys (e.g., new device).

Out-of-band exchange: scan QR code on friend's phone.
  Could generalize to a web-of-trust scheme like PGP.

Verify keys after initial exchange.
  Strawman: show a short 32-bit hash, H(PK_A || PK_B), to both users.
    Easy for MITM adversary to choose PK_Adversary to have chosen 32-bit hash.
    In effect, adversary can make 2^32 offline guesses for a good PK_Adversary.
  Typical solution: commit to some randomness first, then reveal.
    E.g., A -> B: PK_A, H(R_A)
          B -> A: PK_B, H(R_B)
          A -> B: R_A
          B -> A: R_B
    Now both user devices display H(PK_A || PK_B || R_A || R_B).
    Adversary does not know R_A / R_B until already committed to their H(R).
  How to check if the short hashes are the same?
    Voice authentication, for audio communication.  Paper argues not so strong.
    Compare if sitting next to each other in person.

Use an existing secret value (known to only A and B).
  Paper mentions zero-knowledge protocols for this.
  Similar techniques to password authentication (e.g., SRP).
  Need to ensure adversary only gets one chance to guess, as with verification above.

Central server.
  Basically a certificate authority.
  Risk: adversary can compromise server, hand out the adversary's public key.
  One possible solution: auditing / transparency.
    Server publishes a log of all keys it gave out.
    Log structure makes it difficult for server to truncate or fork this log.
    Clients make sure any key they got from a server is in the log.

Blockchain.
  Roughly speaking, just the log technique from the above, no central server.
  Benefit: no single trusted server.
  Downside: cannot mirror existing names; only first-come-first-served.

Many combinations of the above possible in practice.

Conversation protocol (Table II).
  Main sources of complication:
    Forward secrecy.
      Compromised keys do not reveal past conversations.
    Backward secrecy.
      Compromised keys do not reveal future conversations (with passive attacker).
      Why is this useful?  Why assume a passive attacker?
      Possible scenario: adversary gets snapshot of device memory at a later time.
        E.g., swap partition, core dump, etc.
      Want to limit exposure due to disclosed keys.
    Deniability.
    Group chat makes everything harder (consistency, deniability, etc).

Forward secrecy.
  Hard to achieve non-interactively (e.g., just in an email setting).
    [ Sort-of possible using IBE; see https://eprint.iacr.org/2003/083.pdf ]
  Similar to strawman we discussed in TLS lecture.
  Recipient generates some temporary public/private key pairs.
    Private key: r
    Public key: g^r
  Publishes public-key parts (g^r) on some server.
  Sender queries server for one of these public keys, uses it to encrypt message.
    Generate a similar key pair for the sender: s, g^s.
    Compute g^rs, use it to encrypt message.
    Send encrypted message along with g^s.
  Recipient decrypts messages, immediately deletes corresponding private key r.
  How to authenticate these ephemeral public keys?
    One solution: sign them with user's long-term key.

       A's long-term key              B's long-term key
              |                              |
            SIGN                           SIGN
              |                              |
              V                              V
    A's conversation key <-- DH --> B's conversation key

Backward secrecy.
  Strawman: keep repeating the forward-secrecy algorithm.
  Users continuously generate new public/private key pairs.
  Send the public key to conversation peers.
  Using new public key ensures new message cannot be decrypted with old state.
  Old state did not contain corresponding private key (was freshly generated).

Paper refers to more efficient constructions for achieving forward/backward secrecy.

Deniability.
  Seemingly conflicting goals: authentication and deniability.
    B wants to know he is talking to A.
    A does not want B to have cryptographic proof that A sent a message
      (or even talked to B).
  Main idea: make it possible to manufacture any conversation transcript.
    Do not sign messages (like signing the conversation key).
    Use MACs (symmetric-key authentication).
  [ Ref: https://whispersystems.org/blog/simplifying-otr-deniability/ ]
  "3DH" protocol.
    Compute three DH handshakes:
      (A's long-term key) * (B's conversation key)
      (B's long-term key) * (A's conversation key)
      (A's conversation key) * (B's conversation key)
    Then combine the 3 resulting keys into a single key (by hashing together).
    Messages encrypted and authenticated (with a MAC) using that key.
  What if someone wanted to fabricate a conversation between A and B?
    Choose conversation keys (so, private key is known).
    Compute the 3 DH keys (just need A's and B's long-term public keys).
    Compute the resulting key, use it to encrypt and MAC the fabricated message.

Transport privacy.
  Sending packets over the Internet reveals communication patterns.
  What are the possible ways to achieve privacy for message transport?

Store messages on a central server (e.g., email, Skype, ..).
  If server is not compromised, some degree of privacy.
  Amenable to traffic analysis if message is immediately relayed to someone else.
  Distributed version: store messages on peer-to-peer nodes.
  Maybe harder to compromise specific peer-to-peer node (or maybe easier?).

Broadcast.
  Straightforward, strong privacy.
  High bandwidth costs.

Onion routing: Tor.
  Many relay servers, encrypt messages for each server in turn.
  Need one honest server in the chain.
  Hidden services for message recipients.
  Much more about Tor in next lecture.

Mixnets.
  Synchronous.
  [ Ref: http://people.csail.mit.edu/nickolai/papers/vandenhooff-vuvuzela.pdf ]

PIR.
  Cryptographic construction: private query of a database.
  [ Ref: https://www.usenix.org/conference/osdi16/technical-sessions/presentation/angel ]

Dining cryptographer networks (DC-nets).
  Simple example: N parties, someone wants to send a single bit.
  Every pair of users computes a shared secret bit, K_ij.
  Every user u broadcasts K_u1 XOR K_u2 XOR .. XOR K_uN XOR msg.
    msg is either the message (from one user) or 0 (from all other users).
  Finally, compute XOR of everyone's broadcast value.
    Result should be just the anonymous message.
  Cool construction, strong privacy, interesting problem in scaling to many users.
    [ Ref: http://dedis.cs.yale.edu/dissent/ ]

Summary.
  Messaging security is an active area of research and development.
  Interesting combination of cryptography, system design.
  Seems unlikely a single solution will achieve all goals.
    Conflicting requirements, trade-offs.
  Techniques are broader than just messaging (trust establishment, deniability, ..).