Messaging security
==================

Why are we reading this paper?
  Messaging security is important in practice.
  A lot is known about how to provide security.
    In part due to how constrained the problem is.
  But there are also many open problems, and many new ideas.
  Still not a solved problem.
    [ Ref: https://citizenlab.ca/2020/04/move-fast-roll-your-own-crypto-a-quick-look-at-the-confidentiality-of-zoom-meetings/ ]
  This paper has a good survey of a wide range of messaging schemes.

Main security properties:
  Suppose Alice is sending message M to Bob.
  Confidential: only Bob (and Alice) can read M.
  Authentic: Bob can be sure Alice sent M.
  (there are many more properties one might care about!)

First, what about e-mail security?
  E-mail is so old and important you might expect it to be very secure.

Email paths are complex with many trusted entities, e.g.
  alice (user+PC)
  gmail
  list-server, with bob@ibm.com
  IBM, redirects to bob@mit.edu
  MIT, CSAIL
  spam filter
  message archive
  bob (user+PC)
  Plus DNS at each step.
  Often 10 or 20 hops!

Historically there was no cryptographic security anywhere in the e-mail system.
  But security is being added, as options, bit by bit.
  Progress is hard due to the vast amount of existing software.

Deployed e-mail security is generally "hop-by-hop".
  That is, individual user/server and server/interactions are secured.
  Example: POP or IMAP over TLS, with user password.
    Prevents snooping and theft from inboxes.
  Example: DKIM signature by source organiziation.
    Allows e.g. IBM to check that e-mail really came from MIT.
    Mostly useful as a factor in spam decision, not user authentication.

Opportunistic hop-by-hop security doesn't compose into user-level security.
  Suppose e-mail goes through 10 servers.
  Can we be sure that all hops used TLS, instead of falling back to plain SMTP?
  Each server sees the message plain-text, perhaps could change it.
  Can recipient trust authenticity -- really from From: line?
  Or confidentiality -- only sender and recipient could read it?
  Are all 10 servers:
    Properly configured?
    Run by trustworthy staff?
    Latest security patches installed?
  Maybe.

"End-to-end" security is potentially much more believable.
  Designs whose security depends only on ultimate src/dst users and devices.
  E.g. sending user signs; receiving user checks signature.
  Can transmit via network and servers w/o trust (other than liveness).
  Bullet-proof end-to-end security is not easy, as we'll see.

Lots of work on messaging systems with end-to-end security.
  Typically new chat/message services, not e-mail, to allow design flexibility.
  e.g. WhatsApp, Apple iMessage, Keybase chat

Paper's framework for main technical issues in end-to-end message security:
  Establish trust (key exchange).
  Conversation protocol (how to encrypt messages).
  Transport privacy (how to transmit encrypted messages).

First, a review of a few basic cryptographic ideas.

Cryptographic primitives
  m = plain text message
  h = Hash(m)
  tag = MAC(key, m) -- symmetric, essentially Hash(key||m)
  c = E(key, m) -- symmetric encryption
  m = D(key, c) -- symmetric decryption
  c = E(PK_Bob, m) -- Alice encrypts with Bob's public key
  m = D(SK_Bob, c) -- Bob decrypts with his private key
  sig = Sign(SK_Alice, m) -- Alice signs with her private key
  ok = Verify(PK_Alice, m, sig) -- Bob checks signature with Alice's public key

Secure single message, Alice -> Bob, simplified PGP.
  "Secure" = confidential + authentic
  assume they know each other's public keys (PK_Bob, PK_Alice)
  c = E(PK_Bob, m)
  s = Sign(SK_Alice, c)
  Alice sends c, s to Bob
  Bob checks Verify(PK_Alice, c, s), then D(SK_Bob, c)
  confidential?
  authentic?
    maybe sign m rather than c
    or include A's name as part of m
  what if attacker replays?
  puzzle:
    Alice and Bob need to get each other's public key

Often you want "synchronous" communication.
  For interactive, voice/video, or b/c security scheme needs back-and-forth.
  Downside: takes a while unless both users are online at the same time.
  Thus -- secure channel.

Secure channel, Alice<->Bob, much-simplified version of TLS.
  "Secure" = confidential + authentic
  Use public keys to establish symmetric session keys.
  A -> B: E(PK_B, "A" || "B" || randomA), Sign(SK_A, ...)
  A <- B: E(PK_A, "A" || "B" || randomB), Sign(SK_B, ...)
  now A and B share a secret : randomA || randomB
  both can compute a set of keys Kx = Hash(randomA || randomB || x)
  A -> B: E(K1, M), MAC(K2, M)
  confidential?
  authentic?
  what if attacker replays?
  why include identities "A" and "B"?
  puzzle: Alice and Bob need to get each other's public key

Trust establishment (Table I).
  I.e., schemes for Alice and Bob to find each other's public key

Opportunistic encryption.
  Each party sends its public key over the network to the other.
  E.g. secure channel starts with
       A -> B: PK_A
       A <- B: PK_B
       ...
  When does this help security?
  When doesn't it?

Trust on First Use (TOFU)
  Assume first connection was OK, and remember ("pin") other user's public key.
  SSH uses TOFU, as do many messaging systems.
  + Convenient -- no user action required.
  + Strong defense against a new attack, or passive eavesdropper.
  - Not useful against long-term active attacker.
  - What if Bob changes key -- gets new device, or loses old one?
  TOFU is pretty good if you talk to a fixed small set of people.
    And if they don't change keys often.
  The easier it is to accept a changed key, the less secure TOFU is.
    An attack would likely appear as "Bob is using a different key; is that OK?"

Out-of-band exchange:
  Exchange public keys in person in advance, perhaps via QR code.
  Or verify, afterwards, that TOFU pinned key is correct, in person.
  Not so secure if exchange is by phone or e-mail!
  WhatsApp/Signal supports this.
  + Secure.
  + End-to-end, security depends only on users and their devices.
  - It's a pain, people often don't bother.
  - Need to re-do whenever anyone changes keys (loses a device, gets a new one).
  Can be made mandatory, but then most people won't use the system.
  Thus often falls back to TOFU, which is OK.

Some larger points:
  Convenience is very important.
    In practice we can demand little patience or security expertise from users.
    If 100% secure means no-one uses, then actual security is 0%.
    Table 1 has security and useability columns.
      Few schemes score well in both sets of columns!
  Recovery (lost device/key) is particularly challenging.
    Alternative plan to recover key -- could be a target for attackers.
    Generate new key -- how to convince other users the new key is legitimate?
  Useability solutions are often systems-level.
    e.g. TOFU and key pinning.
    A combination of technical ideas (crypto) and system architecture.

Can we have a central server tell Alice what Bob's public key is?
  To avoid Alice and Bob having to exchange public keys in person.
  To increase useability!

Central key server example: Apple's iMessage.
  Alice's device ; Apple server ; Bob's device.
  Registration:
    User has ID and password (iCloud).
    Messaging app has a public/private key pair.
    User registers public key w/ Apple server.
    Only the device itself knows the private key.
  When Alice sends to Bob:
    Alice asks Apple server for Bob's public key.
    Alice encrypts msg with Bob's key, signs, sends to Apple server.
    When Bob is active, he retrieves, decrypts.
    Bob asks Apple for Alice's public key to check her signature.
  TLS secures communication with Apple servers.
  + Convenient, little user involvement
  + Apple doesn't know private keys, can't directly read or forge messages.
  + Handles lost/new devices well (re-register w/ password).
  - Password opens up various attacks.
  - Disaster if Apple is corrupt: could return Eve's public key, not Bob's.
  Not perfect, but the convenience is enormously important to
    people actually using secure chat!

Key Transparency -- detect a corrupt key server or unauthorized key change.
  i.e. use a key service for useability, w/o losing end-to-end security.
  A recent proposal from Google.
  The basic idea:
    Require key server to maintain a public log of all key updates.
    Key owners periodically check that their key is correct in the log.
    Key requesters check that result matches latest update in the log.
  The log might look like
    Bob added PK_x.
    Alice added PK_y.
    Bob deleted PK_z.
  If a corrupt key server wants to return Eve's key when Alice asks about Bob:
    Either server must put Bob->PK_Eve in the log,
      and Bob will see it and complain.
    Or server doesn't change the log, and returns PK_Eve anyway,
      and Alice will see that Bob->PK_Eve is not in the log, and complain.
  Key Transparency reveals dishonesty, but it doesn't enforce it.
  Many interesting technical details:
    Efficient lookups and validity checks (linear scans of a log are too slow).
    Preserve privacy.
    Prevent server from "equivocating" -- forking the log:
      Maintaining two different logs, and showing
      one to Alice, and a different one to Bob.
      Different users compare notes to ensure they see the same log.
  Related to block chains.
  And to 6.858 Lab 5.

Keybase -- a different plan to reduce trust in a key server.
  Allow clients to check server's name/key claims.
  To defend against a corrupt key server.
  Server holds, for each name like "Alice":
    PK_Alice (only Alice's devices have private key)
    identity records linking Alice to her account names on other services
      e.g. Sign(PK_Alice, "I am alice177 on twitter")
  For each identity record,
    Alice must put signed statement visible in that other account.
    E.g. tweet it (encoded in ascii), post it on github, &c.
  How do clients use this?
    Bob asks Keybase server for info about Alice.
    Keybase sends PK_Alice and Alice's signed identity proofs.
    Bob's client fetches proof posts from twitter &c.
    Bob's client checks all proofs are signed with PK_Alice.
    Client presents other names to Bob, asks "is this the Alice you meant?"
  How does this defend against Keybase returning PK_Eve instead of PK_Alice?
    Keybase can't forge identity records linking PK_Eve to alice177 on twitter.
    So it must return nothing, or identity proofs about Eve.
    Bob or his client will (hopefully) notice.
  What attack might succeed?
    Attacker gets control of Alice's twitter &c accounts (e.g. guesses password).
  Keybase clients pin others' public keys, so Bob only has to make these
    checks the first time he talks to Alice.
  Keybase securely updates pins:
    If Alice legitimately changes her public key (or adds a new device),
    she signs the new public key with her old SK_Alice,
    and posts to Keybase server a signed "add key" record.
  Now Bob can ask Keybase for recent "add key" records for Alice,
    and believe them if they are signed by the PK_Alice that
    Bob has pinned.
  Is it convenient?
    Mostly invisible to users once after initial setup.
    Users have to pay attention to "is this the Alice you meant?" questions.
    Users have to revoke keys for lost devices.
    Users better not lose control of other "proof" accounts.
  Keybase is the most convincing key server scheme I know of.

Conversation security (Table II).
  Protecting messages once the user devices got the right keys.
  Some goals are challenging to achieve together.
    Authenticity vs. deniability.
    Confidentiality vs. reporting spam or harrassment.
      [ Ref: Message franking, https://eprint.iacr.org/2017/664.pdf ]
  Many issues with group communication.
  Will look at forward secrecy, deniability.

Forward secrecy.
  We saw this briefly in the SSL lecture.
  Definition:
    If Alice has a secret chat with Bob today,
    and the attacker records the (encrypted) packets,
    and tomorrow the attacker breaks into their computers and
    steals their private keys, 
    the attacker can't decrypt the recorded packets.
  Do my single-message and channel protocols have forward secrecy?
  Basic plan:
    Bob wants to send Alice message M with forward secrecy.
    Alice generates temporary public/private key pair PK_temp, SK_temp.
    A -> B: PK_temp
    A <- C: E(PK_temp, M)
    Alice decrypts C with SK_temp.
    Alice discards SK_temp (overwrites all copies with junk).
  Usually combined with a secure channel scheme like TLS to
    provide properties such as authenticity, using the long-term
    public keys PK_Alice and PK_Bob.
  Why can't the attacker later decrypt recorded E(PK_temp,M)?
    The attacker needs SK_temp.
    Not on Alice or Bob's computer, so useless for attacker to break in.
    Never sent over the network.
  Used in some TLS modes, WhatsApp/Signal.
  See the paper for references to more serious schemes.

Deniable authenticity.
  Seemingly conflicting goals: authentication and deniability.
    Alice wants to prove to Bob that she sent M.
    Alice wants to prevent Bob from proving to others that Alice sent M.
      Or even that Alice talked to Bob at all.
  Do my single-message (PGP) and channel (TLS) protocols have deniability?
  Main idea:
    Alice does not sign anything.
    Use a MAC instead -- MAC is symmetric, so anything
      Alice MAC'd, Bob could also have MAC'd.
  A simple deniable authenticity protocol, for Alice to send M to Bob.
    [ Krawczyk's SKEME, 1996 ]
    Bob chooses random key K.
    A <- B: E(PK_Alice, K)
    Alice decrypts K.
    A -> B: M, MAC(K, M)
    Alice publishes K.
  Afterwards, anyone could produce these message, so no proof it was Alice.
    Everyone now knows K.
    Anyone can produce E(PK_Alice, K).
    Anyone who knows M can also produce MAC(K, M).
  OTR (and WhatsApp/Signal) are deniable with a more involved scheme.

Transport privacy (Table III).
  Protecting metadata: sender, recipient.
  Costly to protect: most systems do not bother.

Centralized server.
  Adversary observes clients communicating to server via TLS.
  Timing of messages might reveal communication patterns.

Onion routing: Tor.
  Relay messages through many servers.
  Somewhat better than central server.
  Global adversary might still be able to trace message timing.
  Will talk more about Tor in 2 weeks.

Could do better, but costly.
  Broadcast.
  Mixnets: process large batches of messages at the same time.

Summary.
  Messaging security is an active area of research and development.
  Interesting combination of cryptography, system design.
  Seems unlikely a single solution will achieve all goals.
    Many potential useability / security trade-off points.