Distributed ledgers
===================

Powerful tool: distributed ledger
  Append-only log
  Anyone can verify entries in this log
  Anyone can verify that the log is really append-only

Large distributed ledger deployment: Certificate Transparency
  Briefly touched on CT in the certificates lecture
  Initial impetus from CA mistakes
    E.g., one CA issued "test" certs for Google servers on real CA
    Would like to make it easier to detect "mis-issued" certificates

CT parties: server operator, CA, log operator, browser
  Server operator asks CA for a certificate
  CA signs certificate, but also sends a copy to the log operator
  Log operator gives back a signed receipt (timestamp)
  Server operator gets certificate + log receipt, sends both to browser
  Browser requires valid log receipt to trust certificate
    Has a list of acceptable log operators (public keys)
    [ Ref: http://www.certificate-transparency.org/known-logs ]

Why is this useful?
  Server operator monitors log for mis-issued certificates
    Contact CA if they mis-issued certificate for that server
    If CA is not cooperative, contact browser with proof that CA is bad
  Browsers require signed receipts from a log operator
    Ensures that if some cert isn't logged, browsers won't even accept it
    Otherwise, adversary that gets a mis-issued cert will just avoid logging it
    Gives server operator confidence that log contains all interesting certs

How do we know the log operator is trustworthy?
  Don't want to just replace trusted CAs with trusted log operator..

Auditing log operators: Merkle tree of the log
  [ Ref: https://tools.ietf.org/html/rfc6962 ]
  Log operator maintains an append-only timestamped log of all certificates
  Receipt timestamp specifies the time at which the cert should appear in log
  Merkle tree used to commit to the log entries
  Example Merkle trees of logs (each log entry is cert + timestamp):
    a, b, c, d, e
    a, b, c, d, e, f, g

New party in the certificate transparency world: the monitor
  Anyone can audit any log operator
  Monitor just needs to store the latest Merkle root hash from log operator
  To check that a receipt is valid, request a Merkle audit path
    Cert w/ correct timestamp must hash up to the Merkle root
    Reasonably efficient: O(ln2(total #certs))
  To check that the log is append-only, follow along with the log
    When new certs are added, make sure that:
    1. Certs are in increasing timestamp order
      Ensures server doesn't back-date inclusion of a cert timestamp
    2. New Merkle root hash corresponds to old log + new entries
      Reasonably efficient with Merkle trees -- also O(ln2(total #certs))
  If either check fails, have cryptographic proof that log operator misbehaved
  Receipt checking is somewhat costly (and leaks what sites user is visiting)
    Most browsers don't check in real-time
    Some extensions "donate receipts" to monitor that checks on user's behalf

Keybase has a similar structure to Certificate Transparency
  Merkle tree of all statements from users, signed by Keybase
  Can be audited to make sure nothing is removed or altered
  Allows users to have more confidence Keybase server didn't misbehave

One remaining problem: forking of logs
  What if CT monitor sees one log, and server operator sees another log?
    Suppose CA issues two certs: one legit and one to an adversary
    Server operator sees a log with that server's legitimate cert
    CT monitor sees log with malicious cert that matches receipt from browser
    How will misbehavior be detected?
  CT suggests some out-of-band exchange of root hashes between monitors
    Would detect forking, if monitors know each other and can exchange hashes

Hard problem: consensus
  Ensuring there's only one head of the log
    Same problem faced by Certificate Transparency, Keybase, ..
  If we had a clear answer, could remove much more trust from servers
    Anyone could run a Certificate Transparency log operator, Keybase server, ..
  This is the problem that Bitcoin solves
  Aside 1: Keybase leverages Bitcoin to prevent forks
    Put latest Keybase Merkle tree hash in the Bitcoin log every day
    Clients that want to guard against forking can get root hash from Bitcoin
  Aside 2: Namecoin is very roughly a serverless variant of Keybase

Bitcoin: decentralized ledger
  Log of cryptocurrency transactions (mostly signed by users' private keys)
  No centralized server (like the log operator in Certificate Transparency)
  Must determine what's the head of the log/ledger

Bitcoin block header contents (most interesting parts):
  Hash of previous block header
  Root of Merkle tree of transactions in that block
  Nonce chosen so that H(block header) is small
    This allows for miners to repeatedly hash different nonces
    "Proof-of-work" part of Bitcoin's design

Determining the head of the ledger: longest chain rule.
  "Nakamoto consensus"
  Probabilistic guarantee, but a very surprising result when it came out!
    Need to wait for several blocks to increase confidence in txn confirmation
  Requires the network to be available
    Network disconnection leads to forks
  Complex interaction with incentives due to block reward (coinbase txn)
  Rough intuition: participants don't have incentive to use shorter chain
    But turns out to be not fully precise -- see paper, selfish mining

Confirming a transaction: find block header in list, then Merkle proof
  Bitcoin SPV (simplified payment verification) mode
  Client computer tracks longest chain of headers, which are relatively small
  Then request Merkle proof for transaction the client wants to confirm

Ensuring the ledger is append-only and adds correct blocks
  Only new blocks can be added; old blocks are immutable
  Need to check that transactions appended by new block are valid
  Requires some state -- all unspent transactions (UTXO)
    ~74M transactions, as of May 2019
    * 256 bytes for txid (+ 4 bytes for output ID)
    = ~20 GBytes of state
  Smaller than the full blockchain history (~200 GBytes)

Big challenge: proof-of-work is resource-intensive
  What are the alternatives to proof-of-work for consensus?
  Difficult in an open system like Bitcoin
  Strawman: collect votes for the next block?
  Not secure due to Sybil attacks: adversary can create many identities to vote
  Need to assign weights to participants in some way to prevent Sybil attacks
  Can think of Bitcoin as weighing participants by compute power

Consensus scheme: designated servers
  Pick some set of servers to be in charge of consensus on head of ledger
  Run a voting protocol between these well-known servers
    Several Byzantine fault tolerance (BFT) protocols could work
  Can evolve the list of designated servers over time
    Changing the list of servers can be one of the entries in the ledger
  Some example systems work in this way
    EOS has 21 designated servers
    Stellar's SCP can be roughly thought of as a very flexible variant of this
    Some hybrid designs that combine Nakamoto consensus w/ BFT (e.g., ByzCoin)

Consensus scheme: "bonded proof-of-stake"
  Intuition: assign weights to participants based on their money
    Particularly applicable in cryptocurrency settings that have money
  Strawman: choose a random coin, and whoever has that coin decides next block
  Q1: where does this randomness come from, in a decentralized system?
    Could be pseudo-randomness from blockchain itself (hash of blocks)
    Need to make sure coin owner can't position his coin to "get lucky"
    Usually means consensus protocol should use coin ownership from
      some time before the randomness is determined
    Possible problem: participant may have given away the coin, and doesn't
      care what happens to this blockchain anymore
  Q2: what if that coin is owned by a participant that's offline right now?
    Several participants can choose the next block, not just one participant
    Keep the longest-chain rule to resolve conflicts
    Need to work out some details to prevent frequent collisions
      Introduce something like proof-of-work but with fewer hash attempts
      Participants have a small number of hashes to try
      Target value (hash difficulty) depends on their stake (number of coins)
  Q3: what if a user with many coins forks the system?
    Many coins means relatively high probability of winning "proof-of-work"
    Possibly easy to generate two conflicting blocks at the same time
    Typical solution: require the coins to be "bonded"
    Bonded coins forfeited if participant misbehaves (proposes diff blocks)
      Of course, requires someone to catch misbehavior (differing blocks)
  Early example proof-of-stake systems:
    Peercoin (no bonding)
    Tendermint (bonding)
  Peercoin still uses proof-of-work to distribute coins to participants
    Important bootstrapping problem for cryptocurrencies
    In some way, this means everything still boils down to compute power
      because that was the initial stake distribution..

Consensus scheme: "pure proof-of-stake"
  Different way to solve Q2/Q3 from bonded proof-of-stake.
  Don't use longest-chain rule or penalize misbehaving participants
  Instead, require votes to confirm the tentative next block
  Logically, every coin gets a vote, so avoids Sybil attacks
  Impractical to collect votes from every coin, so use random sampling
    Can even do this random sampling "privately" using a VRF
  [ Aside: interesting problem w/ voting-based systems: forward security ]
  Example system based on this design: Algorand

Big challenge: joining the system requires downloading lots of data
  Pretty much every decentralized ledger requires downloading all blocks,
    and maintaining a large amount of state to verify future blocks
  Would like to join the system quickly, without downloading all blocks
  Would like to participate without maintaining a large amount of state
    Sharding could reduce security: only subset of nodes can verify txns
  Still an open problem in general; some possible approaches here and there
    [ Ref: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_09-2_Leung_paper.pdf ]

Big challenge: finding important applications / use cases
  Many possible technical challenges to address
  Which ones matter depends on what application matters
  No clear big winning use cases yet, though

Challenge: supporting a large number of transactions
  Some workarounds, like payment channels
    [ Ref: https://lightning.network/lightning-network-paper.pdf ]
  Some form of sharding seems inevitable for high on-chain volume

Challenge: operations that span different ledgers (cross-chain transactions)
  Some clever tricks using Hashed Time-Locked Contracts (HTLC)
    https://en.bitcoin.it/wiki/Atomic_swap
  May want to have stronger support, though

Challenge: privacy for transactions
  Most distributed ledgers make all contents public
  Some work on supporting "encrypted transactions"
    [ Ref: https://z.cash/ ]

Operational challenges for decentralized ledgers
  Monitoring, availability, fixes, ..
  Agreeing on updates, governance
    Having many participants makes it time-consuming to agree on changes
    Tangentially related example: Signal not federated to speed updates

Operational challenges for end-users
  Users now in charge of managing critical cryptographic keys
  Many examples of mistakes: users have improper backups, client-side
    software is buggy, client device is compromised, users fall for
    phishing attacks, ..