Distributed ledgers =================== Powerful tool: distributed ledger Append-only log Anyone can verify entries in this log Anyone can verify that the log is really append-only Large distributed ledger deployment: Certificate Transparency Briefly touched on CT in the certificates lecture Initial impetus from CA mistakes E.g., one CA issued "test" certs for Google servers on real CA Would like to make it easier to detect "mis-issued" certificates CT parties: server operator, CA, log operator, browser Server operator asks CA for a certificate CA signs certificate, but also sends a copy to the log operator Log operator gives back a signed receipt (timestamp) Server operator gets certificate + log receipt, sends both to browser Browser requires valid log receipt to trust certificate Has a list of acceptable log operators (public keys) [ Ref: http://www.certificate-transparency.org/known-logs ] Why is this useful? Server operator monitors log for mis-issued certificates Contact CA if they mis-issued certificate for that server If CA is not cooperative, contact browser with proof that CA is bad Browsers require signed receipts from a log operator Ensures that if some cert isn't logged, browsers won't even accept it Otherwise, adversary that gets a mis-issued cert will just avoid logging it Gives server operator confidence that log contains all interesting certs How do we know the log operator is trustworthy? Don't want to just replace trusted CAs with trusted log operator.. Auditing log operators: Merkle tree of the log [ Ref: https://tools.ietf.org/html/rfc6962 ] Log operator maintains an append-only timestamped log of all certificates Receipt timestamp specifies the time at which the cert should appear in log Merkle tree used to commit to the log entries Example Merkle trees of logs (each log entry is cert + timestamp): a, b, c, d, e a, b, c, d, e, f, g New party in the certificate transparency world: the monitor Anyone can audit any log operator Monitor just needs to store the latest Merkle root hash from log operator To check that a receipt is valid, request a Merkle audit path Cert w/ correct timestamp must hash up to the Merkle root Reasonably efficient: O(ln2(total #certs)) To check that the log is append-only, follow along with the log When new certs are added, make sure that: 1. Certs are in increasing timestamp order Ensures server doesn't back-date inclusion of a cert timestamp 2. New Merkle root hash corresponds to old log + new entries Reasonably efficient with Merkle trees -- also O(ln2(total #certs)) If either check fails, have cryptographic proof that log operator misbehaved Receipt checking is somewhat costly (and leaks what sites user is visiting) Most browsers don't check in real-time Some extensions "donate receipts" to monitor that checks on user's behalf Keybase has a similar structure to Certificate Transparency Merkle tree of all statements from users, signed by Keybase Can be audited to make sure nothing is removed or altered Allows users to have more confidence Keybase server didn't misbehave One remaining problem: forking of logs What if CT monitor sees one log, and server operator sees another log? Suppose CA issues two certs: one legit and one to an adversary Server operator sees a log with that server's legitimate cert CT monitor sees log with malicious cert that matches receipt from browser How will misbehavior be detected? CT suggests some out-of-band exchange of root hashes between monitors Would detect forking, if monitors know each other and can exchange hashes Hard problem: consensus Ensuring there's only one head of the log Same problem faced by Certificate Transparency, Keybase, .. If we had a clear answer, could remove much more trust from servers Anyone could run a Certificate Transparency log operator, Keybase server, .. This is the problem that Bitcoin solves Aside 1: Keybase leverages Bitcoin to prevent forks Put latest Keybase Merkle tree hash in the Bitcoin log every day Clients that want to guard against forking can get root hash from Bitcoin Aside 2: Namecoin is very roughly a serverless variant of Keybase Bitcoin: decentralized ledger Log of cryptocurrency transactions (mostly signed by users' private keys) No centralized server (like the log operator in Certificate Transparency) Must determine what's the head of the log/ledger Bitcoin block header contents (most interesting parts): Hash of previous block header Root of Merkle tree of transactions in that block Nonce chosen so that H(block header) is small This allows for miners to repeatedly hash different nonces "Proof-of-work" part of Bitcoin's design Determining the head of the ledger: longest chain rule. "Nakamoto consensus" Probabilistic guarantee, but a very surprising result when it came out! Need to wait for several blocks to increase confidence in txn confirmation Requires the network to be available Network disconnection leads to forks Complex interaction with incentives due to block reward (coinbase txn) Rough intuition: participants don't have incentive to use shorter chain But turns out to be not fully precise -- see paper, selfish mining Confirming a transaction: find block header in list, then Merkle proof Bitcoin SPV (simplified payment verification) mode Client computer tracks longest chain of headers, which are relatively small Then request Merkle proof for transaction the client wants to confirm Ensuring the ledger is append-only and adds correct blocks Only new blocks can be added; old blocks are immutable Need to check that transactions appended by new block are valid Requires some state -- all unspent transactions (UTXO) ~74M transactions, as of May 2019 * 256 bytes for txid (+ 4 bytes for output ID) = ~20 GBytes of state Smaller than the full blockchain history (~200 GBytes) Big challenge: proof-of-work is resource-intensive What are the alternatives to proof-of-work for consensus? Difficult in an open system like Bitcoin Strawman: collect votes for the next block? Not secure due to Sybil attacks: adversary can create many identities to vote Need to assign weights to participants in some way to prevent Sybil attacks Can think of Bitcoin as weighing participants by compute power Consensus scheme: designated servers Pick some set of servers to be in charge of consensus on head of ledger Run a voting protocol between these well-known servers Several Byzantine fault tolerance (BFT) protocols could work Can evolve the list of designated servers over time Changing the list of servers can be one of the entries in the ledger Some example systems work in this way EOS has 21 designated servers Stellar's SCP can be roughly thought of as a very flexible variant of this Some hybrid designs that combine Nakamoto consensus w/ BFT (e.g., ByzCoin) Consensus scheme: "bonded proof-of-stake" Intuition: assign weights to participants based on their money Particularly applicable in cryptocurrency settings that have money Strawman: choose a random coin, and whoever has that coin decides next block Q1: where does this randomness come from, in a decentralized system? Could be pseudo-randomness from blockchain itself (hash of blocks) Need to make sure coin owner can't position his coin to "get lucky" Usually means consensus protocol should use coin ownership from some time before the randomness is determined Possible problem: participant may have given away the coin, and doesn't care what happens to this blockchain anymore Q2: what if that coin is owned by a participant that's offline right now? Several participants can choose the next block, not just one participant Keep the longest-chain rule to resolve conflicts Need to work out some details to prevent frequent collisions Introduce something like proof-of-work but with fewer hash attempts Participants have a small number of hashes to try Target value (hash difficulty) depends on their stake (number of coins) Q3: what if a user with many coins forks the system? Many coins means relatively high probability of winning "proof-of-work" Possibly easy to generate two conflicting blocks at the same time Typical solution: require the coins to be "bonded" Bonded coins forfeited if participant misbehaves (proposes diff blocks) Of course, requires someone to catch misbehavior (differing blocks) Early example proof-of-stake systems: Peercoin (no bonding) Tendermint (bonding) Peercoin still uses proof-of-work to distribute coins to participants Important bootstrapping problem for cryptocurrencies In some way, this means everything still boils down to compute power because that was the initial stake distribution.. Consensus scheme: "pure proof-of-stake" Different way to solve Q2/Q3 from bonded proof-of-stake. Don't use longest-chain rule or penalize misbehaving participants Instead, require votes to confirm the tentative next block Logically, every coin gets a vote, so avoids Sybil attacks Impractical to collect votes from every coin, so use random sampling Can even do this random sampling "privately" using a VRF [ Aside: interesting problem w/ voting-based systems: forward security ] Example system based on this design: Algorand Big challenge: joining the system requires downloading lots of data Pretty much every decentralized ledger requires downloading all blocks, and maintaining a large amount of state to verify future blocks Would like to join the system quickly, without downloading all blocks Would like to participate without maintaining a large amount of state Sharding could reduce security: only subset of nodes can verify txns Still an open problem in general; some possible approaches here and there [ Ref: https://www.ndss-symposium.org/wp-content/uploads/2019/02/ndss2019_09-2_Leung_paper.pdf ] Big challenge: finding important applications / use cases Many possible technical challenges to address Which ones matter depends on what application matters No clear big winning use cases yet, though Challenge: supporting a large number of transactions Some workarounds, like payment channels [ Ref: https://lightning.network/lightning-network-paper.pdf ] Some form of sharding seems inevitable for high on-chain volume Challenge: operations that span different ledgers (cross-chain transactions) Some clever tricks using Hashed Time-Locked Contracts (HTLC) https://en.bitcoin.it/wiki/Atomic_swap May want to have stronger support, though Challenge: privacy for transactions Most distributed ledgers make all contents public Some work on supporting "encrypted transactions" [ Ref: https://z.cash/ ] Operational challenges for decentralized ledgers Monitoring, availability, fixes, .. Agreeing on updates, governance Having many participants makes it time-consuming to agree on changes Tangentially related example: Signal not federated to speed updates Operational challenges for end-users Users now in charge of managing critical cryptographic keys Many examples of mistakes: users have improper backups, client-side software is buggy, client device is compromised, users fall for phishing attacks, ..