Tor
===

aside: interesting SSL (TLS) vulnerability recently discovered

what's the goal of the paper?
    anonymity for clients, which want to connect to servers on the internet
    anonymity for servers, which want to service requests from users

what's the main idea for how to achieve anonymity?
    intermediate node in the network relays communication

why do we need more than one node?
    scalability: but could have many relay nodes running independently
    compromise: attacker learns identity of everyone using that relay node
	if many independent nodes, compromises only a fraction of traffic
	if using onion routing, attacker must compromise all nodes in the chain!
    traffic analysis:
	attacker can correlate traffic timing or volume on input/output
	need to chain in order to prevent timing/volume attacks
	can attacker still succeed?  yes, if they observe enough nodes
	    e.g. observe user's connection and all Tor exit nodes?
	attacker can also inject timing info (by delaying packets)

how does onion routing work?
    at a high level, there's a mesh of relay nodes in the network
    assume client knows the public keys of all relay nodes
    client picks some path through this network
    encrypts message in public key of each node in path in turn
    sends message to first node in path, which decrypts & relays, etc
    each node only knows previous & next hop, not ultimate src & dst
    "exit node" (last in path) sends the data out into the real network

at what level should we relay things?  i.e., what's in these messages?
    could do any level -- IP packets, TCP connections, application-level (HTTP)
    what's the advantage / disadvantage?
	lower-level (IP): more general, fewer app changes, works with more apps
	higher-level (TCP, HTTP): more efficient, more anonymous
    what does Tor do?  TCP-level relaying, using SOCKS (intercepts libc calls)
	examples of efficiency?  no need to do TCP flow-control, rexmit thru Tor
	examples of lost generality?  UDP doesn't work, can't traceroute, ..
	how does DNS work with Tor, if no UDP support?
	    SOCKS can capture the destination's hostname, not just IP address
	    exit node performs DNS lookup, establishes TCP connection
	examples of anonymity that's lost at lower layers?
	    if we did IP, would leak lots of TCP info (seq#, timestamp)
	    if we did TCP, would leak all kinds of HTTP headers and cookies
	    if we did HTTP, can still get bitten by javascript code etc
		turns out lots of very identifiable features in JS environment
		browser version, history sniffing, local network addrs/servers..

Tor design
    mesh of ORs (onion routers)
    every OR has an open SSL/TLS connection to every other OR
    every OR's "identity key" (public key) globally known
    OR uses an onion key to interact with users
	why not the identity key?  forward secrecy
    OR signs its current onion key with its identity key

why does Tor need a directory?
    someone needs to approve OR nodes (otherwise attacker can inject lots)
    does a directory compromise anonymity?  no, don't need to query it online
    what if a directory is compromised?  clients require majority of directories
    what if many directories are compromised?  attacker can inject many ORs
    what if directories are out-of-sync?
	attacker might be able to narrow down user's identity based on dir info
	user that saw one set of directory messages will use certain ORs..

Tor relaying
    two new terms: circuit and stream
    circuit is a path through ORs that a client builds up
	circuits stick around for a while (perhaps a few minutes)
	new circuits opened periodically to foil attacks on long-lived streams
    stream is a TCP connection
	many streams run over the same circuit

how does the client build up a circuit?
    picks the sequence of ORs it wants to use for its circuit
    connects to the first one, issues a create operation to create circuit
	create performs DH key-exchange to set up circuit encryption key
	how do we authenticate each end in this DH key exchange?
	    client authenticates server by knowing its public key
	    gets the hash of key, which proves server decrypted DH message
	    server does not authenticate client -- anonymity!
    ask OR at the end of current circuit to extend by adding another OR
	currently-last OR sends a create message to newly-added OR
	user supplies DH key-exchange message
    why is everything is in fixed-size cells?  making traffic analysis harder
    nice property:
	ORs don't know if they're relaying data, building up circuit, etc

what state does each OR keep for each circuit that passes through it?
    circuit ID and neighbor OR for two directions in the circuit (to/from OP)
    shared key with OP for this circuit and this OR
    what does the OR do with data packets passing through it?
	if it's coming from OP's direction, decrypt and forward away from OP
	if it's coming not from OP's direction, encrypt and forward towards OP

can we avoid storing all of this state in the network?
    not without having to provide a variable-length path descriptor in each cell
    the exit node would likewise need a path descriptor to know how to send back

how does a node know if a message is meant for it, or if it should forward it?
    uses checksum!  if checksum matches, meant for me; if not, keep forwarding
    nice property: no need to encode path length, next hop, recipient in pkt
	infact, noone other than the ultimate destination should know recipient
    another nice property: packet size independent of path length

why does Tor need exit policies?  how do they work?
    preventing abuse
    exit policy published in directory along with other node info
    what happens when OP wants to establish a new stream (TCP conn)?
	finds last node in current circuit with acceptable exit policy
	picks a stream ID (which will be unique between OP & exit node)
	computes checksum for that particular exit node

what if Tor didn't do integrity checking?
    need integrity to prevent a tagging attack
    attacker compromises internal node, corrupts data packets
    corrupted packets will eventually get sent out, can watch where they go

how does Tor prevent replays?
    each checksum is actually checksum of all previous cells between OP & OR
    checksum for same data sent again would be different
    works well because underlying transport is reliable (SSL/TLS over TCP)

how do anonymous servers work?
    introduction point
    rendezvous point
    why the split between introduction and rendezvous point?
	avoid placing traffic load on introduction points
	another worry: might worry about serving known-illegal data
	introduction point does not relay data
	rendezvous point doesn't know what data it's relaying
    why does Bob connect back to Alice?
	admission control, spread load over many rendezvous points
    what's the rendezvous cookie?  lets Bob prove to Alice's RP that it's Bob
    what's the authorization cookie?  (name: cookie.pubkey.onion)
	something that might compel Bob to reply, when he otherwise wouldn't
	maybe a secret word most people don't know
	as a result they can't DoS Bob's server much (just send lots of cookies)
    end picture: two circuits bridged at the RP
	bridged data is encrypted using key shared between Alice & Bob (DH)
	each can control their own level of anonymity
	neither knows the full path of the other circuit
	RP decrypts messages from one circuit, encrypts them,
	    and sends them into the other circuit

what are the potential pitfalls when using Tor?
    application-level leaks (javascript, HTTP headers, etc -- use an app proxy)
    fingerprinting based on Tor client behavior (how often to open new circuit)
    timing/volume analysis: partial defense is to run your own Tor OR?
    fingerprinting web sites: number of requests & file sizes of popular sites
	quantization from fixed-size cells helps a bit
    malicious ORs: join network, advertise lots of bandwidth or open exit policy