Tor === aside: interesting SSL (TLS) vulnerability recently discovered what's the goal of the paper? anonymity for clients, which want to connect to servers on the internet anonymity for servers, which want to service requests from users what's the main idea for how to achieve anonymity? intermediate node in the network relays communication why do we need more than one node? scalability: but could have many relay nodes running independently compromise: attacker learns identity of everyone using that relay node if many independent nodes, compromises only a fraction of traffic if using onion routing, attacker must compromise all nodes in the chain! traffic analysis: attacker can correlate traffic timing or volume on input/output need to chain in order to prevent timing/volume attacks can attacker still succeed? yes, if they observe enough nodes e.g. observe user's connection and all Tor exit nodes? attacker can also inject timing info (by delaying packets) how does onion routing work? at a high level, there's a mesh of relay nodes in the network assume client knows the public keys of all relay nodes client picks some path through this network encrypts message in public key of each node in path in turn sends message to first node in path, which decrypts & relays, etc each node only knows previous & next hop, not ultimate src & dst "exit node" (last in path) sends the data out into the real network at what level should we relay things? i.e., what's in these messages? could do any level -- IP packets, TCP connections, application-level (HTTP) what's the advantage / disadvantage? lower-level (IP): more general, fewer app changes, works with more apps higher-level (TCP, HTTP): more efficient, more anonymous what does Tor do? TCP-level relaying, using SOCKS (intercepts libc calls) examples of efficiency? no need to do TCP flow-control, rexmit thru Tor examples of lost generality? UDP doesn't work, can't traceroute, .. how does DNS work with Tor, if no UDP support? SOCKS can capture the destination's hostname, not just IP address exit node performs DNS lookup, establishes TCP connection examples of anonymity that's lost at lower layers? if we did IP, would leak lots of TCP info (seq#, timestamp) if we did TCP, would leak all kinds of HTTP headers and cookies if we did HTTP, can still get bitten by javascript code etc turns out lots of very identifiable features in JS environment browser version, history sniffing, local network addrs/servers.. Tor design mesh of ORs (onion routers) every OR has an open SSL/TLS connection to every other OR every OR's "identity key" (public key) globally known OR uses an onion key to interact with users why not the identity key? forward secrecy OR signs its current onion key with its identity key why does Tor need a directory? someone needs to approve OR nodes (otherwise attacker can inject lots) does a directory compromise anonymity? no, don't need to query it online what if a directory is compromised? clients require majority of directories what if many directories are compromised? attacker can inject many ORs what if directories are out-of-sync? attacker might be able to narrow down user's identity based on dir info user that saw one set of directory messages will use certain ORs.. Tor relaying two new terms: circuit and stream circuit is a path through ORs that a client builds up circuits stick around for a while (perhaps a few minutes) new circuits opened periodically to foil attacks on long-lived streams stream is a TCP connection many streams run over the same circuit how does the client build up a circuit? picks the sequence of ORs it wants to use for its circuit connects to the first one, issues a create operation to create circuit create performs DH key-exchange to set up circuit encryption key how do we authenticate each end in this DH key exchange? client authenticates server by knowing its public key gets the hash of key, which proves server decrypted DH message server does not authenticate client -- anonymity! ask OR at the end of current circuit to extend by adding another OR currently-last OR sends a create message to newly-added OR user supplies DH key-exchange message why is everything is in fixed-size cells? making traffic analysis harder nice property: ORs don't know if they're relaying data, building up circuit, etc what state does each OR keep for each circuit that passes through it? circuit ID and neighbor OR for two directions in the circuit (to/from OP) shared key with OP for this circuit and this OR what does the OR do with data packets passing through it? if it's coming from OP's direction, decrypt and forward away from OP if it's coming not from OP's direction, encrypt and forward towards OP can we avoid storing all of this state in the network? not without having to provide a variable-length path descriptor in each cell the exit node would likewise need a path descriptor to know how to send back how does a node know if a message is meant for it, or if it should forward it? uses checksum! if checksum matches, meant for me; if not, keep forwarding nice property: no need to encode path length, next hop, recipient in pkt infact, noone other than the ultimate destination should know recipient another nice property: packet size independent of path length why does Tor need exit policies? how do they work? preventing abuse exit policy published in directory along with other node info what happens when OP wants to establish a new stream (TCP conn)? finds last node in current circuit with acceptable exit policy picks a stream ID (which will be unique between OP & exit node) computes checksum for that particular exit node what if Tor didn't do integrity checking? need integrity to prevent a tagging attack attacker compromises internal node, corrupts data packets corrupted packets will eventually get sent out, can watch where they go how does Tor prevent replays? each checksum is actually checksum of all previous cells between OP & OR checksum for same data sent again would be different works well because underlying transport is reliable (SSL/TLS over TCP) how do anonymous servers work? introduction point rendezvous point why the split between introduction and rendezvous point? avoid placing traffic load on introduction points another worry: might worry about serving known-illegal data introduction point does not relay data rendezvous point doesn't know what data it's relaying why does Bob connect back to Alice? admission control, spread load over many rendezvous points what's the rendezvous cookie? lets Bob prove to Alice's RP that it's Bob what's the authorization cookie? (name: cookie.pubkey.onion) something that might compel Bob to reply, when he otherwise wouldn't maybe a secret word most people don't know as a result they can't DoS Bob's server much (just send lots of cookies) end picture: two circuits bridged at the RP bridged data is encrypted using key shared between Alice & Bob (DH) each can control their own level of anonymity neither knows the full path of the other circuit RP decrypts messages from one circuit, encrypts them, and sends them into the other circuit what are the potential pitfalls when using Tor? application-level leaks (javascript, HTTP headers, etc -- use an app proxy) fingerprinting based on Tor client behavior (how often to open new circuit) timing/volume analysis: partial defense is to run your own Tor OR? fingerprinting web sites: number of requests & file sizes of popular sites quantization from fixed-size cells helps a bit malicious ORs: join network, advertise lots of bandwidth or open exit policy