Denial of Service ================= what kind of DoS attacks might an attacker mount? try to exhaust any resource the server might have network bandwidth CPU time (e.g. expensive server-side image processing, searches, ..) disk bandwidth (e.g. database queries) disk space memory space try to take advantage of insufficient authentication 802.11 "deauthenticate" packets not authenticated attacker can disconnect users from their wireless AP BGP performs no authentication on updates: pakistan announces youtube network bandwidth DoS attacks: all about amplification attacker on his own unlikely to have much more resources than server original trick: smurf, send src-spoofed ping to broadcast address more practical today: src-spoofed UDP DNS queries higher-level amplification: botnets CPU time attacks complex web apps often do quite a bit of computation for some requests e.g. SSL handshake, generate PDF, find airline tickets, .. attackers have actually figured this out for various sites and used it disk bandwidth attacks disk is often the slowest part of the system (100 seeks per second?) systems optimized to avoid disk whenever possible, using caches, etc but what if an attacker enters the picture? can ask unpopular queries all of a sudden, a cache is not so effective, many queries hit disk hard to control / predict state-exhaustion attacks (disk space, memory) once the user is authenticated, quotas are plausible (esp. for disk space) before authentication, one approach is to ask client to maintain state cryptographically seal it, so that server can verify it at a later time (one worry is freshness -- client might be able to supply an old state) is DoS a big problem? extortion is common for certain classes of sites high-value, time-sensitive apps like online betting are victims how do we solve DoS? accountability to track down attackers becoming less practical with botnets, Tor, .. require authentication to access service but lowest level (IP) doesn't provide it by default require clients to prove they spent some resources too captchas cryptographic puzzles given challenge C, find response R such that SHA1(C+R)'s low N bits are zero this gives an advantage to fast CPUs (bad for mobile devices) there's more memory-intensive puzzles that may be more "fair" determine when to use challenge (and parameter N) based on load money (micropayments?) some proposals of this form to avoid spam email ("e-stamps") bandwidth (speakup) why is accountability (IP traceback) hard? IP source address not authenticated, can be spoofed what kinds of things can an attacker spoof in this manner? simple IP packets (go for network bandwidth/capacity exhaustion) UDP/ICMP packets (smurf, DNS amplification attacks earlier) what about TCP connections? not quite, requires guessing sequence number some TCP stacks used to have predictable sequence #s (bad!) however, server keeps state for a half-open TCP connection was the basis for a "syn flood" attack, using up lots of server RAM cute solution: TCP cookies: sequence number is a MAC of connection data MAC(client and server addresses and ports, timestamp) + timestamp send SYN+ACK back with sequence number computed as above can verify when the third packet (ACK) comes back, without state slight problem: if third packet lost, client will not re-xmit! (assumes, rightfully, that server should retransmit 2nd packet) problem: given lots of packets flowing towards you, how to find the source? typical constraints (in increasing order): routers are hard to change routers can't do a lot of processing per packet end-hosts are hard to change packet formats are nearly impossible to change common at the time: input debugging operators find some pattern in packets (e.g. dst addr) run something like tcpdump on their router links looking for pattern repeat once figured out what other router packets are coming from requires no protocol support, but slow, tedious, not scalable, etc potential alternative: controlled flooding instead of doing input debugging, flood specific network links see which links cause attack to fluctuate (i.e. attack is on that link) clever but causes its own problems (flooding), not great for DDoS other alternatives: logging, ICMP traceback logging requires too much storage and runtime overhead ICMP traceback messages reliable, hard to authenticate, easy to spoof this paper's approach: mark packets as they're going through the network requires some support in routers, but not a lot of work per packet big challenge is how to do this without changing IP packet format what can we hope to get from packet marking? figure 1 no public-key infrastructure between routers at best, can hope to get a suffix of the attack path i.e. attacker can make up his own "fake" routers how should packet marking work? ideal: append router's node ID at each hop limitation: requires unbounded space in packet node sampling: each router stamps its ID into fixed field with some prob. is this actually enough? surprisingly, maybe yes why p>0.5? limitation: only works for single attack sources edge sampling: mark packet with IDs of router pairs across a link + distance can reconstruct all links taken by various packets with some (small) probability, overwrite marking info with this router's otherwise attacker can pre-mark all packets to try to avoid marking if router sees incoming packet with distance 0, appends its ID limitation: every router has to increment distance value i.e. must change every router before this is really useful why do two routers have to mark the edge? can we just have one router do it? router knows where it's about to forward the packet problem is, the next hop (neighbor) might have many IP addresses one for each interface, and the next interface will likely have diff addr makes it hard for victim to correlate traces don't know when two IPs belong to the same router how to make edge sampling work in practice? big problem: encoding edge info into IP packet ideal edge info would be 32-bit hop1, 32-bit hop2, 8-bit distance all we have is the fragment ID field -- 16 bits trick 1: edge-IDs XOR edge node IDs to compute an edge ID can reverse the XOR process if you know the node IDs of your neighbors continue XOR'ing for larger distances trick 2: error-correction scheme to know when you XOR'ed the right things node ID is bit-interleaved with a known hash function of same node ID doubles the node ID size (32 bits -> 64 bits) but given a candidate node ID, can verify if it's a valid one trick 3: break up edge-IDs (64 bits) into fragments (e.g. 8 bits) mark packets with some fragment and that fragment's offset (3 bit) victim reconstructs edge IDs from fragments (potentially time-consuming) trick 4: what happens if the fragment-ID field is in use? observation: fragments not common, let's treat it as a separate case to mark fragmented packets, drop them with some small probability but if we dropped a fragmented packet, send an entire edge to dst avoid all the 3 previous tricks, so that fewer pkts needed aside: fragments pose interesting problems for firewalls fragmentation happens at IP level, firewalls might care about TCP packets suppose rule is, cannot open connections from outside to particular port requires looking at destination port & TCP flags (SYN) at once clever attacker can fragment packet: dst port in first, flags in second good firewalls must perform fragment re-assembly again requires keeping expensive state! DoS candidate victim limitation: every router must increment distance! possible alternative: use the TTL field, snapshot it in distance field? how much should we trust the traceback info? can't trust any data from further away than closest attacker attacker can synthesize arbitrary edge fragment; only distance is controlled if one attack node is near victim, will be difficult to trace other sources