Denial of Service
=================

what kind of DoS attacks might an attacker mount?
    try to exhaust any resource the server might have
	network bandwidth
	CPU time (e.g. expensive server-side image processing, searches, ..)
	disk bandwidth (e.g. database queries)
	disk space
	memory space
    try to take advantage of insufficient authentication
	802.11 "deauthenticate" packets not authenticated
	attacker can disconnect users from their wireless AP
	BGP performs no authentication on updates: pakistan announces youtube

network bandwidth DoS attacks: all about amplification
    attacker on his own unlikely to have much more resources than server
    original trick: smurf, send src-spoofed ping to broadcast address
    more practical today: src-spoofed UDP DNS queries
    higher-level amplification: botnets

CPU time attacks
    complex web apps often do quite a bit of computation for some requests
	e.g. SSL handshake, generate PDF, find airline tickets, ..
    attackers have actually figured this out for various sites and used it

disk bandwidth attacks
    disk is often the slowest part of the system (100 seeks per second?)
    systems optimized to avoid disk whenever possible, using caches, etc
    but what if an attacker enters the picture?  can ask unpopular queries
    all of a sudden, a cache is not so effective, many queries hit disk
    hard to control / predict

state-exhaustion attacks (disk space, memory)
    once the user is authenticated, quotas are plausible (esp. for disk space)
    before authentication, one approach is to ask client to maintain state
    cryptographically seal it, so that server can verify it at a later time
    (one worry is freshness -- client might be able to supply an old state)

is DoS a big problem?
    extortion is common for certain classes of sites
    high-value, time-sensitive apps like online betting are victims

how do we solve DoS?
    accountability to track down attackers
	becoming less practical with botnets, Tor, ..
    require authentication to access service
	but lowest level (IP) doesn't provide it by default
    require clients to prove they spent some resources too
	captchas
	cryptographic puzzles
	    given challenge C, find response R such that
		SHA1(C+R)'s low N bits are zero
	    this gives an advantage to fast CPUs (bad for mobile devices)
	    there's more memory-intensive puzzles that may be more "fair"
	    determine when to use challenge (and parameter N) based on load
	money (micropayments?)
	    some proposals of this form to avoid spam email ("e-stamps")
	bandwidth (speakup)

why is accountability (IP traceback) hard?
    IP source address not authenticated, can be spoofed
    what kinds of things can an attacker spoof in this manner?
	simple IP packets (go for network bandwidth/capacity exhaustion)
	UDP/ICMP packets (smurf, DNS amplification attacks earlier)
    what about TCP connections?
	not quite, requires guessing sequence number
	    some TCP stacks used to have predictable sequence #s (bad!)
	however, server keeps state for a half-open TCP connection
	was the basis for a "syn flood" attack, using up lots of server RAM
	cute solution: TCP cookies: sequence number is a MAC of connection data
	    MAC(client and server addresses and ports, timestamp) + timestamp
	    send SYN+ACK back with sequence number computed as above
	    can verify when the third packet (ACK) comes back, without state
	    slight problem: if third packet lost, client will not re-xmit!
	    (assumes, rightfully, that server should retransmit 2nd packet)

problem: given lots of packets flowing towards you, how to find the source?
    typical constraints (in increasing order):
	routers are hard to change
	routers can't do a lot of processing per packet
	end-hosts are hard to change
	packet formats are nearly impossible to change
    common at the time: input debugging
	operators find some pattern in packets (e.g. dst addr)
	run something like tcpdump on their router links looking for pattern
	repeat once figured out what other router packets are coming from
	requires no protocol support, but slow, tedious, not scalable, etc
    potential alternative: controlled flooding
	instead of doing input debugging, flood specific network links
	see which links cause attack to fluctuate (i.e. attack is on that link)
	clever but causes its own problems (flooding), not great for DDoS
    other alternatives: logging, ICMP traceback
	logging requires too much storage and runtime overhead
	ICMP traceback messages reliable, hard to authenticate, easy to spoof
    this paper's approach: mark packets as they're going through the network
	requires some support in routers, but not a lot of work per packet
	big challenge is how to do this without changing IP packet format

what can we hope to get from packet marking?
    figure 1
    no public-key infrastructure between routers
    at best, can hope to get a suffix of the attack path
	i.e. attacker can make up his own "fake" routers

how should packet marking work?
    ideal: append router's node ID at each hop
	limitation: requires unbounded space in packet
    node sampling: each router stamps its ID into fixed field with some prob.
	is this actually enough?  surprisingly, maybe yes
	why p>0.5?
	limitation: only works for single attack sources
    edge sampling: mark packet with IDs of router pairs across a link + distance
	can reconstruct all links taken by various packets
	with some (small) probability, overwrite marking info with this router's
	    otherwise attacker can pre-mark all packets to try to avoid marking
	if router sees incoming packet with distance 0, appends its ID
	limitation: every router has to increment distance value
	    i.e. must change every router before this is really useful

why do two routers have to mark the edge?  can we just have one router do it?
    router knows where it's about to forward the packet
    problem is, the next hop (neighbor) might have many IP addresses
    one for each interface, and the next interface will likely have diff addr
    makes it hard for victim to correlate traces
    don't know when two IPs belong to the same router

how to make edge sampling work in practice?
    big problem: encoding edge info into IP packet
	ideal edge info would be 32-bit hop1, 32-bit hop2, 8-bit distance
	all we have is the fragment ID field -- 16 bits
    trick 1: edge-IDs
	XOR edge node IDs to compute an edge ID
	can reverse the XOR process if you know the node IDs of your neighbors
	continue XOR'ing for larger distances
    trick 2: error-correction scheme to know when you XOR'ed the right things
	node ID is bit-interleaved with a known hash function of same node ID
	doubles the node ID size (32 bits -> 64 bits)
	but given a candidate node ID, can verify if it's a valid one
    trick 3: break up edge-IDs (64 bits) into fragments (e.g. 8 bits)
	mark packets with some fragment and that fragment's offset (3 bit)
	victim reconstructs edge IDs from fragments (potentially time-consuming)
    trick 4: what happens if the fragment-ID field is in use?
	observation: fragments not common, let's treat it as a separate case
	to mark fragmented packets, drop them with some small probability
	but if we dropped a fragmented packet, send an entire edge to dst
	    avoid all the 3 previous tricks, so that fewer pkts needed

aside: fragments pose interesting problems for firewalls
    fragmentation happens at IP level, firewalls might care about TCP packets
    suppose rule is, cannot open connections from outside to particular port
    requires looking at destination port & TCP flags (SYN) at once
    clever attacker can fragment packet: dst port in first, flags in second
    good firewalls must perform fragment re-assembly
	again requires keeping expensive state!  DoS candidate victim

limitation: every router must increment distance!
    possible alternative: use the TTL field, snapshot it in distance field?

how much should we trust the traceback info?
    can't trust any data from further away than closest attacker
    attacker can synthesize arbitrary edge fragment; only distance is controlled
    if one attack node is near victim, will be difficult to trace other sources