TCP/IP security
===============

next few lectures -- network security
  a big open network (the Internet) invites many attacks
  * authentication
  * liveness
  * privacy
  both host/host and inside network (routing, dns, arp, &c)
  today: attacks on network protocols
  later: host/host cryptographic solutions

why are we looking at old attacks?
  core Internet protocols were designed in late 1970s / 1980s
    network was small; stakes were low; cryptography was expensive
  surely old attacks on ancient protocols are no longer relevant?
    surely modern protocols are vastly more secure?
  no-one knows how to do fundamentally better than TCP/IP
  much progress in secure higher layers: kerberos, ssh, ssl
  lots of fixes for specific low-level problems
  but the basic network-level security properties haven't changed much
    so you have to understand them

example application: remote login ca. 1980
  in 1980, TCP but no cryptography -- like many applications today
  telnet -- just opens a TCP connection to login program
    what can an attacker do?
    * steal the password &c by snooping on the network
    * modify the data in flight
    * inject false data
    * re-direct entire conversation via routing
    all would have been hard on the ARPANET 
    but advent of Ethernet made password sniffing a real danger
  rlogin -- don't send password
    destination host has a list of trusted host names (.rhosts file)
    lets user log in w/o password if source host is on trusted list

why did rlogin seem OK?
  authors would not have claimed "secure" -- but perhaps "pretty good"
  big potential problem: attacker could put trusted client's IP
    address in the source address field
  but TCP communication involves *both* directions
    if attacker lies about source, then server's replies will
    not go back to the attacker, so the attacker won't be able
    to execute TCP correctly.

let's look at the details of TCP connection setup:
  Standard handshake (figure on the right side of page 2):
    C: SRC=C, DST=S, SYN(SNc)
    S: SRC=S, DST=C, SYN(SNs), ACK(SNc)
    C: SRC=C, DST=S, ACK(SNs)
    C: SRC=C, DST=S, data(SNc), ACK(SNs)
  The main point: set up initial sequence numbers for data packets.
  Why might one think the server can know it is talking to C?
    Only C should have been able to receive the second message.
    Thus, only C should know SNs.
    Server only accepts third message if it has the expected SNs.

TCP sequence number attack.
  Suppose adversary A wants to simulate a connection to S from C.
    (Assume A knows C's IP address -- usually not a big deal in practice.)
    A: SRC=C, DST=S, SYN(SNc)
    S: SRC=S, DST=C, SYN(SNs), ACK(SNc)
    A: SRC=C, DST=S, ACK(SNs)   -- but how to guess SNs?
    A: SRC=C, DST=S, data(SNc)
  How could the adversary guess SNs?
    Many hosts kept ISN variable, for use by next connection.
      Increment by 128 each second, 64 after each new connection.
      Helps avoid old packets from interfering with new connection.
      [ Ref: RFC 1185 appendix ]
    (ISN is "initial sequence number".)
    Adversary can make an ordinary connection to find out current ISN,
      then guess next one by adding 64.
  What happens to the real packet that S sends to C (second pkt)?
    C would assume the packet is from an old conn, send RST in response.
    Even if that RST was sent, adversary could try to race before RST arrives.
    Turns out attacker can suppress C; will get to that later.
  But why do sequence number attacks turn into a security problem?

1. Forging IP source address to services that authenticated based on IP address.
  Attacker can pretend to be a host in rlogin trusted list, send commands
    without needing to know a password.
  rlogin made a bad assumption about what the TCP layer provided.
    Assumed TCP conn from an IP address meant it really came from that host.
  Actually rlogin authentication was even worse:
    rlogin server used reverse DNS lookup to get host name of connection source.
    Owner of reverse domain can set *any* host name for an IP address!
    Can make a slight improvement: check if host resolves to same addr.
  IP-based authentication seems like a bad plan!
    No longer used for remote login.
    But still used in other situations, since better security is complex.

2. Denial of service attack: connection reset.
  If we can guess SNc, can send a RST packet.
  Worse yet: server will accept a RST packet for any SNc value within window.
  With a large window (~32K=2^15), only need 2^32/2^15 = 2^17 guesses.
  How bad is a connection reset?
    One target of such attacks were the TCP connections between BGP routers.
    Causes routers to assume link failure, could affect traffic for minutes.
    Solutions:
      TTL hack (255).
      MD5 header authentication (very specialized for router-to-router links).

3. Hijack existing connections.
  If you can guess seq #s, can inject data into an existing connection.
  I.e. wait for someone to log in, then take over the connection.
  [ Ref: Blind TCP/IP hijacking is still alive, by lkm@phrack.org, 2007 ]

How to mitigate attacks that forge IP source addresses?
  Some applications now have end-to-end cryptographic authentication.
    E.g. ssh, ssl, Kerberos.
    Next lecture: Kerberos.
  ISPs can filter packets with obviously forged IP source addresses.
    Often done today for small customers.
    Not straightforward for customers with complex networks, multihoming, ...

How to harden TCP against forged IP source addresses?
  Make it harder for attacker to guess next ISN.
  Can't choose ISN's in a completely random way, without violating TCP spec.
    Need to avoid recently used sequence numbers for same host/port pair.
  Random increments?
    Can't increment too quickly; don't want to wrap very often.
    So not a huge amount of randomness (say, low 8 bits per increment).
  Aside: must be careful about how we generate random numbers!
    Common PRNG: linear congruential generator: R_k = A*R_{k-1}+B mod N.
    Not secure: given one pseudo-random value, can guess the next one!
    Lots of better cryptographically secure PRNGs are available.
      Ideally, use your kernel's built-in PRNG (/dev/random, /dev/urandom)
    [ Ref: http://en.wikipedia.org/wiki/Fortuna_(PRNG), or any stream cipher
      like http://en.wikipedia.org/wiki/RC4 ]
  However, SN values for different src/dst pairs never interact!
  So, can choose the ISN using a random offset for each src/dst pair.
    ISN = ISN_oldstyle + SHA1(srcip, srcport, dstip, dstport, secret)
    Requires no extra state to keep track of per-connection ISNs.
    The point: attacker can no longer make an ordinary connection in order
      to guess current ISN for a different client.

Are forged source IP address attacks still relevant?
  Most operating systems implement the above per-connection ISN scheme.
    [ Ref: Linux secure_tcp_sequence_number in net/core/secure_seq.c ]
  But other protocols suffer from similar problems -- e.g., DNS.
    DNS runs over UDP, no seq numbers, just ports, and dst port fixed (53).
    Client does basic sanity checks on reply packet.
    If adversary knows client is making a query, can fake a response.
      Just need to guess client port, often predictable.
    Popular attack starting in 2008.
      [ Ref: http://cr.yp.to/djbdns/forgery.html ]
      [ Ref: http://unixwiz.net/techtips/iguide-kaminsky-dns-vuln.html ]
    Solution: carefully take advantage of all possible randomness!
      DNS queries contain 16-bit query ID, and can randomize ~16 bit src port.
    Solution: DNSSEC (signed DNS records, including missing records).
      Problem: key distribution (who is allowed to sign each domain?)
      Problem: name enumeration (to sign "no such name" responses).
        Partially mitigated by NSEC3: http://tools.ietf.org/html/rfc5155
      Slow adoption, not much incentive to upgrade, non-trivial costs.
      Costs include both performance and administrative (key/cert management).

Liveness is another big problem area for the network layer.
  Even when there are no authentication problems,
    we still rely on network protocols to actually deliver the data!
  "Denial of Service" (DoS) can be annoying, or part of blackmail,
    or an ingredient in a larger attack.

SYN flooding -- the first high-profile DoS attack.
  Server must be able to check client's ACK(SNs) in 3rd packet.
    Original implementation kept state for each "half-open" connection.
    Kept it for minutes in case client is slow, or network lossy.
    Only willing to remember e.g. 50 half-open connections, to avoid out of memory.
    Silently ignored new connections if already had 50 waiting.
  The attack:
    Attacker sends SYN packet with forged random IP addresses.
      Most of the forged addresses don't respond,
      so server never gets 3rd packet.
    Fills up server's 50 half-open slots.
    Now server ignores legitimate connection requests!
  Hard to track down:
    Forged random source addresses.
    Low rate -- attacker only needed to send a few SYN packets per
      second, since servers kept half-open connections for minutes.
  These attacks appeared in 1996 and were a big problem for a while.

Defense against SYN flooding: SYN cookies.
  Idea: make the server stateless, until it receives that third packet (ACK).
    Then server won't have half-open connections, and thus won't run out.
  Why is this tricky?
    Half-open state helped ensure source IP address wasn't forged,
    by checking that 3rd packet had the right ACK.
  Use a bit of cryptography so server doesn't have to keep state.
  Encode server-side state into sequence number.
    ISNs = SNc + (timestamp || SHA1(src/dst addr+port, secret, timestamp))
    Timestamp is coarse-grained (e.g., minutes).
    ISNs wraps around slowly assuming legitimate client choice of SNc.
    ISNs per-client, so attacker can't guess for a forged IP address.
    ISNs hash part changes, so not useful for long if one is stolen.
    [ Detailed ref: http://cr.yp.to/syncookies.html ]
  Server computes seq as above when sending SYN-ACK response.
  Server can verify state is intact by verifying hash on ACK's seq.
  SYN cookies have successfully blunted low-rate SYN-flooding DoS attacks.

Another type of DoS attack: bandwidth amplification.
  Attacker's goal is to overwhelm server or link,
    so that legitimate traffic is discarded.
  Send ICMP echo request (ping) packets to the broadcast address of a network.
    E.g., 18.26.7.255.
    Used to be that you'd get an ICMP echo reply from all machines on network.
    What if you fake a packet from victim's address?  Victim gets all replies.
    Find a subnet with 100 machines on a fast network: 100x amplification!
    [ Ref: http://en.wikipedia.org/wiki/Smurf_attack ]
  Can we fix this?
    Routers now block "directed broadcast" (packets sent to broadcast address).
  Modern-day variant: DNS amplification.
    DNS is also a request-response service.
    With a small query, server might send back a large response.
    With DNSSEC, responses contain lots of signatures, so they're even larger!
    Since DNS runs over UDP, source address is completely unverified.
    [ http://blog.cloudflare.com/deep-inside-a-dns-amplification-ddos-attack ]
  Can we fix the DNS attack?
    Perhaps by fixing DNS servers to only respond to legitimate clients.
    Hard: many name servers must respond to open-ended set of clients.
    E.g. laptops off MIT campus, but configured with MIT DNS servers.

Routing protocols: overly-trusting of participants.
  ARP: within a single Ethernet network.
    To send IP packet, need the Ethernet MAC address of router / next hop.
    Address Resolution Protocol (ARP): broadcast a request for target's MAC.
    Anyone can listen to broadcast, send a reply; no authentication.
    Adversary can impersonate router, intercept packets, even on switched net.

  DHCP: again, within a single Ethernet network.
    Client asks for IP address by sending a broadcast request.
    Server responds, no authentication (some specs exist but not widely used).
      If you just plugged into a network, might not know what to expect.
    Lots of fields: IP address, router address, DNS server, DNS domain list, ..
    Adversary can impersonate DHCP server to new clients on the network.
      Can choose their DNS servers, DNS domains, router, etc.

  BGP: Internet-wide (similar to RIP attacks described in paper).
    BGP routing system is huge; attackers control ISPs and BGP routers.
    Any BGP participant router can announce route to any IP address.
    Attack: announce you have a path to MIT, people route through you,
      you can inspect/modify traffic, and then forward to MIT.
    Attack: spammer announces unused address, sends spam, then goes away.
      Gets around IP-level blacklisting of spam senders: choose almost any IP!
    How to fix? S-BGP, RPKI, BGPsec.
      Sign original announcements.
      Trusted database of who is allowed to announce what IP prefixes.
      Sign paths, so others can verify length.
      Getting some traction but still not widely deployed.
      Database of what is allowed is a weak point.

The open Internet makes it easy for attackers to gather useful info.
  Which hosts are running vulnerable software / protocols?
    Probing:
      Check if a system is listening on a well-known port.
      Protocols / systems often send an initial banner message.
    nmap can guess OS by measuring various impl-specific details.
      [ Ref: http://nmap.org/book/man-os-detection.html ]
    Use DNS to look up the hostname for an IP address; may give hints.
  Which hosts exist, e.g. to explore indirect attacks,
      or to gather botnets?
    traceroute to find routers along the way, for BGP attacks.
    Can also just scan the entire Internet: only 2^32 addresses.
      1 Gbps (100 MB/s) network link, 64 byte minimum packets.
      ~1.5M packets per second.
      2^32=4B packets in ~2500 seconds, or 45 minutes.
      zmap: implementation of this [ Ref: https://zmap.io/ ]

Could one design Internet protocols that are "secure"?
  All packets have cryptographically verified source IP address?
  Track down DoS sources with these IP addresses?
  Require all uses of TCP to use cryptography?

How to improve security?
  Protocol-compatible fixes to TCP implementations.
  Firewalls.
    Partial fix, but widely used.
    Issue: adversary may be within firewalled network.
    Issue: hard to determine if packet is "malicious" or not.
    Issue: even for fields that are present (src/dst), hard to authenticate.
    TCP/IP's design not a good match for firewall-like filtering techniques.
    E.g., IP packet fragmentation: TCP ports in one packet, payload in another.
  Cryptographic security on top of TCP/IP: SSL/TLS, Kerberos, SSH, etc.
    A hard problem: protocol design, key distribution, trust, etc.
    Will talk about this more in next lecture on Kerberos.
  Some kinds of security hard to provide on top: DoS-resistance, routing.