XFI
===

questions on lab 2?

what's this paper trying to achieve?
    run legacy x86 binary code safely
    "safely" defined by P1..P7 in paper

what are the potential uses?
    plugins (device drivers, media codecs, browser plugins, ..)
    hardening existing applications (httpd from lab 1?)
    running untrusted code downloaded from some web site
	what sorts of things would or would not XFI solve?
    "mobile code", as in last lecture
	x86 instead of javascript in the browser?
	app code in the kernel (video codecs, packet filters)

what do we want to prevent the untrusted module from doing?
    corrupting memory that doesn't belong to it
    read secret data that doesn't belong to it
    invoke system calls
    invoke other code that it shouldn't be able to call
    .. or have exploits like buffer overflow that lead to the above

what should the module be able to do?
    read/write its own memory
    execute its own code
    call certain approved external functions

why XFI?
    use hardware protection?
	might be too expensive, or not available (inside kernel)
    use a high-level language?
	not practical for legacy code; not practical in kernel w/o runtime
    use a restricted language?
	works in some cases: packet filtering language for tcpdump (BPF)

so what's the plan?
    instrument calls, memory accesses, privileged instructions (eg syscall)
    make sure that all uses conform to our policy

what's CFI and why do we need it?
    direct: want to make sure the XFI module doesn't call arbitrary code
	might not want the module calling system() or making syscalls
    indirect: need it to make sure we have memory access checks in all places
    indirect: need it to make sure malicious code doesn't jump past checks
    problem: x86 disassembly is tricky
	25 CD 80 00 00    (AND %eax, $0x80cd)
	jump to second byte (CD 80) to invoke linux syscall
    cannot verify every possible offset (likely false positives)
    CFI ensures reliable disassembly and thus reliable software guards

how does CFI work?
    plan:
	disassemble the module linearly
	ensure entry points are on legal instruction boundaries that we saw
	ensure internal jumps go to legal instruction boundaries that we saw

    construct a call graph ahead of time (program analysis)
	for each call site, figure out what might be called
	sometimes hard to tell: calling a function pointer
	conservative answer: any function whose address is ever computed

    ensure that each call goes to one of the possible call sites
	simple for static cases
	need to have a runtime check for computed jumps

    is this good enough?
	should be able to prevent module from directly calling system
	    (i.e. will enforce external callers as we wanted)
	should make disassembly reliable
	what if the call graph allows arbitrary calls within the module?
	    might jump around in strange ways internally -- what could happen?
	    preserves reliable disassembly and external functions called
	    still need to ensure it doesn't bypass software guards we'll insert
	    (soln: CFI only allows arcs to function start)

    how do they implement it?
	assign each arc in the CFG a random ID
	place this random ID at the start of a function (in a prefetchnta)
	check the presence at the call site

    figure 2: impl sketch
	what prevents the attacker from jumping directly to "call EBX"?
	can an attacker synthesize a valid-looking target?
	    avoid ID in the check instruction itself
	    non-executable data
	    what if attacker can load another module later?

    what about returns?
	return addresses saved on a special stack

memory protection
    inline checks: figure 3
	relies on CFI's nice property
	not just jumps to instruction boundaries, but jumps to function start
	prevents jumps to memory reference after the mrguard
    fastpath vs slowpath memory
	fastpath: contiguous range of memory for private use by module
	    can potentially have a different fastpath for each memory ref
	    if we can guess which range of memory it's likely to be
	slowpath: other regions that program may have access to
	    e.g. stack, code (read-only), arguments passed in from the outside

why do they need two stacks?
    need to protect special values (return addr, frame ptrs) on the stack
    but memory protection can only protect contiguous regions, not what's in it
    so place all stack allocations accessed via pointers on one stack
	protected as a single memory range
    static analysis protects individual values on the other "scoped" stack
	XFI keeps track of the use of each element (e.g. return value unchanged)
    what happens when you grow the stack?
	ASP: in theory, mrguard should be enough
	    can we put it in fastpath memory?
	    probably not since stack is allocated at runtime
	SSP: mrguard would not allow (not accessible by pointer)
	    need a separate "stack bottom" check when scoped stack is grown

would XFI prevent exploits of buffer overflows in httpd?
    simple buffer overflow, clobbering return address
    corrupting a function pointer on the stack
	code injection
	return-to-libc
	what would the attacker need to do/know to successfully exploit?
    corrupting a data structure containing the file to read/execute

how does XFI avoid privileged instructions (e.g. page table changing)?
    verifier statically makes sure there are no such instructions

how does their verifier work?
    figure 4
    verification states
    static checks for immediate memory references or immediate jumps
    requires a proof for indirect/computed memory accesses or jumps
    can do one memory check for an entire basic block (ie no branches)
    verification states keep track of where the return address is on the SSP
	how does it keep track of where the next return address is?
    verifier ensures that stack pointers are preserved across function call
	origASP=ASP, origSSP=SSP[+4] at return instruction
    can an attacker do a jump to instruction 1 from elsewhere?
	that would be a static jump, needs no CFI runtime check, in-bounds
	bypasses mrguard
    how does the verifier chain together basic blocks?
	compute all possible transitions between basic blocks (static + CFI)
	make sure verif. states at parent block imply verif. states at children

what happens on a fault?
    presumably some existing error mechanism
    hope the caller knows what to do with errors
    maybe throw an exception?

what's all the stuff the rewriter has to do?
    works on unmodified binaries without source code access
	requires access to some debug information, though
	needs to figure out how the stack is being used, etc
    CFI: compute the control flow graph
	 insert labels and label checks at all jump targets and jump sites
    memory: move stack allocations that are accessed by pointer to diff stack
	    generate any needed verification states
	    insert mrguard calls as necessary to make the verification go thru

how does the XFI module interface with the rest of the world?
    stubs that set up stacks on incoming calls, copy args or set slowpath perms
	revoke perms on return
    stubs that call out (external code doesn't have CFI labels)

what's in the final TCB?
    verifier is trusted
    rewriter not trusted
    rest of the app code is trusted
	better not be trickable by clever invocations by malicious code!
	e.g. if there's a sort() function in libc that takes a function ptr
	almost like the "luring attack" from java
    stubs going in/out of XFI module are trusted
	set up stacks, add/remove slowpath permissions to arg memory
    error handlers are trusted

evaluating security/protection of XFI
    prevents some buffer overflows, heap overflow
    even prevents some data overwrites because of separate stacks
    does not prevent everything (nimda exploited backdoors, "luring" code)
    had to make some changes for windows drivers (avoid misrepresentation)
	doesn't seem that bad: just need to provide custom stubs?

what are the tricks to make things perform well?
    as much static analysis as possible
	clump multiple mrguard's together using verification states if possible
    large chunk of fastpath memory
	need not be fully allocated ahead of time; just virtual memory
    copy arguments to fastpath memory rather than access the ptrs directly
    special cases for stack growth
	SSP: stack bottom kept by windows in a convenient place
	ASP: use page address bits?
    slowpath data structure impl
	very simple: linear array of start+end addrs
	why does it perform OK?  only a few different ranges

what's the performance of XFI?
    seems OK
    main factors: read-protection (expensive!), arg passing (expensive!)

is XFI too strict for some code?
    JITs that generate x86
    might not perform well for code accessing many shared data structures

what doesn't XFI prevent?
    drivers: DMA attacks
    denial of service, liveness failures
    exploit unsafe assumptions that callers make (or other called functions)
	for windows drivers, had to ensure driver "identity" didn't change

where could you use XFI?
    OKWS?
    would you use it to confine Java code?
	might look like a capability design in terms of functions it can invoke
	hard to pass around complex Java objects across prot. domains