Securing interfaces: device drivers
===================================

Goal: sandboxing buggy code in large application (Linux kernel drivers).
  Drivers often written by hardware manufacturers, not expert kernel developers.
  Many different device drivers (67% of Linux kernel source code!)
  Device drivers often have bugs.
  No isolation within the kernel: driver bug can corrupt the entire kernel.
  Extreme version of the privilege-separation problem you have in lab 2.

This paper: state-of-the-art research in driver isolation.
  Sophisticated system, able to isolate serious drivers in Linux, good performance.
  Builds on significant prior work.
    Interface definitions: https://www.usenix.org/system/files/atc19-narayanan.pdf
    Isolation machinery: https://www.cse.psu.edu/~trj1/papers/vee20.pdf
    Program analysis: https://www.cse.psu.edu/~gxt29/papers/ptrsplit.pdf
  May eventually influence how deployed systems isolate drivers, but not there yet.

New problem we're seeing in this paper: securing the interface.
  We already know how to isolate code pretty well.
    Processes, VMs, SFI / WebAssembly.
    A previously-solved problem as far as this paper is concerned.
  A big focus in this paper is on the interface between the kernel and isolated driver.

Background: dealing with buggy device drivers.
  Has long been a particular focus for security, reliability.
    Many Windows crashes used to be caused by buggy device drivers.
    Similar story in Linux as well.
    Lots of interest in the research community on solving this problem.
    Significant interest from OS developers in adopting practical solutions.
  Threat model is usually similar to that of Baggy Bounds Checking.
    Assume that the device driver is buggy but not outright malicious.
    Make it less likely that the device driver could be exploited,
      less likely that attack can cause a problem in the rest of the kernel.
    Outright-malicious driver can probably do significant damage.
      E.g., malicious disk driver can give us fake data.
      E.g., malicious network driver can spoof / monitor traffic, etc.

One approach: run drivers in user-space.
  Isolation enforced by OS process isolation.
  Windows User-mode driver framework (UMDF): API for running drivers in userspace.
    E.g., USB devices.
  Linux also has some support for user-space USB drivers.
  Good but doesn't handle many issues that require in-kernel drivers.
    Handling interrupts.
    Interacting with other kernel subsystems (network, storage).
    Overhead of context-switching to user-space process for each driver call.

Kernel environment is tricky to isolate: no processes or VMs yet.
  One solution: SFI.  Could use WebAssembly or similar techniques in-kernel.
  Another solution: implement some kind of page-table-based isolation in kernel.
  Paper uses page-table approach, using hardware virtualization support.

What is the interface problem with device drivers?
  Threat: device driver has memory corruption bug, scribbles over random memory.
  We can isolate the driver running in the kernel, which might help, except that:
    Kernel needs to be able to invoke the driver (e.g., send packet).
    Driver needs to interact with the kernel (e.g., give incoming packets to kernel).
    Kernel and driver need to access shared memory (e.g., packets, device state).
  Even if the driver is isolated, it could misbehave in how it interacts with kernel.

Why is the interface such a challenge?
  To some degree it's always going to be tricky: kernel and driver depend on each other.
    Reducing dependence requires careful design; not obvious what to do.
    E.g., what if your disk driver refuses to access the disk?  Or returns garbage?
    Or what if your keyboard driver is giving you fake keyboard input?
  But to a large degree this problem arises because we're re-purposing existing boundary.
    Driver interface was not originally designed to be untrusted.
    Kernel may want to avoid giving driver too much data.
    Kernel may want to limit what the driver can do.
  Big challenge: shared-memory interface.
    For isolation, want RPCs that explicitly send call arguments / return results.
    But drivers assume they're sharing memory with the kernel.
    Functions pass pointers, assume driver will be able to access relevant memory.
  KSplit's goal: turn shared-memory interface into an RPC-style interface.
    As if explicitly sending data between kernel and driver.

Related challenge: synchronizing kernel and driver copies of the same data structure.
  Want to run driver in isolation from kernel.
  One approach: somehow allow driver to access specific locations in kernel memory.
    Would require expensive checks on every memory access.
    The set of allowed locations could change often (packet allocation, etc).
    Permissions might differ for each function depending on what the function does.
  Paper's approach (from earlier work): two separate memories.
    Driver has its own memory, with its own copy of the relevant data structures.
    Copy data from kernel to driver and back as needed.
    Insight: either the kernel or the driver should be accessing any particular field
      at any given time.
    When ownership moves from kernel to driver, copy from kernel to driver.
    Similarly, when ownership moves from driver to kernel, copy driver to kernel.
  Two-memory plan means fewer checks, but still need to know what to synchronize.
  Two-memory plan also has advantage of easier checking of the value being written.

Main problem for KSplit: what memory locations should the driver have access to?
  If we allow access to (or synchronize) all kernel memory,
    driver bug can easily cause kernel crash.
  But we need to give the driver access to some kernel memory to do its job.

Complex shared-memory interaction between kernel and driver.
  Kernel and driver need to access the same data structures.
    E.g., packets (struct sk_buff), network interfaces (struct net_device).
  Data structure might be shared at a field level.
    E.g., some sk_buff fields used by driver, others used by kernel.
  Data structure ownership can change through synchronization.
    E.g., grabbing a lock could transfer ownership to driver or kernel.
    Or concurrent use: e.g., driver does atomic writes, kernel does atomic reads.
  Data structures are dynamically allocated and freed.
    E.g., driver might call alloc_skb() or kfree_skb().
  Data structures contain pointers, function pointers.
    E.g., sk_buff contains a pointer to the end of the packet.
    E.g., network device struct contains function pointer to call for sending packet.
    Driver can try to corrupt some data pointer or function pointer.
    What if the kernel accesses or invokes that pointer?

Interface Definition Language (IDL) for specifying boundary.
  Clearly define what is shared across the interface between kernel and driver.
  Tool to generate code for going across the interface boundary.
    Wrappers for calling driver funcs from kernel, and kernel funcs from driver.
    Wrappers take care of allocating shadow copies of data structures.
    Wrappers maintain correspondence between kernel and driver shadow copies.
    Synchronize relevant data between kernel and driver copies of the memory.
  Tool to infer what the interface definition should be, based on source code.
  [[ Example: https://github.com/ksplit/idlc/blob/dev_ksplit/examples/ixgbe.idl ]]

KSplit workflow.
  KSplit analyzes the kernel and driver to figure out what's shared.
  KSplit generates an IDL file that specifies the sharing rules for this interface.
  Developer looks at the IDL file, fixes warnings and other issues.
  IDL compiler produces "glue code": wrappers for calling across the interface.
    This glue code synchronizes state and performs checks at boundary crossings.
  Developer builds kernel code, driver code, isolation machinery, and glue code.
  Resulting kernel will enforce IDL rules for kernel-driver interactions.

Step 1: Section 4.2: figure out what might be shared.
  Analyze the source code of the kernel and the driver.
  Look at struct fields accessed by each side.
  If only one side accesses that field, nothing more to do.
  If the same fields are accessed in both, need to figure out how ownership shifts.

Step 2: Section 4.3: figure out which fields need to be copied at boundary crossings.
  Start with the arguments and return values for a function call.
  Recursively scan through that function and anything else it calls.
  If that function reads or writes a shared field from the original function's
    parameter tree, add it as a read or write.
  Synchronize all of the fields found (reads sync on entry, writes sync on exit).

Step 3: Section 4.4: treat locks (acquire/release) as boundary crossings, etc.
  No parameter trees involved, since we don't know what's passed in/out.
  Just look for any shared fields that are read/written inside locked section.
  Synchronize on acquire and release.
  Empirically, doesn't seem like there's a lot of sharing through locked regions.
    One example: ixgbe driver updates packet stats, kernel reads them directly.

How could a compromised driver try to escape from KSplit's sandbox?

What if driver tries to set some function pointer?
  Typically initialized upfront in some registration function.
    Not commonly changed in steady-state execution.
    So, most functions would not have IDL annotation that lets them modify a func ptr.
  IDL constrains the pointer value: e.g., see pci_register_driver in ixgbe.idl.
    Copying will check that the pointer is a valid function with that signature.

What if driver sets some data pointer or offset to be out-of-bounds?
  Example: sk_buff has tail and end pointers.
  KSplit analysis is able to figure out that these are pointers within a bigger array.
  But need manual effort to annotate IDL and specify what the constraints are.
    "within" attribute.
    KSplit seems to generate a "may_within" attribute, for developer to fix up.
  Listing 1 in the paper.

What if the driver mis-allocates some data structure?
  IDL has explicit annotations when it's expecting to pass pointer to new allocation.
  Will invoke the allocator on the other side (e.g., in the kernel).
  Even if driver's allocator misbehaved, kernel will properly allocate new memory.

What if driver de-allocates memory that's still in-use?
  IDL has dealloc annotations to specify when an object should be deallocated.
  Driver probably won't be able to synchronize with the deallocated object anymore.
    IDL-generated wrappers keep track of correspondence between shadow copies.
    Mapping gets deleted on deallocation.
  But it might be that the kernel is still using the object, depending on the API.
  If kernel writes to deallocated object, could have memory corruption.

What if the driver was malicious?
  Malicious driver developer could write arbitrary driver code.
  KSplit would infer lots of state is needed by the driver.
  Maybe the kernel developer that's using KSplit would notice there's too much state.
  Otherwise, malicious driver would be able to corrupt / compromise the kernel.

Rough assumptions being made by KSplit:
  Driver developer is well-meaning; not trying to modify fields they shouldn't.
  Kernel developer using KSplit should look at the IDL to sanity-check it,
    resolving any warnings or missing information (e.g., within for sk_buff).
  Particularly subtle issues: memory allocation, deallocation, pointers.

How well does KSplit work?
  Seems like manual interface analysis would be a lot of work.
    2000 functions for ixgbe.
  KSplit seems to generate pretty accurate IDL.
    Few manual changes required: 53 lines out of 2476 for ixgbe.
    Driver seems to work (reasonable coverage during stress-test execution).
  KSplit seems to be not too far from "least privilege".
    ixgbe had >90% of the fields marked as shared in IDL actually be shared.
    Remaining <10% could be over-conservative analysis or incomplete workload.

How is the performance after sandboxing with KSplit?
  Some overhead: 5.4-18.7% overhead in throughput for memcached.
  Pretty aggressive benchmark: lots of boundary crossings for ixgbe driver.

Are there interface problems that arise if we're talking over the network?
  E.g., RPCs between OKWS components, Google services, etc.
  Need to be careful about decoding response.
  Need to validate data.
    E.g., status code or response from RPC should be a valid one.
  Need to be careful about assumptions across calls.
    E.g., assuming that next callback should advance some pointer, etc.
    E.g., passing pointers or offsets to data that was sent earlier.
    Less of a problem if RPC is just one round-trip.
    But could have a similar problem if doing streaming RPCs, repeated calls, etc.

Another context where these problems arise: enclaves / untrusted OS kernels.
  Separate line of work on putting the kernel in a separate isolation domain.
  Kernel might have bugs, but even if adversary takes over kernel, cannot compromise app.
  Isolation is relatively well defined, and perhaps doable.
  Challenge: how to use the syscall interface from a compromised OS?
    Many of the same problems that RLbox is facing, at an even larger scale.
    E.g., we invoked open("foo.txt") and got back file descriptor 5; is that OK?
      Should check that no other file we had open is using fd 5.
      Otherwise might get confused.
    E.g., we called mmap() to allocate memory and got some address back; is it OK?
      Should make sure this isn't the address of some existing data structure.
      Otherwise we might be tricked into writing over some existing memory!
    Perhaps suggests this isn't a great interface for a security boundary.
  [[ Ref: https://hovav.net/ucsd/dist/iago.pdf ]]
  [[ Ref: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/asplos2011-drawbridge.pdf ]]

Related practical tool: Linux user pointer checking.
  [[ Ref: https://www.usenix.org/legacy/publications/library/proceedings/sec04/tech/full_papers/johnson/johnson_html/cquk.html ]]
  [[ Ref: https://sparse.docs.kernel.org/en/latest/ ]]

Summary.
  Interesting case study of isolation inside the kernel.
  Tricky to safely interact between isolated components.
  Particularly hard to retrofit existing interfaces into security boundaries.
  Useful idea: make the interface definition explicit (IDL).