Capabilities and other protection mechanisms
============================================

Today's lecture: precise access control
  to build sandboxes that control potentially buggy programs,
    yet let them do what they need to
  core problem: very hard to make both precise AND convenient
    people have tried many ideas!

What sorts of applications need precise access control?
  OKWS -- to set up privilege-separated sandboxes, yet allow needed sharing.
  Programs that deal with network input:
    Parsing code should not have much privilege -- bugs.
    But rest of app may need to read and write various files.
  Programs that manipulate potentially untrusted file content:
    (gzip, media codecs, &c)
    Restrict access privileges of e.g. image manipulation code.
    Allow rest of application to read and write user's files.
  Untrusted software downloaded from the network
    e.g. JavaScript and extensions in browsers
    Needs to talk to main browser to display things
    Better not read/write my files -- even though browser can
  System utilities
    (Can't trust time-sharing users; "user" might be attacker.)
    Suppose e-mail program can write any user's inbox file.
    So it must have more privilege than I do.
    Can I trick it into using that privilege to e-mail me any file?

Confused Deputy paper explains a major reason why access control is hard (1988).
  Their system had a Fortran compiler, /sysx/fort (in Unix filename syntax)
  They wanted the Fortran compiler to record usage statistics, but where?
    Created a special statistics file, /sysx/stat.
    Gave /sysx/fort "home files license" (kind-of like setuid w.r.t. /sysx)
  What goes wrong?
    User can invoke the compiler asking it to write output to /sysx/stat.
      e.g. /sysx/fort /my/code.f -o /sysx/stat
    Compiler opens supplied path name, and succeeds, because of its license.
    User alone couldn't have written to that /sysx/stat file.
  Why isn't the /sysx/fort thing just a bug in the compiler?
    Could, in principle, solve this by adding checks all over the place.
    Problem: need to add checks virtually everywhere files are opened.
  So what's the "confused deputy"?
    The compiler is running on behalf of two principals:
     - the user principal (to open user's files)
     - the compiler principal (to open compiler's files)
    O/S lets the compiler access a file if *either* principal is allowed.
      Convenient, but does the wrong thing here.

Confused deputy arises in subtle ways.
  Not just programs that explicity run as two principals.
  Web server: fetching files for remote browser vs reading its own configuration files.
  Web browser: running untrusted JavaScript vs saving PDF files for user.
  Text editor: macros in edited file vs responding to user commands.

Two ways to think about the confused deputy problem:
  1. Ambient authority: privileges that are automatically used by process
     are the problem here.  It's risky for privilege to be used implicitly!
  2. Complex permission checks: it's hard for programs to implement
     their own permission checks correctly.
  A good access control scheme could help a lot!

Today we'll look at some O/S-level access control techniques.
  They operate on O/S abstractions: processes, UIDs, files, FDs.
    That's great if privilege boundaries can be made to align well
      with process boundaries, and if file granularity matches
      desired granularity of permissions.
  There are non-O/S sandboxing techniques as well.
    They often operate with finer granularity, e.g. threads, language objects.
    e.g. JavaScript, Native Client.
    Will look at these in more detail in later lectures.

Plan 0: Virtualize everything (e.g., virtual machines).
  Run untrustworthy code inside of a virtualized environment.
    Has its own file system, PIDs, UIDs, network ports, &c.
    Either complete "guest" O/S, or a "container" system within host O/S.
  Almost a different category of mechanism: strict isolation.
  Advantage: sandboxed code inside VM has almost no interactions with outside.
  Advantage: can sandbox unmodified code that's not expecting to be isolated.
  Advantage: some VMs can be started by arbitrary users (e.g., qemu).
  Disadvantage: hard to allow some sharing: no shared processes, pipes, files.
  Disadvantage: virtualizing everything often makes VMs relatively heavyweight.
    Non-trivial CPU/memory overheads for each sandbox.

Plan 1: Discretionary Access Control (DAC).
  Each object has a set of permissions (an access control list).
    E.g., Unix files, with rwx permission bits.
    "Discretionary" means applications set permissions on objects (e.g., chmod).
  Each program runs with privileges of some principals.
    E.g., Unix user/group IDs.
  When program accesses an object, check the program's privileges to decide.
    "Ambient privilege": process's privileges used implicitly for each access.

       Name              Process privileges
         |                       |
         V                       V
      Object -> Permissions -> Allow?

  DAC is well-suited to time-sharing, where users own their own files,
    sometimes need to share, and programs unambiguously execute as
    as single specific user.

  What if code might be malicious, or exploited via buffer overflow?
    Don't want to run it with my full permissions!

  What if program acts as multiple principals, e.g. web
    server that uses its own files as well as fetching
    files for browsers?

  Problem: ambient authority makes it too hard to constrain malicious/buggy code.
    Too easy for some files to have the wrong permissions.
  Problem: ambient authority makes confused deputies all too likely.
  Problem: only root can create new principals, on most DAC systems.
    E.g., Unix, Windows.
  Problem: some objects might not have a clear configurable access control list.
    Unix: processes, network, ...

Plan 2: Mandatory Access Control (MAC).
  MAC enforces a set of policies (== rules) on an application.
    Rules set up by application writer, or administrator, or user.
    O/S enforces policy; program can't change them.
  Example policies:
    Can only access specified files, no others.
    Can access any file except specified files/directories.
    Cannot use the network.
    Can connect to host X over the network.
  The goal of MAC is to make it much harder for applications to make mistakes
    about what files &c they access, or to be tricked into making mistakes.
  "Mandatory" in the sense that applications can't change this policy.

       Name    Operation + caller process
         |               |
         V               V
      Object --------> Allow?
                         ^
                         |
      Policy ------------+

  MAC usually implemented by intercepting every system call.
    Policies keyed by system call name and arguments.
  Each application has a policy file.
    Supplied by application vendor if more or less trusted.
    Supplied by admin or user otherwise.

  Example: Mac OS X sandbox ("Seatbelt").
  Pro: any user can sandbox an arbitrary piece of code, finally!
  Pro: can be applied to existing applications (with some work).
  Pro: can be very restrictive, yet allow precise sharing.
  Con: some operations can only be filtered at coarse granularity.
    E.g. shared memory can be only allowed or forbidden; not specific sharing.
  Con: can be difficult to determine security impact of syscall based on args.
    What does a pathname refer to?  Symlinks, hard links, race conditions, ..
  Con: programmer must separately write the policy + application code.
  Con: static -- not a programming tool.
    Hard for program itself to use to set up sandboxes with
    dynamically-determined privileges.

  Is it a good idea to separate policy from application code?
    Depends on overall goal.
    Good if user/admin wants to look at or change policy.
    Awkward if app developer needs to maintain both code and policy.
    For app developers, might help clarify policy.

Plan 3: Capabilities.
  Different plan for access control: capabilities.
    If process has a handle for some object ("capability"), can access it.

      Capability --> Object

  Characteristics of capability systems:
    The only access logic is "does the process have the capability".
      There is no ambient authority, and thus no global name spaces.
    All resources are uniformly accessible via capabilities.
    Capabilities can't be forged.
    A process can give a capability to another process.
      Holding a capability automatically grants access to corresponding object.
  Why is this attractive?
    A sandbox can be set up with exactly the capabilities it needs.
      Including any capabilities needed to share with other processes.
      No capability -> no access.
    There is only one access/permission scheme for all kinds of objects.
    No ambient authority, so no confused deputy.
  Capabilities are really an idea for a totally different O/S design!
    Some O/S's have *only* capabilities (e.g. KeyKOS); interesting but hard.
    Capsicum adds capabilities to a name+DAC system.
    Doing either right is a difficult design/research problem.

Unix file descriptors are a not-very-secure form of capability.
  An FD refers to a specific file/socket/&c (not a name, which might change).
  Holding a FD to a file allows process to write it, regardless of permissions.
  FDs can't be forged.
  FDs can be passed to other processes (inherited by fork, sent by sendmsg).
  Why aren't Unix FDs enough for sandboxing?
    Unix FDs allow operations like fchmod() that must be protected.
    Some Unix resources aren't addressed via FDs, e.g. processes.
    Many Unix system calls don't involve FDs, but must be protected.

Capsicum -- precise access control for Unix using capabilities.
  A process can be in normal mode, or in "capability mode".
    cap_enter() call switches to capabilitiy mode.
    Cannot exit capability mode!
    All children/descendents inherit capability mode.
  In capability mode:
    Access *only* allowed via capabilities.
    Capability is a kind of file descriptor,
      with some flags indicating allowed ops (read, write, seek, &c).
    Lots of new system calls to allow access via capabilities.
      openat(fd, name, ...)
      unlinkat(fd, name, ...)
      fd = pdfork(); pdwait(fd); pdkill(fd);
        thus ability to pdkill() can be restricted, given away
    No root directory or current directory.

General capsicum philosophy: no global namespaces.
  Why are the authors so fascinated with eliminating global namespaces?
  Global namespaces require some access control story (e.g., ambient privs).
  Hard to control access to objects in global namespaces.

Using Capsicum in applications.
  General plan:
    Some setup in non-capability mode -- open needed directories &c.
    Then switch to capability mode.
    From then on, application must use openat() &c.
    -> applications need to be modified to intentionally constrain
       themselves with capsicum, and to use capabilities.
  tcpdump.
    tcpdump snoops on LAN, parses packets w/ complex code, juicy target!
    needs superuser to open "pcap" -- then should have almost no privileges!
    2-line version (Figure 6): just cap_enter() after opening all FDs.
    Used procstat to look at resulting capabilities.
    8-line version (Figure 7): also restrict stdin/stdout/stderr.
    Why?  E.g., avoid reading stderr log, changing terminal settings, ..
    the point: now tcpdump can't do *anything* other than read
      packets, compute, write stdout.
  dhclient.
    dhclient sends/receives "raw" network packets, and then configures network interfaces.
      so it needs to retain privilege.
      but it also parses DHCP packets from whoever, so risk of e.g. buffer overflow.
    So use privilege separation:
      fork()
      parent opens raw socket, cap_enter(), send/recv packets, notify child.
      child runs as root, waits for info from parent, configures network interface.
  gzip.
    compression/decompression code may have exploitable bugs; people
      often decompress files from untrusted sources.
    Fork/exec sandboxed child process, send it file capabilities
      over a pipe.
      Child in cap mode, has no other capabilities, thus can't see any files &c.
    Substantial changes, mostly to marshal/unmarshal data for RPC: 409 LoC.
    Interesting bug: forgot to propagate compression level at first.
  Chromium.
    Want to render HTML, run JS, &c in separate sandboxed processes.
      They talk back to main browser process.
    Already privilege-separated on other platforms (but not on FreeBSD).
    ~100 LoC to wrap file descriptors for sandboxed processes.
  OKWS.
    What are the answers to the homework question?

Does Capsicum achieve its goals?
  How hard/easy is it to use?
    Using Capsicum in an application almost always requires app changes.
      To open files with openat(), &c.
    Suggested plan: sandbox and see what breaks.
      Might be subtle: gzip compression level bug.
  What are the security guarantees it provides?
    Guarantees provided to app developers: sandbox can operate only on open FDs.
    Implications depend on how app developer partitions application, FDs.
    User/admin doesn't get any direct guarantees from Capsicum.
      Unlike MAC schemes, wher user/admin directly specifies policy.
    Guarantees assume no bugs in FreeBSD kernel (lots of code), and that
      the Capsicum developers caught all ways to access a resource not via FDs.
  What are the performance overheads?  (CPU, memory)
    Minor overheads for accessing a file descriptor.
    Setting up a sandbox using fork/exec takes O(1msec), non-trivial.
    Privilege separation can require RPC / message-passing, perhaps noticeable.
  Adoption?
    In FreeBSD's kernel now, enabled by default (as of FreeBSD 10).
    A handful of applications have been modified to use Capsicum.
      dhclient, tcpdump, and a few more since the paper was written
      [ Ref: http://www.cl.cam.ac.uk/research/security/capsicum/freebsd.html ]
    Casper daemon to help applications perform non-capability operations.
      E.g., DNS lookups, look up entries in /etc/passwd, etc.
      [ Ref: http://people.freebsd.org/~pjd/pubs/Capsicum_and_Casper.pdf ]

What applications wouldn't be a good fit for Capsicum?
  Apps that need to handle human-oriented file/directory names.
    Names seem to require ambient authority, fit badly with capability's direct refs.
  Apps that need to control access to non-kernel-managed objects.
    E.g.: X server resources (windows &c)
    Capsicum treats pipe to a user-level server (e.g., X server) as one cap.
  Apps that need to connect to specific TCP/UDP addresses/ports from sandbox.
    Capsicum works by only allowing operations on existing open FDs.
    Need some other mechanism to control what FDs can be opened.
    Possible solution: helper program can run outside of capability mode,
      open TCP/UDP sockets for sandboxed programs based on policy.

References:
  http://reverse.put.as/wp-content/uploads/2011/09/Apple-Sandbox-Guide-v1.0.pdf
  http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/prctl/seccomp_filter.txt;hb=HEAD
  http://en.wikipedia.org/wiki/Mandatory_Integrity_Control