Native Client
=============

Today's lecture: sandboxing
  Sometimes we use isolation to keep attackers out.
  Sometimes to keep attackers in!
  Isolation to contain suspect code is often called a "sandbox".
    A sandbox allows code to use its own memory,
    and communicate in restricted ways,
    but nothing else.
  Why do people need sandboxes?
    to allow untrusted (malicious?) code to run on sensitive computer.
    E.g. JavaScript from web site in my browser on my laptop.
  Some sandbox mechanisms:
    virtual machines (Google App Engine runs customer code on VMs).
    language restrictions (JavaScript in your browser)
    UNIX processes
    chroot jail (as in OKWS)
  This paper: Software Fault Isolation (SFI)

Why so many sandbox schemes?
  Tough tradeoffs mean best design depends on situation.
  Light-weight vs heavy-weight (process vs VM).
  Compatible vs new environment (VM vs chroot jail).
  Slow versus fast (interpreted JavaScript vs machine code).
  Integrated vs arms-length (JavaScript in browser vs VM).
  Bugs vs malice (UNIX processes versus VM).

Today's paper: light-weight, new env, very fast, arms-length, malice.

Bugs in sandbox isolation / sharing mechanisms are a big worry.
  Sandbox must monitor and understand sandboxed code.
    To decide if actions are allowed/forbidden.
  Monitoring may be complex, thus potentially buggy.
  Sandboxed code must communicate with the outside world.
    This is also an opportunity for bugs.

What's the overall picture for Native Client?
  Suppose there's a web-based game that needs lots of CPU to draw images.
  Idea: run the game software on the user's machine.
    Maybe using JavaScript in browser?
    But JavaScript is slow!
  Idea: write in C, send (fast!) executable to the user to run.
  So:
    A web site.
      Compiled executable.
    User with laptop.
      Connect to web site.
      Download executable.
      Run the executable on the laptop.

What's wrong with this picture?

Native Client solves a very ambitious problem:
  Run machine code supplied by the attacker,
  but safely!

Quick demo, based on Google NaCl tutorial.
  % cd ~/6.858-web/spring19/web/lec/nacl-demo
  % vi hello.cc
  % vi index.html
  % make
  % make serve
  visit http://localhost:5103/
  uncomment memset(buf, 'A', 65536);
  % make
  % make serve
  re-load
  view JavaScript console to see "NativeClient: NaCl module crashed"

The browser did not crash or mysteriously malfunction!

What are some options for safely running x86 code?

Approach 0: trust the code developer.
  Browser plug-ins, ActiveX, etc.
  Developer signs code with private key.
  Browser asks user to decide whether to trust code from some developer.
  Users are bad at making such decisions.
    Works for known developers (e.g., Windows Update code, signed by MS).
    Unclear how to answer for unknown web applications (other than "no").
  No protection if code turns out to be buggy or malicious.

Approach 1: O/S mechanisms (process, chroot, containers, &c)
  OKWS takes this approach.
  Run untrusted code as a regular user-space program, but...
  Restrict what system calls the untrusted code can invoke.
    Linux: seccomp.
    MacOSX: Seatbelt.
  Native Client uses these techniques as its "outer sandbox".
  Why not use OS sandboxing alone?
    Mechanisms are quite different per O/S, not always adequate.
    Sandboxed code can make some system calls, could exploit kernel bug.
    Some sandbox mechanisms require root: don't want to run Chrome as root.

Approach 2: Software Fault Isolation (SFI) (Native Client).
  Given an x86 binary to run in Native Client, check that it's safe.
    Validation checks each instruction in the binary.
    Some instructions are always safe: allow.
    Some instructions are never safe, e.g. system call: prohibit.
    Some instructions are sometimes safe, depending on operands.
      Require a check before these.
        Compiler is expected to insert check instruction(s).
        Validator ensures the check is present. 
      Another option: insert the check through binary rewriting.
        Hard with x86, but more doable with higher-level lang.
  After checking, safe to run in same process as trusted code.

The Native Client environment:
  Web site provides x86 binary.
  Main browser process.
  Separate NaCl process:
    Validator.
    Runtime support for module.
    IPC to main browser process, to talk to JavaScript.
    NaCl module (the web site's binary).

What does safety mean for a Native Client module?
  Goal #1: executes only allowed instructions.
    No system calls.
    No instructions that undermine validation.
  Goal #2: accesses only its own memory.
    Module has dedicated code+data memory area.
    No writes of trusted runtime data.
    No jumps to unexpected targets in runtime's code.

What does the validator need to do?
  Look at every instruction.
  Check for forbidden INT &c instructions.
  Check that load/store/jump target addresses are inside the module.
  (It does all this before module execution starts.)

Challenges
  Variable-length instructions
    Validator must correctly identify instruction boundaries.
  Indirect memory references, both jumps and load/store
    E.g. if the address is in a register, or pulled from memory.
    The validator doesn't know what address will be used at run-time.

Why are variable-length instructions a problem?
  x86 instructions can be anywhere from 1 to 15 bytes.
  Suppose program's code contains the following bytes:
    25 CD 80 00 00
  Starting from 25, it is a 5-byte instruction:
    AND %eax, $0x000080cd
  Starting from CD, it's a 2-byte instr:
    INT $0x80   # Linux syscall -- forbidden!
  Not useful to prohibit forbidden instructions at every offset.
    Real instructions often accidentally contain forbidden bytes.
  
How to ensure code executes only instructions that validator knows about?
  Scan forward through all instructions, starting at the beginning.
    Instruction lengths as if executing straight through.
    This is Table 1's "fall-through disassembly".
  Check for forbidden instructions.
  Remember every instruction boundary.
  Note that this only sees a certain subset of possible instructions.
  Then look at every JMP (or branch, call, etc).
    Require that it target a remembered boundary.
  Thus:
    Validator examines only a subset of possible instructions.
    But enforces that code only JMPs to that subset.

What about indirect jumps?
  Example: C++ virtual method call.
  Example: function return.
  The problem:
    Program computes target address at run-time.
    Perhaps to an INT in the middle of an instruction!
    Validator can't tell!

Idea: require compiler to help.
  For indirect jump via %eax, NaCl requires the following code:
    AND $0xffffffe0, %eax
    JMP *%eax
  This clears the low 5 bits of every indirect jump target.
  So it ensures indirect jumps go to multiples of 32 bytes.
  And NaCl requires that no instructions span 32-byte boundaries.
  The compiler must follow these rules.
    Replace every indirect jump with the two-instruction sequence above.
    Add NOPs to prevent instructions spanning 32-byte boundary.
    Add NOPs 32-byte boundary if next instr is a indirect jump target.
  Validator checks the rules.
    No instruction spans a 32-byte boundary.
    Indirect jump always preceded by the AND.
  What prevents the module from jumping past the AND, directly to the JMP?
    Can't happen for direct jump -- validator allows only jump to the AND.
    Can't happen for indirect jump -- AND is at 32-byte boundary, not JMP.
  What about RET?
    RET is effectively an indirect jump via stack.
    Validator prohibits RET.
    Compiler must generate explicit POP + indirect jump code.
    Thus functions return to 32-byte boundaries.

Note that the compiler must cooperate.
  Generate AND+JMP, pad to 32-byte boundaries, no RET, &c.
  So Native Client requires a specially modified compiler.
    Can't run just any x86 machine code.
  Does this require us to trust the *attacker's* compiler?
  No: validator will catch a cheating or broken compiler.

Why are the rules from Table 1 necessary?
  C1: executable code in memory is not writable.
  C2: binary is statically linked at zero, code starts at 64K.
  C3: all indirect jumps use the two-instruction sequence above.
  C4: binary is padded to a page boundary with one or more HLT instruction.
  C5: no instructions, or the AND/JMP pair, can span 32 bytes.
  C6/C7: direct jumps only visit instrs seen in fall-through disassembly.

Homework Q: what happens if validator gets some instruction length wrong?
  Can this lead to an exploit?
  Or just to the validator rejecting the code?

How to prevent indirect jumps outside module's code?
  Into module's data, or outside the module altogether?
  (The AND+JMP sequence only restricts the low 5 bits.)

Segmentation.
  x86-32 MMU hardware provides "segments".
  Diagram: CPU -> MMU -> memory.
  Each memory access is with respect to some "segment".
    Segment specifies base + size.
  Segments are specified by a segment selector: ptr into a segment table.
    %cs, %ds, %ss, %es, %fs, %gs
    Each instruction can specify what segment to use for accessing memory.
    Code always fetched using the %cs segment.
  Translation: (segment selector, addr) -> (segbase + addr % segsize).
  Usually base=0, size=max, so segmentation is a no-op.
  To change segment table: in Linux, modify_ldt() system call.
  To change segment selectors: "MOV %ds", etc.

Limiting code/data to module's size.
  Add a new data segment with offset=0, size=256MB.
  Add a new code segment with offset=0, size=codesize.
  Validator rejects instructions that change segment selectors.
  When runtime enters module code, set all selectors.
    Now loads/stores are restricted to module's memory.
    And jumps are restricted to modules code.
    This is the springboard.
    Trampoline is *unchecked* code the module can jump to
      that sets segment selectors to normal values.
  NaCl write-protects code pages as well.

What if the CPU has no segment hardware?
  E.g. modern AMD/Intel 64-bit CPUs.
  One possibility: run in 32-bit mode.
    AMD/Intel CPUs still support segment limits in 32-bit mode.
    Can run in 32-bit mode even on a 64-bit OS.
  Or use 64-bit mode, require compiler to generate more checks:
    Limit indirect jumps to code size.
    Limit LD/ST to 256 MB.
    Look up Google paper on 64-bit NaCl (see ref below).

What would happen if the NaCl module code had a buffer overflow?
  And overwrote a return address or function pointer?
  Any exploit must use an indirect jump.
  Indirect jumps are limited by the code segment and the AND.
  So can only jump to validator module code.
    And the code pages are write-protected.
    Thus no code injection.
    Still, a clever attacker will likely find useful code to jump to.
  However, can't escape NaCl's sandbox.
    Since an exploit can only execute checked instructions.
  Does "can't escape from sandbox" mean "no security problem"?

Invoking trusted code from sandbox.
  Runtime injects helper sequences in [4KB..64KB).
    This code does not have to follow the validator rules!
    Module code can jump to 32-byte boundaries here.
  Trampoline undoes the sandbox, enters trusted code.
    Starts at a 32-byte multiple boundary.
    Loads unlimited segment into %cs, %ds segment selectors.
    Jumps to trusted code that lives above 256MB.
  Springboard (re-)enters the sandbox on return or initial start.
    Re-set segment selectors, jump to a particular address in NaCl module.
    Springboard slots (32-byte multiples) start with HLT.
    Prevents indirect jumps into springboard by module code.
    Springboard must be inside NaCl memory because it briefly
      runs with restricted code segment selector.

What's provided by the service runtime?  NaCl's "system call" equivalent.
  Memory allocation: sbrk/mmap.
  Thread operations: create, etc.
  IPC: with Javascript code on page that started this NaCl program.
    The JavaScript runs in a different process.
    So only messages, nothing more direct.

What are likely attack avenues?
  Inner sandbox: validator has to be correct (had some tricky bugs!).
    Validator has to understand every detail of the instruction set.
  Outer sandbox: OS-dependent plan to forbid most system calls.
    E.g. Linux SECCOMP, or chroot, &c, depending on O/S.
  Why do they need the outer sandbox?
    Bugs in IPC machinery or runtime, exploitable from inside sandbox.
    Bugs in the inner sandbox.
  What could an adversary do if they compromise the inner sandbox?
    Exploit OS bugs via system calls, though outer sandbox limits this.
    Exploit bugs in the main browser via IPC.

How well does Native Client perform?
  CPU overhead seems to be dominated by NaCl's code alignment requirements.
    Larger instruction cache footprint.
    But for some applications, NaCl's alignment works better than gcc's.
  Minimal overhead for added checks on indirect jumps.
  Call-into-service-runtime performance seems comparable to Linux syscalls.

How hard is it to port code to NaCl?
  For computational things, seems straightforward: 20 LoC change for H.264.
  For code that interacts with system (syscalls, etc), need to change them.
    E.g., Bullet physics simulator (section 4.4).

What happened after the paper was published?
  Integrated into Chrome web browser.
  Maybe Google App Engine uses NaCl to sandbox customer code.
  But Google says Chrome support will soon be dropped.
  Why not more widely used?
    Tied to Chrome -- no other browsers.
    Initially x86 only, though later a portable version.
    Separate process is awkward, required IPC to DOM and JS.
    JavaScript is faster now, eliminating some pressure for more speed.
  WebAssembly is a portable and widely supported replacement.
    Uses a similar software sandboxing approach!

Additional references.
  A report about the security contest
    https://www.nccgroup.trust/us/about-us/newsroom-and-events/blog/2009/august/the-security-implications-of-google-native-client/
  Report from one of the security contest teams
    https://www.lateralsecurity.com/downloads/hawkes_HAR_2009_exploiting_native_client.pdf
  Native Client for 64-bit x86 and for ARM.
    http://static.usenix.org/events/sec10/tech/full_papers/Sehr.pdf
  Native Client for runtime-generated code (JIT).
    http://research.google.com/pubs/archive/37204.pdf
  Native Client without hardware dependence.
    http://css.csail.mit.edu/6.858/2012/readings/pnacl.pdf
  Other software fault isolation systems w/ fine-grained memory access control.
    http://css.csail.mit.edu/6.858/2012/readings/xfi.pdf
    http://research.microsoft.com/pubs/101332/bgi-sosp.pdf
  Formally verifying the validator.
    http://www.cse.psu.edu/~gxt29/papers/rocksalt.pdf