Software fault isolation: WebAssembly
=====================================

This paper: research exploration of building a secure WebAssembly runtime.
  Formally reasoning about correctness for vWasm helps clarify isolation plan.
  Neat trick with rWasm translating wasm code to Rust, which enforces isolation.

Original paper that proposed WebAssembly:
  https://css.csail.mit.edu/6.858/2022/readings/wasm.pdf

Goal: run fast code in web applications.
  Javascript is universally supported but not a great fit in some cases.
    High-performance code.
    Applications written in a language that's not JS.
  Challenge: code must be isolated.
    Cannot have code from one website tamper with data from another site.
    Cannot have code from a website tamper with data on user's computer.

We've seen a number of isolation mechanisms already.  Why yet another one?
  Some require special privileges.
    Need to be root in order to set up chroot, UIDs, etc.
  Some work only on specific platforms.
    Firecracker runs only on Linux.
      But then what about browsers on Windows, MacOS, ...?
    VMs work well on CPUs that support hardware virtualization.
      But then what about devices without direct hardware VM support?
  Some require the system to be pre-configured by administrator.
    Containers require Docker / LXC / etc to be installed first.
    Administrator must create virtual network interfaces.
    Administrator must set up UID namespaces.

Approach: software fault isolation.
  Powerful idea for achieving isolation.
  Does not rely on hardware or OS support.
    Does not require special privileges to deploy.
    Can provide isolation even in the face of some hw/os bugs.
  But requires cooperation between developers and devices that run their code.
    Cannot take existing binary and run it in isolation.
    Contrast with containers or VMs: can run existing Linux applications.

Modern SFI system: WebAssembly.
  [[ Demo: wasm/README ]]
  lib.c has some code
  Can run it on command-line:
    wasmtime main.wasm "test Hello world"
  Can also run it in the browser:
    python -m http.server 8080
    load http://localhost:8080/demo.html
  Corrupting memory in lib.c gets caught.
    uncomment memset
    recompile (make)
    wasmtime main.wasm test, catches error
    shift-reload browser, catches error as well

WebAssembly module.
  Functions.  Define all of the code in the module.
  Globals.  Global variables.
  Tables.  Targets of function pointers.
  Memory.  Contiguous memory from 0 to sz.
  Imports and exports.  E.g., can mark a function as exported, or import function.
  Demo:
    wasm2wat lib.wasm

WebAssembly workflow.
  [[ Application code (e.g., C) ]]
    -> C compiler that can generate WebAssembly ->
  [[ WebAssembly module / file ]]
    -> Runtime (vWasm, rWasm, ...) ->
  [[ Native code that can execute the WebAssembly module ]]

What are the security guarantees for running Wasm code?
  Threat model: Section 3.
  Adversary can run any of the functions in the Wasm module.
  Should not be able to access memory outside of what's assigned to the sandbox.

Challenge: performance + isolation.
  Could write an interpreter for whatever code we want to run.
    Javascript sort-of works this way.
    Could even be x86: just run qemu in the browser to emulate x86.
  Interpreter will not allow code to access outside of isolation boundary.
  But that's slow; goal is high performance.

Why would it be difficult to just isolate arbitrary x86 running natively?
  Ultimately want to prevent:
    Syscalls.
    Accessing memory outside of the isolated module.
  But hard to control what the code does through validation.
    Hard to determine what memory addresses will be accessed.
    Hard to tell if code will jump somewhere else, outside of module.
    Perhaps hard to tell if there's syscall instructions even.
      Variable-length instructions in x86.
  At runtime, code uses computed jumps and computed memory accesses.
    Without hardware support, cannot control what the addresses will be.

Overall approach: make sure all memory accesses are checked.
  Naive variant of this plan: add a bounds check just before every memory access.
  Performance challenge: that might be a lot of bounds checks!
  Security challenge: what if the code bypasses the check, jumping past it?

Why is WebAssembly better than x86?
  Clever design enables simple (and thus secure) high-performance interpreter/compiler.
  Split off structured data (code, stack, variables) from unstructured data (memory).
    Code not accessible as data at all.
    Can avoid bounds-checking for accesses to stack, variable locations.
    Only need bounds-checking for pointer accesses to memory.
    Wasm stack is not quite the same as the C stack: cannot access by pointer.
    If any data structures are accessed by pointer, compiler must put them in memory.
  Structured control flow so there's no possibility to jump past the check.
    No way to refer to the address of an arbitrary instruction.
    Structured control flow (scoped blocks); jumps only via pre-defined tables.
    Again, compiler's job is to ensure any pointer jump targets are registered.
  Possible to do SFI for x86, x86-64, ARM, etc.
    [[ Ref: https://css.csail.mit.edu/6.858/2020/readings/nacl.pdf ]]
    Quite a bit more complex than WebAssembly, perhaps slower, not portable.

High performance plan: translate WebAssembly into native code (e.g., x86, x86-64).
  Basic operations like arithmetic should be more-or-less one-to-one.
  That is, one WebAssembly operation can compile into one x86-64 instruction.
    One of the reasons why NaN floating-point ops are non-deterministic.
    Different hardware has different bit-level behavior when NaNs are involved.
    Want to emit single floating-point hardware instruction, so allow non-determinism.
  But need to be more careful about state and control flow.
    State: shouldn't be able to access anything outside the module.
    Control flow: shouldn't jump to any code that's not "properly translated."

What's the plan for ensuring safety of the executed code?  (vWasm style)
  WebAssembly instructions that access state should not escape the sandbox.
    Need to make sure we're accessing legal local var / global var / heap memory.
    Add checks to the generated native code as needed to make this safe.
  Ensure that the generated code can only jump to other generated code.
    Cannot jump to code outside of sandbox.
    Cannot jump to mis-aligned code.
    Cannot jump to code in sandbox bypassing the check.
    Typically this is called "control flow integrity" or CFI.
  This plan is summarized in the vWasm proof strategy (end of Section 4.3).

What state can the WebAssembly code access?
  Stack values.
  Local variables.
  Global variables.
  Heap memory, grows from 0 upwards.

How to bounds-check global var accesses?
  Known ahead of time how many global vars are in a module.
  Can check global var access at compile-time.
  Emit code that is always safe to run.
    Just access globals[off] where off is already a checked constant.
    Runtime maintains some way to access the base of the globals.
  Accessing in-bounds globals is safe.
    WebAssembly code could write arbitrary values to global variable.
    But that's OK: isolation doesn't depend on specific values.

How to bounds-check local var accesses?
  Every function has a well-known (constant) number of local variables.
    Local variables include arguments to a function call.
    As with globals, can check the offset is in-range at compile time.
  Slight complication: how do we access the local variables?
    Same function could be invoked many times recursively.
    Want to access the local variables for current invocation.
  At runtime, these local variables are stored on the (native) stack.
    Intermingled with (WebAssembly) stack values.
    See next.

How to bounds-check stack locations?
  Big question:
    Is this stack offset valid to access?
  How are WebAssembly local variables + stack implemented in terms of a native stack?
    (Some WebAssembly stack values are never materialized: stored in registers.)
    Real stack stores WebAssembly local vars, stack values (that aren't omitted),
      plus return addresses, saved registers when calling.
    Need to make sure return addr doesn't get corrupted by storing to local var.

WebAssembly's structured control flow helps check stack accesses.
  At every point in a function, it's known how many values are on the runtime stack.
  Control flow within a function must always agree on stack contents.
    E.g., converging control flow must agree on how many stack items there are.
  Benefit: compile-time check for how far up the stack code can access.
    Compiler knows the stack depth at every instruction in WebAssembly code.
  Benefit: compiler knows where the local variables are located.
    Accessing local variable requires skipping WebAssembly stack values on stack.
    But since we know stack depth, compiler can easily add that offset.
  Benefit: runtime can check if we're out of stack space.
    When entering a function, runtime knows how much stack space will be needed.

Bounds-checking memory operations.
  E.g., i64.store stores 64-bit value (8 bytes) at some address A.
  Need to make sure the resulting code cannot corrupt state outside of sandbox.
    In particular, addresses A through A+7 must be within the memory region.
  What should the compiled code look like?
    if a+8 >= memsz { panic }
    do the store
  Compiler can potentially eliminate repeated bounds checks.
    uint64 a[4];
    a[0] = ..;
    a[1] = ..;
    a[2] = ..;
    a[3] = ..;
  Suffices to check once:
    if a+4*8 >= memsz { panic }
    do the 4 stores

Clever use of virtual memory for bounds checks is possible.
  WebAssembly specification says memory is at most 4GB in size.
  Address used for i64.store is a 32-bit unsigned address.
  Instruction can also specify a static offset.
    Common to access arrays, which means two offsets.
      a[5] = 6;
  So, full memory store operation is:
    *(membase + address + offset) = value
  WebAssembly specification requires both address and offset to be 32-bit values.
    Thus, resulting address will be at most 8GB from membase.
  Trick: virtual memory reserves 8GB region of memory for all possible addresses.
    Worst case, will hit page fault if access is to memory beyond allocated size.

Small trade-off with VM-based bounds checking.
  Not quite deterministic anymore.
  Depending on runtime, program might or might not stop on out-of-bounds store,
    if store is slightly out-of-bounds.

Yet another trade-off suggested by the paper: possible corruption on OOB write.
  Mask the address (cut off the high bits).
  Out-of-bounds write gets the address truncated, overwrites some in-bounds memory.
  Safe; turns unsafe access into a correctness problem.

No memory safety within a module.
  Entirely possible for a C program compiled to WebAssembly to have buffer overflow.
  E.g., corrupt heap-allocated memory region.
  WebAssembly doesn't prevent overflow, so C program could misbehave arbitrarily.
  But does ensure that the WebAssembly code can't affect the rest of the system.

Why is value type checking important?  (i32 vs i64)
  Compiler may want to know if a value might be limited to 32 bits.
  Allows some optimizations like loading/storing memory locations.

Why is function type checking important?  (How many arguments does a function take?)
  WebAssembly function can access (and modify) arguments.
    They are just like local vars.
  Thus, wasm function can modify some number of runtime stack locations,
    depending on how many arguments it declares itself as taking.
  Need to make sure the caller gave us enough arguments on the stack.
    Otherwise function can read/write other items on runtime stack!
    Could be some native return address, which would be bad.

How to achieve control flow integrity?
  Direct jumps: compiler ensures the native jump goes to proper compiled code.
  Function calls: set up the stack in the way the target function expects.
    Namely, args + return address should be at the bottom of the stack after call.
    Type safety for function types ensures we put the right number of arguments.
  Indirect function calls: table of jump targets.
    Compiler ensures that all jump targets are valid functions.
    WebAssembly call_indirect emits native code to jump to some function from table.

rWasm runtime: compile wasm module to Rust code.
  Demo:
    git clone https://github.com/secure-foundations/rWasm
    cd rWasm
    cargo run ~/demo/lib.wasm
    wasm2wat ~/demo/lib.wasm
    cat generated/src/lib.rs
    [[ look at the implementation for add: func_2 ]]
    [[ look at the implementation for memwrite: func_5 ]]
    [[ look at memory_accessors: Rust's slice &[u8] checks bounds in get/get_mut ]]

    cargo run -- --wasi-executable ~/demo/main.wasm
    ( cd generated && cargo run )

  Cool trick of using Rust guarantees.
    All generated code is "safe" Rust, meaning should be impossible to have OOB accesses.
    Compiler should ensure the code is sandboxed properly.
  Nice to be able to see how the wasm runtime is working, by examining generated Rust.
  Compilation times are prohibitively slow compared to other runtimes.

Is Rust special in some way?
  Could have compiled to some other language too, if we can ensure memory isolation.
  E.g., could compile to Javascript or Python too.  Slow but probably isolated.
  E.g., could compile to Go too.  Could probably get reasonable performance.
  Rust has a strong type system that catches many issues at compile-time.
    Things like checking the type of stack vars would be done at runtime in JS/Python.
    [[ Ref: https://github.com/secure-foundations/rWasm/blob/main/templates-for-generation/tagged_value_definitions.rs ]]
  Rust has a convenient way of checking if a library is using unsafe code.
    #![forbid(unsafe_code)]
    Reduces the amount of code that has to be trusted (assumed correct).

WebAssembly is widely supported in browsers now.
  Firefox, Chrome, Safari, mobile variants, IE Edge, ...
  Used in serious applications, like Adobe Photoshop, Figma, Google Earth.
  [[ Ref: https://web.dev/ps-on-the-web/ ]]
  [[ Ref: https://www.figma.com/blog/webassembly-cut-figmas-load-time-by-3x/ ]]
  [[ Ref: https://web.dev/earth-webassembly/ ]]

Quite a bit of activity, new features.
  [[ Ref: https://github.com/WebAssembly/proposals ]]
  E.g., work on adding support for threads, vector instructions, etc.

Also used outside of web browsers.
  E.g., Fastly and Cloudflare CDNs use WebAssembly.
    Goal: run code on a server near requesting user.
    Similar goals to AWS Lambda, perhaps even more focus on low-overhead starts.
    Run the compilation step once, then binary is fast to start running.
    [[ Ref: https://docs.fastly.com/products/compute-at-edge ]]
    [[ Ref: https://blog.cloudflare.com/webassembly-on-cloudflare-workers/ ]]
  Cloudflare also uses Javascript-level sandboxing.
    [[ Ref: https://blog.cloudflare.com/cloud-computing-without-containers/ ]]
  Ongoing work to define standard interface to the system outside of a module.
    [[ Ref: https://github.com/WebAssembly/WASI ]]

What do security vulnerabilities in WebAssembly sandboxes look like?
  [[ Ref: https://www.fastly.com/blog/defense-in-depth-stopping-a-wasm-compiler-bug-before-it-became-a-problem ]]
  Compiler relied on 32-bit type optimization for memory accesses.
    Knows that certain values are 32 bits only.
    If accessing memory with a known-32-bit offset, omit truncation.
    Otherwise, emit instruction that will force-zero the upper 32 bits of 64-bit value.
  Compiler stored values in CPU registers (rather than stack locations) for performance.
  When it ran out of registers, stored these registers on runtime stack.
  Bug: when reloading 32-bit value from stack, accidentally sign-extended it.
    I.e., 0x80000000 was loaded as 0xffffffff80000000 instead of 0x0000000080000000.
    Just a wrong instruction used to restore 32-bit value from stack to register.
  Now it's not safe to add this (supposedly known-32-bit) value for memory access.
    Ends up subtracting from memory base instead of adding to it.

Possible to reduce TCB to rely on a verifier rather than entire compiler.
  Add a verifier that accepts binary result of compiler and outputs yes/no.
  Verifier can be much smaller than compiler.
  E.g., VeriWasm for Lucet (now Wasmtime).
    [[ Ref: https://cseweb.ucsd.edu/~dstefan/pubs/johnson:2021:veriwasm.pdf ]]
    [[ Ref: https://github.com/bytecodealliance/wasmtime ]]

Possible to reduce the cost of transitions between isolation domains even further.
  [[ Ref: https://cseweb.ucsd.edu/~dstefan/pubs/kolosick:2022:isolation.pdf ]]

Summary.
  Language-level isolation / software fault isolation.
  Powerful idea for building strong, OS/HW-independent sandboxes.
  Can be even lower overhead than OS/HW isolation: lower context switching costs.
  Correct compiler / runtime is critical to achieving security.
  WebAssembly designed to make compiler/runtime job easier in terms of security.