Cloud Terminal
==============

Hard problem: secure applications despite malware on the client device!
  Common situation: malware on user's Windows or Mac machine.
  Difficult to prevent malware in the first place.
    Users install many applications.
    Little isolation between apps on desktop OSes.
  To some degree, also a problem on mobile phones.
    Bugs in Linux kernel on Android can allow malicious apps to get root.
    Will talk about Android in more detail next week.

Possible solution: use virtual machines (or even separate devices).
  Separate VM per secure application.
  "Default" VM running the user's usual desktop environment, likely compromised.
  VMM provides isolation between apps.
  Actually used in practice: QubesOS
    [[ Ref: https://www.qubes-os.org/ ]]

  Upsides:
    Pretty strong isolation.
    Compatibility -- anything can run in a separate VM.
    Good performance.

  Downsides:
    Sharing.  But that's almost inevitable, and not solved by CT either.
    Management of these VMs.
    User interface -- who controls the UI, how to combine trusted + untrusted?
    Large TCB.
      General-purpose hypervisor is a pretty large piece of software.
      Need to emulate a variety of devices.
      Need some kind of window manager or UI switcher.

  Missing pieces:
    How can the user tell secure app vs. a malicious app impersonating it?
    How can the user tell if the machine is running this setup in the first place?
    How can some server (e.g., Google Docs) tell if the user's machine is running
      this setup, or if the user screwed something up?

Cloud Terminal plan:
  Small hypervisor on the client, manages just one "VM": the user's existing OS.
  Hypervisor uses special hardware (TPM) to initialize securely.
  Hypervisor includes a VNC client.
    "STT", secure thin terminal, in the paper.
  Each app operator provides servers to run that app in a VM.
    "CRE", cloud rendering engine.
  When user wants to run an app, VNC client connects to the corresponding SRE.

  Maintains the upsides (performance seems acceptable for target applications).
  Downsides:
    Management of VMs is shifted to app operator, which is probably good.
      Centralized and expert.
    User interface is simple: just one app active at a time (or default OS).
    TCB is smaller.
      No need to support general-purpose VMs.
      No need to share between multiple VMs.
      No need for a hybrid window manager combining multiple apps.
    New risk, though: VMs now running in some server.
      Attacker could break into this VM.
      Your data lives in plaintext on this VM.
      Would this be a good fit for end-to-end encrypted messaging like Signal?
        Unclear; depends on what you expect the attacks to be.
        Client insecurity is likely the easiest way to bypass Signal.
        Then the target would shift to these server VMs.
        Perhaps they would be more secure because they would be better managed.
        Admins can reconfig/update VM any time (but same with software updates).
  Some answers for the missing pieces, as we will discuss.

Background: TPM ("Trusted Platform Module").
  Separate physical chip present in many desktop machines.
    Connected to the CPU.
    The CPU knows about the TPM chip and interacts with it.
  TPM chip provides several interesting security-relevant functions.
    Keep a cryptographic key that's not accessible to the main CPU.
    Cooperate with CPU and/or BIOS to keep track of what software is running.
    Sign what software is running, so that other machines can verify.
  TPM chip's state (simplified):
    Ephemeral set of PCR ("Platform Configuration Register") registers.
      PCR0, PCR1, ..
      When TPM resets, registers get reset to well-known values.
        Which get reset to what is slightly complicated & not super relevant.
        Most of them get reset to all-zeroes (PCR0-15, 23).
        Some PCRs get set to all-ones (PCR17-22).
      PCR register value intended to reflect what is currently running
        on the main CPU.  "What" is not precisely defined by the TPM.
    Private key known only to the TPM.
    A certificate signed by the TPM manufacturer's private key.
      "We manufactured this TPM chip, with public key XXX."
  TPM chip's operations (simplified):
    TPM_extend(n, m): extend a PCR register, PCRn = SHA1(PCRn || m)
    TPM_quote(n, m): generate signature of (PCRn, m) with TPM's key
    TPM_seal(n, PCR_value, plaintext): return ciphertext.
    TPM_unseal(ciphertext): return plaintext, if PCRn matches PCR_value.

How does the TPM keep track of what's running on the host CPU?
  When the CPU resets, it sends a special signal to the TPM to reset the PCRs.
    Signal cannot be spoofed by code running on the CPU.
    PCR values reset == CPU has just reset.
  After reset, CPU starts running some initial bootstrapping code.
    This code will decide what to run next (the BIOS).
    But before running the BIOS, it calls:
      TPM_extend(n, ".. BIOS code ..")
    Now, PCR value reflects a hash of the BIOS code running on the main CPU.
  Same sequence takes place for subsequent loading.
    BIOS calls TPM_extend() with the boot loader it reads from disk.
    Boot loader calls TPM_extend() with the kernel it reads from disk.
    Can incorporate configuration state as well.
      E.g., maybe it matters what options the boot loader passed to the kernel.

What can we infer if some PCRn corresponds to a particular chain of hashes?
  Intended: could be that the right software chain was loaded.
  Or some software along the way had a bug, was exploited, and adversary
    issued their own extends from that point forward in the chain.
  Or the CPU did not start with the BIOS code in the first place.
  Or the TPM hardware did not reset synchronously with the CPU.
    [ Turned out to be "easy" on some motherboards: just short out a pin. ]

What does this allow us to do?
  Can prove to others over the network that you're running some software.
    Use TPM_quote() to get the TPM to sign a message on your behalf.
    Assumption: remote party trusts your TPM (but not you directly).
    TPM has its own secret key, HW mfg signs public key, stores cert on TPM.
  Can encrypt data in a way that's only accessible to specific software.
    Use TPM_seal, TPM_unseal.
    Sealed data can be decrypted only by chosen recipient (PCR).
    Each TPM has its own randomly-generated key for encryption.
  Assumption: adversary does not tamper with CPU, TPM, or their link.

Late launch for TPMs.
  Awkward to measure everything from physical boot-up.
    Remote party doesn't know if the BIOS, kernel, etc you are running is good.
  CPUs also support partially resetting the TPM without resetting the CPU.
    AMD instruction: SKINIT; Intel instruction: SENTER.
    Caller supplies a chunk of code to run after the partial reset.
    CPU stops other cores, blocks DMA from devices, etc.
    CPU tells the TPM to reset several special PCR registers (PCR17-22).
    CPU tells the TPM to extend one of these PCRs with the caller-supplied code.
    CPU jumps to caller-supplied code.
  Important: CPU must guarantee isolated execution of the initial code!
    If adversary can tamper with that code's execution, TPM attestation useless.
    True for initial boot by assuming no malicious device firmware.
    For late launch, CPU needs to disable other cores, disable DMA, ...

Interesting way of initializing Cloud Terminal's "microvisor".
  Paper doesn't say explicitly how this works, so guessing based on their hints.
  User runs the Cloud Terminal on their untrusted OS.
    Seems like it needs to run in the kernel to start with.
  Cloud Terminal allocates some memory that it will use.
  Cloud Terminal suspends all OS kernel threads except itself.
  Creates a virtual machine descriptor (for x86 hardware virtualizaton) that
    gives access to all physical memory except what it allocated for itself.
  Sets virtual machine descriptor to match the one remaining OS kernel thread.
  Jump into the microvisor.
  Microvisor resumes the VM descriptor, effectively resuming the OS.
    But now the OS is running inside of the microvisor's VM.
  This happens each time the untrusted OS boots up.
    Avoids the need for microvisor to know how to boot, etc.

Few differences between running OS on raw hardware vs. on microvisor.
  No access to Cloud Terminal's memory.
  No direct access to keyboard/mouse input.  Instead, emulated from microvisor.
  Switched access to video output, controlled by microvisor.
    Sometimes OS gets direct access to video memory, sometimes sees shadow copy.

What does the microvisor do?
  Monitor keyboard input for special key (Control+F12).
  When pressed, stop forwarding input to OS, disconnect video from OS.
  Run the STT code instead.
  When STT says it's done (e.g., user pressed F12), resume OS.
    Need to clear video memory, to make sure nothing leaks to untrusted OS.
    Resume forwarding keyboard/mouse input.

How does the microvisor / STT interact with the network, file system?
  Need to send data to/from the CRE.
  Need to store a small amount of data persistently.
  Don't want network/disk/file system drivers in the microvisor.
  Solution: helper process in untrusted OS.
    Forwards messages between microvisor and the untrusted-OS-managed network.
    OK because messages are cryptographically encrypted/authenticated.
    Same for files (file contents better be encrypted/authenticated).

How does the user know the microvisor got correctly installed the first time?
  Malicious OS could install a different microvisor!
  Verification service gets attestation:
    Sig_UserTPMKey( H(microvisor code) || ... )
    Probably need to include something that binds attestation to current session
    E.g., a public key that the microvisor just generated
  Verification service sends a nonce to the microvisor.
    Probably encrypted with the freshly-generated public key.
  Microvisor is supposed to show this nonce to the user.
  User calls the verification service, enters the nonce (and some part of the
    hash of their TPM public key, printed on the computer's case?).
  Verification service checks if this code was sent to the TPM with that hash.
    If so, tells user "OK".
  Why is this secure?  Rough argument:
    "OK" means that code was sent encrypted with public key appearing in a
      TPM attestation from the user's computer.
    TPM attestation from user's computer means that the user's computer was
      running the correct microvisor, with the corresponding private key.
    The correct microvisor would not disclose the private key to anyone, and
      would not disclose the decrypted code to anyone but the user.
    The verification service generated a random code that's hard to guess.
    So, the only way the user knew the code is if it was displayed to him
      by the legitimate microvisor running on his machine.
  Would this be secure without entering some part of the TPM public key?
    No: adversary can initialize Cloud Terminal on their own physical machine,
      and just relay the code to the victim's computer running malicious OS.
    "Cuckoo attack" from the paper's footnote 3.
    Paper proposes several defenses.
      One is to enter hash.  (Somewhat awkward.)
      Another is to allow registering each TPM exactly once.  (How to reinstall?)
      No great solution -- bootstrapping trust in hardware is hard!

How does the user know they got the right microvisor on subsequent runs?
  "Reverse password": secret background image.
  First-time install: user remembers some image.
  The image gets TPM_seal()ed with microvisor's PCR value,
    and stored in the untrusted OS file system.
  On resume, microvisor tries to get this file and TPM_unseal() it.
  Only the correct microvisor would be able to TPM_unseal() it.
  User needs to watch for this secret background when launching STT.
  Similar to "SiteKey" used by banks, but stronger guarantees.
    SiteKey can be obtained by querying the bank's server.
    STT's image cannot be.

Can an adversary guess this background image?
  Probably a relatively small number of choices (hundreds?  thousands?)
  Adversary likely cannot try many times: user will notice something is wrong.
  Adversary may be able to try across users, and get lucky with a few.
    Compromised 1M machines, guess right 1/1000, trick 1000 users.

What can the user do once they're securely running the STT?
  Pick an application from an "app store" managed by the "directory service".
    Kind-of like a CA for Cloud Terminal apps.
  Connect to that application's CRE.
  CRE sends nonce to the STT.
  STT sends back attestation containing STT's PCR value from the TPM,
    nonce, and hash of CRE's public key.
  Why do we need the nonce?  Freshness.
  Why do we need the CRE's public key?  Relay attack with a malicious CRE.

Why does the STT send an attestation to the CRE?
  Seems like a weak story: few possible reasons, none super convincing.
  1. Strong 2FA.
    But kind-of overkill, if it's just 2FA; don't need to involve TPM for 2FA.
    Could keep a private key in STT, used to authenticate connection to the SRE.
  2. Prevent adversaries from connecting without using an STT.
    But adversary can install an STT on their own computer.
    Maybe raises the bar: harder to automate many attempts, etc?
  3. Prevent user from logging in from an incorrect STT client?
    Assumes user didn't correctly install STT / check background image.
    But if client is controlled by adversary, user has already lost..
    So this is only good for some kind of "not very malicious" incorrect STT.

How would this work with many applications?
  Likely malicious apps would sneak in.
  Hard to tell which one you should launch.
  Phishing attacks could arise ("Bank of AAmerica" vs "Bank of America")?

Is this a good design?
  Real and significant problem.
  Plausible solution, where one might assume it's just hopeless.
  Interesting use of the TPM.
  Not clear how well this would scale to many apps, in terms of security.
  Clunky UI, but necessary in this design.
  Maybe too much inconvenience for casual users, and for serious users they
    just carry two devices or run multiple VMs managed by their IT staff?
  Will see another take on the TPM next lecture: Intel's SGX.