Side-channel attacks and Spectre ================================ Side channel attacks: historically worried about EM signals leaking. [ Ref: http://cryptome.org/nsa-tempest.pdf ] Broadly, systems may need to worry about many unexpected ways in which information can be revealed. Many information leaks have been looked at to extract crypto keys How long it takes to decrypt. How decryption affects shared resources (cache, TLB, branch predictor). Emissions from the CPU itself (RF, audio, power consumption, etc). Many side-channel attacks are not crypto-related. Operation time relates to which character of password was incorrect (see Tenex below) Or time related to how many common friends you + some user have on Facebook. If attacker want to find out how many shared friends Or how long it takes to load a page in browser (depends if it was cached). If attacker wants to find out if someone looked a particular page Or recovering printed text based on sound from dot-matrix printer. [ Ref: https://www.usenix.org/conference/usenixsecurity10/acoustic-side-channel-attacks-printers ] Or network traffic timing / analysis attacks. Even when data is encrypted, its ciphertext size remains ~same as plaintext. Recent papers show can infer a lot about SSL/VPN traffic by sizes, timing. E.g., Fidelity lets customers manage stocks through an SSL web site. Web site displays some kind of pie chart image for each stock. User's browser requests images for all of the user's stocks. Adversary can enumerate all stock pie chart images, knows sizes. Can tell what stocks a user has, based on sizes of data transfers. Similar to CRIME attack mentioned in Paul Youn's lecture earlier this term. Or https://blog.appcanary.com/2016/encrypt-or-compress.html But attacks on passwords or keys are usually the most damaging. Famous password timing attack (Tenex operating system) Time page-faults for password guessing. Suppose the kernel provides a system call to check user's password. Checks the password one byte at a time, returns error when finds mismatch. (Tenex stored the password as plaintext.) Adversary aligns password, so that first byte is at the end of a page, rest of password is on next page. Somehow arrange for the second page to be swapped out to disk. Or just unmap the next page entirely (using equivalent of mmap). Measure time to return an error when guessing password. If it took a long time, kernel had to read in the second page from disk. [ Or, if unmapped, if crashed, then kernel tried to read second page. ] Means first character was right! Can guess an N-character password in 256*N tries, rather than 256^N. [ Ref: http://xeroxalto.computerhistory.org/_cd8_/pup/.plfsup.mac!2.html ] Not quite the original code, but similar: [chkpsw] calls [stcmp] to compare. What's Spectre's contribution? Spectre caused people to adjust their threat model Side channels were assumed away Now micro-architectural side channels are real $ grep . /sys/devices/system/cpu/vulnerabilities/* Exploit using speculation execution Standard processor feature to get high performance Speculative execution has observable side effects Brings values into cache, even on miss Breaks process isolation Attacker probes cache High bandwidth side channel Architectural side channel Not easy to address Several workarounds Danger: exploitable remotely through Javascript Processor features exploited Both features to obtain high performance Speculative execution if cond { ... } Processors guess branch and speculative execute instructions On today's processor it may be many instructions On failed speculation clean up processor state registers, etc. Caching Processors have several levels of caches Last level is shared On failed speculation, data stays in cache Three attack variants 1. Bound check bypass (Spectre) 2. Branch target injection (Spectre) 3. Rogue data cache load (Meltdown) 4. ... any other micro-architectural side channel ... Outline of exploit of variant 1 Kernel has a secret byte E.g., offset for Address Space Randomizaton Kernel has a code fragment ("gadget") as follows if (offset_from_attacker < sz) { // attacker arrange miss on sz // trains the branch predictor to speculatively executes this branch // arrange that array1 is in cache so that the following // instruction runs quickly and there is time to start next one v = array1[offset_from_attacker] // read secret byte // attacker cannot read v, but arranges array2 is not in the cache and uses v v1 = array2[v] } Once speculation fails, processor squashes internal processor state But doesn't remove array2[v] from cache External to the processor Attacker measures time to read of array2[0] array2[1] if one them is fast, then attacker knows value of v Concrete example: appendix A [show code, compile wo optimizations, and run] Illustrative Not a real attack Victim and attacker code in same address space Victim code Gadget is present Secret why unused1? unused2? (force size to be in different cache line from array1) Main function Try to read secret a byte at the time Using readMemoryByte malicious_x is the offset_from_attacker Uses cache measurements to read byte ReadMemoryByte Does 1,000 tries -- why? But breaks earlier Flush caches why 256? (a byte has 256 possible values) why 512? (prefetch?) Train branch predictor Why complicated way of computing x Invoke victim's gadget Time every possible value Count cache hits Bandwidth of side channel 10KB/s To make attack real: Find gadget in victim process Attacker process must be able to access array2 Evict sz in victim process Evict array2 in victim process Train predictor in victim process "Real" attack: steal secret from victim process on Windows Exploit the presence of DLL (dynamically-loaded library) Mapped in each process address space Which can be read and clflushed'd by any process Can even be written "Real" attack: steal secret from Linux kernel [Ref: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html] Exploit eBFP in kernel Two filters: 1. to train BP and find address of "array2" 2. to leak secret data Measure array2 from user space User space is mapped in kernel address space Discussion Attack breaks process isolation Serious attack! Attacker can be remote Attack can be performed from JavaScript Any micro-architecural state maybe a side channel Branch predictor, write-buffer, cache-coherence directory, ... Variant 2 Kernel code: if (off < sz) { v = array1[off] (*f)() // an f that read memory depending on v } Processor may speculative executes f important for object-oriented code Attacker chooses convenient f in kernel Like a return-to-libc attack Requires reverse engineering of branch predictor To train it to invoke the chosen f Variant 3 User-space code: mov rax, [some_kernel_address] and rax, 1 mov rbx, [rax+some_user_mode_address] First instruction is several micro operations check and load Processor execute instructions out of order first load, then check memory isn't committed, but address is in cache use flush+reload probe to measure if address is in cache Works because one address space with kernel and use Kernel-code is present in user process's page table Kernel code is mapped not readable by user code Mediation Variant 1. Linux kernel: Code analysis to avoid vulnerable code modify code to constraint index in gadget code https://lwn.net/Articles/752408/ include/linux/nospec.h For example: if (index < ARRAY_SIZE) // array_index_nospec() blocks speculation index = array_index_nospec(index, ARRAY_SIZE); return array[index]; } Browsers: avoid generating vulnerable code https://v8.dev/blog/spec Variant 2. Microcode update and kernel patches (e.g., retpoline) a return trampoline (https://support.google.com/faqs/answer/7625886) has an infinite loop that is never executed prevents the CPU from speculating on the target of an indirect jump slows down all indirect jumps but another plan (optpoline) resolves indirect jumps at runtime and replaces them with direct jumps https://lwn.net/Articles/774743/ Variant 3. Kernel page table isolation (KPTI) [Ref: https://en.wikipedia.org/wiki/Kernel_page-table_isolation] Developed before Meltdown was known Two page tables: one for kernel and one for user Harden browser No precise timing info Other observable side effects of speculative execution Power consumption Bus traffic ... Other types of timing attacks. Cache analysis attacks: processor's cache shared by all processes. E.g.: accessing one of the sliding-window multiples brings it in cache. Necessarily evicts something else in the cache. Malicious process could fill cache with large array, watch what's evicted. Guess parts of exponent (d) based on offsets being evicted. Cache attacks are potentially problematic with "mobile code". NaCl modules, Javascript, Flash, etc running on your desktop or phone Mobile code is untrusted, and may come from an attacker It is sandboxed, but it still may be able to measure side channels How to stop side channels? Remove side channels E.g., redesign processor to not commit to cache But every shared resource could be a side channel Reduce shared resources when processing secret data Process confidential data on a separate core With its own cache, etc. Requires refactoring of software --- not easy Measure channel and make them low bandwidth Hard to exploit References: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html https://cyber.wtf/2017/07/28/negative-result-reading-kernel-memory-from-user-mode/ https://eprint.iacr.org/2013/448.pdf http://css.csail.mit.edu/6.858/2014/readings/ht-cache.pdf http://www.tau.ac.il/~tromer/papers/cache-joc-20090619.pdf http://www.tau.ac.il/~tromer/papers/handsoff-20140731.pdf http://www.cs.unc.edu/~reiter/papers/2012/CCS.pdf http://ed25519.cr.yp.to/ http://cseweb.ucsd.edu/classes/fa05/cse240a/bp.pdf Side-channel attacks on RSA [ Ref: https://crypto.stanford.edu/~dabo/papers/ssl-timing.pdf] Victim Apache HTTPS web server using OpenSSL, has private key in memory. Goal: extract private key The basic plan is like Tenex system. Guess a bit at the time. But operations on server are noisy and so attack more complicated. Connected to Stanford's campus network. Adversary controls some client machine on campus network. Adversary sends specially-constructed ciphertext in msg to server. Server decrypts ciphertext, finds garbage padding, returns an error. Client measures response time to get error message. Uses the response time to guess bits of q. Overall response time is on the order of 5 msec. Time difference between requests can be around 10 usec. What causes time variations? Optimizations to implement RSA efficiently Karatsuba vs normal; extra reductions. About 1M queries seem enough to obtain 512-bit p and q for 1024-bit key. p and q are the two primes that RSA uses to construct public and secret key Only need to guess the top 256 bits of p and q, then use another algorithm. Counter measures are needed (e.g., RSA blinding) How worried should we be about RSA timing attacks? Relatively tricky to develop an exploit (but that's a one-time problem). Possible to notice attack on server (many connection requests). Though maybe not so easy on a busy web server cluster? Adversary has to be close by, in terms of network. Not that big of a problem for adversary. Can average over more queries, co-locate nearby (Amazon EC2), run on a nearby bot or browser, etc. Adversary may need to know the version, optimization flags, etc of OpenSSL. Is it a good idea to rely on such a defense? How big of an impediment is this? If adversary mounts attack, effects are quite bad (key leaked).