Flume ===== background: mandatory access control traditional OS threat model: protect users from each other file permissions: only nickolai can read files in nickolai's home dir what if the user was compromised, and had access to sensitive data? military example: cat attack-plans.txt | mail mahmoud@mail.ir Unix is sometimes called discretionary access control, or DAC if user has access to something, then protecting it (upholding policy) is at his discretion alternative: mandatory access control, or MAC even if user has "access" to something, cannot violate policy mandatory enforcement of security policies military policies: classified data classification: unclassified, secret, top-secret categories: nuclear, crypto, .. policy: classified data should not be disclosed to anyone without the proper security clearance how to enforce such a policy? (Bell-LaPadula model) assign every subject and object in the system a label: (c, s) c - classification level s - category set label describes the classification of data in the object intuitively, will help prevent disclosure of classified data (c1, s1) dominates (c2, s2) if: c1 >= c2 [higher classification] s1 is a (non-strict) superset of s2 intuition: (c1, s1) can see all data that (c2, s2) can labels and the "dominates" relation have a number of nice properties: transitive L1 dominates L2, L2 dominates L3 => L1 dominates L3 lattice can always find an upper bound of two labels how to compute upper bound for (c1,s1) and (c2,s2)? (max(c1,c2), s1 union s2) diagram: lattice and the dominates relation (top-secret, {crypto, nuclear}) (top-secret, {crypto}) (top-secret, {nuclear}) (secret, {crypto}) (secret, {nuclear}) (secret, {}) (unclassified, {}) ok, so how does this help us enforce mandatory access control? standard access-control matrix: b: (S, O) -> {read, append, read+write} simple security ("ss-property"), akin to Unix protection: if b(S, O) is read or read+write, then L(S) must dominate L(O) => cannot observe documents that you are not authorized to see star security ("*-property"): if b(S, O1) is read or read+write, and b(S, O2) is append or read+write, then: L(O2) must dominate L(O1) => cannot copy secret data into unclassified files how to achieve the star property? analyzing the entire access-control matrix is cumbersome rely on the fact that dominates is transitive if b(S, O) is read, then L(S) must dominate L(O) ("no-read-up") if b(S, O) is append, then L(O) must dominate L(S) ("no-write-down") if b(S, O) is read+write, then L(S) = L(O) intuition: data only flows up in the lattice according to these rules annoying: can you write anything if you're authorized to see top-secret? refinement: subjects have current-label and max-label (clearance) current-label cannot exceed max-label current-label cannot be lowered strawman: MAC for Unix label for every user, process, file at login, process gets lowest current-label and user's max-label current-label adjusted dynamically at runtime when process reads file, current-label set to LUB of file, process security measures: if process label exceeds max-label, kill process if process writes to a file it doesn't dominate, kill process security hole: storage channels attacker starts with process with lowest current-label starts another process that will read secret file and send data back covert channels: process listing (ps) exit status amount of free disk space labels themselves can be a covert channel to communicate a byte, start 256 procs send secret message to the Nth process (which will raise label) security hole: timing channels even if we avoid all storage channels, holes still abound CPU utilization: use 100% to send "1", use 0% to send "0" cache use, disk throughput, .. how to avoid covert channels? covert channels caused by sharing shared process table, shared CPU, shared cache, shared disk, .. strictly partition resources between labels if sharing needed, re-allocate at coarse granularity re-allocation leaks data but hopefully not much only works well with a small number of labels no great solutions; various techniques to reduce cov. chan. bandwidth how do you get data out? not specified in the model usually a "security officer" performs "declassification" (privilege) labels don't correspond to privilege themselves higher labels might be able to read more, but can write less lower labels can write more, but read less in general hard to know what's safe to declassify NSA: declassifying paper is generally safer: less hidden data another variant of MAC: integrity (Biba model) concerned about the integrity of data: trojaned text editor symmetrically opposite of secrecy: "no-write-up, no-read-down" add an extra element to the label: (c, s, i) (c1, s1, i1) can-flow-to (c2, s2, i2) if: [opposite of dominates] c1 <= c2 [flows to higher classification] s1 is a subset of s2 [flows to a superset of categories] i1 >= i2 [flows to at most as high integrity] Flume's DIFC: decentralized inf. flow control no single "security officer" doing declassification or specifying policy anyone can create a new category (tag) each subject, object has two labels (S, I) only two classification levels (either tag is in the S/I set, or it's not) declassification can be done by anyone with the right capability third label for each subject (O) creator gets these capabilities at first, can grant to anyone safety rules (definitions 1, 2, 3) as long as each step maintains security, then the end result is secure why do we need both t- and t+? can we just have t- for secrecy tags? t+ still interesting -- it controls "clearance" might not want arbitrary processes reading top-secret data one-way vs two-way communication why are pipes two-way? what's an endpoint? S,I labels associated with a particular file descriptor useful when a process can take on many labels (i.e. has capabilities) implicit vs explicit privileges when are privileges "exercised"? before: whenever a proces sends/receives a message with endpoints: whenever assigning/changing endpoint label definition 4: exercising privileges to change endpoint labels definition 5: sending messages does not involve privileges how is flume implemented? reference monitor (RM) 3 types of processes: flume-oblivious, unconfined, and confined control socket opened to RM modified libc, "system call forwarding" flume-oblivious: does not talk to RM unconfined: talks to the RM, but can also do anything else confined process can talk ONLY to the RM what's e_\bot for? a way to model channels outside of flume's control unconfined processes can talk to the network, send/recv arbitrary data e_\bot ensures the flume RM is "aware" of that fact label checking schemes IPC: check-at-send files: check-at-open what's the difference? can we use check-at-open for IPC? why random IDs (tag IDs, process IDs, etc)? covert channel through ID values why file server? need to perform label checks as opposed to Unix permission checks passes open FD back to client once checks are OK what permission checks does the file server perform? standard S/I labels, plus write-protect set why not just use the integrity set for write access control? write-protect set requires any of the privileges integrity set requires all of the privileges integrity must also be lower for files down the directory tree, but might want more protected files than the parent directory? why must secrecy increase/integrity decrease down the FS hierarchy? not strictly necessary -- an optimization when opening a file, need to have read access to entire path from / expensive to check explicitly, so instead just check file itself and this rule ensures file check is sufficient does the file server need to track how clients use their FDs? open the FD in read-only mode for RO access, kernel will prevent writes actually, will it? atime updated on read! others will be able to look at atime using fstat() maybe they mount the filesystem with -o noatime, or open(O_NOATIME) how does the RM know if you've closed the file, once it gives you the FD? in this impl, does not: assumes you have the FD forever (until exit) system call filtering (figure 5) why is pipe allowed? doesn't this create comm. chan. outside the RM? why is bind for network sockets forbidden? why is bind for Unix domain sockets OK? why spawn and not fork/exec? need to tightly control communication fork/exec allows processes to inherit a pipe outside of RM's control would never be able to isolate that process privilege persistence why doesn't Unix have this problem? (it does, but anything interesting runs as root and can change uid) what's their mechanism? if you have a capability, you can turn it into a string given the string, you can recover the capability at a later time setlabel facility sort-of like setuid is this any more likely to be secure than Unix setuid? spawn offers a more controlled environment than fork/exec min. integrity label may provide some control over callers how does FlumeWiki work? wikilaunch, wiki.py, pmgr.py what can wikilaunch do, if compromised? wiki.py? pmgr.py? end-to-end integrity why do we need the filters? can we mount a "return-to-libc" kind of attack (no low-integrity code?) unlikely: flume tracks all inputs, not just code inputs are we convinced that a page was only influenced by vendor code? maybe not: influenced by request from browser (note: not user!) malicious plug-in could have generated a link to update the page user inadvertently clicked on it (or browser auto-executed code) browser doesn't do label tracking, so no integrity information is FlumeWiki secure? prevented two subsequently-discovered bugs in wiki.py added wikilauncher and all Flume trusted code would you use Flume? +: can provide guarantees for malicious code handling private data -: for some apps this threat model is too strong (hard to use)