Koi === What problem is this paper trying to address? User location information is potentially sensitive. Applications may need to access to user's location information. An application could inappropriately disclose user's location. How to build applications where this is less likely to happen? Can think of this as a privilege-separation story. Hard question: how to design interfaces between separated components? Note: base protection mechanisms in Android provide just access control. Application either has access to location, or does not have access. Not a particularly helpful mechanism for applications that need some access. Many problems have the same shape / fall in this category. Applications often require some access to sensitive data or resources. Naive plan: directly give full access to the data or resource. Many things in Android are like this, though not all. Can lead to security problems: too many privileges. Often applications don't need to perform arbitrary operations. Careful interface / architecture design can reduce trust / limit app damage. One example: send SMS message in Android, instead of full access to GSM. Maybe another: allow app to dial phone, but not to record/play audio. But relatively few good examples having to do with sensitive data. How could Android developers build a more secure location-based app? Example from last lecture's paper. One application has a service that tracks locations for you and your friends. Requires access to precise location from the user's phone and from friends. Exchanges location information over the network with friends' devices. Probably some server on the Internet holds every user's current location. Another application might want to know when friends are nearby. Request to receive intents, using a broadcast receiver. Service can send out intents when friends are nearby. Service could avoid disclosing exact location of user or friends. This application doesn't need direct access to user's or friends' locations. Can enforce access control for broadcasts by labeling the broadcast intent. Two significant reasons why this approach might be insufficient. Other applications restricted to existing kinds of matching (friends nearby). What if application wants to find nearest Starbucks or gas station or ...? Most interesting use case in the paper: driving directions. The server is fully trusted with everyone's location information. Infact, in strawman above, clients get information about friend locations. Could fix this by having server do the location matching. But this paper's goal is to avoid fully trusting a single server. Koi's overall plan for addressing these problems: Introduce an interface for specifying richer queries for location data. Applications register location-based items. Applications issue queries to search for items based on location. Application's trigger gets invoked on a query match. Still have a service on the phone for collecting user's location data. But other users' location need not be sent to other users' phones. And other applications don't need direct access to location information. Still have an online service, but split it into two components. Matcher and combiner. Run by some well-known entities, rather than by individual app developers. E.g., Google, Microsoft, etc might be more trustworthy than random apps. Could be implemented on top of Android; no platform changes required. Figure 1(a) in the paper. Threat model / goal: Application is non-malicious. Servers are "honest but curious" (HbC): do not violate specified protocol. Servers do not collude with one another. Locations not sensitive, but linkage to user identity is sensitive. Why matcher-combiner split design? A typical application today would use its own server. Relying on a third-party server increases app's TCB: not so compelling. Splitting trust between two components slightly mitigates risk. Who would run this cloud server? Unclear deployment plan from paper's description. One possibility: platform provider (e.g., Google for Android, Apple for iOS). Platform providers interested in getting more apps, improving security, .. Perhaps other privacy-oriented organizations might run combiner pieces. Paper suggests EFF, ACLU, .. How deployable is something like this? Should be possible for each application to start using this in isolation. Just replace application's own location-tracking servers with Koi's. No need to interact with other applicatons' data (like on private servers). Why does Koi require non-malicious application assumption? Application could register fine-grained points all over the map. Application trigger gets called for the specific map point where user is. What attacks does this threat model capture? Helps well-meaning application developers avoid storing sensitive data. Attacks on the application's online servers are mitigated. Attacks on application code are somewhat mitigated. Requires more effort for adversary to compromise two servers instead of one. Not great for sensitive locations: know where guards are in a building? In some siatuations, actual location matters, not the linkage. Why is this threat model reasonable? Many applications are well-meaning, but may inadvertently leak data. Either mistakes in application code or compromises of app's servers. Developers might want help in avoiding data disclosure, or reduce the cost of running secure servers. Users may be less trusting of apps that don't use Koi, if it's widely adopted. With Koi, adversary has to compromise Koi servers, which might be better-run. HbC may make unilateral misbehavior by one server easier to detect. Active attack that changes interaction vs. passive "curious" observation. Collusion: users may be able to choose matcher vs. combiner. Paper describes a plan for running multiple combiners. Users could choose a combiner they're more comfortable with trusting. (Paper suggests a combiner run by the EFF, etc.) What does Koi's application interface look like? Figure 1(b) in the paper. Interface talks about two kinds of things: items and triggers. Each item has a piece of data associated with it. Could be user's email address, a photo, an IP address/port for app, .. Each item also has attributes: can be anything (integer, string, location). E.g., { tourguide: True, loc: , ... }. For location attributes, can specify symbolic "my location" value. Application doesn't know what it is, but current location will be used. Auto-update: item attribute can be continuously updated w/ user's location. Triggers also have attributes, which get matched against existing items. When a match is found, trigger gets run with the matching item's data. Logical table view: Content Attribute Alice tourguide:True Alice loc:Austin Bob tourguide:True Bob loc:Boston Chuck tourguide:True? Chuck loc:lat/lon? How does Koi's matcher / combiner work? Don't want any single server to store the logical table view from above. Break the direct connection between Content and Attribute: Content RegID AttrID Attribute Alice 1 A tourguide:True Alice 1 B loc:Austin Bob 2 C tourguide:True Bob 2 D loc:Boston Chuck 3 E tourguide:True? Chuck 3 F loc:lat/lon? One server stores and mappings. Called the "matcher" because it's going to match attributes. Does not directly know the linkage between the two mappings. Another server stores mappings. Called the "combiner" because it can combine matches w/ linkage data. Knows linkage between items/triggers, but not what the item is. [ Can visualize vertically splitting table into three slices. ] How does Koi's network protocol work? Three sub-protocols: registration, matching, and combining. Use simple public-key cryptography. Public-key avoids sharing separate secret key between servers & each user. Assume everyone knows public key for matcher and combiner. With public-key crypto, user can encrypt message to server, but not decrypt. Registration protocol. User wants to add new item/trigger to the table. Goal: add item without revealing relation between new rows. User's device creates an item: Chuck's ID, { tourguide:True?, loc:xxx? }. Note: device knows location, but app just specified "loc:self". Ultimately, want matcher M to learn attributes. Encrypt each attribute with M's key. Only want attributes revealed to matcher once anonymized by combiner. Thus, encrypt the ciphertext again with C's key. But sending this to combiner will directly leak our identity! So, send the whole message to the matcher first. { Chuck's ID, Enc_C [ { Enc_M[attr], Enc_M[attr], .. } ] } Matcher creates an entry in the Content/RegID table with fresh RegID. Sends the whole blob to the combiner with new RegID. Combiner decrypts one layer, chooses fresh AttrIDs for each attribute. Sends attrs to matcher (still encrypted with M's key) with new AttrIDs. [ Can visualize as adding table entries left-to-right in split table. ] How well does the registration protocol protect linkage privacy? Could infer linkage between attributes based on message order/timing. Paper claims the design relies on "cover traffic". May need to generate quite a bit of cover traffic to mask linkage. How could this be fixed? Maybe split the matcher into two pieces, based on the two tables it has. Still susceptible to analysis of attributes registered at the same time. But not susceptible to matching nearby Content and Attribute registrations. Matching protocol. Follows the registration protocol, once a trigger is present. Matcher looks for matches to individual attributes. E.g., what attributes match tourguide:True? E.g., what attributes match loc:xxx? Then the AttrIDs of matching attributes are sent to combiner. How does the matcher know what attributes to match at any one point? Combiner initiates the matching protocol. Picks a particular trigger that should be refreshed. Looks up the AttrIDs for the query attributes for that trigger. Sends those AttrIDs to the matcher. Is this secure? Again, susceptible to traffic analysis attacks. Requires cover traffic. Over time may be able to perform intersections to narrow down linkage. Combining protocol. Matcher found the AttrIDs matching each part of the trigger query. Combiner translates these matched AttrIDs into RegIDs. Then takes intersection of these RegIDs to find matching items. Almost done: we know the set of matching items. But, combiner doesn't know the identity of trigger owner. Solution: send the matching RegIDs (and the trigger's RegID) to matcher. Matcher looks up the item data for matching RegID. Matcher looks up the item data (callback info) for the trigger's RegID. Matcher sends item data to the user's device for callback. How secure is the combining protocol? Again, traffic analysis. But another problem shows up: matcher learns plaintext trigger + match IDs. Paper proposes using commutative encryption to address this problem. Suppose c = E_ka(E_kb(x)). Traditional encryption requires decryption in opposite order of encryption. I.e., x = D_kb(D_ka(c)). With commutative encryption, can decrypt in any order. I.e., x = D_ka(D_kb(c)). Cute construction: Section 5.4. Why does commutative encryption help? Koi must take data from one user (item) send it to another user (trigger). Don't want to require users to share keys apriori, or know who will match. Thus, even if data originally encrypted with one key when registered, it must get encrypted w/ different key when it's sent to trigger. With traditional encryption, must go down to plaintext to re-encrypt. With commutative encryption, can "re-order" layers: data is never plaintext. As before, trust each party (matcher, combiner) to be non-colluding. Cannot unilaterally decrypt data. But if it fails to do the right encryption, someone else can now get data. Multiple combiners. Section 5.4. Some minor changes required: User needs to indicate choice of combiner. Combiner-selected IDs need to be named using tuples, so matcher can tell apart two identical IDs from different combiners. Weakens traffic analysis resistance -- less cover traffic! What are the performance characteristics of Koi? End-user requirements are pretty similar. Slightly more CPU to encrypt/decrypt, perhaps. Similar bandwidth requirements to talk to server. Combiner/matcher need some more CPU to decrypt messages. Combiner/matcher need significant bandwidth for intermediate result sets. What does it mean that they proved Koi's network protocol secure? What matters is how individual nodes respond to messages. Only individual nodes have cryptographic keys of interest to the system. If a given message comes in, what messages get generated as a result? ProVerif tries all possible operations on messages, etc. Tries to prove that no operations can violate a particular property. E.g., prove non-linkability: messages connected by being in same packet. Limitation: does not take into account timing proximity, implementation, etc. Limitation: assumes the ProVerif specification is correct. Example of what the analysis could find: encrypt deterministically. Why does deterministic encryption violate their security? What can one conclude from this? Less likely to find protocol-level errors like replay, reflection, etc. What location-based applications can Koi support? Social network / friends nearby. Driving directions. (Surprising, since not originally in motivating example.) Matcher implements application logic (e.g., driving directions). Communicate over attributes: put current location / driving target in attr. Matcher puts "response" (e.g., where to turn) in the trigger response data. Combiner figures out which user should get which trigger response. How about plotting things on a map? Might be OK with a trusted "mapper" component. What about offline applications? The Koi service could cache whatever data the user wants to access offline. What location-based applications are not supported by Koi? Do you think Koi might get deployed in practice? Requires modifications to existing apps: can't "auto-Koi-ify" an application. Higher-level APIs for developing location-based applications seems useful. Having platform-provided servers for tracking locations seems useful. Non-collusion seems less critical but potentially nice. Protocol seems inefficient, requires significant matcher-combiner bandwidth. Users already fully trust platform provider (e.g., Android, Apple, MSFT). Might be OK with trusting Google/Apple/Microsoft servers. There's quite a bit of value in linking user information: ads, etc. Other candidates for protecting sensitive data in applications? Can Koi deal with malicious applications? One idea: rate-limiting. Could limit how often applications create new items in Koi. Would this work well? Could Koi provide guarantees? Hard.. A part of the problem: requires strong identity. Common problem for many systems that give out resources. Resources could be some storage space, network bandwidth, private data, .. Can Koi be fixed to deal with colluding (or malicious) servers? Seems strictly harder than removing the non-malicious app assumption. Our ideas for fixing non-malicious app assumption relies on servers. Some cryptographic techniques can help (e.g., PrivStats in references), but only for limited kinds of computation (location-based aggregates). Other ways to preserve privacy using richer platform features? Differential privacy: stronger guarantees for noisy aggregates. Access patterns: PIR, ORAM. (Slightly different problem.) References: http://nms.csail.mit.edu/projects/privacy/privstats-ccs.pdf http://www.cis.upenn.edu/~ahae/papers/fuzz-sec2011.pdf