Cloud computing aims to revolutionise traditional ways of service delivery. Security considerations, however, are a practical obstacle for its adoption. Cloud providers consolidate data from multiple services, which may result in wide-spread data disclosure when their security is compromised. While cloud tenants can be isolated through virtualisation, virtual networks and compartmentalised storage, the implementations of these features may contain vulnerabilities themselves, breaching data confinement. The problem of enforcing confinement of sensitive data is even more challenging in federated clouds, i.e.when a cloud provider uses another provider for some of its services. This is common in a Software-as-a-Service (SaaS) model, in which a provider offers a high-level service that can be reused by other providers. For example, the Dropbox file synchronisation service uses internally Amazons S3 storage cloud. This means that sensitive client data may flow to various cloud providers without client control, e.g. when Dropbox decides to change its storage provider. At the same time, both clients and cloud providers have an incentive to control the propagation of sensitive data. Clients are often legally responsible for data protection, and cloud providers want to prevent hosting sensitive data to avoid liability claims after security incidents.
The CloudFilter project aims to explore novel methods for exercising control over sensitive data propagation across multiple cloud providers. The expected outcome is a practical solution that allows clients and cloud providers to control the sensitivity of data that is transferred across their systems and to prevent user actions that would violate data dissemination policies. For such a solution to be practical, it must be compatible with todays applications and cloud platforms and integrate with current approaches for authentication and access control.
Our key idea is to provide application-level proxies that transparently monitor data propagation from clients to cloud providers and between cloud providers. These proxies employ a data labelling scheme inspired by decentralised information flow control (DIFC) models, in which security classes express the sensitivity of transfered data. In contrast to mandatory access control models, DIFC enables the decentralised creation and management of security classes at runtime, which is needed in federated cloud environments. When crossing domain boundaries, labels are attached to data automatically based on data dissemination policies. Proxies verify labels according to domain policies to detect and prevent unauthorised data propagation.
Web applications lead to a loss of control over sensitive data, which is a major concern to enterprises. Typically untrusted externally-hosted applications, such as Google Docs, and trusted internally-hosted ones, such as content management systems, are accessed side-by-side in web browsers. This makes it easy for employees to share text between them, accidentally violating corporate restrictions on data propagation. There is a need to help employees comply with data propagation policies by alerting them about data disclosure to untrusted web applications. A challenge is to identify sensitive data without burdening employees or requiring modifications to web applications.
We developed BrowserFlow, a browser plug-in that tracks the propagation of text data across web applications. BrowserFlow associates new paragraphs in a document with security tags that denote their sensitivity based on the application context in which they were created. When a paragraph appears in another application, BrowserFlow associates tags based on text similarity, thus automatically inferring the security requirements of the text. It uses assigned tags to reason if text is permitted to flow to a given application. If not, BrowserFlow alerts the user and transparently encrypts the text before permitting its upload to an untrusted server. Our experiments show that BrowserFlow can robustly track sensitive text across updates and manages tags for many documents with little impact on browser performance.