With the advent of data-intensive applications that generate large volumes of real-time data, distributed stream processing systems (DSPS) become increasingly important in domains such as social networking and web analytics. In practice, DSPSs must handle highly variable workloads caused by unpredictable changes in stream rates. Cloud computing offers an elastic infrastructure that DSPSs can use to obtain resources on-demand, but an open problem is to decide on the correct resource allocation when deploying DSPSs in the cloud.
This paper proposes an adaptive approach for provisioning virtual machines (VMs) for the use of a DSPS in the cloud. We initially perform a set of benchmarks across performance metrics such as network latency and jitter to explore the feasibility of cloud-based DSPS deployments. Based on these results, we propose an algorithm for VM provisioning for DSPSs that reacts to changes in the stream workload. Through a prototype implementation on Amazon EC2, we show that our approach can achieve low-latency stream processing when VMs are not overloaded, while adjusting resources dynamically with workload changes.