Neptune: Scheduling Suspendable Tasks for Unified Stream/Batch Applications | LSDS - Large-Scale Data & Systems Group, Imperial College London

Distributed dataflow systems allow users to express a wide range of computations, including batch, streaming, and machine learning. A recent trend is to unify different computation types as part of a single stream/batch application that combines latency-sensitive ("stream") and latency-tolerant ("batch") jobs. This sharing of state and logic across jobs simplifies application development. Examples include machine learning applications that perform batch training and low-latency inference, and data analytics applications that include batch data transformations and low-latency querying. Existing execution engines, however, were not designed for unified stream/batch applications. As we show, they fail to schedule and execute them efficiently while respecting their diverse requirements.

We present Neptune, an execution framework for stream/batch applications that dynamically prioritizes tasks to achieve low latency for stream jobs. Neptune employs coroutines as a lightweight mechanism for suspending tasks without losing task progress. It couples this fine-grained control over CPU resources with a locality- and memory-aware (LMA) scheduling policy to determine which tasks to suspend and when, thereby sharing executors among heterogeneous jobs. We evaluate our open-source Spark-based implementation of Neptune on a 75-node Azure cluster. Neptune achieves up to 3x lower end-to-end processing latencies for latency-sensitive jobs of a stream/batch application, while minimally impacting the throughput of batch jobs.

Publications