Modern Data Management Systems increasingly abandon monolithic architectures in favor of compositions of specialized components. Storage layers like Parquet and Arrow are combined with kernels like Velox and RocksDB, optimizers like Apache Calcite or Orca and other specialized components to build systems optimized for a specific domain, execution environment or even application. Unfortunately, the architecture of Data Management Systems and the interfaces between components are the same as 30 years ago: highly efficient but rigid. This rigidity obstructs the adoption of novel ideas and techniques such as hardware acceleration, adaptive processing, learned optimization, or serverless execution in real-world systems.
BOSS address this impasse through a novel approach to data management system composition inspired by two principles stemming from compiler-construction research: a homoiconic representation of data and code and partial evaluation of queries by components. BOSS achieves a fully composable design that effectively combines different data models, hardware platforms and processing engines. It implements features like GPU acceleration of relational queries and generative data cleaning without (measurable) overhead compared to a monolithic design.