This paper explores the challenges posed by modern chiplet-based CPU architectures in the context of sorting algorithms, a funda- mental component of many computer science applications. We highlight how the heterogeneity introduced by chiplet-based processors—including varying access times to partitioned L3 caches, inter-core latencies, and bandwidths—can lead to suboptimal performance when using traditional sorting algorithms that assume uniform memory access and consistent processor performance.
To address these issues, we propose a set of chiplet-aware optimizations designed to enhance the efficiency of memory-intensive sorting algorithms on these modern architectures. Our approach includes four key strategies: (1) partitioning input data at a chiplet-level granularity to minimize inter-chiplet communication and balance the computational load, (2) extending the memory hierarchy phase to account for distinct L3 cache partitions, (3) scheduling tasks based on data size relative to local and combined L3 cache capacities, and (4) avoiding expensive data shuffling. We provide a comprehensive analysis of chiplet architectures and detail chiplet-aware implementations of LSB Radix-Sort and Comparison Sort. Our evaluation demonstrates that chiplet-conscious sorting algorithms can enhance performance by up to 4.5× compared to NUMA-aware approaches.