
Predicate pushdown for data science pipelines
Cong Yan, Microsoft Research Redmond
Predicate pushdown is a widely adopted query optimization. Existing systems and prior work mostly use pattern-matching rules to decide when a predicate can be pushed. However, challenges arise in optimizing for data science pipelines due to the widely used non-relational operators and user-defined functions (UDF) that existing rules would fail to cover. In this talk, I’ll present MagicPush, which pushes down predicate using a search-verification approach. The main contribution lies in verifying the pushdown correctness for a wide range of UDFs. Our evaluation shows that MagicPush is able to discover many more pushdown opportunities than prior rules on real-world data science pipelines.
About the speaker
Cong Yan is a senior researcher in Microsoft Research Redmond. Her research is about using the formal method and other programming language techniques to improve the performance and usability of data management systems, with a particular focus on the data science workload programmed using Python. She has received the SIGMOD best paper award and VLDB best paper award in 2023.
Date & Time
Thursday, January 11, 2024 - 14:00