Type: Research Highlight
Title: Wander Join and XDB: Online Aggregation via Random Walks
Feifei Li , Bin Wu, Ke Yi, Zhuoyue Zhao
Available in: PDF
Joins are expensive, and online aggregation is an e↵ective approach to explore the tradeo↵ between query e”ciency and accuracy in a continuous, online fashion. However, the stateof-the-art approach, in both internal and external memory, is based on ripple join, which is still very expensive and needs strong assumptions (e.g., the tuples in a table are stored in random order). This paper proposes a new approach, the wander join algorithm, to the online aggregation problem by performing random walks over the underlying join graph. We also design an optimizer that chooses the optimal plan for conducting the random walks without having to collect any statistics a priori. Selection predicates and group-by clauses can be handled as well. We have developed an online engine called XDB by integrating wander join in the latest version of PostgreSQL. Extensive experiments using the TPC-H benchmark have shown the superior performance of wander join. The XDB implementation has demonstrated its practicality in a full-fledged database system.
Bio
Feifei Li is an associate professor of computer science at the University of Utah. He does research on large-scale data management systems. He was a recipient for an NSF career award, a Google Faculty award, the IEEE ICDE best paper award and 10+ years most influential paper award in 2004 and 2014 respectively, the Best Demonstration Award in SIGMOD 2015, and the ACM SIGMOD 2016 Best Paper Award. He is an associate editor for IEEE TKDE and ACM TODS, and a member of SIGMOD Jim Gray Dissertation Award Selection Committee in 2017.
Bin Wu received his Bachelor’s degree from School of Computer Science, Fudan University in 2012. He is currently a PhD student in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He is expected to graduate in July 2017, and his dissertation is on approximate join processing by random sampling and its applications in large-scale database systems. He has received a SIGMOD Best Paper Award (2016).
Ke Yi is an Associate Professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. He obtained his Bachelor’s degree from Tsinghua University (2001) and PhD from Duke University (2006), both in computer science. His research spans theoretical computer science and database systems. He has received a Google Faculty Research Award (2010), the Young Investigator Research Award from HKUST (2012), a SIGMOD Best Demonstration Award (2015), and the SIGMOD Best Paper Award (2016). He currently serves as an Associate Editor of ACM Transactions on Database Systems and IEEE Transactions on Knowledge and Data Engineering.
Zhuoyue Zhao is a Ph.D. student in the School of Computing at University of Utah since August 2016. His advisor is Prof. Feifei Li. He is a member of the Data Group. His current research interest is in large-scale data management systems; in particular, approximate query processing and query optimization, OLTP and OLAP engines, and graph analytics and graph query processing systems.