Type: Research Highlight
Title: Data partitioning for single-round multi-join evaluation in massively parallel system
Tom J. Ameloot, Gaetano Geck, Bas Ketsman, Frank Neven, Thomas Schwentick
Available in: PDF
A dominant cost for query evaluation in modern massively distributed systems is the number of communication rounds. For this reason, there is a growing interest in single-round multiway join algorithms where data is first reshuffled over many servers and then evaluated in a parallel but communication-free way. The reshuffling itself is specified as a distribution policy. We introduce a correctness condition, called parallel-correctness, for the evaluation of queries w.r.t. a distribution policy. We provide a semantical characterization for when conjunctive queries (and extensions thereof) are parallel-correct and give matching complexity bounds for the associated decision problem. Motivated by scenarios for workload optimization, we further consider the problem of parallel-correctness transfer from a query Q to a query Q0, that is, whether Q0 is parallelcorrect for all distribution policies for which Q is parallelcorrect. In this case, Q0 can always be evaluated after Q without repartitioning the data. We provide a semantical characterization for parallel-correctness transfer and provide matching complexity bounds for the associated decision problem for conjunctive queries (and extensions). Finally, we investigate restrictions of queries and families of distribution policies with better complexities, including, for instance, the Hypercube distributions.
Tom J. Ameloot is currently a Postdoctoral Fellow in the Databases and Theoretical Computer Science research group at Hasselt University (Belgium). Tom obtained his PhD on the topic of declarative networking under the supervision of Jan Van den Bussche. In his research, Tom likes to make theoretical insights for various research topics.
Gaetano Geck is a PhD student at TU Dortmund University, where he received his master’s degree in computer science. He is interested in foundational theoretical aspects of distributed databases and distributed query evaluation in particular.
Bas Ketsman is a PhD student at Hasselt University (Belgium) under the supervision of Prof. Frank Neven. He is a PhD Fellow of the Research Foundation – Flanders (FWO) and a member of the research group Databases and Theoretical Computer Science. His research interests include database theory and big data computations.
Frank Neven is a full professor at Hasselt University (Belgium) where he also received his PhD in 1999 under the supervision of Jan Van den Bussche. His research interests include the theory and practice of databases with a strong interest in automata and logic. He received three PODS best paper awards and currently is the editor of the Database Principles Column of SIGMOD Record.
Thomas Schwentick works in several areas where Mathematical Logic meets Computer Science, particularly in Database Theory. He often studies questions that concern the expressiveness of logical languages together with the computational complexity of related algorithmic problems.
He is Professor of Theoretical Computer Science at TU Dortmund University (Germany). He graduated in Mathematics at the Johannes Gutenberg University in Mainz (Germany), where he also finished his PhD and his habilitation. Before he moved to his current position, he had professor positions in Jena and Marburg. He was PC chair of PODS 2011 and PC co-chair of ICDT 2007, STACS 2010 and STACS 2011. He is a member of the ICDTCouncil, the EATCS Council, and the Academia Europaeae.