Type: Research Highlight
Title: Juggling Functions Inside a Database
Mahmoud Abo Khamis, Hung Q. Ngo, Atri Rudra
Available in: PDF
We define and study the Functional Aggregate Query (FAQ) problem, which captures common computational tasks across a very wide range of domains including relational databases, logic, matrix and tensor computation, probabilistic graphical models, constraint satisfaction, and signal processing. Simply put, an FAQ is a declarative way of defining a new function from a database of input functions. We present InsideOut, a dynamic programming algorithm, to evaluate an FAQ. The algorithm rewrites the input query into a set of easier-to-compute FAQ sub-queries. Each sub-query is then evaluated using a worst-case optimal relational join algorithm. The topic of designing algorithms to optimally evaluate the classic multiway join problem has seen exciting developments in the past few years. Our framework tightly connects these new ideas in database theory with a vast number of application areas in a coherent manner, showing potentially that – with the ight abstraction, blurring the distinction between data and computation – a good database engine can be a general purpose constraint solver, relational data store, graphical model inference engine, and matrix/tensor computation processor all at once. The InsideOut algorithm is very simple, as shall be described in this paper. Yet, in spite of solving an extremely general problem, its runtime either is as good as or improves upon the best known algorithm for the applications that FAQ specializes to. These corollaries include computational tasks in graphical model inference, matrix/tensor operations, relational joins, and logic. Better yet, InsideOut can be used within any database engine, because it is basically a principled way of rewriting queries. Indeed, it is already part of the LogicBlox database engine, helping efficiently answer traditional database queries, graphical model inference queries, and train a large class of machine learning models inside the database itself.
Mahmoud Abo Khamis is currently a Computer Scientist at LogicBlox. He received his M.S. and Ph.D. degrees in Computer Science and Engineering from the State University of New York at Buffalo in 2012 and 2016 respectively. His research interests include join algorithms, query optimization, information theory, proof systems, and beyond worst-case analysis. He received PODS best paper award (2016) and UB CSE best dissertation award (2016).
Hung Q. Ngo was an assistant, associate, and full professor of Computer Science and Engineering at the State University of New York at Buffalo from 2001 to 2017. Since 2015, he has worked as a computer scientist at LogicBlox, designing and implementing in-database computation algorithms for solving database, logic, and machine learning problems. He received three best paper awards, at COCOON 2008, PODS 2012, and PODS 2016.
Atri Rudra is an Associate Professor of Computer Science and Engineering at SUNY Buffalo, State University of New York, Buffalo. Atri received his Bachelor’s degree from the Indian Institute of Technology, Kharagpur, India in 2000 and his PhD from University of Washington in 2007. From 2000 to 2002, he was a Research Staff Member at IBM India Research Lab, New Delhi, India. His research interests lie in theoretical computer science and in particular, theory of error-correcting codes, database algorithms, sub-linear algorithms, linear algebra, finite field theory and applications. He is a recipient of an NSF CAREER award (2009), HP Labs Innovation Research Award (2010), ESA best paper award (2010), UB Exceptional Scholars – Young Investigator award (2011), PODS best paper awards (2012 and 2016), IBM Faculty Award (2013) and UB SEAS Senior Faculty Teacher of the Year (2015).