Type: Research Highlight
Title: A Scalable Execution Engine for Package Queries
Matteo Brucato, Azza Abouzied, Alexandra Meliou
Available in: PDF
Many modern applications and real-world problems involve the design of item collections, or packages: from planning your daily meals all the way to mapping the universe. Despite the pervasive need for packages, traditional data management does not offer support for their definition and computation. This is because traditional database queries follow a powerful, but very simple model: a query defines constraints that each tuple in the result must satisfy. However, a system tasked with the design of packages cannot consider items independently; rather, the system needs to determine if a set of items collectively satisfy given criteria.
In this paper, we present package queries, a new query model that extends traditional database queries to handle complex constraints and preferences over answer sets. We develop a full-fledged package query system, implemented on top of a traditional database engine. Our work makes several contributions. First, we design PaQL, a SQL-based query language that supports the declarative specification of package queries. Second, we present a fundamental strategy for evaluating package queries that combines the capabilities of databases and constraint optimization solvers. The core of our approach is a set of translation rules that transform a package query to an integer linear program. Third, we introduce an offline data partitioning strategy allowing query evaluation to scale to large data sizes. Fourth, we introduce SKETCHREFINE, an efficient and scalable algorithm for package evaluation, which offers strong approximation guarantees. Finally, we present extensive experiments over real-world data. Our results demonstrate that SKETCHREFINE is effective at deriving high-quality package results, and achieves runtime performance that is an order of magnitude faster than directly using ILP solvers over large datasets.
Matteo Brucato is a PhD candidate at the University of Massachusetts, Amherst, advised by Professor Alexandra Meliou, and remotely co-advised by Professor Azza Abouzied (NYU Abu Dhabi). His research interests include Data Management, Information Retrieval, and Natural Language Processing, with a focus on query languages and algorithms for scalable query processing. His paper on scalable package queries was among the best papers at VLDB 2016. He interned at IBM Watson in summer 2016 and Microsoft Research in summer 2017. Prior to joining UMass, he received his M.S. and B.S. in Computer Science from the University of Bologna, Italy.
Azza Abouzied is an assistant professor at New York University, Abu Dhabi. Her research work focuses on designing intuitive and innovative data querying tools. Today’s technologies are helping people collect and produce data at phenomenal rates. Despite the abundance of data, it remains largely inaccessible due to the skill required to explore, query and analyze it in a non-trivial fashion. While many users know exactly what they are looking for, they have trouble expressing sophisticated queries in interfaces that require knowledge of a programming language or a query language. Azza designs novel interfaces such as example-driven query tools, that simplify data querying and analysis, and new database query paradigms such as the package query paradigm. Her research work combines techniques from various research fields such as UI-design, machine learning and databases. Azza Abouzied received her doctoral degree from Yale in 2013. She spent a year as a visiting scholar at UC Berkeley. She is also one of the co-founders of ÊHadaptÊÐ a Big Data analytics platform.
Alexandra Meliou is an Assistant Professor at the University of Massachusetts, Amherst. Before that, she was a Postdoctoral Research Associate at the University of Washington, working with Dan Suciu. Her research focuses on issues of data provenance, data quality, explanations, and causality. She is particularly interested in developing technologies that enhance our interactions with and understanding of data. She has served on many program committees, including as a PC Track Chair for SIGMOD 2016. She is the recipient of an NSF CAREER Award, a Google Faculty Research Award, and a Siebel Scholarship. She received her MS and PhD degrees from the University of California, Berkeley, and her BS degree from the National Technical University of Athens.