Type: Research Highlight
Title: k-Shape:Efficient and Accurate Clustering of Time Series
John Paparrizos, Luis Gravano
Available in: PDF
The proliferation and ubiquity of temporal data across many disciplines has generated substantial interest in the analysis and mining of time series. Clustering is one of the most popular data mining methods, not only due to its exploratory power, but also as a preprocessing step or subroutine for other techniques. In this paper, we describe k-Shape, a novel algorithm for time-series clustering. k-Shape relies on a scalable iterative refinement procedure, which creates homogeneous and well-separated clusters. As its distance measure, k-Shape uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them. Based on the properties of that distance measure, we develop a method to compute cluster centroids, which are used in every iteration to update the assignment of time series to clusters. An extensive experimental evaluation against partitional, hierarchical, and spectral clustering methods, with the most competitive distance measures, showed the robustness of k-Shape. Overall, k-Shape emerges as a domain-independent, highly accurate, and efficient clustering approach for time series with broad applications.
John Paparrizos is a Computer Science Ph.D. candidate at Columbia University. John received a B.S. degree in Computer Science from Aristotle University of Thessaloniki (AUTH), Greece, in 2009, and a M.S. degree in Computer Science from École Polytechnique Fédérale de Lausanne (EPFL), Switzerland, in 2011. His research interests are in data management, data mining, and information retrieval. John has done internships at Microsoft Research, Yahoo Labs, and Logitech; and he has received multiple fellowships, including recently the Alexander S. Onassis Foundation Fellowship.
Luis Gravano is a Professor of Computer Science at Columbia University. Luis joined Columbia in 1997. In 2001, Luis was a Senior Research Scientist at Google (on leave from Columbia). He received his Ph.D. degree in Computer Science from Stanford University in 1997. He also received an M.S. degree from Stanford in 1994 and a B.S. degree from the Escuela Superior Latinoamericana de Informática (ESLAI), Argentina, in 1991. His research interests are in databases, information retrieval, and web search. Luis is a recipient of an NSF CAREER award. He has received multiple best paper awards, including at the ACM SIGMOD 2006 and IEEE ICDE 2005 conferences.