research-article

Dynamic Density Based Clustering

Authors:
Junhao Gan

University of Queensland, Brisbane, Australia

University of Queensland, Brisbane, Australia
View Profile

,
Yufei Tao

University of Queensland, Brisbane, Australia

University of Queensland, Brisbane, Australia
View Profile

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of DataMay 2017Pages 1493–1507https://doi.org/10.1145/3035918.3064050

Published:09 May 2017Publication History

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

Pages 1493–1507

ABSTRACT

Dynamic clustering---how to efficiently maintain data clusters along with updates in the underlying dataset---is a difficult topic. This is especially true for density-based clustering, where objects are aggregated based on transitivity of proximity, under which deciding the cluster(s) of an object may require the inspection of numerous other objects. The phenomenon is unfortunate, given the popular usage of this clustering approach in many applications demanding data updates.

Motivated by the above, we investigate the algorithmic principles for dynamic clustering by DBSCAN, a successful representative of density-based clustering, and ρ-approximate DBSCAN, proposed to bring down the computational hardness of the former on static data. Surprisingly, we prove that the ρ-approximate version suffers from the very same hardness when the dataset is fully dynamic, namely, when both insertions and deletions are allowed. We also show that this issue goes away as soon as tiny further relaxation is applied, yet still ensuring the same quality---known as the ``sandwich guarantee''---of ρ-approximate DBSCAN. Our algorithms guarantee near-constant update processing, and outperform existing approaches by a factor over two orders of magnitude.

References

M. Ankerst, M. M. Breunig, H. Kriegel, and J. Sander. OPTICS: ordering points to identify the clustering structure. In SIGMOD, pages 49--60, 1999. Google ScholarDigital Library
S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. JACM, 45(6):891--923, 1998. Google ScholarDigital Library
N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990. Google ScholarDigital Library
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In ICDM, pages 328--339, 2006.Google ScholarCross Ref
T. M. Chan. A dynamic data structure for 3-d convex hulls and 2-d nearest neighbor queries. JACM, 57(3), 2010. Google ScholarDigital Library
J. Erickson. On the relative complexities of some geometric problems. In CCCG, pages 85--90, 1995.Google Scholar
J. Erickson. New lower bounds for Hopcroft's problem. Disc. & Comp. Geo., 16(4):389--418, 1996.Google ScholarDigital Library
M. Ester, H. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In VLDB, pages 323--333, 1998. Google ScholarDigital Library
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In SIGKDD, pages 226--231, 1996. Google ScholarDigital Library
J. Gan and Y. Tao. DBSCAN revisited: Mis-claim, un-fixability, and approximation. In SIGMOD, pages 519--530, 2015. Google ScholarDigital Library
A. Gunawan. A faster algorithm for DBSCAN. Master's thesis, Technische University Eindhoven, March 2013.Google Scholar
A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984. Google ScholarDigital Library
J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2012. Google ScholarDigital Library
J. Holm, K. de Lichtenberg, and M. Thorup. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. JACM, 48(4):723--760, 2001. Google ScholarDigital Library
S. Lühr and M. Lazarescu. Incremental clustering of dynamic data streams using connectivity based representative points. 68(1):1--27, 2009. Google ScholarDigital Library
D. M. Mount and E. Park. A dynamic data structure for approximate range searching. In SoCG, pages 247--256, 2010. Google ScholarDigital Library
S. Nassar, J. Sander, and C. Cheng. Incremental and effective data summarization for dynamic hierarchical clustering. In SIGMOD, pages 467--478, 2004. Google ScholarDigital Library
S. Nittel, K. T. Leung, and A. Braverman. Scaling clustering algorithms for massive data sets using data streams. In ICDE, page 830, 2004. Google ScholarDigital Library
I. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, and H. Kriegel. Density-based projected clustering over high dimensional data streams. In ICDM, pages 987--998, 2012.Google ScholarCross Ref
R. G. Pensa, D. Ienco, and R. Meo. Hierarchical co-clustering: off-line and incremental approaches. Data Min. Knowl. Discov., 28(1):31--64, 2014. Google ScholarDigital Library
S. Singh and A. Awekar. Incremental shared nearest neighbor density-based clustering. In CIKM, pages 1533--1536, 2013. Google ScholarDigital Library
P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson, 2006. Google ScholarDigital Library
R. E. Tarjan. Efficiency of a good but not linear set union algorithm. JACM, 22(2):215--225, 1975. Google ScholarDigital Library

Index Terms

Dynamic Density Based Clustering
1. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Data structures and algorithms for data management

Recommendations

Integration of particle swarm optimization and genetic algorithm for dynamic clustering

Although the algorithms for cluster analysis are continually improving, most clustering algorithms still need to set the number of clusters. Thus, this study proposes a novel dynamic clustering approach based on particle swarm optimization (PSO) and ...
Read More
Density-based semi-supervised clustering

Semi-supervised clustering methods guide the data partitioning and grouping process by exploiting background knowledge, among else in the form of constraints. In this study, we propose a semi-supervised density-based clustering method. Density-based ...
Read More
Dynamic clustering using combinatorial particle swarm optimization

Combinatorial Particle Swarm Optimization (CPSO) is a relatively recent technique for solving combinatorial optimization problems. CPSO has been used in different applications, e.g., partitional clustering and project scheduling problems, and it has ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
General Chairs:
Rada Chirkova
North Carolina State University, USA
,
Jun Yang
Duke University, USA
,
Program Chair:
Dan Suciu
University of Washington, USA
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
algorithms
approximate dbscan
dynamic clustering
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 23
  Total Citations
  View Citations
- 1,019
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)5
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Dynamic Density Based Clustering

SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Integration of particle swarm optimization and genetic algorithm for dynamic clustering

Density-based semi-supervised clustering

Dynamic clustering using combinatorial particle swarm optimization