skip to main content
10.1145/3035918.3064050acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Dynamic Density Based Clustering

Published:09 May 2017Publication History

ABSTRACT

Dynamic clustering---how to efficiently maintain data clusters along with updates in the underlying dataset---is a difficult topic. This is especially true for density-based clustering, where objects are aggregated based on transitivity of proximity, under which deciding the cluster(s) of an object may require the inspection of numerous other objects. The phenomenon is unfortunate, given the popular usage of this clustering approach in many applications demanding data updates.

Motivated by the above, we investigate the algorithmic principles for dynamic clustering by DBSCAN, a successful representative of density-based clustering, and ρ-approximate DBSCAN, proposed to bring down the computational hardness of the former on static data. Surprisingly, we prove that the ρ-approximate version suffers from the very same hardness when the dataset is fully dynamic, namely, when both insertions and deletions are allowed. We also show that this issue goes away as soon as tiny further relaxation is applied, yet still ensuring the same quality---known as the ``sandwich guarantee''---of ρ-approximate DBSCAN. Our algorithms guarantee near-constant update processing, and outperform existing approaches by a factor over two orders of magnitude.

References

  1. M. Ankerst, M. M. Breunig, H. Kriegel, and J. Sander. OPTICS: ordering points to identify the clustering structure. In SIGMOD, pages 49--60, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Arya, D. M. Mount, N. S. Netanyahu, R. Silverman, and A. Y. Wu. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. JACM, 45(6):891--923, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Beckmann, H. Kriegel, R. Schneider, and B. Seeger. The R*-tree: An efficient and robust access method for points and rectangles. In SIGMOD, pages 322--331, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In ICDM, pages 328--339, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  5. T. M. Chan. A dynamic data structure for 3-d convex hulls and 2-d nearest neighbor queries. JACM, 57(3), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Erickson. On the relative complexities of some geometric problems. In CCCG, pages 85--90, 1995.Google ScholarGoogle Scholar
  7. J. Erickson. New lower bounds for Hopcroft's problem. Disc. & Comp. Geo., 16(4):389--418, 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Ester, H. Kriegel, J. Sander, M. Wimmer, and X. Xu. Incremental clustering for mining in a data warehousing environment. In VLDB, pages 323--333, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In SIGKDD, pages 226--231, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Gan and Y. Tao. DBSCAN revisited: Mis-claim, un-fixability, and approximation. In SIGMOD, pages 519--530, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Gunawan. A faster algorithm for DBSCAN. Master's thesis, Technische University Eindhoven, March 2013.Google ScholarGoogle Scholar
  12. A. Guttman. R-trees: a dynamic index structure for spatial searching. In SIGMOD, pages 47--57, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Han, M. Kamber, and J. Pei. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Holm, K. de Lichtenberg, and M. Thorup. Poly-logarithmic deterministic fully-dynamic algorithms for connectivity, minimum spanning tree, 2-edge, and biconnectivity. JACM, 48(4):723--760, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Lühr and M. Lazarescu. Incremental clustering of dynamic data streams using connectivity based representative points. 68(1):1--27, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. M. Mount and E. Park. A dynamic data structure for approximate range searching. In SoCG, pages 247--256, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Nassar, J. Sander, and C. Cheng. Incremental and effective data summarization for dynamic hierarchical clustering. In SIGMOD, pages 467--478, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Nittel, K. T. Leung, and A. Braverman. Scaling clustering algorithms for massive data sets using data streams. In ICDE, page 830, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I. Ntoutsi, A. Zimek, T. Palpanas, P. Kröger, and H. Kriegel. Density-based projected clustering over high dimensional data streams. In ICDM, pages 987--998, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  20. R. G. Pensa, D. Ienco, and R. Meo. Hierarchical co-clustering: off-line and incremental approaches. Data Min. Knowl. Discov., 28(1):31--64, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Singh and A. Awekar. Incremental shared nearest neighbor density-based clustering. In CIKM, pages 1533--1536, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P.-N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. E. Tarjan. Efficiency of a good but not linear set union algorithm. JACM, 22(2):215--225, 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Dynamic Density Based Clustering

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
      May 2017
      1810 pages
      ISBN:9781450341974
      DOI:10.1145/3035918

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 May 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader