skip to main content
10.1145/2003653.2003659acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Precise anytime clustering of noisy sensor data with logarithmic complexity

Published:21 August 2011Publication History

ABSTRACT

Clustering of streaming sensor data aims at providing online summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time) a sensitive restriction when designing stream clustering algorithms. Additionally, the varying speed of the stream is a natural characteristic of sensor data, e.g. changing the sampling rate upon detecting an event or for a certain time. In such cases, most clustering algorithms have to heavily restrict their model size such that they can handle the minimal time allowance. Recently the first anytime stream clustering algorithm has been proposed that flexibly uses all available time and dynamically adapts its model size. However, the method was not designed to precisely cluster sensor data which are usually noisy and extremely evolving. In this paper we detail the LiarTree algorithm that provides precise stream summaries and effectively handles noise, drift and novelty. We prove that the runtime of the LiarTree is logarithmic in the size of the maintained model opposed to a linear time complexity often observed in previous approaches. We demonstrate in an extensive experimental evaluation using synthetic and real sensor datasets that the LiarTree outperforms competing approaches in terms of the quality of the resulting summaries and exposes only a logarithmic time complexity.

References

  1. Physiological Sensor Dataset in PDMC (ICML 2004 workshop) http://www.cs.utexas.edu/~sherstov/pdmc/.Google ScholarGoogle Scholar
  2. C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB, pages 81--92, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, pages 914--925, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. L. Bowers, H. U. Gerber, J. C. Hickman, D. A. Jones, and C. J. Nesbitt. Actuarial Mathematics. Society of Actuaries, Itasca, IL, 1997.Google ScholarGoogle Scholar
  5. F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SDM, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  6. Y. Chen and L. Tu. Density-based clustering for real-time stream data. In KDD, pages 133--142, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. DeCoste. Anytime interval-valued outputs for kernel machines: Fast support vector machine classification via distance geometry. In ICML, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Hassani, E. Müller, and T. Seidl. EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers. In Proc. Sensor KDD 2009, pages 39--48, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Hettich and S. Bay. The UCI KDD archive http://kdd.ics.uci.edu, 1999.Google ScholarGoogle Scholar
  10. A. Jain, Z. Zhang, and E. Y. Chang. Adaptive non-linear clustering in data streams. In CIKM, pages 122--131, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Kranen, I. Assent, C. Baldauf, and T. Seidl. Self-adaptive anytime stream clustering. In IEEE ICDM, pages 249--258, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Kranen, I. Assent, C. Baldauf, and T. Seidl. The clustree: Indexing micro-clusters for anytime stream mining. In KAIS Journal, 2010.Google ScholarGoogle Scholar
  13. P. Kranen, S. Günnemann, S. Fries, and T. Seidl. MC-tree: Improving bayesian anytime classification. In 22nd SSDBM, Springer LNCS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. Kranen, F. Reidl, F. S. Villaamil, and T. Seidl. Hierarchical clustering for real-time stream data with noise. In SSDBM (to appear), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Kranen and T. Seidl. Harnessing the strengths of anytime algorithms for constant data streams. DMKD Journal (19)2, ECML PKDD Special Issue, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Lin and L. Chen. A grid and fractal dimension-based data stream clustering algorithm. In ISISE, volume 1, pages 66--70, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Seidl, I. Assent, P. Kranen, R. Krieger, and J. Herrmann. Indexing density models for incremental learning and anytime classification on data streams. In EDBT/ICDT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Shieh and E. Keogh. Polishing the right apple: Anytime classification also benefits data streams with constant arrival times. In Proc. of ICDM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Z. F. Siddiqui and M. Spiliopoulou. Combining multiple interrelated streams for incremental clustering. In SSDBM, pages 535--552, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proc. of the 7th ACM KDD, pages 377--382, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Ueno, X. Xi, E. J. Keogh, and D.-J. Lee. Anytime classification using the nearest neighbor algorithm with applications to stream mining. In ICDM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. of the 9th ACM KDD, pages 226--235, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Y. Yang, G. I. Webb, K. B. Korb, and K. M. Ting. Classifying under computational resource constraints: anytime classification using probabilistic estimators. Machine Learning, 69(1), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Ye, X. Wang, E. J. Keogh, and A. Mafra-Neto. Autocannibalistic and anyspace indexing algorithms with application to sensor data mining. In SDM, pages 85--96, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  25. T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In SIGMOD, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Precise anytime clustering of noisy sensor data with logarithmic complexity

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SensorKDD '11: Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data
      August 2011
      69 pages
      ISBN:9781450308328
      DOI:10.1145/2003653

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader