research-article

Precise anytime clustering of noisy sensor data with logarithmic complexity

Authors:
Marwan Hassani

RWTH Aachen University, Germany

RWTH Aachen University, Germany
View Profile

,
Philipp Kranen

RWTH Aachen University, Germany

RWTH Aachen University, Germany
View Profile

,
Thomas Seidl

RWTH Aachen University, Germany

RWTH Aachen University, Germany
View Profile

SensorKDD '11: Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor DataAugust 2011Pages 52–60https://doi.org/10.1145/2003653.2003659

Published:21 August 2011Publication History

SensorKDD '11: Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data

Pages 52–60

ABSTRACT

Clustering of streaming sensor data aims at providing online summaries of the observed stream. This task is mostly done under limited processing and storage resources. This makes the sensed stream speed (data per time) a sensitive restriction when designing stream clustering algorithms. Additionally, the varying speed of the stream is a natural characteristic of sensor data, e.g. changing the sampling rate upon detecting an event or for a certain time. In such cases, most clustering algorithms have to heavily restrict their model size such that they can handle the minimal time allowance. Recently the first anytime stream clustering algorithm has been proposed that flexibly uses all available time and dynamically adapts its model size. However, the method was not designed to precisely cluster sensor data which are usually noisy and extremely evolving. In this paper we detail the LiarTree algorithm that provides precise stream summaries and effectively handles noise, drift and novelty. We prove that the runtime of the LiarTree is logarithmic in the size of the maintained model opposed to a linear time complexity often observed in previous approaches. We demonstrate in an extensive experimental evaluation using synthetic and real sensor datasets that the LiarTree outperforms competing approaches in terms of the quality of the resulting summaries and exposes only a logarithmic time complexity.

References

Physiological Sensor Dataset in PDMC (ICML 2004 workshop) http://www.cs.utexas.edu/~sherstov/pdmc/.Google Scholar
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB, pages 81--92, 2003. Google ScholarDigital Library
B. Arai, G. Das, D. Gunopulos, and N. Koudas. Anytime measures for top-k algorithms. In VLDB, pages 914--925, 2007. Google ScholarDigital Library
N. L. Bowers, H. U. Gerber, J. C. Hickman, D. A. Jones, and C. J. Nesbitt. Actuarial Mathematics. Society of Actuaries, Itasca, IL, 1997.Google Scholar
F. Cao, M. Ester, W. Qian, and A. Zhou. Density-based clustering over an evolving data stream with noise. In SDM, 2006.Google ScholarCross Ref
Y. Chen and L. Tu. Density-based clustering for real-time stream data. In KDD, pages 133--142, 2007. Google ScholarDigital Library
D. DeCoste. Anytime interval-valued outputs for kernel machines: Fast support vector machine classification via distance geometry. In ICML, 2002. Google ScholarDigital Library
M. Hassani, E. Müller, and T. Seidl. EDISKCO: energy efficient distributed in-sensor-network k-center clustering with outliers. In Proc. Sensor KDD 2009, pages 39--48, 2009. Google ScholarDigital Library
S. Hettich and S. Bay. The UCI KDD archive http://kdd.ics.uci.edu, 1999.Google Scholar
A. Jain, Z. Zhang, and E. Y. Chang. Adaptive non-linear clustering in data streams. In CIKM, pages 122--131, 2006. Google ScholarDigital Library
P. Kranen, I. Assent, C. Baldauf, and T. Seidl. Self-adaptive anytime stream clustering. In IEEE ICDM, pages 249--258, 2009. Google ScholarDigital Library
P. Kranen, I. Assent, C. Baldauf, and T. Seidl. The clustree: Indexing micro-clusters for anytime stream mining. In KAIS Journal, 2010.Google Scholar
P. Kranen, S. Günnemann, S. Fries, and T. Seidl. MC-tree: Improving bayesian anytime classification. In 22nd SSDBM, Springer LNCS, 2010. Google ScholarDigital Library
P. Kranen, F. Reidl, F. S. Villaamil, and T. Seidl. Hierarchical clustering for real-time stream data with noise. In SSDBM (to appear), 2011. Google ScholarDigital Library
P. Kranen and T. Seidl. Harnessing the strengths of anytime algorithms for constant data streams. DMKD Journal (19)2, ECML PKDD Special Issue, 2009. Google ScholarDigital Library
G. Lin and L. Chen. A grid and fractal dimension-based data stream clustering algorithm. In ISISE, volume 1, pages 66--70, 2008. Google ScholarDigital Library
T. Seidl, I. Assent, P. Kranen, R. Krieger, and J. Herrmann. Indexing density models for incremental learning and anytime classification on data streams. In EDBT/ICDT, 2009. Google ScholarDigital Library
J. Shieh and E. Keogh. Polishing the right apple: Anytime classification also benefits data streams with constant arrival times. In Proc. of ICDM, 2010. Google ScholarDigital Library
Z. F. Siddiqui and M. Spiliopoulou. Combining multiple interrelated streams for incremental clustering. In SSDBM, pages 535--552, 2009. Google ScholarDigital Library
W. N. Street and Y. Kim. A streaming ensemble algorithm (sea) for large-scale classification. In Proc. of the 7th ACM KDD, pages 377--382, 2001. Google ScholarDigital Library
K. Ueno, X. Xi, E. J. Keogh, and D.-J. Lee. Anytime classification using the nearest neighbor algorithm with applications to stream mining. In ICDM, 2006. Google ScholarDigital Library
H. Wang, W. Fan, P. S. Yu, and J. Han. Mining concept-drifting data streams using ensemble classifiers. In Proc. of the 9th ACM KDD, pages 226--235, 2003. Google ScholarDigital Library
Y. Yang, G. I. Webb, K. B. Korb, and K. M. Ting. Classifying under computational resource constraints: anytime classification using probabilistic estimators. Machine Learning, 69(1), 2007. Google ScholarDigital Library
L. Ye, X. Wang, E. J. Keogh, and A. Mafra-Neto. Autocannibalistic and anyspace indexing algorithms with application to sensor data mining. In SDM, pages 85--96, 2009.Google ScholarCross Ref
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In SIGMOD, 1996. Google ScholarDigital Library

Index Terms

Precise anytime clustering of noisy sensor data with logarithmic complexity
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Data stream clustering: A survey

Data stream mining is an active research area that has recently emerged to discover knowledge from large amounts of continuously generated data. In this context, several data stream clustering algorithms have been proposed to perform unsupervised ...
Read More
Self-Adaptive Anytime Stream Clustering
ICDM '09: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining

Clustering streaming data requires algorithms which are capable of updating clustering results for the incoming data. As data is constantly arriving, time for processing is limited. Clustering has to be performed in a single pass over the incoming data ...
Read More
Subspace anytime stream clustering
SSDBM '14: Proceedings of the 26th International Conference on Scientific and Statistical Database Management

Clustering of high dimensional streaming data is an emerging field of research. A real life data stream imposes many challenges on the clustering task, as an endless amount of data arrives constantly. A lot of research has been done in the full space ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SensorKDD '11: Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data
August 2011
69 pages
ISBN:9781450308328
DOI:10.1145/2003653
Conference Chairs:
Varun Chandola
Oak Ridge National Laboratory, TN
,
Olufemi A. Omitaomu
Oak Ridge National Laboratory, TN
,
Karsten Steinhaeuser
University of Minnesota, MN
,
Auroop R. Ganguly
Oak Ridge National Laboratory, TN
,
Joao Gama
University of Porto, Portugal
,
Ranga Raju Vatsavai
Oak Ridge National Laboratory, TN
,
Nitesh V. Chawla
University of Notre Dame, IN
,
Mohamed Medhat Gaber
Monash University, Australia
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
Conference
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 16
  Total Citations
  View Citations
- 185
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Precise anytime clustering of noisy sensor data with logarithmic complexity

SensorKDD '11: Proceedings of the Fifth International Workshop on Knowledge Discovery from Sensor Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Data stream clustering: A survey

Self-Adaptive Anytime Stream Clustering

Subspace anytime stream clustering