ACM Home Page
Please provide us with feedback. Feedback
A dimensionality reduction technique for efficient similarity analysis of time series databases
Full text PdfPdf (243 KB)
Source Conference on Information and Knowledge Management archive
Proceedings of the thirteenth ACM international conference on Information and knowledge management table of contents
Washington, D.C., USA
POSTER SESSION: Posters P-1 table of contents
Pages: 160 - 161  
Year of Publication: 2004
ISBN:1-58113-874-1
Authors
Vasileios Megalooikonomou  Temple University, Philadelphia, PA
Guo Li  Temple University, Philadelphia, PA
Qiang Wang  Temple University, Philadelphia, PA
Sponsors
SIGIR: ACM Special Interest Group on Information Retrieval
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 5,   Downloads (12 Months): 73,   Citation Count: 2
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1031171.1031203
What is a DOI?

ABSTRACT

Efficiently searching for similarities among time series and discovering interesting patterns is an important and non-trivial problem with applications in many domains. The high dimensionality of the data makes the analysis very challenging. To solve this problem, many dimensionality reduction methods have been proposed. PCA (Piecewise Constant Approximation) and its variant have been shown efficient in time series indexing and similarity retrieval. However, in certain applications, too many false alarms introduced by the approximation may reduce the overall performance dramatically. In this paper, we introduce a new piecewise dimensionality reduction technique that is based on Vector Quantization. The new technique, PVQA (Piecewise Vector Quantized Approximation), partitions each sequence into equi-length segments and uses vector quantization to represent each segment by the closest (based on a distance metric) codeword from a codebook of key-sequences. The efficiency of calculations is improved due to the significantly lower dimensionality of the new representation. We demonstrate the utility and efficiency of the proposed technique on real and simulated datasets. By exploiting prior knowledge about the data, the proposed technique generally outperforms PCA and its variants in similarity searches.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
 
2
Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra, S. (2000). "Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases", Knowledge and Information Systems 3(3): 263--286.
 
3
Lin, J., Keogh, E., Patel, P. & Lonardi, S. (2002). "Finding motifs in time series", 2nd Workshop on Temporal Data Mining at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23-26. Edmonton, Alberta, Canada.
 
4
Lloyd, S. P. (1982). "Least squares quantization in PCM", IEEE Transactions on Information Theory, IT(28), pp. 127--135.
 
5
Stanford Genomic Resources. http://genome-www.stanford.edu/nci60
 
6
UCI KDD Archive. http://kdd.ics.uci.edu
 
7


Collaborative Colleagues:
Vasileios Megalooikonomou: colleagues
Guo Li: colleagues
Qiang Wang: colleagues