skip to main content
10.1145/1167350.1167384acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
Article

A study of the effects of bias in criterion functions for temporal data clustering

Published: 18 March 2005 Publication History

Abstract

In this paper, we study the bias associated with modeling methods and criterion functions used in temporal data clustering. In particular, we experimentally study two approaches on clustering discrete valued uni-variate temporal data. The first approach uses Markov chain models to capture the temporal relations encoded in data. The similarity between two sequences is computed as the average sequence to model likelihood. The second approach is distance based where Levenshtein string edit distance is applied to compute the edit distance between two sequences. Experiments are performed using these two approaches on web user data and on CS student online lab performance data. The characteristics of clustering results obtained from the two approaches are analyzed and recommendation about the suitable application for each approach is given.

References

[1]
Cadez, I., Heckerman, D. and Meek, C., Smyth. P., and White, S., Visualization of Navigation Patterns on a Web Site using Model Based Clustering, Data Mining and Knowledge Discovery, 7(4), 399--424, 2003.
[2]
Chib, S., Marginal Likelihood from the Gibbs Sampling. Journal of the American Statistical Association, 1313--1321, December 1995.
[3]
Chudova, D., Smyth, P., Sequences and Strings: Pattern Discovery in Sequences under a Markov Assumption, in Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 153--162, 2002.
[4]
Fisher, D., Data Mining Tasks and Methods: Clustering: Conceptual Clustering, Handbook of Data Mining and Knowledge Discovery, 388--396, 2002.
[5]
Heckerman, D., Geiger, D., and Chickering, D. M., A Tutorial on Learning with Bayesian Networks, Machine Learning, 20:197--243, 1995.
[6]
Huhtala, Y., Karkkainen, J., and Toivonen, H. and Nevanlinna R., Mining for Similarity in Aligned Time Series using Wavelets, in Proceedings of SPIE on Data Mining and Knowledge Discovery: Theory, Tools, and Technology, edited by B. V. Dasarathy, 142--149, 1999.
[7]
Jain, A. K., Murty, M. N., and Flynn, P. J., Data Clustering: A Review, ACM Computing Surveys (CSUR), 31(3), 264--323, September 1999.
[8]
Jain, A. K. and Dubes, D. C., Algorithms for Clustering Data, Prentice Hall, 1988.
[9]
Keogh, E., and Kasetty, S., On the need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration, Data Mining and Knowledge Discovery, 7(4), 349--371, 2003.
[10]
Li, C., Biswas, G., Dale, M., and Dale, P. Matryoshka: A Hidden Markov Model Based Temporal Data Clustering Methodology for Modeling System Dynamics, Intelligent Data Analysis, 6(3), 281--308, 2002.
[11]
Li, C. A Bayesian Approach to Temporal Data Clustering using the Hidden Markov Model Methodology, PhD thesis, Vanderbilt University, December 2000.
[12]
Manganaris, S., Supervised Classification with Temporal Data, PhD Dissertation, Vanderbilt University, 1997.
[13]
Okuda; T., Tanara, E., and Kasai, T., A Method for the Correction of Garbled Words Based on the Levenshtein Metric, IEEE Transactions on Computers, C25, 172--177, 1976.
[14]
Oommen, B. J. and Loke, R. K., Pattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions, Pattern Recognition, 30(5), 789--800, 1997.
[15]
Oates, T., Firoiu, L., and Cohen, P. R., Clustering Time Series with Hidden Markov Models and Dynamic Time Warping, in the Proceedings of the IJCAI-99 Workshop on Robot Action Planning, 1999.
[16]
Rabiner, L. R., A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceedings of the IEEE, 77(2), 257--285, 1989.
[17]
Ramoni, M., Sebastiani, P., and Cohen, P., Bayesian Clustering by Dynamics, Machine Learning 47 (1), 91--121, Aril 2002
[18]
Saito, N., Local Discriminant Bases And Their Applications, Journal of Mathematical Imaging and Vision, 4(5), 337-, 1995.
[19]
Seni, G., Kripasundar, V., and Srihari, R. K., Generalized Edit Distance to Incorporate Domain Information: Handwritten Text Recognition As A Case Study, Pattern Recognition, 29(3), 405--413, 1996.
[20]
Schwarz, G., Estimating the dimension of a model, Annuals of Statistics, 6, 461--464, 1978.
[21]
Stevens, P., Soller, A., Cooper, M., and Sprang, M. Modeling the Development of Problem-Solving Skills in Chemistry with a Web-Based Tutor. In Proceedings of the 7th International Conference on Intelligent Tutoring Systems (ITS 2004), Maceio-Alagoas -- Brasil, 2004.
[22]
Wagner. R. A. and Fisher, M. J., The String to String Correction Problem, Journal of Association of Computer Machine, 21, 168--173, 1974.
[23]
Yoo, J. P., Li, C., and Pettey, C., Adaptive Teaching Strategy for Online Learning, to appear in Proceedings of the International Conference on Intelligent User Interface, San Diego, California, Jan, 2005.

Cited By

View all
  • (2006)Modeling student online learning using clusteringProceedings of the 44th annual ACM Southeast Conference10.1145/1185448.1185490(186-191)Online publication date: 10-Mar-2006

Index Terms

  1. A study of the effects of bias in criterion functions for temporal data clustering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ACMSE '05 vol 1: Proceedings of the 43rd annual ACM Southeast Conference - Volume 1
    March 2005
    408 pages
    ISBN:1595930590
    DOI:10.1145/1167350
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 March 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Levenshtein distance
    2. Markov chain clustering
    3. clustering analysis
    4. clustering applications
    5. similarity measures
    6. temporal data clustering

    Qualifiers

    • Article

    Conference

    ACM SE05
    Sponsor:
    ACM SE05: ACM Southeast Regional Conference 2005
    March 18 - 20, 2005
    Georgia, Kennesaw

    Acceptance Rates

    Overall Acceptance Rate 502 of 1,023 submissions, 49%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2006)Modeling student online learning using clusteringProceedings of the 44th annual ACM Southeast Conference10.1145/1185448.1185490(186-191)Online publication date: 10-Mar-2006

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media