skip to main content
10.1145/2566486.2568044acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

Finding progression stages in time-evolving event sequences

Published:07 April 2014Publication History

ABSTRACT

Event sequences, such as patients' medical histories or users' sequences of product reviews, trace how individuals progress over time. Identifying common patterns, or progression stages, in such event sequences is a challenging task because not every individual follows the same evolutionary pattern, stages may have very different lengths, and individuals may progress at different rates. In this paper, we develop a model-based method for discovering common progression stages in general event sequences. We develop a generative model in which each sequence belongs to a class, and sequences from a given class pass through a common set of stages, where each sequence evolves at its own rate. We then develop a scalable algorithm to infer classes of sequences, while also segmenting each sequence into a set of stages. We evaluate our method on event sequences, ranging from patients' medical histories to online news and navigational traces from the Web. The evaluation shows that our methodology can predict future events in a sequence, while also accurately inferring meaningful progression stages, and effectively grouping sequences based on common progression patterns. More generally, our methodology allows us to reason about how event sequences progress over time, by discovering patterns and categories of temporal evolution in large-scale datasets of events.

References

  1. Y. Ahn, J. Bagrow, and S. Lehmann. Link communities reveal multi-scale complexity in networks. Nature, 2010.Google ScholarGoogle Scholar
  2. I. Batal, D. Fradkin, J. Harrison, F. Moerchen, and M. Hauskrecht. Mining recent temporal patterns for event detection in multivariate time series data. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Reserach, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Coviello, A. B. Chan, and G. R. G. Lanckriet. The variational hierarchical em algorithm for clustering hidden markov models. In NIPS, 2012.Google ScholarGoogle Scholar
  6. C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts. No country for old members: user lifecycle and linguistic change in online communities. In WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Felzenszwalb, D. Huttenlocher, and J. Kleinberg. Fast algorithms for large state space HMMs with applications to web usage analysis. In NIPS, 2003.Google ScholarGoogle Scholar
  8. J. Kiernan and E. Terzi. Constructing comprehensive summaries of large event sequences. ACM Transaction on Knowledge Discovery from Data, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Koren. Collaborative filtering with temporal dynamics. Communications of the ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Kramer, Q. Nguyen, G. Curhan, and C. Hsu. Renal insufficiency in the absence of albuminuria and retinopathy among adults with type 2 diabetes mellitus. The Journal of the American Medical Association, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Laxman, V. Tankasali, and R. White. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Leeper, A. Bauer-Mehren, S. Iyer, P. LePendu, C. Olson, and N. Shah. Practice-based evidence: Profiling the safety of cilostazol by text-mining of clinical notes. PLoS ONE, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Liu and J. Qu. Mining high utility itemsets without candidate generation. In CIKM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. P. Lowe H., Ferris T. and W. S. STRIDE-an integrated standards-based translational research informatics platform. AMIA, 2009.Google ScholarGoogle Scholar
  15. Y. Matsubara, Y. Sakurai, C. Faloutsos, T. Iwata, and M. Yoshikawa. Fast mining and forecasting of complex time-stamped events. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. McAuley and J. Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Patnaik, S. Laxman, B. Chandramouli, and N. Ramakrishnan. Efficient episode mining of dynamic event streams. In ICDM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Pei, H. Wang, J. Liu, K. Wang, J. Wang, and P. Yu. Discovering frequent closed partial orders from strings. IEEE Transactions on Knowledge and Data Engineering, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Scott. Bayesian methods for hidden markov models. Journal of the American Statistical Association, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  21. D. Shahaf, J. Yang, C. Suen, J. Jacobs, H. Wang, and J. Leskovec. Information cartography: creating zoomable, large-scale maps of information. In KDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. P. Smyth. Clustering sequences with hidden markov models. In NIPS, 1997.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. Suen, S. Huang, C. Eksombatchai, R. Sosic, and J. Leskovec. Nifty: a system for large scale information flow tracking and clustering. In WWW, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. N. Tatti and J. Vreeken. The long and the short of it: summarising event sequences with serial episodes. In KDD, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. West and J. Leskovec. Human wayfinding in information networks. In WWW, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C.-W. Wu, Y.-F. Lin, P. Yu, and V. Tseng. Mining high utility episodes in complex event sequences. In KDD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Finding progression stages in time-evolving event sequences

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '14: Proceedings of the 23rd international conference on World wide web
      April 2014
      926 pages
      ISBN:9781450327442
      DOI:10.1145/2566486

      Copyright © 2014 Copyright is held by the International World Wide Web Conference Committee (IW3C2).

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 7 April 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WWW '14 Paper Acceptance Rate84of645submissions,13%Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader