ABSTRACT
Event sequences, such as patients' medical histories or users' sequences of product reviews, trace how individuals progress over time. Identifying common patterns, or progression stages, in such event sequences is a challenging task because not every individual follows the same evolutionary pattern, stages may have very different lengths, and individuals may progress at different rates. In this paper, we develop a model-based method for discovering common progression stages in general event sequences. We develop a generative model in which each sequence belongs to a class, and sequences from a given class pass through a common set of stages, where each sequence evolves at its own rate. We then develop a scalable algorithm to infer classes of sequences, while also segmenting each sequence into a set of stages. We evaluate our method on event sequences, ranging from patients' medical histories to online news and navigational traces from the Web. The evaluation shows that our methodology can predict future events in a sequence, while also accurately inferring meaningful progression stages, and effectively grouping sequences based on common progression patterns. More generally, our methodology allows us to reason about how event sequences progress over time, by discovering patterns and categories of temporal evolution in large-scale datasets of events.
- Y. Ahn, J. Bagrow, and S. Lehmann. Link communities reveal multi-scale complexity in networks. Nature, 2010.Google Scholar
- I. Batal, D. Fradkin, J. Harrison, F. Moerchen, and M. Hauskrecht. Mining recent temporal patterns for event detection in multivariate time series data. In KDD, 2012. Google ScholarDigital Library
- L. Bergroth, H. Hakonen, and T. Raita. A survey of longest common subsequence algorithms. In String Processing and Information Retrieval, 2000. Google ScholarDigital Library
- D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Reserach, 2003. Google ScholarDigital Library
- E. Coviello, A. B. Chan, and G. R. G. Lanckriet. The variational hierarchical em algorithm for clustering hidden markov models. In NIPS, 2012.Google Scholar
- C. Danescu-Niculescu-Mizil, R. West, D. Jurafsky, J. Leskovec, and C. Potts. No country for old members: user lifecycle and linguistic change in online communities. In WWW, 2013. Google ScholarDigital Library
- P. Felzenszwalb, D. Huttenlocher, and J. Kleinberg. Fast algorithms for large state space HMMs with applications to web usage analysis. In NIPS, 2003.Google Scholar
- J. Kiernan and E. Terzi. Constructing comprehensive summaries of large event sequences. ACM Transaction on Knowledge Discovery from Data, 2009. Google ScholarDigital Library
- Y. Koren. Collaborative filtering with temporal dynamics. Communications of the ACM, 2010. Google ScholarDigital Library
- H. Kramer, Q. Nguyen, G. Curhan, and C. Hsu. Renal insufficiency in the absence of albuminuria and retinopathy among adults with type 2 diabetes mellitus. The Journal of the American Medical Association, 2003.Google ScholarCross Ref
- S. Laxman, V. Tankasali, and R. White. Stream prediction using a generative model based on frequent episodes in event sequences. In KDD, 2008. Google ScholarDigital Library
- N. Leeper, A. Bauer-Mehren, S. Iyer, P. LePendu, C. Olson, and N. Shah. Practice-based evidence: Profiling the safety of cilostazol by text-mining of clinical notes. PLoS ONE, 2013.Google ScholarCross Ref
- M. Liu and J. Qu. Mining high utility itemsets without candidate generation. In CIKM, 2012. Google ScholarDigital Library
- H. P. Lowe H., Ferris T. and W. S. STRIDE-an integrated standards-based translational research informatics platform. AMIA, 2009.Google Scholar
- Y. Matsubara, Y. Sakurai, C. Faloutsos, T. Iwata, and M. Yoshikawa. Fast mining and forecasting of complex time-stamped events. In KDD, 2012. Google ScholarDigital Library
- J. McAuley and J. Leskovec. From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews. In WWW, 2013. Google ScholarDigital Library
- D. Patnaik, S. Laxman, B. Chandramouli, and N. Ramakrishnan. Efficient episode mining of dynamic event streams. In ICDM, 2012. Google ScholarDigital Library
- J. Pei, H. Wang, J. Liu, K. Wang, J. Wang, and P. Yu. Discovering frequent closed partial orders from strings. IEEE Transactions on Knowledge and Data Engineering, 2006. Google ScholarDigital Library
- L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, 1989.Google ScholarCross Ref
- S. Scott. Bayesian methods for hidden markov models. Journal of the American Statistical Association, 2002.Google ScholarCross Ref
- D. Shahaf, J. Yang, C. Suen, J. Jacobs, H. Wang, and J. Leskovec. Information cartography: creating zoomable, large-scale maps of information. In KDD, 2013. Google ScholarDigital Library
- P. Smyth. Clustering sequences with hidden markov models. In NIPS, 1997.Google ScholarDigital Library
- C. Suen, S. Huang, C. Eksombatchai, R. Sosic, and J. Leskovec. Nifty: a system for large scale information flow tracking and clustering. In WWW, 2013. Google ScholarDigital Library
- N. Tatti and J. Vreeken. The long and the short of it: summarising event sequences with serial episodes. In KDD, 2012. Google ScholarDigital Library
- B. West and J. Leskovec. Human wayfinding in information networks. In WWW, 2012. Google ScholarDigital Library
- C.-W. Wu, Y.-F. Lin, P. Yu, and V. Tseng. Mining high utility episodes in complex event sequences. In KDD, 2013. Google ScholarDigital Library
Index Terms
- Finding progression stages in time-evolving event sequences
Recommendations
Discovery of Frequent Episodes in Event Sequences
Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering ...
SSRDVis: Interactive visualization for event sequences summarization and rare detection
AbstractThis paper presents SSRDVis, a visual approach to effectively summarize event sequences and interactively detect rare behaviors. SSRDVis is mainly composed of three components: (1) a sequence embedding module for learning effective feature vectors ...
Constructing comprehensive summaries of large event sequences
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data miningEvent sequences capture system and user activity over time. Prior research on sequence mining has mostly focused on discovering local patterns. Though interesting, these patterns reveal local associations and fail to give a comprehensive summary of the ...
Comments