skip to main content
10.1145/1645953.1646000acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Efficient itemset generator discovery over a stream sliding window

Published: 02 November 2009 Publication History

Abstract

Mining generator patterns has raised great research interest in recent years. The main purpose of mining itemset generators is that they can form equivalence classes together with closed itemsets, and can be used to generate simple classification rules according to the MDL principle. In this paper, we devise an efficient algorithm called StreamGen to mine frequent itemset generators over a stream sliding window. We adopt a novel enumeration tree structure to help keep the information of mined generators and the border between generators and non-generators, and propose some optimization techniques to speed up the mining process. We further extend the algorithm to directly mine a set of high quality classification rules over stream sliding windows while keeping high performance. The extensive performance study shows that our algorithm outperforms other state-of-the-art algorithms which perform similar tasks in terms of both runtime and memory usage efficiency, and has high utility in terms of classification.

References

[1]
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pages 207--216, Washington, D.C., 1993. ACM Press.
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proceedings of 20th International Conference on Very Large Data Bases, pages 487--499, Santiago de Chile, Chile, 1994. Morgan Kaufmann.
[3]
M.-L. Antonie and O. R. Zaıane. Text document categorization by term association. In Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), pages 19--26, Maebashi City, Japan, 2002. IEEE Computer Society.
[4]
Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-redundant association rules using frequent closed itemsets. In Proceedings of the First International Conference on Computational Logic, pages 972--986, London, UK.
[5]
H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In Proceedings of the 23rd International Conference on Data Engineering, pages 716---725, Istanbul, Turkey, 2007. IEEE.
[6]
H. Cheng, X. Yan, J. Han, and P. S. Yu. Direct discriminative pattern mining for effective classification. In Proceedings of the 24th International Conference on Data Engineering, pages 169--178, Cancún, México, 2008. IEEE.
[7]
Y. Chi, H. Wang, P. S. Yu, and R. R. Muntz. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window. Knowl. Inf. Syst., 10(3):265--294, 2006.
[8]
G. Cong, K.-L. Tan, A. K. H. Tung, and X. Xu. Mining top-k covering rule groups for gene expression data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 670--681, Baltimore, Maryland, USA, 2005. ACM.
[9]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the Fifteen ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43--52, 1999.
[10]
C. Gao, J. Wang, Y. He, and L. Zhou. Efficient mining of frequent sequence generators. In Proceedings of the 17th International Conference on World Wide Web, pages 1051--1052, Beijing, China, 2008. ACM.
[11]
G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA, 2003. CEUR-WS.org.
[12]
J. Han, J. Pei, Y. Yin, and R. Mao. Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Min. Knowl. Discov., 8(1):53--87, 2004.
[13]
N. Jiang and L. Gruenwald. Cfi-stream: mining closed frequent itemsets in data streams. In Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 592--597, Philadelphia, PA, USA, 2006. ACM.
[14]
J. Li, G. Dong, and K. Ramamohanarao. Making use of the most expressive jumping emerging patterns for classification. Knowl. Inf. Syst., 3(2):131---145, 2001.
[15]
J. Li, H. Li, L. Wong, J. Pei, and G. Dong. Minimum description length principle: Generators are preferable to closed patterns. In Proceedings of the Twenty-First National Conference on Artificial Intelligence, Boston, Massachusetts, USA, 2006. AAAI Press.
[16]
J. Li, G. Liu, and L. Wong. Mining statistically important equivalence classes and delta-discriminative emerging patterns. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 430--439, San Jose, California, USA, 2007.
[17]
W. Li, J. Han, and J. Pei. Cmar: Accurate and efficient classification based on multiple class-association rules. In Proceedings of the 2001 IEEE International Conference on Data Mining, pages 369--376, San Jose, California, USA, 2001. IEEE Computer Society.
[18]
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of the Fourteen ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 80--86, 1998.
[19]
G. Liu, H. Lu, W. Lou, and J. X. Yu. On computing, storing and querying frequent patterns. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 607--612, Washington, DC, USA, 2003. ACM.
[20]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th International Conference on Database Theory, pages 398--416, Jerusalem, Israel, 1999. Springer.
[21]
J. Pei, J. Han, and R. Mao. Closet: An efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21--30, 2000.
[22]
J. R. Quinlan and R. M. Cameron-Jones. Foil: A midterm report. In Machine Learning: ECML-93, European Conference on Machine Learning, pages 3--20, Vienna, Austria, 1993. Springer.
[23]
A. Veloso, W. M. Jr., M. de Carvalho, B. Pôssas, S. Parthasarathy, and M. J. Zaki. Mining frequent itemsets in evolving databases. In Proceedings of the 2002 SIAM International Conference on Data Mining, Arlington, VA, USA, 2002. SIAM.
[24]
A. Veloso, W. M. Jr., and M. J. Zaki. Lazy associative classification. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), pages 645--654, Hong Kong, China, 2006. IEEE Computer Society.
[25]
J. Wang, J. Han, and J. Pei. Closet: searching for the best strategies for mining frequent closed itemsets. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 236--245, Washington, DC, USA, 2003. ACM.
[26]
J. Wang and G. Karypis. On mining instance-centric classification rules. IEEE Trans. Knowl. Data Eng., 18(11):1497--1511, 2006.
[27]
L. Xu and K. Xie. An incremental algorithm for mining generators representation. In PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 701--708, Porto, Portugal, 2005. Springer.
[28]
J. X. Yu, Z. Chong, H. Lu, and A. Zhou. False positive or false negative: Mining frequent itemsets from high speed transactional data streams. In Proceedings of the Thirtieth International Conference on Very Large Data Bases, pages 204--215, Toronto, Canada, 2004. Morgan Kaufmann.
[29]
M. J. Zaki and C.-J. Hsiao. Charm: An efficient algorithm for closed itemset mining. In Proceedings of the 2002 SIAM International Conference on Data Mining, Arlington, VA, USA, 2002. SIAM.

Cited By

View all
  • (2023)Frequent Subgraph Mining in Dynamic Databases2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386736(5733-5742)Online publication date: 15-Dec-2023
  • (2023)Mining frequent generators and closures in data streams with FGC-StreamKnowledge and Information Systems10.1007/s10115-023-01852-365:8(3295-3335)Online publication date: 3-Apr-2023
  • (2021)FGC-Stream: A novel joint miner for frequent generators and closed itemsets in data streams2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00053(419-428)Online publication date: Dec-2021
  • Show More Cited By

Index Terms

  1. Efficient itemset generator discovery over a stream sliding window

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
    November 2009
    2162 pages
    ISBN:9781605585123
    DOI:10.1145/1645953
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 November 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classification
    2. feature selection
    3. itemset generator
    4. sliding window
    5. stream data

    Qualifiers

    • Research-article

    Conference

    CIKM '09
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Frequent Subgraph Mining in Dynamic Databases2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386736(5733-5742)Online publication date: 15-Dec-2023
    • (2023)Mining frequent generators and closures in data streams with FGC-StreamKnowledge and Information Systems10.1007/s10115-023-01852-365:8(3295-3335)Online publication date: 3-Apr-2023
    • (2021)FGC-Stream: A novel joint miner for frequent generators and closed itemsets in data streams2021 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM51629.2021.00053(419-428)Online publication date: Dec-2021
    • (2020)CICLAD: A Fast and Memory-efficient Closed Itemset Miner for StreamsProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403232(1810-1818)Online publication date: 23-Aug-2020
    • (2018)Key roles of closed sets and minimal generators in concise representations of frequent patternsIntelligent Data Analysis10.5555/2595513.259551716:4(581-631)Online publication date: 27-Dec-2018
    • (2017)Incremental Frequent Subgraph Mining on Large Evolving GraphsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.274307529:12(2710-2723)Online publication date: 1-Dec-2017
    • (2015)Fast Rule-Based Prediction of Data Streams Using Associative Classification Mining2015 5th International Conference on IT Convergence and Security (ICITCS)10.1109/ICITCS.2015.7292983(1-5)Online publication date: Aug-2015
    • (2015)Reliable Early Classification on Multivariate Time Series with Numerical and Categorical AttributesAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-18038-0_16(199-211)Online publication date: 17-Apr-2015
    • (2013)Mining Frequent Rooted Ordered Tree Generators EfficientlyProceedings of the 2013 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery10.1109/CyberC.2013.29(132-139)Online publication date: 10-Oct-2013
    • (2011)Efficient Mining of Closed Sequential Patterns on Stream Sliding WindowProceedings of the 2011 IEEE 11th International Conference on Data Mining10.1109/ICDM.2011.61(1044-1049)Online publication date: 11-Dec-2011
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media