skip to main content
10.1145/2534645.2534653acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Novel parallel method for mining frequent patterns on multi-core shared memory systems

Authors Info & Claims
Published:18 November 2013Publication History

ABSTRACT

Frequent pattern mining is an important problem in data mining with many practical applications. Current parallel methods for mining frequent patterns unstably perform for different database types and under-utilize the benefits of multi-core shared memory machines. We present ShaFEM, a novel parallel frequent pattern mining method, to address these issues. Our method can dynamically adapt to the data characteristics to efficiently perform on both sparse and dense databases. Its parallel mining lock free approach minimizes the synchronization needs and maximizes the data independence to enhance the scalability. Its structure lends itself well for dynamic job scheduling resulting in well-balanced load on new multi-core shared memory architectures. We evaluate ShaFEM on a 12-core multi-socket server and find that our method runs 2.1--5.8 times faster than the state-of-the-art parallel method. For some test cases, we have shown that ShaFEM saves 4.9 days and 12.8 hours of execution time over the compared method.

References

  1. Agrawal, R., Srikant, R. 1994. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th Int. Conf. on Very Large Databases (1994). 487--499. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Han, J., Cheng, H., Xin, D., Yan, X. 2007. Frequent Pattern Mining: Current Status and Future Directions. In Journal of Data Mining and Knowledge Discovery (Aug. 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Burdick, D., Calimlim, M., Flannick, J., Gehrke, J., Yiu, T. 2005. MAFIA: A Maximal Frequent Itemset Algorithm. In IEEE Transaction on Knowledge and Data Engineering (Nov. 2005). vol. 17, no. 11, 1490--1504. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Li, W., Mozes, A. 2004. Computing Frequent Itemsets Inside Oracle 10g. In Proc. of the 30th Int. Conf. on Very Large Databases (2004). 1253--1256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Utley, C. 2005. Introduction to SQL Server 2005 Data Mining. Microsoft SQL Server 9.0 technical articles (Jun. 2005). Available at: http://technet.microsoft.com/en-us/library/ms345131.aspx.Google ScholarGoogle Scholar
  6. Yoshizawa, T., Pramudiono, I., Kitsuregawa, M. 2000. SQL Based Association Rule Mining Using Commercial RDBMS (IBM db2 UBD EEE). In Proc. of the 2nd Int. Conf. on Data Warehousing and Knowledge Discovery (2000). 301--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Han, J., Pei, J., Yin Y. 2000. Mining Frequent Patterns without Candidate Generation. In Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data (Jun. 2000), vol. 29, issue 2, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Borgelt C. 2005. An Implementation of the FP-growth Algorithm. In Proc. of the 1st Workshop on OSDM: Frequent Pattern Mining Implementations (Aug. 2005). 1--5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Grahne, G., Zhu, J. 2003. Efficiently Using Prefix-trees in Mining Frequent Itemsets. In Proc. of the 2003 Workshop on Frequent Pattern Mining Implementations (2003). 123--132.Google ScholarGoogle Scholar
  10. Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D. 2001. Hmine: Hyper-structure Mining of Frequent Patterns in Large Databases. In Proc. of the IEEE Int. Conf. on Data Mining (Nov. 2001). 441--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Racz, B. 2004. Nonordfp: An FP-growth Variation Without Rebuilding the FP-tree. In Proc. of the IEEE Workshop on Frequent Itemset Mining Implementations (FIMI) (Nov. 2004).Google ScholarGoogle Scholar
  12. Shporer, S. 2004. AIM2: Improved Implementation of AIM. In Proc. of the IEEE Workshop on FIMI (Nov. 2004).Google ScholarGoogle Scholar
  13. Schmidt-Thieme, L. 2004. Algorithmic Features of Eclat. In Proc. of the IEEE Workshop on FIMI (Nov. 2004).Google ScholarGoogle Scholar
  14. Agrawal, R., Imielinski T., Swami, A. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc of the 1993 ACM SIGMOD Int. Conf on Management of Data (Jun. 1993). vol. 22, issue 2, 207--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E. 2008. PFP: Parallel FP-Growth for Query Recommendation. In Proc. of the 2008 ACM Conf. on Recommender systems (2008). 107--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Zaki, M. J. 1999. Parallel and Distributed Association Mining: A Survey. In IEEE Concurrency Journal (Oct-Dec 1999). vol. 7, issue 4, 14--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Garg, R., Mishra, P. K. 2009. Some Observations of Sequential, Parallel and Distributed Association Rule Mining Algorithms. In the IEEE Proc of the 2009 Int. Conf. on Computer and Automation Engineering (March 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Moonesinghe, H. D. K., Chung, M. J., Tan P. N. Fast Parallel Mining of Frequent Itemsets, Michigan State University.Google ScholarGoogle Scholar
  19. Tanbeer, S. K., Ahmed, C. F., Jeong, B. S. 2009. Parallel and Distributed Frequent Pattern Mining in Large Databases. In Proc. of the 11th IEEE Int. Conf. on High Performance Computing and Communications (2009). 407--414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Li, J., Liu, Y., Liao, W., Choudhary, A. 2006. Parallel Data Mining Algorithms for Association Rules and Clustering. In CRC Press (2006), 3--5.Google ScholarGoogle Scholar
  21. Zaki, M., Parthasarathy, S., Ogihara, M., Li, W. 1997. New algorithms for fast discovery of association rules. In Proc. of Knowledge Discovery and Data Mining (1997). 283--286.Google ScholarGoogle Scholar
  22. Liu, L., Li, E., Zhang, Y., Tang, Z. 2007. Optimization of Frequent Itemset Mining on Multiple-core Processor. In Proc. of the 33rd Int. Conf. on Very Large Databases (2007). 1275--1285. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chen, D., Lai, C., Hu W., Chen, W., Zhang, Y., Zheng, W. 2006. Tree partition based parallel frequent pattern mining on shared memory systems. In Proc. of the 20th Int. Conf. on Parallel and distributed processing (2006), 313--320. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zaiane, O. R., El-Hajj, M., Lu, P. 2001. Fast parallel association rule mining without candidacy generation. In Proc. of the IEEE 2001 Int. Conf. on Data Mining (ICDM, 2001), 665--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Bienia, C. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications (PARSEC - freqmine). Princeton University Technical Report TR-811-08 (Jan. 2008). Available at http://parsec.cs.princeton.edu.Google ScholarGoogle Scholar
  26. Frequent Itemset Mining Implementations Repository, Available at http://fimi.ua.ac.beGoogle ScholarGoogle Scholar
  27. Vu, L., Alaghband, G. 2012. Mining Frequent Patterns Based on Data Characteristics. In Proc. of the 2012 Int. Conf. on Information and Knowledge Engineering (Jul. 2012). 369--375.Google ScholarGoogle Scholar
  28. Uno, T., Kiyomi, M., Arimura, H. 2004. LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In Proc. of ICDM Workshop on FIMI, (2004).Google ScholarGoogle Scholar

Index Terms

  1. Novel parallel method for mining frequent patterns on multi-core shared memory systems

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
                  November 2013
                  66 pages
                  ISBN:9781450325066
                  DOI:10.1145/2534645

                  Copyright © 2013 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 18 November 2013

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  DISCS-2013 Paper Acceptance Rate10of19submissions,53%Overall Acceptance Rate19of34submissions,56%

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader