ABSTRACT
Frequent pattern mining is an important problem in data mining with many practical applications. Current parallel methods for mining frequent patterns unstably perform for different database types and under-utilize the benefits of multi-core shared memory machines. We present ShaFEM, a novel parallel frequent pattern mining method, to address these issues. Our method can dynamically adapt to the data characteristics to efficiently perform on both sparse and dense databases. Its parallel mining lock free approach minimizes the synchronization needs and maximizes the data independence to enhance the scalability. Its structure lends itself well for dynamic job scheduling resulting in well-balanced load on new multi-core shared memory architectures. We evaluate ShaFEM on a 12-core multi-socket server and find that our method runs 2.1--5.8 times faster than the state-of-the-art parallel method. For some test cases, we have shown that ShaFEM saves 4.9 days and 12.8 hours of execution time over the compared method.
- Agrawal, R., Srikant, R. 1994. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th Int. Conf. on Very Large Databases (1994). 487--499. Google ScholarDigital Library
- Han, J., Cheng, H., Xin, D., Yan, X. 2007. Frequent Pattern Mining: Current Status and Future Directions. In Journal of Data Mining and Knowledge Discovery (Aug. 2007). Google ScholarDigital Library
- Burdick, D., Calimlim, M., Flannick, J., Gehrke, J., Yiu, T. 2005. MAFIA: A Maximal Frequent Itemset Algorithm. In IEEE Transaction on Knowledge and Data Engineering (Nov. 2005). vol. 17, no. 11, 1490--1504. Google ScholarDigital Library
- Li, W., Mozes, A. 2004. Computing Frequent Itemsets Inside Oracle 10g. In Proc. of the 30th Int. Conf. on Very Large Databases (2004). 1253--1256. Google ScholarDigital Library
- Utley, C. 2005. Introduction to SQL Server 2005 Data Mining. Microsoft SQL Server 9.0 technical articles (Jun. 2005). Available at: http://technet.microsoft.com/en-us/library/ms345131.aspx.Google Scholar
- Yoshizawa, T., Pramudiono, I., Kitsuregawa, M. 2000. SQL Based Association Rule Mining Using Commercial RDBMS (IBM db2 UBD EEE). In Proc. of the 2nd Int. Conf. on Data Warehousing and Knowledge Discovery (2000). 301--306. Google ScholarDigital Library
- Han, J., Pei, J., Yin Y. 2000. Mining Frequent Patterns without Candidate Generation. In Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data (Jun. 2000), vol. 29, issue 2, 1--12. Google ScholarDigital Library
- Borgelt C. 2005. An Implementation of the FP-growth Algorithm. In Proc. of the 1st Workshop on OSDM: Frequent Pattern Mining Implementations (Aug. 2005). 1--5. Google ScholarDigital Library
- Grahne, G., Zhu, J. 2003. Efficiently Using Prefix-trees in Mining Frequent Itemsets. In Proc. of the 2003 Workshop on Frequent Pattern Mining Implementations (2003). 123--132.Google Scholar
- Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D. 2001. Hmine: Hyper-structure Mining of Frequent Patterns in Large Databases. In Proc. of the IEEE Int. Conf. on Data Mining (Nov. 2001). 441--448. Google ScholarDigital Library
- Racz, B. 2004. Nonordfp: An FP-growth Variation Without Rebuilding the FP-tree. In Proc. of the IEEE Workshop on Frequent Itemset Mining Implementations (FIMI) (Nov. 2004).Google Scholar
- Shporer, S. 2004. AIM2: Improved Implementation of AIM. In Proc. of the IEEE Workshop on FIMI (Nov. 2004).Google Scholar
- Schmidt-Thieme, L. 2004. Algorithmic Features of Eclat. In Proc. of the IEEE Workshop on FIMI (Nov. 2004).Google Scholar
- Agrawal, R., Imielinski T., Swami, A. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc of the 1993 ACM SIGMOD Int. Conf on Management of Data (Jun. 1993). vol. 22, issue 2, 207--216. Google ScholarDigital Library
- Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E. 2008. PFP: Parallel FP-Growth for Query Recommendation. In Proc. of the 2008 ACM Conf. on Recommender systems (2008). 107--114. Google ScholarDigital Library
- Zaki, M. J. 1999. Parallel and Distributed Association Mining: A Survey. In IEEE Concurrency Journal (Oct-Dec 1999). vol. 7, issue 4, 14--45. Google ScholarDigital Library
- Garg, R., Mishra, P. K. 2009. Some Observations of Sequential, Parallel and Distributed Association Rule Mining Algorithms. In the IEEE Proc of the 2009 Int. Conf. on Computer and Automation Engineering (March 2009). Google ScholarDigital Library
- Moonesinghe, H. D. K., Chung, M. J., Tan P. N. Fast Parallel Mining of Frequent Itemsets, Michigan State University.Google Scholar
- Tanbeer, S. K., Ahmed, C. F., Jeong, B. S. 2009. Parallel and Distributed Frequent Pattern Mining in Large Databases. In Proc. of the 11th IEEE Int. Conf. on High Performance Computing and Communications (2009). 407--414. Google ScholarDigital Library
- Li, J., Liu, Y., Liao, W., Choudhary, A. 2006. Parallel Data Mining Algorithms for Association Rules and Clustering. In CRC Press (2006), 3--5.Google Scholar
- Zaki, M., Parthasarathy, S., Ogihara, M., Li, W. 1997. New algorithms for fast discovery of association rules. In Proc. of Knowledge Discovery and Data Mining (1997). 283--286.Google Scholar
- Liu, L., Li, E., Zhang, Y., Tang, Z. 2007. Optimization of Frequent Itemset Mining on Multiple-core Processor. In Proc. of the 33rd Int. Conf. on Very Large Databases (2007). 1275--1285. Google ScholarDigital Library
- Chen, D., Lai, C., Hu W., Chen, W., Zhang, Y., Zheng, W. 2006. Tree partition based parallel frequent pattern mining on shared memory systems. In Proc. of the 20th Int. Conf. on Parallel and distributed processing (2006), 313--320. Google ScholarDigital Library
- Zaiane, O. R., El-Hajj, M., Lu, P. 2001. Fast parallel association rule mining without candidacy generation. In Proc. of the IEEE 2001 Int. Conf. on Data Mining (ICDM, 2001), 665--668. Google ScholarDigital Library
- Bienia, C. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications (PARSEC - freqmine). Princeton University Technical Report TR-811-08 (Jan. 2008). Available at http://parsec.cs.princeton.edu.Google Scholar
- Frequent Itemset Mining Implementations Repository, Available at http://fimi.ua.ac.beGoogle Scholar
- Vu, L., Alaghband, G. 2012. Mining Frequent Patterns Based on Data Characteristics. In Proc. of the 2012 Int. Conf. on Information and Knowledge Engineering (Jul. 2012). 369--375.Google Scholar
- Uno, T., Kiyomi, M., Arimura, H. 2004. LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In Proc. of ICDM Workshop on FIMI, (2004).Google Scholar
Index Terms
- Novel parallel method for mining frequent patterns on multi-core shared memory systems
Recommendations
Novel parallel method for association rule mining on multi-core shared memory systems
ShaFEM: a novel association rule mining method for multi-core shared memory systems.ShaFEM self-adapts to data characteristic to run fast on sparse and dense databases.ShaFEM uses two mining strategies and dynamically switching between them.ShaFEM ...
Mining of Frequent Itemsets from Streams of Uncertain Data
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data EngineeringFrequent itemset mining plays an essential role in the mining of various patterns and is in demand in many real-life applications. Hence, mining of frequent itemsets has been the subject of numerous studies since its introduction. Generally, most of ...
Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures
Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large ...
Comments