research-article

Novel parallel method for mining frequent patterns on multi-core shared memory systems

Authors:
Lan Vu

University of Colorado Denver, Lawrence St. Denver, CO

University of Colorado Denver, Lawrence St. Denver, CO
View Profile

,
Gita Alaghband

University of Colorado Denver, Lawrence St. Denver, CO

University of Colorado Denver, Lawrence St. Denver, CO
View Profile

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing SystemsNovember 2013Pages 49–54https://doi.org/10.1145/2534645.2534653

Published:18 November 2013Publication History

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Pages 49–54

ABSTRACT

Frequent pattern mining is an important problem in data mining with many practical applications. Current parallel methods for mining frequent patterns unstably perform for different database types and under-utilize the benefits of multi-core shared memory machines. We present ShaFEM, a novel parallel frequent pattern mining method, to address these issues. Our method can dynamically adapt to the data characteristics to efficiently perform on both sparse and dense databases. Its parallel mining lock free approach minimizes the synchronization needs and maximizes the data independence to enhance the scalability. Its structure lends itself well for dynamic job scheduling resulting in well-balanced load on new multi-core shared memory architectures. We evaluate ShaFEM on a 12-core multi-socket server and find that our method runs 2.1--5.8 times faster than the state-of-the-art parallel method. For some test cases, we have shown that ShaFEM saves 4.9 days and 12.8 hours of execution time over the compared method.

References

Agrawal, R., Srikant, R. 1994. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th Int. Conf. on Very Large Databases (1994). 487--499. Google ScholarDigital Library
Han, J., Cheng, H., Xin, D., Yan, X. 2007. Frequent Pattern Mining: Current Status and Future Directions. In Journal of Data Mining and Knowledge Discovery (Aug. 2007). Google ScholarDigital Library
Burdick, D., Calimlim, M., Flannick, J., Gehrke, J., Yiu, T. 2005. MAFIA: A Maximal Frequent Itemset Algorithm. In IEEE Transaction on Knowledge and Data Engineering (Nov. 2005). vol. 17, no. 11, 1490--1504. Google ScholarDigital Library
Li, W., Mozes, A. 2004. Computing Frequent Itemsets Inside Oracle 10g. In Proc. of the 30th Int. Conf. on Very Large Databases (2004). 1253--1256. Google ScholarDigital Library
Utley, C. 2005. Introduction to SQL Server 2005 Data Mining. Microsoft SQL Server 9.0 technical articles (Jun. 2005). Available at: http://technet.microsoft.com/en-us/library/ms345131.aspx.Google Scholar
Yoshizawa, T., Pramudiono, I., Kitsuregawa, M. 2000. SQL Based Association Rule Mining Using Commercial RDBMS (IBM db2 UBD EEE). In Proc. of the 2^nd Int. Conf. on Data Warehousing and Knowledge Discovery (2000). 301--306. Google ScholarDigital Library
Han, J., Pei, J., Yin Y. 2000. Mining Frequent Patterns without Candidate Generation. In Proc. of the 2000 ACM SIGMOD Int. Conf. on Management of Data (Jun. 2000), vol. 29, issue 2, 1--12. Google ScholarDigital Library
Borgelt C. 2005. An Implementation of the FP-growth Algorithm. In Proc. of the 1^st Workshop on OSDM: Frequent Pattern Mining Implementations (Aug. 2005). 1--5. Google ScholarDigital Library
Grahne, G., Zhu, J. 2003. Efficiently Using Prefix-trees in Mining Frequent Itemsets. In Proc. of the 2003 Workshop on Frequent Pattern Mining Implementations (2003). 123--132.Google Scholar
Pei, J., Han, J., Lu, H., Nishio, S., Tang, S., Yang, D. 2001. Hmine: Hyper-structure Mining of Frequent Patterns in Large Databases. In Proc. of the IEEE Int. Conf. on Data Mining (Nov. 2001). 441--448. Google ScholarDigital Library
Racz, B. 2004. Nonordfp: An FP-growth Variation Without Rebuilding the FP-tree. In Proc. of the IEEE Workshop on Frequent Itemset Mining Implementations (FIMI) (Nov. 2004).Google Scholar
Shporer, S. 2004. AIM2: Improved Implementation of AIM. In Proc. of the IEEE Workshop on FIMI (Nov. 2004).Google Scholar
Schmidt-Thieme, L. 2004. Algorithmic Features of Eclat. In Proc. of the IEEE Workshop on FIMI (Nov. 2004).Google Scholar
Agrawal, R., Imielinski T., Swami, A. 1993. Mining Association Rules between Sets of Items in Large Databases. In Proc of the 1993 ACM SIGMOD Int. Conf on Management of Data (Jun. 1993). vol. 22, issue 2, 207--216. Google ScholarDigital Library
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E. 2008. PFP: Parallel FP-Growth for Query Recommendation. In Proc. of the 2008 ACM Conf. on Recommender systems (2008). 107--114. Google ScholarDigital Library
Zaki, M. J. 1999. Parallel and Distributed Association Mining: A Survey. In IEEE Concurrency Journal (Oct-Dec 1999). vol. 7, issue 4, 14--45. Google ScholarDigital Library
Garg, R., Mishra, P. K. 2009. Some Observations of Sequential, Parallel and Distributed Association Rule Mining Algorithms. In the IEEE Proc of the 2009 Int. Conf. on Computer and Automation Engineering (March 2009). Google ScholarDigital Library
Moonesinghe, H. D. K., Chung, M. J., Tan P. N. Fast Parallel Mining of Frequent Itemsets, Michigan State University.Google Scholar
Tanbeer, S. K., Ahmed, C. F., Jeong, B. S. 2009. Parallel and Distributed Frequent Pattern Mining in Large Databases. In Proc. of the 11th IEEE Int. Conf. on High Performance Computing and Communications (2009). 407--414. Google ScholarDigital Library
Li, J., Liu, Y., Liao, W., Choudhary, A. 2006. Parallel Data Mining Algorithms for Association Rules and Clustering. In CRC Press (2006), 3--5.Google Scholar
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W. 1997. New algorithms for fast discovery of association rules. In Proc. of Knowledge Discovery and Data Mining (1997). 283--286.Google Scholar
Liu, L., Li, E., Zhang, Y., Tang, Z. 2007. Optimization of Frequent Itemset Mining on Multiple-core Processor. In Proc. of the 33rd Int. Conf. on Very Large Databases (2007). 1275--1285. Google ScholarDigital Library
Chen, D., Lai, C., Hu W., Chen, W., Zhang, Y., Zheng, W. 2006. Tree partition based parallel frequent pattern mining on shared memory systems. In Proc. of the 20th Int. Conf. on Parallel and distributed processing (2006), 313--320. Google ScholarDigital Library
Zaiane, O. R., El-Hajj, M., Lu, P. 2001. Fast parallel association rule mining without candidacy generation. In Proc. of the IEEE 2001 Int. Conf. on Data Mining (ICDM, 2001), 665--668. Google ScholarDigital Library
Bienia, C. 2008. The PARSEC Benchmark Suite: Characterization and Architectural Implications (PARSEC - freqmine). Princeton University Technical Report TR-811-08 (Jan. 2008). Available at http://parsec.cs.princeton.edu.Google Scholar
Frequent Itemset Mining Implementations Repository, Available at http://fimi.ua.ac.beGoogle Scholar
Vu, L., Alaghband, G. 2012. Mining Frequent Patterns Based on Data Characteristics. In Proc. of the 2012 Int. Conf. on Information and Knowledge Engineering (Jul. 2012). 369--375.Google Scholar
Uno, T., Kiyomi, M., Arimura, H. 2004. LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets. In Proc. of ICDM Workshop on FIMI, (2004).Google Scholar

Index Terms

Recommendations

Novel parallel method for association rule mining on multi-core shared memory systems

ShaFEM: a novel association rule mining method for multi-core shared memory systems.ShaFEM self-adapts to data characteristic to run fast on sparse and dense databases.ShaFEM uses two mining strategies and dynamically switching between them.ShaFEM ...
Read More
Mining of Frequent Itemsets from Streams of Uncertain Data
ICDE '09: Proceedings of the 2009 IEEE International Conference on Data Engineering

Frequent itemset mining plays an essential role in the mining of various patterns and is in demand in many real-life applications. Hence, mining of frequent itemsets has been the subject of numerous studies since its introduction. Generally, most of ...
Read More
Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

Frequent pattern mining has been playing an essential role in knowledge discovery and data mining tasks that try to find usable patterns from databases. Efficiency is especially crucial for an algorithm in order to find frequent itemsets from a large ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
November 2013
66 pages
ISBN:9781450325066
DOI:10.1145/2534645
General Chair:
Xian-He Sun
Illinois Institute of Technology
,
Program Chairs:
Yong Chen
Texas Tech University
,
Philip C. Roth
Oak Ridge National Laboratory
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 November 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
association rule mining
databases
frequent pattern mining
multi-core
parallel algorithm
shared memory
Qualifiers
- research-article
Conference

Acceptance Rates
DISCS-2013 Paper Acceptance Rate10of19submissions,53%Overall Acceptance Rate19of34submissions,56%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 10
  Total Citations
  View Citations
- 203
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Novel parallel method for mining frequent patterns on multi-core shared memory systems

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Novel parallel method for association rule mining on multi-core shared memory systems

Mining of Frequent Itemsets from Streams of Uncertain Data

Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Novel parallel method for mining frequent patterns on multi-core shared memory systems

DISCS-2013: Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Novel parallel method for association rule mining on multi-core shared memory systems

Mining of Frequent Itemsets from Streams of Uncertain Data

Apriori-based High Efficiency Load Balancing Parallel Data Mining Algorithms on Multi-core Architectures

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media