skip to main content
10.1145/1454008.1454027acmconferencesArticle/Chapter ViewAbstractPublication PagesrecsysConference Proceedingsconference-collections
research-article

Pfp: parallel fp-growth for query recommendation

Published: 23 October 2008 Publication History

Abstract

Frequent itemset mining (FIM) is a useful tool for discovering frequently co-occurrent items. Since its inception, a number of significant FIM algorithms have been developed to speed up mining performance. Unfortunately, when the dataset size is huge, both the memory use and computational cost can still be prohibitively expensive. In this work, we propose to parallelize the FP-Growth algorithm (we call our parallel algorithm PFP) on distributed machines. PFP partitions computation in such a way that each machine executes an independent group of mining tasks. Such partitioning eliminates computational dependencies between machines, and thereby communication between them. Through empirical study on a large dataset of 802,939 Web pages and 1,021,107 tags, we demonstrate that PFP can achieve virtually linear speedup. Besides scalability, the empirical study demonstrates that PFP to be promising for supporting query recommendation for search engines.

References

[1]
Lamine M. Aouad, Nhien-An Le-Khac, and Tahar M. Kechadi. Distributed frequent itemsets mining in heterogeneous platforms. Engineering, Computing and Archtecture, 1, 2007.
[2]
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286:509--512, 1999.
[3]
Gregory Buehrer, Srinivasan Parthasarathy, Shirish Tatikonda, Tahsin Kurc, and Joel Saltz. Toward terabyte pattern mining: An architecture-conscious solution. In PPOPP, 2007.
[4]
Jeffrey Dean and Sanjay Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137--150, 2004.
[5]
Mohammad El-Hajj and Osmar R. Zaiane. Parallel leap: Large-scale maximal pattern mining in a distributed environment. In ICPADS, 2006.
[6]
Jiawei Han, Jian Pei, and Yiwen Yin. Mining frequent patterns without candidate generation. In SIGMOD, 2000.
[7]
Li Liu, Eric Li, Yimin Zhang, and Zhizhong Tang. Optimization of frequent itemset mining on multiple-core processor. In VLDB, 2007.
[8]
Iko Pramudiono and Masaru Kitsuregawa. Parallel FP-Growth on PC cluster. In PAKDD, 2003.
[9]
Agrawal Rakesh and Ramakrishnan Srikant. Fast algorithms for mining association rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB, 1994.
[10]
Osmar R. Zaïane, Mohammad El-Hajj, and Paul Lu. Fast parallel association rule mining without candidacy generation. In ICDM, 2001.

Cited By

View all
  • (2025)Crowd Distance Induced Multi Objective Binary Salp Swarm Optimization Algorithm for Mining High Frequency and Utility ItemsetsSN Computer Science10.1007/s42979-025-03725-86:2Online publication date: 14-Feb-2025
  • (2024)DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset MiningMathematics10.3390/math1224393012:24(3930)Online publication date: 13-Dec-2024
  • (2024)Temporal Association Rule Mining: Race-Based Patterns of Treatment-Adverse Events in Breast Cancer Patients Using SEER–Medicare DatasetBiomedicines10.3390/biomedicines1206121312:6(1213)Online publication date: 29-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
RecSys '08: Proceedings of the 2008 ACM conference on Recommender systems
October 2008
348 pages
ISBN:9781605580937
DOI:10.1145/1454008
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. frequent itemset mining
  3. parallel fp-growth

Qualifiers

  • Research-article

Conference

RecSys08: ACM Conference on Recommender Systems
October 23 - 25, 2008
Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 254 of 1,295 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)138
  • Downloads (Last 6 weeks)12
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Crowd Distance Induced Multi Objective Binary Salp Swarm Optimization Algorithm for Mining High Frequency and Utility ItemsetsSN Computer Science10.1007/s42979-025-03725-86:2Online publication date: 14-Feb-2025
  • (2024)DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset MiningMathematics10.3390/math1224393012:24(3930)Online publication date: 13-Dec-2024
  • (2024)Temporal Association Rule Mining: Race-Based Patterns of Treatment-Adverse Events in Breast Cancer Patients Using SEER–Medicare DatasetBiomedicines10.3390/biomedicines1206121312:6(1213)Online publication date: 29-May-2024
  • (2024)Market-aware Long-term Job Skill Recommendation with Explainable Deep Reinforcement LearningACM Transactions on Information Systems10.1145/3704998Online publication date: 21-Nov-2024
  • (2024)Operating Optimization of Steam Turbine Unit Based on Big Data Parallel Association Rule MiningIEEE Transactions on Automation Science and Engineering10.1109/TASE.2023.329384321:3(4226-4236)Online publication date: Jul-2024
  • (2024)An Improved FP-Growth Algorithm with Time Decay Factor and Element Attention Weight2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA)10.1109/ICPECA60615.2024.10470947(860-865)Online publication date: 26-Jan-2024
  • (2024)Vedalytics: Concepts as Knowledge from Rules of the Ancient Indian Scriptures2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10825723(6967-6976)Online publication date: 15-Dec-2024
  • (2024)An exploration of descriptive machine learning approaches for antimicrobial resistance: Multidrug resistance patterns in Salmonella entericaPreventive Veterinary Medicine10.1016/j.prevetmed.2024.106261230(106261)Online publication date: Sep-2024
  • (2024)A scalable and flexible basket analysis system for big transaction data in SparkInformation Processing & Management10.1016/j.ipm.2023.10357761:2(103577)Online publication date: Mar-2024
  • (2024)Colossal Trajectory MiningExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.122055238:PDOnline publication date: 15-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media