| Efficient mining of association rules in text databases |
| Full text |
Pdf
(1.09 MB)
|
| Source
|
Conference on Information and Knowledge Management
archive
Proceedings of the eighth international conference on Information and knowledge management
table of contents
Kansas City, Missouri, United States
Pages: 234 - 242
Year of Publication: 1999
ISBN:1-58113-146-1
|
|
Authors
|
|
John D. Holt
|
Department of Computer Science and Engineering, Wright State University, Dayton, Ohio
|
|
Soon M. Chung
|
Department of Computer Science and Engineering, Wright State University, Dayton, Ohio
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 10, Downloads (12 Months): 73, Citation Count: 1
|
|
|
ABSTRACT
In this paper, we propose two new algorithms for mining association rules between words in text databases. The characteristics of text databases are quite different from those of retail transaction databases, and existing mining algorithms cannot handle text databases efficiently because of the large number of itemsets (i.e., words) that need to be counted. Two well-known mining algorithms, Apriori algorithm and Direct Hashing and Pruning (DHP) algorithm, are evaluated in the context of mining text databases, and are compared with the new proposed algorithms named Multipass-Apriori (M-Apriori) and Multipass-DHP (M-DHP). It has been shown that the proposed algorithms have better performance for large text databases.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
 |
2
|
Sergey Brin , Rajeev Motwani , Jeffrey D. Ullman , Shalom Tsur, Dynamic itemset counting and implication rules for market basket data, Proceedings of the 1997 ACM SIGMOD international conference on Management of data, p.255-264, May 11-15, 1997, Tucson, Arizona, United States
|
| |
3
|
|
| |
4
|
|
| |
5
|
|
| |
6
|
|
| |
7
|
|
| |
8
|
|
| |
9
|
E.M. Voorhee~ and D.K. Harmon (editors), The Fifth Text Retrieval Conference, National Institute of Standards and Technology, 1997.
|
| |
10
|
|
Peer to Peer - Readers of this Article have also read:
-
M4: a metamodel for data preprocessing
Proceedings of the 4th ACM international workshop on Data warehousing and OLAP
Anca Vaduva
, Jörg-Uwe Kietz
, Regina Zücker
-
Data structures for quadtree approximation and compression
Communications of the ACM
28, 9
Hanan Samet
-
A hierarchical single-key-lock access control using the Chinese remainder theorem
Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing
Kim S. Lee
, Huizhu Lu
, D. D. Fisher
-
The GemStone object database management system
Communications of the ACM
34, 10
Paul Butterworth
, Allen Otis
, Jacob Stein
-
Putting innovation to work: adoption strategies for multimedia communication systems
Communications of the ACM
34, 12
Ellen Francik
, Susan Ehrlich Rudman
, Donna Cooper
, Stephen Levine
|