ABSTRACT
We present an approach to document clustering based on winnowing fingerprints that achieved good values of effectiveness with considerable save in memory space and computation time.
- A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In WWW'97, pages 1157--1166, 1997. Google ScholarDigital Library
- A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264--323, 1999. Google ScholarDigital Library
- P. Pantel and D. Lin. Document clustering with committees. In SIGIR'02, pages 199--206,2002. Google ScholarDigital Library
- D. Puppin and F. Silvestri. The query-vector document model. In CIKM'06, pages 880--881, 2006. Google ScholarDigital Library
- M. Rosell, V. Kann, and J.-E. Litton. Comparing comparisons: Document clustering evaluation using two manual classications. In ICON'04, 2004.Google Scholar
- S. Schleimer, D. S. Wilkerson, and A. Aiken. Winnowing: local algorithms for document fingerprinting. In SIGMOD'03, pages 76--85, 2003. Google ScholarDigital Library
- F. Giannotti and C. Gozzi. Characterizing web user accesses: A transactional approach to web log clustering in ITCC '02, pages 312--317, 2002.Google Scholar
Index Terms
- Winnowing-based text clustering
Recommendations
Text document clustering based on neighbors
Clustering is a very powerful data mining technique for topic discovery from text documents. The partitional clustering algorithms, such as the family of k-means, are reported performing well on document clustering. They treat the clustering problem as ...
Document clustering with committees
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalDocument clustering is useful in many information retrieval tasks: document browsing, organization and viewing of retrieval results, generation of Yahoo-like hierarchies of documents, etc. The general goal of clustering is to group data elements such ...
Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global InformatizationIn this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Comments