Article

ReCoM: reinforcement clustering of multi-type interrelated data objects

Authors:
Jidong Wang

Microsoft Research Asia, Beijing, P.R.China

Microsoft Research Asia, Beijing, P.R.China
View Profile

,
Huajun Zeng

Microsoft Research Asia, Beijing, P.R.China

Microsoft Research Asia, Beijing, P.R.China
View Profile

,
Zheng Chen

Microsoft Research Asia, Beijing, P.R.China

Microsoft Research Asia, Beijing, P.R.China
View Profile

,
Hongjun Lu

Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong

Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
View Profile

,
Li Tao

Microsoft Research Asia, Beijing, P.R.China

Microsoft Research Asia, Beijing, P.R.China
View Profile

,
Wei-Ying Ma

Microsoft Research Asia, Beijing, P.R.China

Microsoft Research Asia, Beijing, P.R.China
View Profile

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrievalJuly 2003Pages 274–281https://doi.org/10.1145/860435.860486

Published:28 July 2003Publication History

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Pages 274–281

ABSTRACT

Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated in the same ways as other attributes of the objects. In this paper, we propose a novel clustering approach for clustering multi-type interrelated data objects, ReCoM (Reinforcement Clustering of Multi-type Interrelated data objects). Under this approach, relationships among data objects are used to improve the cluster quality of interrelated data objects through an iterative reinforcement clustering process. At the same time, the link structure derived from relationships of the interrelated data objects is used to differentiate the importance of objects and the learned importance is also used in the clustering process to further improve the clustering results. Experimental results show that the proposed approach not only effectively overcomes the problem of data sparseness caused by the high dimensional relationship space but also significantly improves the clustering accuracy.

References

P. Berkhin, Survey of Clustering Data Mining Techniques, http://www.accrue.com/products/researchpapers.html, 2002.Google Scholar
J. S. Breese et al, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Technical report, Microsoft Research, 1998.Google Scholar
S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, in Proc. of the 7th international World Wide Web Conference Vol.7, 1998. Google ScholarDigital Library
S. Chakrabarti, Data Mining for Hypertext: A Tutorial survey, In ACMSIGKDD Explorations, 2000. Google ScholarDigital Library
L. Chen and K. Sycara, "Webmate: A personal agent for browsing and searching," In Proceedings 2nd Intl. Conf. Autonomous Agents, pp. 132--139, 1998. Google ScholarDigital Library
D. Cohn & T. Hofman, The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity, in Proc. Neural Information Processing Systems, 2001.Google Scholar
T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 1991. Google ScholarDigital Library
I. Dhillon et al, Efficient Clustering of Very Large Document Collections, In Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 2001. Google ScholarDigital Library
D. Gibson, J. Kleinberg, and P Raghavan. Inferring Web communities from link topology, In Proc. 9th ACM Conference on Hypertext and Hypermedia, pages 225--234, 1998. Google ScholarDigital Library
J. Heer and E. H. Chi, Identification of Web User Traffic Composition Using Multi-Modal Clustering and Information Scent, in 1st SIAM ICDM, Workshop on Web Mining, Chicago, 2001.Google Scholar
J. Kleinberg, Authoritative Sources in a Hyperlinked Environment, in Proc. of the 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarDigital Library
B. Liu et al, Clustering Through Decision Tree Construction, the 9th International Conference on Information and Knowledge Management (CIKM), 2000. Google ScholarDigital Library
J. Neville and D. Jensen, Iterative Classification in Relational Data, In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, AAAI Press, 2000.Google Scholar
S. Slattery and M. Craven, Combining statistical and relational methods in hypertext domains. In Proc.ILP, 1998. Google ScholarDigital Library
M. Steinbach et al, A Comparison of Document Clustering Techniques, in 6th ACM SIGKDD, World Text Mining Conference, Boston, 2000.Google Scholar
Z. Su et al, Correlation-based Document Clustering using Web Logs, In Proc. of the 34th Hawaii International Conference On System Sciences (HICSS-34), 2001. Google ScholarDigital Library
B. Taskar et al, Probabilistic Classification and Clustering in Relational Data, in Proc. of IJCAI-01, 17th International Joint Conference on Artificial Intelligence, 2001. Google ScholarDigital Library
L. H. Ungar, D.P.Foster, Clustering Methods for Collaborative Filtering, In Workshop on Recommendation System at the 15th National Conference on Artificial Intelligence, 1998.Google Scholar
J. Wen, J.Y. Nie, H. Zhang, "Query Clustering Using User Logs," ACM Transactions on Information Systems, 20 (1): 59--81, 2002. Google ScholarDigital Library
H. Zeng et al, A Unified Framework for Clustering Heterogeneous Web Objects, in Proc. of the 3rd International Conference on Web Information System Engineering, Singapore, 2002. Google ScholarDigital Library
Open Directory Project, http://dmoz.org/Google Scholar

Index Terms

ReCoM: reinforcement clustering of multi-type interrelated data objects
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

Diva: a variance-based clustering approach for multi-type relational data
CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Clustering is a common technique used to extract knowledge from a dataset in unsupervised learning. In contrast to classical propositional approaches that only focus on simple and flat datasets, relational clustering can handle multi-type interrelated ...
Read More
Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Clustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...
Read More
A new hybrid method based on partitioning-based DBSCAN and ant clustering

Clustering problem is an unsupervised learning problem. It is a procedure that partition data objects into matching clusters. The data objects in the same cluster are quite similar to each other and dissimilar in the other clusters. Density-based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
July 2003
490 pages
ISBN:1581136463
DOI:10.1145/860435
General Chairs:
Charles Clarke
University of Waterloo, Canada
,
Gordon Cormack
University of Waterloo, Canada
,
Program Chairs:
Jamie Callan
Carnegie Mellon University, Pittsburgh, PA
,
David Hawking
Australian National University, Australia
,
Alan Smeaton
Dublin City University, Ireland
Copyright © 2003 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 July 2003
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
interrelated
multi-type
reinforcement
Qualifiers
- Article
Conference

Acceptance Rates
SIGIR '03 Paper Acceptance Rate46of266submissions,17%Overall Acceptance Rate792of3,983submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 75
  Total Citations
  View Citations
- 1,381
  Total Downloads
- Downloads (Last 12 months)16
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ReCoM: reinforcement clustering of multi-type interrelated data objects

SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Diva: a variance-based clustering approach for multi-type relational data

Improved k- means clustering algorithm for two dimensional data

A new hybrid method based on partitioning-based DBSCAN and ant clustering