skip to main content
10.1145/860435.860486acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

ReCoM: reinforcement clustering of multi-type interrelated data objects

Authors Info & Claims
Published:28 July 2003Publication History

ABSTRACT

Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated in the same ways as other attributes of the objects. In this paper, we propose a novel clustering approach for clustering multi-type interrelated data objects, ReCoM (Reinforcement Clustering of Multi-type Interrelated data objects). Under this approach, relationships among data objects are used to improve the cluster quality of interrelated data objects through an iterative reinforcement clustering process. At the same time, the link structure derived from relationships of the interrelated data objects is used to differentiate the importance of objects and the learned importance is also used in the clustering process to further improve the clustering results. Experimental results show that the proposed approach not only effectively overcomes the problem of data sparseness caused by the high dimensional relationship space but also significantly improves the clustering accuracy.

References

  1. P. Berkhin, Survey of Clustering Data Mining Techniques, http://www.accrue.com/products/researchpapers.html, 2002.Google ScholarGoogle Scholar
  2. J. S. Breese et al, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Technical report, Microsoft Research, 1998.Google ScholarGoogle Scholar
  3. S. Brin and L. Page, The Anatomy of a Large-Scale Hypertextual Web Search Engine, in Proc. of the 7th international World Wide Web Conference Vol.7, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Chakrabarti, Data Mining for Hypertext: A Tutorial survey, In ACMSIGKDD Explorations, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. L. Chen and K. Sycara, "Webmate: A personal agent for browsing and searching," In Proceedings 2nd Intl. Conf. Autonomous Agents, pp. 132--139, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Cohn & T. Hofman, The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity, in Proc. Neural Information Processing Systems, 2001.Google ScholarGoogle Scholar
  7. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Dhillon et al, Efficient Clustering of Very Large Document Collections, In Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Gibson, J. Kleinberg, and P Raghavan. Inferring Web communities from link topology, In Proc. 9th ACM Conference on Hypertext and Hypermedia, pages 225--234, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Heer and E. H. Chi, Identification of Web User Traffic Composition Using Multi-Modal Clustering and Information Scent, in 1st SIAM ICDM, Workshop on Web Mining, Chicago, 2001.Google ScholarGoogle Scholar
  11. J. Kleinberg, Authoritative Sources in a Hyperlinked Environment, in Proc. of the 9th ACM-SIAM Symposium on Discrete Algorithms, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Liu et al, Clustering Through Decision Tree Construction, the 9th International Conference on Information and Knowledge Management (CIKM), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Neville and D. Jensen, Iterative Classification in Relational Data, In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, AAAI Press, 2000.Google ScholarGoogle Scholar
  14. S. Slattery and M. Craven, Combining statistical and relational methods in hypertext domains. In Proc.ILP, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Steinbach et al, A Comparison of Document Clustering Techniques, in 6th ACM SIGKDD, World Text Mining Conference, Boston, 2000.Google ScholarGoogle Scholar
  16. Z. Su et al, Correlation-based Document Clustering using Web Logs, In Proc. of the 34th Hawaii International Conference On System Sciences (HICSS-34), 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Taskar et al, Probabilistic Classification and Clustering in Relational Data, in Proc. of IJCAI-01, 17th International Joint Conference on Artificial Intelligence, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. L. H. Ungar, D.P.Foster, Clustering Methods for Collaborative Filtering, In Workshop on Recommendation System at the 15th National Conference on Artificial Intelligence, 1998.Google ScholarGoogle Scholar
  19. J. Wen, J.Y. Nie, H. Zhang, "Query Clustering Using User Logs," ACM Transactions on Information Systems, 20 (1): 59--81, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Zeng et al, A Unified Framework for Clustering Heterogeneous Web Objects, in Proc. of the 3rd International Conference on Web Information System Engineering, Singapore, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Open Directory Project, http://dmoz.org/Google ScholarGoogle Scholar

Index Terms

  1. ReCoM: reinforcement clustering of multi-type interrelated data objects

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
      July 2003
      490 pages
      ISBN:1581136463
      DOI:10.1145/860435

      Copyright © 2003 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 28 July 2003

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      SIGIR '03 Paper Acceptance Rate46of266submissions,17%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader