skip to main content
10.1145/1321440.1321463acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Diva: a variance-based clustering approach for multi-type relational data

Published: 06 November 2007 Publication History

Abstract

Clustering is a common technique used to extract knowledge from a dataset in unsupervised learning. In contrast to classical propositional approaches that only focus on simple and flat datasets, relational clustering can handle multi-type interrelated data objects directly and adopt semantic information hidden in the linkage structure to improve the clustering result. However, exploring linkage information will greatly reduce the scalability of relational clustering. Moreover, some characteristics of vector data space utilized to accelerate the propositional clustering procedure are no longer valid in relational data space. These two disadvantages restrain the relational clustering techniques from being applied to very large datasets or in time-critical tasks, such as online recommender systems. In this paper we propose a new variance-based clustering algorithm to address the above difficulties. Our algorithm combines the advantages of divisive and agglomerative clustering paradigms to improve the quality of cluster results. By adopting the idea of Representative Object, it can be executed with linear time complexity. Experimental results show our algorithm achieves high accuracy, efficiency and robustness in comparison with some well-known relational clustering approaches.

References

[1]
S. S. Anand, P. Kearney, and M. Shapcott. Generating semantically enriched user profiles for web personalization. ACM Transactions on Internet Technologies, 7(3), August 2007.
[2]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd Edition). Wiley-Interscience Publication, 2001.
[3]
S. Dzeroski and N. Lavrac. Relational Data Mining. Springer, 2001.
[4]
M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, 1996.
[5]
P. Ganesan, H. Garcia-Molina, and J. Widom. Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems (TOIS), 21(1):64--93, 2003.
[6]
J. Han and M. Kamber. Data Mining: Concepts and Techniques (2nd Edition). Morgan Kaufmann, 2006.
[7]
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264--323, 1999.
[8]
M. Kirsten and S. Wrobel. Relational distance-based clustering. In Proceedings of Fachgruppentreffen Maschinelles Lernen (FGML-98), pages 119--124, 10587 Berlin, 1998. Techn. Univ. Berlin, Technischer Bericht 98/11.
[9]
M. Kirsten and S. Wrobel. Extending k-means clustering to first-order representations. In ILP '00: Proceedings of the 10th International Conference on Inductive Logic Programming, pages 112--129, London, UK, 2000. Springer-Verlag.
[10]
J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. GroupLens: Applying collaborative filtering to Usenet news. Communication of the ACM, (3):77--87, 1997.
[11]
B. Long, Z. M. Zhang, X. Wu;, and P. S. Yu. Spectral clustering for multi-type relational data. In Proceedings of the 23rd international conference on Machine learning (ICML'06), pages 585--592, New York, NY, USA, 2006. ACM Press.
[12]
J. Neville, M. Adler, and D. Jensen. Clustering relational data using attribute and link information. In Proceedings of the Text Mining and Link Analysis Workshop, 18th International Joint Conference on Artificial Intelligence, 2003.
[13]
J. Wang, H. Zeng, Z. Chen, H. Lu, T. Li, and W.-Y. Ma. Recom: Reinforcement clustering of multi-type interrelated data objects. In Proceedings of the 26th ACM SIGIR conference on Research and development in informaion retrieval (SIGIR'03), pages 274--281, New York, NY, USA, 2003. ACM Press.
[14]
R. Xu and D. Wunsch. Survey of clustering algorithms. IEEE Transaction on Neural Networks, 16:645--678, 5 2005.
[15]
X. Yin, J. Han, and P. S. Yu. Linkclus: efficient clustering via heterogeneous semantic links. In Proceedings of the 32nd international conference on Very large data bases (VLDB'06), pages 427--438. VLDB Endowment, 2006.
[16]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In Proceedings of 1996 ACM SIGMOD International Conference on Management of Data, pages 103--114, Montreal, Canada, 1996.

Cited By

View all
  • (2021)Sequence Contained Heterogeneous Graph Neural Network2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9533391(1-8)Online publication date: 2021
  • (2020)An efficient generic approach for automatic taxonomy generation using HMMsPattern Analysis and Applications10.1007/s10044-020-00918-0Online publication date: 18-Sep-2020
  • (2016)Route planning for locations based on trajectory segmentsProceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics10.1145/3007540.3007546(1-8)Online publication date: 31-Oct-2016
  • Show More Cited By

Index Terms

  1. Diva: a variance-based clustering approach for multi-type relational data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. multi-type
    3. relational

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 17 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Sequence Contained Heterogeneous Graph Neural Network2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9533391(1-8)Online publication date: 2021
    • (2020)An efficient generic approach for automatic taxonomy generation using HMMsPattern Analysis and Applications10.1007/s10044-020-00918-0Online publication date: 18-Sep-2020
    • (2016)Route planning for locations based on trajectory segmentsProceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics10.1145/3007540.3007546(1-8)Online publication date: 31-Oct-2016
    • (2016)LVC: Local Variance-based Clustering2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727579(2992-2999)Online publication date: Jul-2016
    • (2013)A data partitioning approach for hierarchical clusteringProceedings of the 7th International Conference on Ubiquitous Information Management and Communication10.1145/2448556.2448628(1-4)Online publication date: 17-Jan-2013
    • (2013)Interlinking educational resources and the web of dataProgram10.1108/0033033121129631247:1(60-91)Online publication date: 8-Feb-2013
    • (2012)Discovering multirelational structure in social media streamsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2071396.20714008:1(1-28)Online publication date: 3-Feb-2012
    • (2011)Efficient link-based clustering in a large scaled blog networkProceedings of the 5th International Conference on Ubiquitous Information Management and Communication10.1145/1968613.1968699(1-5)Online publication date: 21-Feb-2011
    • (2010)Application of Linkclus in blogosphere2010 2nd IEEE InternationalConference on Network Infrastructure and Digital Content10.1109/ICNIDC.2010.5657906(95-99)Online publication date: Sep-2010
    • (2010)A New Data Clustering AlgorithmProceedings of the 2010 Fifth International Conference on Internet Computing for Science and Engineering10.1109/ICICSE.2010.16(106-111)Online publication date: 1-Nov-2010

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media