research-article

Diva: a variance-based clustering approach for multi-type relational data

Authors:

Sarabjot Singh AnandAuthors Info & Claims

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

Pages 147 - 156

https://doi.org/10.1145/1321440.1321463

Published: 06 November 2007 Publication History

Abstract

Clustering is a common technique used to extract knowledge from a dataset in unsupervised learning. In contrast to classical propositional approaches that only focus on simple and flat datasets, relational clustering can handle multi-type interrelated data objects directly and adopt semantic information hidden in the linkage structure to improve the clustering result. However, exploring linkage information will greatly reduce the scalability of relational clustering. Moreover, some characteristics of vector data space utilized to accelerate the propositional clustering procedure are no longer valid in relational data space. These two disadvantages restrain the relational clustering techniques from being applied to very large datasets or in time-critical tasks, such as online recommender systems. In this paper we propose a new variance-based clustering algorithm to address the above difficulties. Our algorithm combines the advantages of divisive and agglomerative clustering paradigms to improve the quality of cluster results. By adopting the idea of Representative Object, it can be executed with linear time complexity. Experimental results show our algorithm achieves high accuracy, efficiency and robustness in comparison with some well-known relational clustering approaches.

References

[1]

S. S. Anand, P. Kearney, and M. Shapcott. Generating semantically enriched user profiles for web personalization. ACM Transactions on Internet Technologies, 7(3), August 2007.

Digital Library

[2]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd Edition). Wiley-Interscience Publication, 2001.

Digital Library

[3]

S. Dzeroski and N. Lavrac. Relational Data Mining. Springer, 2001.

Digital Library

[4]

M. Ester, H. P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of 2nd International Conference on Knowledge Discovery and Data Mining, 1996.

[5]

P. Ganesan, H. Garcia-Molina, and J. Widom. Exploiting hierarchical domain structure to compute similarity. ACM Transactions on Information Systems (TOIS), 21(1):64--93, 2003.

Digital Library

[6]

J. Han and M. Kamber. Data Mining: Concepts and Techniques (2nd Edition). Morgan Kaufmann, 2006.

Digital Library

[7]

A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: a review. ACM Computing Surveys, 31(3):264--323, 1999.

Digital Library

[8]

M. Kirsten and S. Wrobel. Relational distance-based clustering. In Proceedings of Fachgruppentreffen Maschinelles Lernen (FGML-98), pages 119--124, 10587 Berlin, 1998. Techn. Univ. Berlin, Technischer Bericht 98/11.

[9]

M. Kirsten and S. Wrobel. Extending k-means clustering to first-order representations. In ILP '00: Proceedings of the 10th International Conference on Inductive Logic Programming, pages 112--129, London, UK, 2000. Springer-Verlag.

Digital Library

[10]

J. A. Konstan, B. N. Miller, D. Maltz, J. L. Herlocker, L. R. Gordon, and J. Riedl. GroupLens: Applying collaborative filtering to Usenet news. Communication of the ACM, (3):77--87, 1997.

Digital Library

[11]

B. Long, Z. M. Zhang, X. Wu;, and P. S. Yu. Spectral clustering for multi-type relational data. In Proceedings of the 23rd international conference on Machine learning (ICML'06), pages 585--592, New York, NY, USA, 2006. ACM Press.

Digital Library

[12]

J. Neville, M. Adler, and D. Jensen. Clustering relational data using attribute and link information. In Proceedings of the Text Mining and Link Analysis Workshop, 18th International Joint Conference on Artificial Intelligence, 2003.

[13]

J. Wang, H. Zeng, Z. Chen, H. Lu, T. Li, and W.-Y. Ma. Recom: Reinforcement clustering of multi-type interrelated data objects. In Proceedings of the 26th ACM SIGIR conference on Research and development in informaion retrieval (SIGIR'03), pages 274--281, New York, NY, USA, 2003. ACM Press.

Digital Library

[14]

R. Xu and D. Wunsch. Survey of clustering algorithms. IEEE Transaction on Neural Networks, 16:645--678, 5 2005.

Digital Library

[15]

X. Yin, J. Han, and P. S. Yu. Linkclus: efficient clustering via heterogeneous semantic links. In Proceedings of the 32nd international conference on Very large data bases (VLDB'06), pages 427--438. VLDB Endowment, 2006.

Digital Library

[16]

T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In Proceedings of 1996 ACM SIGMOD International Conference on Management of Data, pages 103--114, Montreal, Canada, 1996.

Digital Library

Cited By

Zhang SHuang TWang D(2021)Sequence Contained Heterogeneous Graph Neural Network2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9533391(1-8)Online publication date: 2021
https://doi.org/10.1109/IJCNN52387.2021.9533391
Iloga SRomain OTchuenté M(2020)An efficient generic approach for automatic taxonomy generation using HMMsPattern Analysis and Applications10.1007/s10044-020-00918-0Online publication date: 18-Sep-2020
https://doi.org/10.1007/s10044-020-00918-0
Jiang JXu CXu JXu MZheng NKong KVo H(2016)Route planning for locations based on trajectory segmentsProceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics10.1145/3007540.3007546(1-8)Online publication date: 31-Oct-2016
https://dl.acm.org/doi/10.1145/3007540.3007546
Show More Cited By

Index Terms

Diva: a variance-based clustering approach for multi-type relational data
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

ReCoM: reinforcement clustering of multi-type interrelated data objects
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

Most existing clustering algorithms cluster highly related data objects such as Web pages and Web users separately. The interrelation among different types of data objects is either not considered, or represented by a static feature space and treated in ...
HIREL: An Incremental Clustering Algorithm for Relational Datasets
ICDM '08: Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Traditional clustering approaches usually analyze static datasets in which objects are kept unchanged after being processed, but many practical datasets are dynamically modified which means some previously learned patterns have to be updated ...
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management

November 2007

1048 pages

ISBN:9781595938039

DOI:10.1145/1321440

Co-chair:
Alberto H. F. Laender,
Conference Chairs:
André O. Falcão
Universidade de Lisboa, Portugal
,
Øystein Haug Olsen,
General Chair:
Mário J. Silva
(Universidade de Lisboa, Portugal)
,
Program Chairs:
Ricardo Baeza-Yates,
Deborah L. McGuinness,
Bjorn Olstad

Copyright © 2007 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM07

Sponsor:

CIKM07: Conference on Information and Knowledge Management

November 6 - 10, 2007

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
424
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang SHuang TWang D(2021)Sequence Contained Heterogeneous Graph Neural Network2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9533391(1-8)Online publication date: 2021
https://doi.org/10.1109/IJCNN52387.2021.9533391
Iloga SRomain OTchuenté M(2020)An efficient generic approach for automatic taxonomy generation using HMMsPattern Analysis and Applications10.1007/s10044-020-00918-0Online publication date: 18-Sep-2020
https://doi.org/10.1007/s10044-020-00918-0
Jiang JXu CXu JXu MZheng NKong KVo H(2016)Route planning for locations based on trajectory segmentsProceedings of the 2nd ACM SIGSPATIAL Workshop on Smart Cities and Urban Analytics10.1145/3007540.3007546(1-8)Online publication date: 31-Oct-2016
https://dl.acm.org/doi/10.1145/3007540.3007546
Ibrahim RElbagoury AKamel MKarray F(2016)LVC: Local Variance-based Clustering2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727579(2992-2999)Online publication date: Jul-2016
https://doi.org/10.1109/IJCNN.2016.7727579
Yoon SSong SLee SJeong KKim SKang SChoi YCha JRyu MJeong B(2013)A data partitioning approach for hierarchical clusteringProceedings of the 7th International Conference on Ubiquitous Information Management and Communication10.1145/2448556.2448628(1-4)Online publication date: 17-Jan-2013
https://dl.acm.org/doi/10.1145/2448556.2448628
Dietze SSanchez‐Alonso SEbner HQing Yu HGiordano DMarenzi IPereira Nunes B(2013)Interlinking educational resources and the web of dataProgram10.1108/0033033121129631247:1(60-91)Online publication date: 8-Feb-2013
https://doi.org/10.1108/00330331211296312
Lin YSundaram HDe Choudhury MKelliher A(2012)Discovering multirelational structure in social media streamsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/2071396.20714008:1(1-28)Online publication date: 3-Feb-2012
https://dl.acm.org/doi/10.1145/2071396.2071400
Yoon SSong SKim SLee SHanzo LChung MLee SCho K(2011)Efficient link-based clustering in a large scaled blog networkProceedings of the 5th International Conference on Ubiquitous Information Management and Communication10.1145/1968613.1968699(1-5)Online publication date: 21-Feb-2011
https://dl.acm.org/doi/10.1145/1968613.1968699
Yoon SSong SKim SHa JLee JKim H(2010)Application of Linkclus in blogosphere2010 2nd IEEE InternationalConference on Network Infrastructure and Digital Content10.1109/ICNIDC.2010.5657906(95-99)Online publication date: Sep-2010
https://doi.org/10.1109/ICNIDC.2010.5657906
Cheng YHuang SLv TLiu G(2010)A New Data Clustering AlgorithmProceedings of the 2010 Fifth International Conference on Internet Computing for Science and Engineering10.1109/ICICSE.2010.16(106-111)Online publication date: 1-Nov-2010
https://dl.acm.org/doi/10.1109/ICICSE.2010.16

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten