A unified framework for image database clustering and content-based retrieval

Published: 13 November 2004


With the proliferation of image data, the need to search and retrieve images efficiently and accurately from a large image database or a collection of image databases has drastically increased. To address such a demand, a unified framework called <i>Markov Model Mediators</i> (MMMs) is proposed in this paper to facilitate conceptual database clustering and to improve the query processing performance by analyzing the summarized knowledge. The unique characteristics of MMMs are that it provides the capabilities of exploring the affinity relations among the images at the database level and among the databases at the cluster level respectively, using an effective data mining process. At the database level, each database is modeled by an intra-database MMM which enables accurate image retrieval within the database. Then the conceptual database clustering is performed and cluster-level knowledge summarization is conducted to reduce the cost of retrieving images across the databases. This framework has been tested using a set of image databases, which contain various numbers of images with different dimensions and concept categories. The experimental results demonstrate that our framework achieves better retrieval accuracy via inter-cluster retrieval than that of intra-cluster retrieval with minimal extra effort.


Fazli Can

A unified framework to support image database clustering and content-based image retrieval is proposed by the authors. For this purpose, they use the Markov model mediators (MMMs) mechanism to facilitate conceptual clustering. A MMM (triple M) is a stochastic finite state machine. Each state of a MMM is attached to a stochastic output process for describing the probability of occurrences of the output symbols (states). In the study, MMMs are used to explore affinity relations among the images at the database and cluster levels. The authors consider both effectiveness and efficiency issues, and provide, respectively, some reasonable results and a short discussion. For the latter, they ignore the off-line cost of clustering and related activities, with the assumption that they will be performed annually or semi-annually. This sounds quite unrealistic in dynamic environments, and needs further attention. The authors seek to address the problems that involve large databases, and, in their experiments, they use 12 databases, with a total of 18,700 images. This is a good experimental number, but it leaves a lot of ground to cover in terms of scalability. In their discussion, the authors use the term "distributed databases," and actually mean "autonomous databases," that is, databases that involve no interrelated integrity constraints. They state what they mean by "distributed databases" in their text. However, for clarity, they should have used either the phrase "autonomous databases" or "multi databases," instead of "distributed databases," throughout the paper. The material looks interesting, but it is hard to appreciate without additional reading. A nicely written, yet long general introduction could have been shortened, and the unused half-page at the end of this conference paper could have been used to provide more information about the authors' related earlier work. Online Computing Reviews Service

