skip to main content
10.1145/1217935.1217966acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
Article

Ferret: a toolkit for content-based similarity search of feature-rich data

Published:18 April 2006Publication History

ABSTRACT

Building content-based search tools for feature-rich data has been a challenging problem because feature-rich data such as audio recordings, digital images, and sensor data are inherently noisy and high dimensional. Comparing noisy data requires comparisons based on similarity instead of exact matches, and thus searching for noisy data requires similarity search instead of exact search.The Ferret toolkit is designed to help system builders quickly construct content-based similarity search systems for feature-rich data types. The key component of the toolkit is a content-based similarity search engine for generic, multi-feature object representations. To solve the similarity search problem in high-dimensional spaces, we have developed approximation methods inspired by recent theoretical results on dimension reduction. The search engine constructs sketches from feature vectors as highly compact data structures for matching, filtering and ranking data objects. The toolkit also includes several other components to help system builders address search system infrastructure issues. We have implemented the toolkit and used it to successfully construct content-based similarity search systems for four data types: audio recordings, digital photos, 3D shape models and genomic microarray data.

References

  1. Altavista. http://www.altavista.com.Google ScholarGoogle Scholar
  2. Spotlight: Find anything on your mac instantly. http://images.apple.com/macosx/pdf/MacOSX_Spotlight_TB.pdf.Google ScholarGoogle Scholar
  3. 3D model retrieval. http://amp.ece.cmu.edu/projects/3DModelRetrieval/.Google ScholarGoogle Scholar
  4. 3D model retrieval. http://3d.csie.ntu.edu.tw/~dynamic/.Google ScholarGoogle Scholar
  5. 3D model retrieval. http://shape.cs.princeton.edu/search.html.Google ScholarGoogle Scholar
  6. A. Berenzweig and D. Ellis. Locating singing voice segments within music signals. In Proc. of IEEE Workshop on Applications of Signal Processing to Acoustics and Audio, October 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proc. of the 7th World Wide Web Conference, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Broder, M. Charikar, A. Frieze, and M. Mitzenmacher. Min-wise independent permutations. Journal of Computer Systems and Sciences, 60(3):630--659, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the web. In Proc. of the Sixth Int. World Wide Web Conf., pages 391--404, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. Cardone, S. K. Gupta, and M. Karnik. A survey of shape similarity assessment algorithms for product design and manufacturing applications. Journal of Computing and Information Science in Engineering, 3(2):109--118, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  11. M. Charikar. Similarity estimation techniques from rounding algorithms. In Proc. of the 34th Annual ACM Symp. on Theory of Computing, pages 380--388, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Deng and B. S. Manjunath. Unsupervised segmentation of color-texture regions in images and video. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Dumais, E. Cutrell, J. Cadiz, G. Jancke, R. Sarin, and D. C. Robbins. Stuff I've seen: A system for personal information retrieval and re-use. In Proc. of the 26th ACM SIGIR Conf. on Research and Development in Information Retrieval, pages 72--79, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. P. Eakins and M. e. Graham. Content-based image retrieval: A report to the JISC technology applications programme. Technical report, University of Northumbria at newcastle, Institute for Image Data Research, 1999.Google ScholarGoogle Scholar
  15. I. K. Fodor. A survey of dimension reduction techniques. Technical Report UCRL-ID-148494, Lawrence Livermore National Laboratory, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren. DARPA TIMIT acoustic-phonetic continuous speech corpus, 1993.Google ScholarGoogle Scholar
  17. J. Gemmell, G. Bell, R. Lueder, S. Drucker, and C. Wong. Mylifebits: Fulfilling the Memex vision. In Proc. of ACM Multimedia, Conference, pages 235--238, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Grauman and T. Darrell. Fast contour matching using approximate earth mover's distance. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Gray and A. S. Szalay. Where the rubber meets the sky: Bridging the gap between databases and science. IEEE Data Engineering Bulletin, 27(4):3--11, December 2004.Google ScholarGoogle Scholar
  20. A. Hauptmann, R. Jones, K. Seymore, S. Slattery, M. Witbrock, and M. Siegler. Experiments in information retrieval from spoken documents. In In Proc. of the Broadcast News Transcription and Understanding Workshop, pages 175--181, 1998.Google ScholarGoogle Scholar
  21. ftp://db.stanford.edu/pub/wangz/image.vary.jpg.tar.Google ScholarGoogle Scholar
  22. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proc. of the 30th Annual ACM Symposium on Theory of Computing, pages 604--613, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. P. Indyk and N. Thaper. Fast image retrieval via embeddings. In Proc. of the 3rd Int. Workshop on Statistical and Computational Theories of Vision, 2003.Google ScholarGoogle Scholar
  24. N. Iyer, S. Jayanti, K. Lou, Y. Kalyanaraman, and K. Ramani. Three dimensional shape searching: State-of-the-art review and future trends. Computer-Aided Design, 37(5):509--530, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Kazhdan, T. Funkhouser, and S. Rusinkiewicz. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Proc. of the Eurographics Symposium on Geometry Processing, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. E. Kushilevitz, R. Ostrovsky, and Y. Rabani. Efficient search for approximate nearest neighbor in high dimensional spaces. SIAM Journal of Computing, 30(2):457--474, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Q. Lv, M. Charikar, and K. Li. Image similarity search with compact data structures. In Proc. of the 13th ACM Conf. on Information and Knowledge Management, pages 208--217, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Lyman, H. Varian, K. Swaringen, P. Charles, N. Good, L. Jordan, and J. Pal. How much information 2003? http://www.sims.berkeley.edu/research/projects/how-much-info-2003.Google ScholarGoogle Scholar
  29. W. Ma and H. Zhang. Benchmarking of image features for content-based retrieval. In Proc. of IEEE 32nd Asilomar Conf. on Signals, Systems, Computers, volume 1, pages 253--257, 1998.Google ScholarGoogle Scholar
  30. N. Moreau, H. G. Kim, and T. Sikora. Phone-based spoken document retrieval in conformance with the mpeg-7 standard. Proc. of the Audio Engineering Society 25th Intl. Conf., 2004.Google ScholarGoogle Scholar
  31. M. Olson, K. Bostic, and M. Seltzer, Berkeley DB. In Proc. of the 1999 Summer USENIX Technical Conf., June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. Rabiner and M. Sambur. An algorithm for determining the endpoints of isolated utterances. Bell System Technical Journal, 54:297--315, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  33. Y. Rubner, C. Tomasi, and L. J. Guibas. The earth mover's distance as a metric for image retrieval. Int. Journal of Computer Vision, 40(2):99--121, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Y. Rui, T. S. Huang, and S.-F. Chang. Image retrieval: Current techniques, promising directions and open issues. J. of Visual Communication and Image Representation, 10(4):39--62, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. R. Schettini, G. Ciocca, and S. Zuffi. A survey of methods for color image indexing and retrieval in image databases. Color Imaging Science: Exploiting Digital Media, 2001.Google ScholarGoogle Scholar
  36. P. Shilane, M. Kazhdan, P. Min, and T. Funkhouser. The Princeton shape benchmark. In Proc. of the Conf. on Shape Modeling and Applications, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Siegler and M. Witbrock. Improving the suitability of imperfect transcriptions for information retrieval of spoken documents. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Sets of Similar Images. http://dbvis.inf.unikonstanz.de/research/projects/SimSearch/effpics.html.Google ScholarGoogle Scholar
  39. A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-base image retrieval at the end of the early years. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(12), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. W. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12):1349--1380, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. G. Tzanetakis and P. Cook. MARSYAS: A Framework for Audio Analysis. Cambridge University Press, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. G. Tzanetakis and P. Cook. Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing, 10(5), July 2002.Google ScholarGoogle ScholarCross RefCross Ref
  43. R. C. Veltkamp. Shape matching: Similarity measures and algorithms. In Proc. of the Int. Conf. on Shape Modeling & Applications, page 188, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. C. Veltkamp and M. Tanase. Content-base image retrieval systems: A survey, Technical Report UU-CS-2000-34, Utrecht University, Information and Computer Sciences, 2000.Google ScholarGoogle Scholar
  45. J. Z. Wang, J. Li, and G. Wiederhold. SIMPLIcity: Semantics-sensitive integrated matching for picture libraries. IEEE Trans. on Pattern Analysis and Machine Intelligence, 23(9):947--963, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ferret: a toolkit for content-based similarity search of feature-rich data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      EuroSys '06: Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
      April 2006
      420 pages
      ISBN:1595933220
      DOI:10.1145/1217935
      • cover image ACM SIGOPS Operating Systems Review
        ACM SIGOPS Operating Systems Review  Volume 40, Issue 4
        Proceedings of the 2006 EuroSys conference
        October 2006
        383 pages
        ISSN:0163-5980
        DOI:10.1145/1218063
        Issue’s Table of Contents

      Copyright © 2006 Authors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 April 2006

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate241of1,308submissions,18%

      Upcoming Conference

      EuroSys '24
      Nineteenth European Conference on Computer Systems
      April 22 - 25, 2024
      Athens , Greece

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader