skip to main content
10.1145/133160.133214acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Scatter/Gather: a cluster-based approach to browsing large document collections

Authors Info & Claims
Published:01 June 1992Publication History

ABSTRACT

Document clustering has not been well received as an information retrieval tool. Objections to its use fall into two main categories: first, that clustering is too slow for large corpora (with running time often quadratic in the number of documents); and second, that clustering does not appreciably improve retrieval.

We argue that these problems arise only when clustering is used in an attempt to improve conventional search techniques. However, looking at clustering as an information access tool in its own right obviates these objections, and provides a powerful new access paradigm. We present a document browsing technique that employs document clustering as its primary operation. We also present fast (linear time) clustering algorithms which support this interactive browsing paradigm.

References

  1. 1.Chris Buckley and Alan F. Lewit. Optimizations of inverted vector searches. In Proceedings of the Eighth Annual International A CM SIGIR Conference on Research and Development in {nfoT'mat~on Retrieval, pages 97-110, 1985. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.W.B. Croft. Clustering large files of documents using the single-link method. Journal of the Amemcan Soczety for Informatzon Science, 28:341-344, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  3. 3.A. E1-Hamdouchi and P. Willett. Hierarchical document clustering using Ward's method. In Proceedzngs of the N, nth InternatzonaI Conference on Research and Development in Informatzon Retrieval, pages 149-156, 1986. Google ScholarGoogle Scholar
  4. 4.A. Grifiiths, H.C. Luckhurst, and P. Willett. Using inter-document similarity information in document retrieval systems. Journal of the American Society for Information Sczence, 37:3-11, 1986.Google ScholarGoogle ScholarCross RefCross Ref
  5. 5.Anil K. aain and Richard C. Dubes. Algorithms for Clustering Data. Pretice Hall, Engelwood Cliffs, N.J. 07632, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.N. aardine and C.J. van Rijsbergen. The use of hierarchical clustering in information retrieval. Informatzon Storage and Retrzeval, 7:217-240, 1971.Google ScholarGoogle ScholarCross RefCross Ref
  7. 7.O. Pedersen, D. R. Cutting, and a. w. Tukey. Snippet search: a single phrase approach to text access. In Proceedings of the 1991 Yoznt Statistical Meetings. American Statistical Association, 1991. Also available as Xerox PARC technical report SSL- 91-08.Google ScholarGoogle Scholar
  8. 8.G. Salton. The SMART Retmeval System. Prentice- Hall, Englewood Cliffs, N.J., 1971.Google ScholarGoogle Scholar
  9. 9.G. Salton and M. a. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.R. Sibson. SLINK: an optimally efficient algorithm for the single link cluster method. Computer Journal, 16:30-34, 1973.Google ScholarGoogle ScholarCross RefCross Ref
  11. 11.C.J. van Rijsbergen. Information Retmeval. Butterworths, London, second edition, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. 12.C.j. van Rijsbergen and W.B. Croft. Document clustering: An evaluation of some experiments with the Cranfield 1400 collection. Information Processing Management, 11:171-182, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  13. 13.P. Willett. Document clustering using an inverted file approach. Journal of Informatzon Sczence, 2:223- 231, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  14. 14.P. Willett. A fast procedure for the calculation of similarity coefficients in automatic classification. Informatzon Processzng ~ Management, 17:53-60, 1981.Google ScholarGoogle Scholar
  15. 15.P. Willett. Recent trends in hierarchical document clustering: A critical review. Information Processing Management, 24(5):577-597, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Scatter/Gather: a cluster-based approach to browsing large document collections

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '92: Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
          June 1992
          352 pages
          ISBN:0897915232
          DOI:10.1145/133160

          Copyright © 1992 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1992

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader