skip to main content
10.1145/253168.253200acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article
Free Access

Hierarchic document classification using Ward's clustering method

Authors Info & Claims
Published:01 September 1986Publication History

ABSTRACT

In this paper, we discuss the application of a recent hierarchic clustering algorithm to the automatic classification of files of documents. Whereas most hierarchic clustering algorithms involve the generation and updating of an inter-object dissimilarity matrix, this new algorithm is based upon a series of nearest neighbor searches. Such an approach is appropriate to several clustering methods, including Ward's method which has been shown to perform well in experimental studies of hierarchic document clustering. A description is given of heuristics which can increase the efficiency of the new algorithm when it is used to cluster three document collections by Ward's method.

References

  1. BENT80.Bentley, J.L., Wetde, B.W. and YAK), A.C. (1980), "OptLma# ExpeQ#ed-Ttme A1Borithms for Closest Point FYoble#s ," ACH Transaatlons on Ha them at iQ al Soft,are, 6, 563-580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. CROF77.Croft, W.B. (1977), "Clustering Large Files of Documents Usln8 the 5i,81 e-L#nk Hethod," Journal of the #merlcan Soetety for Infoneaf.:Lon S#tence, 28, 3" 1-3#q.Google ScholarGoogle Scholar
  3. CROF80.Croft, W.B. (1980), "A Hodel of (:luster Searching Using Classsi~1cation," information S st.#,,.#_#, 5, 189-1 95.Google ScholarGoogle Scholar
  4. DAYE84.Day, W.B.E and Edelabrunner, H. (198#), "Efftolent Algorithms for #glomerattve Hlerarchioal ClusterlnE Hethoda,e Journal of Classification, 1, 7-2q.Google ScholarGoogle Scholar
  5. DEFA77.} De lays, O. (1977), "# F#ft~ lent A1gorlthm for a Ccmplete Link Hethod," Journa______11, 20, 93-95.Google ScholarGoogle Scholar
  6. FRIE77.Friedman, J.H., Bentley, J.L. and Finkel, R.A. (I 977), "b ~1gorlth# for Fird1.i Bes# Hatches In Lo#arlthmt~ Expected Time," A(;M Transaetlons on l#thel#a1#2C#L# #oft#lare. 3, 209-- 226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. GOFF69.Goff#en, W. (1969), "# Indlreot Hethod of Znfomation Retrieval," #nformatlon $toral#e and Retr i ev al, q, 363-3 73.Google ScholarGoogle Scholar
  8. GRIF84.Grifflths, A. , Robinson, L.A. and WilIett, P. (198JI), "Hierarehle Agglomerative C1 ustering #Mthods for #utomatto #eument Classification ," Journal of Documentation, qO, 175-205.Google ScholarGoogle Scholar
  9. GRIF86.Grlffiths, A. , Luekhurst, H.C. and W111ett, P. (1 986), "Uslng Inter-I#eument Similarity Information in Ibeum ent Retrieval Systems," Journal of the American Soelety for Information Science, 37, 3-i I. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. JARD71a.Jard ine, N. and Sib son, R. (1971), Ha%hematlcal Taxonomy New York: Wiley.Google ScholarGoogle Scholar
  11. JARD71b.Jardine, tl. and Van RlJsbergen, C.J. (1971), "The Use of Hierarchic Clustering in Information Retrieval," Information #orage and Retrieval, 7, 217-2q0.Google ScholarGoogle ScholarCross RefCross Ref
  12. KITT78.Kittler, J. (1978), "A Method for Determining K-Nearest NeIEhbotrs ," K ybernetes, 7, 313-315.Google ScholarGoogle ScholarCross RefCross Ref
  13. LANC67.Lance, G.N. and Williams, W.T. (1967), "A General Theory of Classificatory Sorting Strat#les. I. Hierarchical Systems," Jo ur na__#l, 9, 373-3 80.Google ScholarGoogle Scholar
  14. MURT82.HurtaKh, F. (1982), "A Very Fast, Exaat Nearest NelEhbour Algorithm for Use in Information Retrieval," Information Technology: Research and Development, I, 275-283.Google ScholarGoogle Scholar
  15. MURT83.Murtagh, F. (1983), "A SUrvey of Peoent Advances in Hierarchical Clustering Algorithms," Com#uter Journal, 26, 354-359.Google ScholarGoogle ScholarCross RefCross Ref
  16. MURT84a.rlurtagh, F. (198q), "Complexities of Hierarchic Clustering Algorithms: State of the Art," Computational Statistlcs Quarterly, 1, 101-1 13.Google ScholarGoogle Scholar
  17. MURT84b.MurtsEh, F. (1984), "A Revlew of Fast Techniques for Nearest Neighbour Searching," in ~(#F3TAT 198q, Vienna: Physloa-Verl#.Google ScholarGoogle Scholar
  18. NORE77.Moreault, T., F#I1, H. and MeGill, H.J. (1977), "Autcmatle Ranked Output from Boolean Searches in SIRE," Journal of the #eriean Soelety for Information Science, 28, 333#339.Google ScholarGoogle Scholar
  19. PERR83.Perry, S.A. and Willett, P. (1983), "A Review of the Use of Inverted Files for Best Match Retr lev al in In form ation Retr iev al Systems," Journal of Infomatlon Science 6, 59- 66.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. POGU84.Pogue, C.A. and Willett, P. (198q),-An Evaluatt.on of Doeumen# Retrieval from Serial Files Using the ICL I# sir Ib u# ed #ray Processor ," Online Review 8, 569-58q.Google ScholarGoogle Scholar
  21. SALT71.Salton, G. (1971), The S1ART Retrieval System, New Jersey: Prentlee-4lall.Google ScholarGoogle Scholar
  22. SALT81.Salton, G. and Bergmark, D. (i 981 ), "Parall el Computations in Information Retrleval," I#ture Motes in Computer Selenee, 111, 328-3 q2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. SALT83.Salton, G. and #GilI, H.J. (1983), XntroduQtlon to t#dern Information Retrleval, #ew Yet'k: +#raw-mill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. SIBS73.Sibson, R. (1973), "BLINK: an Optimally Efflclent A#gorlthm for the Slngle-bink C#usber Method ," Computer Journal, 1 6, #0-3#.Google ScholarGoogle Scholar
  25. SMEA81.Smeaton, A.F. and Van RIJ#bergen, C.J. (1981), "The Nearem# Neighbour Problem in Information Retrieval. An Algorithm Uslr# Upperbourds," ACH S.IGIR Forum, 16, 83-8?. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. VANR74.Van R#j#beraen, C.J. (1974), "Further Experiments with Hierarehie Clustering in Document Retrieval ," Information Storage and Retrieval, 10, 1-1#.Google ScholarGoogle ScholarCross RefCross Ref
  27. VANR75.Van Pdjsbergen, C.J. and Croft, W.B. (1975), "Document Clustering: an Evaluation of some Experiments with the Cran field 1gO0 Collection," Information StorsEe and Retrieval, 11, 171-182.Google ScholarGoogle Scholar
  28. VANR79.Van Rijsbergen, c.a. (1979), Information Retrieval London: BUtteraprth. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. WEIS81.Weiss, S.F. (1981), "A Probablllstie AIEorithm for Nearest Nei#hbouc Searching," in Oddy, R.N, et al. (eds.) Information Retrieval Research# London: Butterwprth. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Hierarchic document classification using Ward's clustering method

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '86: Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
      September 1986
      283 pages
      ISBN:0897911873
      DOI:10.1145/253168

      Copyright © 1986 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 September 1986

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader