skip to main content
10.1145/1099554.1099703acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Taxonomies by the numbers: building high-performance taxonomies

Published:31 October 2005Publication History

ABSTRACT

In this paper, we describe a system for the construction of taxonomies which yield high accuracies with automated categorization systems, even on Web and intranet documents. In particular, we describe the way in which measurement of five key features of the system can be used to predict when categories are sufficiently well defined to yield high accuracy categorization. We describe the use of this system to construct a large (8800-category) general-purpose taxonomy and categorization system.

References

  1. Adami, G., Avesani, P., and Sona, D. 2003. Bootstrapping for hierarchical document classification. In Proceedings of the Twelfth international Conference on information and Knowledge Management (New Orleans, LA, USA, November 03 - 08, 2003). CIKM '03. ACM Press, New York, NY, 295--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Aggarwal, C. C., Gates, S. C., and Yu, P. S. 1999. On the merits of building categorization systems by supervised clustering. In Proceedings of the Fifth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (San Diego, California, United States, August 15 - 18, 1999). KDD '99. ACM Press, New York, NY, 352--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Anagnostopoulos, A., Broder, A. Z., and Carmel, D. 2005. Sampling search-engine results. In Proceedings of the 14th international Conference on World Wide Web (Chiba, Japan, May 10 - 14, 2005). WWW '05. ACM Press, New York, NY, 245--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Broder, A. Z. and Ciccolo, A. C. 2004. Towards the next generation of enterprise search technology. IBM Systems J. 43, 3 (Jul. 2004), 451--454. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Byeungwoo Jeon and David Landgrebe, Partially Supervised Classification Using Weighted Unsupervised Clustering, IEEE Transactions on Geoscience and Remote Sensing, Vol. 37, No.2, pp 1073--1079, March 1999.Google ScholarGoogle Scholar
  6. Cody, W. F., Kreulen, J. T., Krishna, V., and Spangler, W. S. 2002. The integration of business intelligence and knowledge management. IBM Syst. J. 41, 4 (Oct. 2002), 697--713. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cohn, D. A.; Ghahramani, Z.; and Jordan, M. I. 1995. Active learning with statistical models. In Tesauro, G.; Touretzky, D.; and Alspector, J., eds., Advances in Neural Information Processing, Volume 7. Morgan Kaufmann.Google ScholarGoogle Scholar
  8. Eirinaki, M., Vazirgiannis, M., and Varlamis, I. 2003. SEWeP: using site semantics and a taxonomy to enhance the Web personalization process. In Proceedings of the Ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Washington, D.C., August 24 - 27, 2003). KDD '03. ACM Press, New York, NY, 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ferrucci, D. and Lally, A. 2004. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat. Lang. Eng. 10, 3-4 (Sep. 2004), 327--348. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michelangelo Ceci, Floriana Esposito, Michele Lapi, Donato Malerba: Automated Classification of Web Documents into a Hierarchy of Categories. In Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM'03 (Zakopane, Poland, June 2-5, 2003). 59--68Google ScholarGoogle Scholar
  11. Neff, M. S., Byrd, R. J., and Boguraev, B. K. 2004. The Talent system: TEXTRACT architecture and data model. Nat. Lang. Eng. 10, 3-4 (Sep. 2004), 307--326. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nigam, K., McCallum, A. K., Thrun, S., and Mitchell, T. 2000. Text Classification from Labeled and Unlabeled Documents using EM. Mach. Learn. 39, 2-3 (May. 2000), 103--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Pelikan, M., Leous, J., Pearce, R., Smith, M. E., and Vaught, R. 2004. Searching for the needle in the haystack: taxonomies, tags and targets. In Proceedings of the 32nd Annual ACM SIGUCCS Conference on User Services (Baltimore, MD, USA, October 10 - 13, 2004). SIGUCCS '04. ACM Press, New York, NY, 256--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Pohs, W., Pinder, G., Dougherty, C., and White, M. 2001. The Lotus Knowledge Discovery System: tools and experiences. IBM Systems J. 40, 4 (Oct. 2001), 956--966. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pohs, Wendi, In: Practical Knowledge Management: The Lotus Discovery Server, IBM Press (2001), 53.Google ScholarGoogle Scholar
  16. Prieto-Díaz, R. 1991. Implementing faceted classification for software reuse. Commun. ACM 34, 5 (May. 1991), 88--97 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Spangler, S. and Kreulen, J. 2002. Interactive methods for taxonomy editing and validation. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM Press, New York, NY, 665--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Tzitzikas, Y., Spyratos, N., and Constantopoulos, P. 2005. Mediators over taxonomy-based information sources. The VLDB Journal 14, 1 (Mar. 2005), 112--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Zhang, L., Liu, S., Pan, Y., and Yang, L. 2004. InfoAnalyzer: a computer-aided tool for building enterprise taxonomies. In Proceedings of the Thirteenth ACM Conference on information and Knowledge Management (Washington, D.C., USA, November 08 - 13, 2004). CIKM '04. ACM Press, New York, NY, 477--483. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Taxonomies by the numbers: building high-performance taxonomies

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management
          October 2005
          854 pages
          ISBN:1595931406
          DOI:10.1145/1099554

          Copyright © 2005 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 31 October 2005

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          CIKM '05 Paper Acceptance Rate77of425submissions,18%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader