skip to main content
10.1145/1363686.1364053acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

Knowledge-free discovery of domain-specific multiword units

Published:16 March 2008Publication History

ABSTRACT

The discovery of multiword units is one of the key steps in the preprocessing of raw text. In this paper, we propose a know ledge-free approach for the discovery on such entities- It does not only outperform state-of-the-art approaches, but is also fully unsupervised. Furthermore, it does not demand the setting of any threshold, making it appropriate for usage by non-experts. The approach proposed is evaluated against five other metrics on a medical corpus.

References

  1. Y. Choueka. Looking for needles in a haystack or locating interesting collocation expressions in large textual databases. In Proceedings of the RIAO '88, pages 38--43, 1988.Google ScholarGoogle Scholar
  2. C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachussets, first edition, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Frank A. Smadja. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143--177, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Bourigault. Lexter: A terminology extraction software for knowledge acquisition from texts. In 9th Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada, 1995.Google ScholarGoogle Scholar
  5. I. Dagan and K. Church. Termight: identifying and translating technical terminology. In Proceedings of the fourth conference on Applied natural language processing, pages 34--40, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Justeson and S. Katz. Co-occurrences of antonymous adjectives and their contexts. Computational Linguistics, 17(1):1--20, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. V. E. Giuliano. The interpretation of word associations. In M. E. et al Stevens, editor, Proceedings of the Symposiums on Statistical Association Methods for Mechanical Documentation, number 269, Washington D.C., 1964. NBS.Google ScholarGoogle Scholar
  8. J. Ferreira da Silva and G. Pereira Lopes. A local maxima method and a fair dispersion normalization for extracting multi-words units from corpora. In Sixth Meeting on Mathematics of Language, pages 369--381, Orlando, USA, 1999.Google ScholarGoogle Scholar
  9. L. R. Dice. Measures of the amount of ecological association between species. Ecology, 26:297--302, 1945.Google ScholarGoogle Scholar
  10. P. Schone. Toward Knowledge-Free Induction of Machine-Readable Dictionaries. PhD thesis, University of Colorado at Boulder, Boulder, USA, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Dias. Extraction Automatique dŠAssociations Lexicales à partir de Corpora. PhD thesis, New University of Lisbon (Portugal) and LIFO University of Orléans (France), Lisbon, Portugal, 2002.Google ScholarGoogle Scholar
  12. George A. Miller. Word-net: An on-line lexical database. International Journal of Lexicography, 3(4):235--244, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  13. Christian Charras and Thierry Lecroq. Handbook of Exact String Matching Algorithms. King's College Publications, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Richard Hamming. Error-detecting and error-correcting codes. In Bell System Technical Journal, volume 29(2), pages 147--160, 1950.Google ScholarGoogle ScholarCross RefCross Ref
  15. Patrick Schone and Daniel Jurafsky. Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pages 100--108, 2001.Google ScholarGoogle Scholar
  16. Kenneth W. Church and Patrick Hanks. Word association norms, mutual information, and lexicography. In Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, pages 76--83, Vancouver, B. C., 1989. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Knowledge-free discovery of domain-specific multiword units

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
      March 2008
      2586 pages
      ISBN:9781595937537
      DOI:10.1145/1363686

      Copyright © 2008 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 16 March 2008

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader