research-article

Knowledge-free discovery of domain-specific multiword units

Author:
Axel-Cyrille Ngonga Ngomo

Institute of Computer Sciences, Leipzig, Germany

Institute of Computer Sciences, Leipzig, Germany
View Profile

SAC '08: Proceedings of the 2008 ACM symposium on Applied computingMarch 2008Pages 1561–1565https://doi.org/10.1145/1363686.1364053

Published:16 March 2008Publication History

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

Pages 1561–1565

ABSTRACT

The discovery of multiword units is one of the key steps in the preprocessing of raw text. In this paper, we propose a know ledge-free approach for the discovery on such entities- It does not only outperform state-of-the-art approaches, but is also fully unsupervised. Furthermore, it does not demand the setting of any threshold, making it appropriate for usage by non-experts. The approach proposed is evaluated against five other metrics on a medical corpus.

References

Y. Choueka. Looking for needles in a haystack or locating interesting collocation expressions in large textual databases. In Proceedings of the RIAO '88, pages 38--43, 1988.Google Scholar
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, Massachussets, first edition, June 1999. Google ScholarDigital Library
Frank A. Smadja. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143--177, 1993. Google ScholarDigital Library
D. Bourigault. Lexter: A terminology extraction software for knowledge acquisition from texts. In 9th Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada, 1995.Google Scholar
I. Dagan and K. Church. Termight: identifying and translating technical terminology. In Proceedings of the fourth conference on Applied natural language processing, pages 34--40, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
J. Justeson and S. Katz. Co-occurrences of antonymous adjectives and their contexts. Computational Linguistics, 17(1):1--20, 1991. Google ScholarDigital Library
V. E. Giuliano. The interpretation of word associations. In M. E. et al Stevens, editor, Proceedings of the Symposiums on Statistical Association Methods for Mechanical Documentation, number 269, Washington D.C., 1964. NBS.Google Scholar
J. Ferreira da Silva and G. Pereira Lopes. A local maxima method and a fair dispersion normalization for extracting multi-words units from corpora. In Sixth Meeting on Mathematics of Language, pages 369--381, Orlando, USA, 1999.Google Scholar
L. R. Dice. Measures of the amount of ecological association between species. Ecology, 26:297--302, 1945.Google Scholar
P. Schone. Toward Knowledge-Free Induction of Machine-Readable Dictionaries. PhD thesis, University of Colorado at Boulder, Boulder, USA, 2001. Google ScholarDigital Library
G. Dias. Extraction Automatique dŠAssociations Lexicales à partir de Corpora. PhD thesis, New University of Lisbon (Portugal) and LIFO University of Orléans (France), Lisbon, Portugal, 2002.Google Scholar
George A. Miller. Word-net: An on-line lexical database. International Journal of Lexicography, 3(4):235--244, 1990.Google ScholarCross Ref
Christian Charras and Thierry Lecroq. Handbook of Exact String Matching Algorithms. King's College Publications, 2004. Google ScholarDigital Library
Richard Hamming. Error-detecting and error-correcting codes. In Bell System Technical Journal, volume 29(2), pages 147--160, 1950.Google ScholarCross Ref
Patrick Schone and Daniel Jurafsky. Is knowledge-free induction of multiword unit dictionary headwords a solved problem? In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, pages 100--108, 2001.Google Scholar
Kenneth W. Church and Patrick Hanks. Word association norms, mutual information, and lexicography. In Proceedings of the 27th. Annual Meeting of the Association for Computational Linguistics, pages 76--83, Vancouver, B. C., 1989. Association for Computational Linguistics. Google ScholarDigital Library

Index Terms

Knowledge-free discovery of domain-specific multiword units
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing

Since Sag et al. (2002) highlighted a key problem that had been underappreciated in the past in natural language processing (NLP), namely idiosyncratic multiword expressions (MWEs) such as idioms, quasi-idioms, cliches, quasi-cliches, institutionalized ...
Read More
Non-Contextual vs Contextual Word Embeddings in Multiword Expressions Detection
Computational Collective Intelligence
Abstract
Multiword Expression (MWE) detection is a crucial problem for many NLP applications. Recent methods approach it as a sequence labeling task and require manually annotated corpus. Traditional methods are based on statistical association measures ...
Read More
Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '08: Proceedings of the 2008 ACM symposium on Applied computing
March 2008
2586 pages
ISBN:9781595937537
DOI:10.1145/1363686
Conference Chairs:
Roger L. Wainwright
University of Tulsa
,
Hisham M. Haddad
Kennesaw State University
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 16 March 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
knowledge-free algorithms
multiword units
natural language processing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 126
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Knowledge-free discovery of domain-specific multiword units

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing

Non-Contextual vs Contextual Word Embeddings in Multiword Expressions Detection

Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Knowledge-free discovery of domain-specific multiword units

SAC '08: Proceedings of the 2008 ACM symposium on Applied computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

A lexicon of multiword expressions for linguistically precise, wide-coverage natural language processing

Non-Contextual vs Contextual Word Embeddings in Multiword Expressions Detection

Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media