research-article

Algorithms for the verification of the semantic relation between a compound and a given lexeme

Authors:
Gudrun Kellner

Vienna University of Technology, Austria

Vienna University of Technology, Austria
View Profile

,
Johannes Grünauer

Vienna University of Technology, Austria

Vienna University of Technology, Austria
View Profile

i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge TechnologiesSeptember 2012Article No.: 5Pages 1–8https://doi.org/10.1145/2362456.2362463

Published:05 September 2012Publication History

i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies

Pages 1–8

ABSTRACT

Text mining on a lexical basis is quite well developed for the English language. In compounding languages, however, lexicalized words are often a combination of two or more semantic units. New words can be built easily by concatenating existing ones, without putting any white spaces in between.

That poses a problem to existing search algorithms: Such compounds could be of high interest for a search request, but how can be examined whether a compound comprises a given lexeme? A string match can be considered as an indication, but does not prove semantic relation. The same problem is faced when using lexicon based approaches where signal words are defined as lexemes only and need to be identified in all forms of appearance, and hence also as component of a compound. This paper explores the characteristics of compounds and their constituent elements for German, and compares seven algorithms with regard to runtime and error rates. The results of this study are relevant to query analysis and term weighting approaches in information retrieval system design.

References

Alfonseca, E., Bilac, S., Pharies, S.: Decompounding query keywords from compounding languages, in: Proceedings of ACL-08, Columbus, 2008, pp. 253--256. Google ScholarDigital Library
Alfonseca, E., Bilac, S., Pharies, S.: German Decompounding in a Difficult Corpus, Springer, Berlin, 2008, pp. 128--139. Google ScholarDigital Library
Baccianella, S., Esuli, A., Sebastiani, F.: SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of the 7th conference on International Language Resources and Evaluation LREC10, pp. 2200--2204, 2008.Google Scholar
Bozsahin, C.: The Combinatory Morphemic Lexicon, Middle East Technical University, Turkey, 2002. Google ScholarDigital Library
Braschler, M., Ripplinger, B.: How Effective is Stemming and Decompounding for German Text Retrieval?, in: Information Retrieval 7, 2004, pp. 291--316. Google ScholarDigital Library
Brown, R. D.: Corpus-Driven Splitting of Compound Words, in: Proceedings of the TMI 2002, Keihanna, Japan, 2011, pp. 12--21.Google Scholar
Canoo Engineering AG: Deutsche Wörterbücher und Grammatik. Available at: http://www.canoo.net/services/WordformationRules/ueberblick/ (Feb. 2012)Google Scholar
Carstensen, K.-U., Ebert, Ch., Ebert, C., Jekat, S., Langer, H. and Klabunde, R.: Computerlinguistik und Sprachtechnologie, Elsevier, München, 2009.Google Scholar
Geyken, A. and Hanneforth, T.: TAGH: A Complete Morphology for German Based on Weighted Finite State Automata. In Proceedings of the FSMNLP 2005, Springer, Berlin, 2006, 55--66.Google ScholarCross Ref
Gupta, G. K.: Introduction to Data Mining with Case Studies, Prentice-Hall of India, New Delhi, 2006.Google Scholar
Hess, W.: Grundlagen der Phonetik, Rheinische Friedrich Wilhelms Universität Bonn, 2001.Google Scholar
Holz, F., Biemann, C.: Unsupervised and Knowledge-free Learning of Compound Splits and Periphrases, Springer, Berlin, 2008, pp. 117--127. Google ScholarDigital Library
Ingason, A. K., Helgadóttir, S., Loftsson, H. and Rögnvaldsson, E.: A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI). In Proceedings of GoTAL 2008, LNAI vol. 5221. Berlin: Springer, 2008, pp. 205--216. Google ScholarDigital Library
Jürgenson, I. B.: Neuronale Korrelate phonotaktischer Verarbeitung, Dissertation: Universitätsmedizin Berlin, 2009.Google Scholar
Kellner, G.: Wege der Kommunikationsoptimierung. Anwendung von NLP im Bereich der Künstlichen Intelligenz. VDM, Saarbrücken, 2010.Google Scholar
Kellner, G., Berendt, B.: Extracting Knowledge about Cognitive Style. The Use of Sensory Vocabulary in Forums: A Text Mining Approach, in Proceedings of the NLPKE 2011, IEEE Press, 2011.Google Scholar
Macherey, K., Dai, A. M., Talbot, D., Popat, A. C. and Och, F.: Language-independent compound splitting with morphological operations, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011; pp. 1395--1404. Google ScholarDigital Library
Porter, M. F.: An algorithm for suffix stripping, in: Program, Nr. 3, 1980, pp. 130--137.Google Scholar
Porter, M. F., Boulton, R., Miles, P. et al.: Snowball Project: German Stemming Algorithm. Available at: http://snowball.tartarus.org/algorithms/german/stemmer.html (Feb. 2012)Google Scholar
Protaziuk, G., Kryszkiewicz, M., Rybinski, H., Delteil, A.: Discovering Compound and Proper Nouns, Springer, Berlin, 2007, pp. 505--515. Google ScholarDigital Library
Stymne, S.: German Compounds in Factored Statistical Machine Translation, in Proceedings of GoTAL, 6th International Conference on Natural Language Processing, Springer LNCS/LNAI Vol. 5221, 2008, pp. 464--475. Google ScholarDigital Library
Williams, E.: On the Notions "Lexically Related" and "Head of a Word". Linguistic Inquiry 12/2, 1981, pp. 245--274.Google Scholar
Zemb, J. M.: Vergleichende Grammatik Französisch--Deutsch. Part 1: Comparaison de deux systèmes. Part 2: L'économie de la langue et le jeu de la parole. Duden, Mannheim, 1984.Google Scholar

Index Terms

Algorithms for the verification of the semantic relation between a compound and a given lexeme
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A survey on Urdu and Urdu like language stemmers and stemming techniques

Stemming is one of the basic steps in natural language processing applications such as information retrieval, parts of speech tagging, syntactic parsing and machine translation, etc. It is a morphological process that intends to convert the inflected ...
Read More
A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics
Abstract
Word Stemming is a widely used mechanism in the fields of Natural Language Processing, Information Retrieval, and Language Modeling. Language-independent stemmers discover classes of morphologically related words from the ambient ...
Read More
Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this article: 1) How to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
September 2012
244 pages
ISBN:9781450312424
DOI:10.1145/2362456
Conference Chairs:
Stefanie Lindstaedt
Graz University of Technology, Austria
,
Michael Granitzer
University Passau, Germany
Copyright © 2012 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 5 September 2012
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
compound
information retrieval
semantic relation
stemming
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate77of238submissions,32%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 87
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Algorithms for the verification of the semantic relation between a compound and a given lexeme

i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

A survey on Urdu and Urdu like language stemmers and stemming techniques

A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics

Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Algorithms for the verification of the semantic relation between a compound and a given lexeme

i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies

ABSTRACT

References

Cited By

Index Terms

Recommendations

A survey on Urdu and Urdu like language stemmers and stemming techniques

A novel unsupervised corpus-based stemming technique using lexicon and corpus statistics

Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media