skip to main content
10.1145/2362456.2362463acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesi-knowConference Proceedingsconference-collections
research-article

Algorithms for the verification of the semantic relation between a compound and a given lexeme

Published:05 September 2012Publication History

ABSTRACT

Text mining on a lexical basis is quite well developed for the English language. In compounding languages, however, lexicalized words are often a combination of two or more semantic units. New words can be built easily by concatenating existing ones, without putting any white spaces in between.

That poses a problem to existing search algorithms: Such compounds could be of high interest for a search request, but how can be examined whether a compound comprises a given lexeme? A string match can be considered as an indication, but does not prove semantic relation. The same problem is faced when using lexicon based approaches where signal words are defined as lexemes only and need to be identified in all forms of appearance, and hence also as component of a compound. This paper explores the characteristics of compounds and their constituent elements for German, and compares seven algorithms with regard to runtime and error rates. The results of this study are relevant to query analysis and term weighting approaches in information retrieval system design.

References

  1. Alfonseca, E., Bilac, S., Pharies, S.: Decompounding query keywords from compounding languages, in: Proceedings of ACL-08, Columbus, 2008, pp. 253--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alfonseca, E., Bilac, S., Pharies, S.: German Decompounding in a Difficult Corpus, Springer, Berlin, 2008, pp. 128--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Baccianella, S., Esuli, A., Sebastiani, F.: SENTIWORDNET 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. Proceedings of the 7th conference on International Language Resources and Evaluation LREC10, pp. 2200--2204, 2008.Google ScholarGoogle Scholar
  4. Bozsahin, C.: The Combinatory Morphemic Lexicon, Middle East Technical University, Turkey, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Braschler, M., Ripplinger, B.: How Effective is Stemming and Decompounding for German Text Retrieval?, in: Information Retrieval 7, 2004, pp. 291--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Brown, R. D.: Corpus-Driven Splitting of Compound Words, in: Proceedings of the TMI 2002, Keihanna, Japan, 2011, pp. 12--21.Google ScholarGoogle Scholar
  7. Canoo Engineering AG: Deutsche Wörterbücher und Grammatik. Available at: http://www.canoo.net/services/WordformationRules/ueberblick/ (Feb. 2012)Google ScholarGoogle Scholar
  8. Carstensen, K.-U., Ebert, Ch., Ebert, C., Jekat, S., Langer, H. and Klabunde, R.: Computerlinguistik und Sprachtechnologie, Elsevier, München, 2009.Google ScholarGoogle Scholar
  9. Geyken, A. and Hanneforth, T.: TAGH: A Complete Morphology for German Based on Weighted Finite State Automata. In Proceedings of the FSMNLP 2005, Springer, Berlin, 2006, 55--66.Google ScholarGoogle ScholarCross RefCross Ref
  10. Gupta, G. K.: Introduction to Data Mining with Case Studies, Prentice-Hall of India, New Delhi, 2006.Google ScholarGoogle Scholar
  11. Hess, W.: Grundlagen der Phonetik, Rheinische Friedrich Wilhelms Universität Bonn, 2001.Google ScholarGoogle Scholar
  12. Holz, F., Biemann, C.: Unsupervised and Knowledge-free Learning of Compound Splits and Periphrases, Springer, Berlin, 2008, pp. 117--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ingason, A. K., Helgadóttir, S., Loftsson, H. and Rögnvaldsson, E.: A Mixed Method Lemmatization Algorithm Using a Hierarchy of Linguistic Identities (HOLI). In Proceedings of GoTAL 2008, LNAI vol. 5221. Berlin: Springer, 2008, pp. 205--216. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jürgenson, I. B.: Neuronale Korrelate phonotaktischer Verarbeitung, Dissertation: Universitätsmedizin Berlin, 2009.Google ScholarGoogle Scholar
  15. Kellner, G.: Wege der Kommunikationsoptimierung. Anwendung von NLP im Bereich der Künstlichen Intelligenz. VDM, Saarbrücken, 2010.Google ScholarGoogle Scholar
  16. Kellner, G., Berendt, B.: Extracting Knowledge about Cognitive Style. The Use of Sensory Vocabulary in Forums: A Text Mining Approach, in Proceedings of the NLPKE 2011, IEEE Press, 2011.Google ScholarGoogle Scholar
  17. Macherey, K., Dai, A. M., Talbot, D., Popat, A. C. and Och, F.: Language-independent compound splitting with morphological operations, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, 2011; pp. 1395--1404. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Porter, M. F.: An algorithm for suffix stripping, in: Program, Nr. 3, 1980, pp. 130--137.Google ScholarGoogle Scholar
  19. Porter, M. F., Boulton, R., Miles, P. et al.: Snowball Project: German Stemming Algorithm. Available at: http://snowball.tartarus.org/algorithms/german/stemmer.html (Feb. 2012)Google ScholarGoogle Scholar
  20. Protaziuk, G., Kryszkiewicz, M., Rybinski, H., Delteil, A.: Discovering Compound and Proper Nouns, Springer, Berlin, 2007, pp. 505--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Stymne, S.: German Compounds in Factored Statistical Machine Translation, in Proceedings of GoTAL, 6th International Conference on Natural Language Processing, Springer LNCS/LNAI Vol. 5221, 2008, pp. 464--475. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Williams, E.: On the Notions "Lexically Related" and "Head of a Word". Linguistic Inquiry 12/2, 1981, pp. 245--274.Google ScholarGoogle Scholar
  23. Zemb, J. M.: Vergleichende Grammatik Französisch--Deutsch. Part 1: Comparaison de deux systèmes. Part 2: L'économie de la langue et le jeu de la parole. Duden, Mannheim, 1984.Google ScholarGoogle Scholar

Index Terms

  1. Algorithms for the verification of the semantic relation between a compound and a given lexeme

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      i-KNOW '12: Proceedings of the 12th International Conference on Knowledge Management and Knowledge Technologies
      September 2012
      244 pages
      ISBN:9781450312424
      DOI:10.1145/2362456

      Copyright © 2012 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 5 September 2012

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate77of238submissions,32%
    • Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader