ABSTRACT
In this paper, we present an automatic indexing experiment of greek documents. In particular, we describe an attempt to use JEX, the JRC-developed indexing tool, in order to assign EuroVoc descriptors to a collection of Greek open data. We discuss the results and limitations of this approach and we propose solutions which take into account the particularities of the Greek language.
- EuroVoc 2012. Multilingual thesaurus of the European Union. http://eurovoc.europa.eu/Google Scholar
- Fellbaum C. (ed.) 1998. WordNet: An Electronic Lexical Database. MIT Press.Google Scholar
- Geodata.gov.gr 2012. Web service for Greek open geospatial data http://www.geodata.gov.gr/geodataGoogle Scholar
- JEX-JRC EuroVoc Indexer 2014. http://langtech.jrc.ec.europa.eu/Eurovoc.htmlGoogle Scholar
- Karanikolas, N. and Skourlas, C. 2006. Text Classification: Forming Candidate Key-Phrases from Existing Shorter Ones. FACTA UNIVERSITATIS Series: Electronics and Energetics, ISSN 0353-3670, 19, 3.Google Scholar
- Lancaster, F.W. 1998. Indexing and abstracting in theory and practice. Library Association Publishing, London.Google Scholar
- Pouliquen, B., Steinberger, R. and Degeurnel, O. 2008. Story tracking: Linking similar news over time and across languages. In Proceedings of the 2nd workshop "Multi-source Multilingual Information Extraction and Summarization (MMIES'2008)" held at CoLing'2008 (Manchester, Aug.23, 2008). Google ScholarDigital Library
- Pouliquen, B., Steinberger, R. and Ignat, C. 2003. Automatic annotation of multilingual text collections with a conceptual thesaurus. In Proceedings of the workshop "Ontologies and Information Extraction" - at the summer school "The Semantic Web and Language Technology -- Its Potential and Practicalities (EUROLAN 2003)" (Bucharest, July 28 -- Aug. 8, 2003).Google Scholar
- Stamou S., Oflazer K., Pala K., Christoudoulakis D., Cristea D., Tufiş D., Koeva S., Totkov G., Dutoit D., Grigoriadou M. 2002. Balkanet: A Multilingual Semantic Network for the Balkan Languages. In Proceedings of the International Wordnet Conference, January 21-25, Mysore, India, 12--14.Google Scholar
- Steinberger, R., Ebrahim, M. and Turchi, M. 2012. JRC EuroVoc Indexer JEX -- A freely available multi-label categorisation tool. In Proceedings of the 8th Int. Conference LREC'2012, Istanbul, 798--805.Google Scholar
- Steinberger, R., Ehrmann, M., Pajzs, J., Ebrahim, M., Steinberger, J. and Turchi, M. 2013. Multilingual media monitoring and text analysis -- Challenges for highly inflected languages. In Proceedings of the 16th Int. Conference TSD 2013, Pilsen, Springer -- Verlag, 22--33.Google Scholar
- Tsoumakas, G. and Katakis, I. 2007. Multi-label classification: An overview, Int. J. Data Warehousing and Mining, 3, 1--13.Google ScholarCross Ref
- Vossen P. (ed.) 1998. EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers Google ScholarDigital Library
Index Terms
- A Preliminary Investigation into the Automatic EuroVoc Indexing of Greek Documents
Recommendations
A Neural NLP toolkit for Greek
SETN 2020: 11th Hellenic Conference on Artificial IntelligenceWe present a neural NLP toolkit for Greek, currently integrating modules for POS tagging, lemmatization, dependency parsing and text classification. The toolkit is based on language resources including web crawled corpora, word embeddings, large lexica, ...
Development and Enhancement of a Stemmer for the Greek Language
PCI '16: Proceedings of the 20th Pan-Hellenic Conference on InformaticsAlthough there are three stemmers published for the Greek language, only the one presented in this paper and called Ntais' stemmer is freely open and available, together with its enhancements and extensions according to Saroukos' algorithm. The primary ...
Building and evaluating resources for sentiment analysis in the Greek language
Sentiment lexicons and word embeddings constitute well-established sources of information for sentiment analysis in online social media. Although their effectiveness has been demonstrated in state-of-the-art sentiment analysis and related tasks in the ...
Comments