ACM Home Page
Please provide us with feedback. Feedback
Retrieval in text collections with historic spelling using linguistic and spelling variants
Full text PdfPdf (224 KB)
Source
International Conference on Digital Libraries archive
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries table of contents
Vancouver, BC, Canada
SESSION: Information retrieval and extraction 2 table of contents
Pages: 333 - 341  
Year of Publication: 2007
ISBN:978-1-59593-644-8
Authors
Andrea Ernst-Gerlach  University of Duisburg-Essen, Duisburg, Germany
Norbert Fuhr  University of Duisburg-Essen, Duisburg, Germany
Sponsors
ACM: Association for Computing Machinery
SIGIR: ACM Special Interest Group on Information Retrieval
SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 9,   Downloads (12 Months): 109,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1255175.1255242
What is a DOI?

ABSTRACT

We present a new approach for the retrieval of texts with non-standard spelling, which is important for historic texts e.g. in English or German. In this paper, we describe the overall architecture of our system, followed by its evaluation. Given a search term as lemma, we use a dictionary of contemporary German for finding all inflected and derived forms of the lemma. Then we apply transformation rules (derived from training data) for generating historic spelling variants. For the evaluation, we regard the resulting retrieval quality. The experimental results show that we can improve the retrieval quality for historic collections substantially.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
D. Archer, A. Ernst-Gerlach, S. Kempken, T. Pilz and P. Rayson: The identification of spelling variants in English and German historical texts: manual or automatic? In Proceedings DH06, Paris, France, July 2006.
 
2
P. S. Baker: Introduction to Old English. Blackwell Publishing, 2007, ISBN 1405152729.
 
3
D. Biella, E. Dyllong, H. Kaiser, W. Luther and T. Mittmann: Edition électronique de la réception de Nietzsche des années 1865 à 1945. In ICHIM03 015C. Paris, France, September 2003.
 
4
D. Biella, E. Dyllong, W. Luther and T. Pilz: An On-line Literature Research System with Rule-Based Search. In Proc. of the 4th European Conference on e-Learning (ECEL2005), Amsterdam, 2005.
 
5
J. Cendrowska: PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), pp. 349--370.1987.
 
6
A. Ernst-Gerlach, N. Fuhr: Generating Search Term Variants for Text Collections with Historic Spellings. In {8}
 
7
R. Ferber: Information Retrieval - Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web. ISBN 3898642135, dpunkt.verlag, 2003.
 
8
 
9
R. Keller: Die Deutsche Sprache und ihre historische Entwicklung. Helmut Buske Verlage, Hamburg, 1995.
 
10
S. Kempken, W. Luther and T. Pilz: Comparison of distance measures for historical spelling variants. In Artificial Intelligence in Theory and Practice IFIP Series 217 pp. 295--304, Springer, 2006, ISBN 9780387346540.
 
11
M. Koolen, F. Adriaans, J. Kamps and M. de Rijke: A Cross-Language Approach to Historic Document Retrieval. In {8}.
 
12
H. Nottelmann: Inside PIRE: An extensible, open-source IR engine based on probabilistic logics. Technical Report, University of Duisburg-Essen,2005.
 
13
 
14
U. Quasthoff: Tools for Automatic Lexicon Maintenance: Acquisition, Error Correction, and the Generation of Missing Values. In Proceedings of the first International Conference on Language Resources & Evaluation, pp. 853--856, ELRA 1998.
 
15
C. Peters (Hrsg.): Cross-Language Information Retrieval and Evaluation, Vol. 2069, Lecture Notes in Computer Science, Heidelberg et al. Springer. 2001.
 
16
 
17
T. Pilz: Unscharfe Suche in Textdatenbanken mitnichtstandardisierter Rechtschreibung am Beispiel vonFrakturtexten zur Nietzsche-Rezeption. Staatsexamensarbeit, Universit&3228;t Duisburg-Essen, 2003.
 
18
P. Rayson, D. Archer and N. Smith: VARD versus Word. A comparison of the UCREL variant detector and modern spell checkers on English historical corpora. In Proceedings of the Corpus Linguistics 2005 conference, Birmingham, UK. In Proceedings from the Corpus Linguistics Conference Series on-line e-journal, Vol. 1, No. 1., 2005.
 
19
J. Strunk: Information Retrieval for Languages that lack a fixed orthography. 2003. http://www.linguistics.ruhr-uni-bochum.de/~strunk/LSreport.pdf.
20

Collaborative Colleagues:
Andrea Ernst-Gerlach: colleagues
Norbert Fuhr: colleagues