|
ABSTRACT
We present a new approach for the retrieval of texts with non-standard spelling, which is important for historic texts e.g. in English or German. In this paper, we describe the overall architecture of our system, followed by its evaluation. Given a search term as lemma, we use a dictionary of contemporary German for finding all inflected and derived forms of the lemma. Then we apply transformation rules (derived from training data) for generating historic spelling variants. For the evaluation, we regard the resulting retrieval quality. The experimental results show that we can improve the retrieval quality for historic collections substantially.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
D. Archer, A. Ernst-Gerlach, S. Kempken, T. Pilz and P. Rayson: The identification of spelling variants in English and German historical texts: manual or automatic? In Proceedings DH06, Paris, France, July 2006.
|
| |
2
|
P. S. Baker: Introduction to Old English. Blackwell Publishing, 2007, ISBN 1405152729.
|
| |
3
|
D. Biella, E. Dyllong, H. Kaiser, W. Luther and T. Mittmann: Edition électronique de la réception de Nietzsche des années 1865 à 1945. In ICHIM03 015C. Paris, France, September 2003.
|
| |
4
|
D. Biella, E. Dyllong, W. Luther and T. Pilz: An On-line Literature Research System with Rule-Based Search. In Proc. of the 4th European Conference on e-Learning (ECEL2005), Amsterdam, 2005.
|
| |
5
|
J. Cendrowska: PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), pp. 349--370.1987.
|
| |
6
|
A. Ernst-Gerlach, N. Fuhr: Generating Search Term Variants for Text Collections with Historic Spellings. In {8}
|
| |
7
|
R. Ferber: Information Retrieval - Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web. ISBN 3898642135, dpunkt.verlag, 2003.
|
| |
8
|
Mounia Lalmas , Andrew MacFarlane , Stefan Rüger , Anastasios Tombros , Theodora Tsikrika , Alexei Yavlinsky, Advances in Information Retrieval: 28th European Conference on IR Research, ECIR 2006, London, UK, April 10-12, 2006, Proceedings (Lecture Notes in Computer Science), Springer-Verlag New York, Inc., Secaucus, NJ, 2006
|
| |
9
|
R. Keller: Die Deutsche Sprache und ihre historische Entwicklung. Helmut Buske Verlage, Hamburg, 1995.
|
| |
10
|
S. Kempken, W. Luther and T. Pilz: Comparison of distance measures for historical spelling variants. In Artificial Intelligence in Theory and Practice IFIP Series 217 pp. 295--304, Springer, 2006, ISBN 9780387346540.
|
| |
11
|
M. Koolen, F. Adriaans, J. Kamps and M. de Rijke: A Cross-Language Approach to Historic Document Retrieval. In {8}.
|
| |
12
|
H. Nottelmann: Inside PIRE: An extensible, open-source IR engine based on probabilistic logics. Technical Report, University of Duisburg-Essen,2005.
|
| |
13
|
|
| |
14
|
U. Quasthoff: Tools for Automatic Lexicon Maintenance: Acquisition, Error Correction, and the Generation of Missing Values. In Proceedings of the first International Conference on Language Resources & Evaluation, pp. 853--856, ELRA 1998.
|
| |
15
|
C. Peters (Hrsg.): Cross-Language Information Retrieval and Evaluation, Vol. 2069, Lecture Notes in Computer Science, Heidelberg et al. Springer. 2001.
|
| |
16
|
|
| |
17
|
T. Pilz: Unscharfe Suche in Textdatenbanken mitnichtstandardisierter Rechtschreibung am Beispiel vonFrakturtexten zur Nietzsche-Rezeption. Staatsexamensarbeit, Universit&3228;t Duisburg-Essen, 2003.
|
| |
18
|
P. Rayson, D. Archer and N. Smith: VARD versus Word. A comparison of the UCREL variant detector and modern spell checkers on English historical corpora. In Proceedings of the Corpus Linguistics 2005 conference, Birmingham, UK. In Proceedings from the Corpus Linguistics Conference Series on-line e-journal, Vol. 1, No. 1., 2005.
|
| |
19
|
J. Strunk: Information Retrieval for Languages that lack a fixed orthography. 2003. http://www.linguistics.ruhr-uni-bochum.de/~strunk/LSreport.pdf.
|
 |
20
|
|
|