Abstract
In this article, we present an algorithmic system for determining the proper correspondence between place markers and their labels in historical maps. We assume that the locations of place markers (usually pictographs) and labels (pieces of text) have already been determined -- either algorithmically or by hand -- and we want to match the labels to the markers. This time-consuming step in the digitization process of historical maps is nontrivial even for humans but provides valuable metadata (e.g., when subsequently georeferencing the map). To speed up this process, we model the problem in terms of combinatorial optimization, solve that problem efficiently, and show how user interaction can be used to improve the quality of the results. We also consider a version of the model where we are given label fragments and additionally have to decide which fragments go together. We show that this problem is NP-hard. However, we give a polynomial-time algorithm for a restricted version of this fragment assignment problem. We have implemented the algorithm for the main problem and tested it on a manually extracted ground truth for eight historical maps with a combined total of more than 12,800 markers and labels. On average, the algorithm correctly matches 96% of the labels and is robust against noisy input. It furthermore performs a sensitivity analysis and in this way computes a measure of confidence for each of the matches. We use this as the basis for an interactive system where the user’s effort is directed to checking those parts of the map where the algorithm is unsure; any corrections the user makes are propagated by the algorithm. We discuss a prototype of this system and statistically confirm that it successfully locates those areas on the map where the algorithm needs help.
- Mauricio Giraldo Arteaga. 2013. Historical map polygon and feature extractor. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on MapInteraction (MapInteract’13). ACM, 66--71. Google ScholarDigital Library
- Benedikt Budig and Thomas C. van Dijk. 2015. Active learning for classifying template matches in historical maps. In Proceedings of the 18th International Conference on Discovery Science (DS’15) (Lecture Notes in Computer Science), Vol. 9356. Springer, 33--47.Google Scholar
- Yao-Yi Chiang. 2015. Querying historical maps as a unified, structured, and linked spatiotemporal source. In Proceedings of the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL’15). ACM, 270--273. Google ScholarDigital Library
- Yao-Yi Chiang and Craig A. Knoblock. 2015. Recognizing text in raster maps. GeoInformatica 19, 1 (2015), 1--27. Google ScholarDigital Library
- Yao-Yi Chiang, Stefan Leyk, and Craig A. Knoblock. 2014. A survey of digital map processing techniques. ACM Computing Surveys (CSUR) 47, 1, Article 1 (2014), 44 pages. Google ScholarDigital Library
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). MIT Press. Google ScholarDigital Library
- Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 2963--2970.Google ScholarCross Ref
- Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). The AAAI Press, Menlo Park, California, 226--231. Google ScholarDigital Library
- Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874. Google ScholarDigital Library
- Christopher Fleet, Kimberly C. Kowal, and Petr Pridal. 2012. Georeferencer: Crowdsourced georeferencing for map library collections. D-Lib Magazine 18, 11/12 (2012).Google ScholarCross Ref
- Andrew V. Goldberg. 1997. An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms 22 (1997), 1--29. Google ScholarDigital Library
- Winfried Höhn. 2013. Detecting arbitrarily oriented text labels in early maps. In Proceedings of the 6th Iberian Conference on Pattern Recognition and Image Analysis (LNCS), Vol. 7887. Springer, 424--432.Google ScholarCross Ref
- Winfried Höhn, Hans-Günter Schmidt, and Hendrik Schöneberg. 2013. Semiautomatic recognition and georeferencing of places in early maps. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13). ACM, 335--338. Google ScholarDigital Library
- David W. Hosmer Jr. and Stanley Lemeshow. 2004. Applied Logistic Regression. John Wiley 8 Sons.Google Scholar
- Paul Jaccard. 1912. The distribution of the flora in the alpine zone. New Phytologist 11, 2 (1912), 37--50.Google ScholarCross Ref
- Bernhard Jenny and Lorenz Hurni. 2011. Cultural heritage: Studying cartographic heritage: Analysis and visualization of geometric distortions. Computers 8 Graphics 35, 2 (2011), 402--411. Google ScholarDigital Library
- Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Springer, 85--103.Google Scholar
- Stefan Leyk, Ruedi Boesch, and Robert Weibel. 2006. Saliency and semantic processing: Extracting forest cover from historical topographic maps. Pattern Recognition 39, 5 (2006), 953--968. Google ScholarDigital Library
- Carlos A. B. Mello, Diogo C. Costa, and T. J. dos Santos. 2012. Automatic image segmentation of old topographic maps and floor plans. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC’12). IEEE, 132--137.Google ScholarCross Ref
- Reza Farrahi Moghaddam and Mohamed Cheriet. 2009. Application of multi-level classifiers and clustering for automatic word spotting in historical document images. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR’09). IEEE, 511--515. Google ScholarDigital Library
- Toni M. Rath and Raghavan Manmatha. 2003. Word image matching using dynamic time warping. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03), Vol. 2. IEEE, II--521--II--527.Google Scholar
- Toni M. Rath and Raghavan Manmatha. 2007. Word spotting for historical documents. International Journal of Document Analysis and Recognition (IJDAR’07) 9, 2--4 (2007), 139--152. Google ScholarDigital Library
- Hendrik Schöneberg, Hans-Günter Schmidt, and Winfried Höhn. 2013. A scalable, distributed and dynamic workflow system for digitization processes. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13). ACM New York, NY, USA, 359--362. Google ScholarDigital Library
- Alexander Schrijver. 2003. Combinatorial Optimization: Polyhedra and Efficiency. Springer.Google Scholar
- Tenzing Shaw and Peter Bajcsy. 2011. Automation of digital historical map analyses. In Proceedings of the IS8T/SPIE Electronic Imaging 2011, Vol. 7869. SPIE.Google Scholar
- Rainer Simon, Elton Barker, Leif Isaksen, and Pau de Soto Cañamares. 2015. Linking early geospatial documents, one place at a time: Annotation of geographic documents with recogito. e-Perimetron 10, 2 (2015), 49--59.Google Scholar
- Rainer Simon, Bernhard Haslhofer, Werner Robitza, and Elaheh Momeni. 2011. Semantically augmented annotations in digitized map collections. In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL’11). ACM New York, NY, USA, 199--202. Google ScholarDigital Library
- Rainer Simon, Peter Pilgerstorfer, Leif Isaksen, and Elton Barker. 2014. Towards semi-automatic annotation of toponyms on old maps. e-Perimetron 9, 3 (2014), 105--112.Google Scholar
- Kai Wang, Boris Babenko, and Serge Belongie. 2011. End-to-end scene text recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, 1457--1464. Google ScholarDigital Library
- Jerod Weinman. 2013. Toponym recognition in historical maps by gazetteer alignment. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR’13). IEEE, 1044--1048. Google ScholarDigital Library
Index Terms
Matching Labels and Markers in Historical Maps: An Algorithm with Interactive Postprocessing
Recommendations
Querying historical maps as a unified, structured, and linked spatiotemporal source: vision paper
SIGSPATIAL '15: Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information SystemsHistorical spatiotemporal datasets are important for a variety of studies such as cancer and environmental epidemiology, urbanization, and landscape ecology. However, existing data sources typically contain only contemporary datasets. Historical maps ...
Matching labels and markers in historical maps: an algorithm with interactive postprocessing
MapInteract '14: Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Interacting with MapsIn this paper we present an algorithmic system for determining the proper correspondence between place markers and their labels in historical maps. We assume that the locations of place markers (usually pictographs) and labels (pieces of text) have ...
Synthetic Map Generation to Provide Unlimited Training Data for Historical Map Text Detection
GEOAI '21: Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge DiscoveryMany historical map sheets are publicly available for studies that require long-term historical geographic data. The cartographic design of these maps includes a combination of map symbols and text labels. Automatically reading text labels from map ...
Comments