skip to main content
research-article

Matching Labels and Markers in Historical Maps: An Algorithm with Interactive Postprocessing

Published:19 November 2016Publication History
Skip Abstract Section

Abstract

In this article, we present an algorithmic system for determining the proper correspondence between place markers and their labels in historical maps. We assume that the locations of place markers (usually pictographs) and labels (pieces of text) have already been determined -- either algorithmically or by hand -- and we want to match the labels to the markers. This time-consuming step in the digitization process of historical maps is nontrivial even for humans but provides valuable metadata (e.g., when subsequently georeferencing the map). To speed up this process, we model the problem in terms of combinatorial optimization, solve that problem efficiently, and show how user interaction can be used to improve the quality of the results. We also consider a version of the model where we are given label fragments and additionally have to decide which fragments go together. We show that this problem is NP-hard. However, we give a polynomial-time algorithm for a restricted version of this fragment assignment problem. We have implemented the algorithm for the main problem and tested it on a manually extracted ground truth for eight historical maps with a combined total of more than 12,800 markers and labels. On average, the algorithm correctly matches 96% of the labels and is robust against noisy input. It furthermore performs a sensitivity analysis and in this way computes a measure of confidence for each of the matches. We use this as the basis for an interactive system where the user’s effort is directed to checking those parts of the map where the algorithm is unsure; any corrections the user makes are propagated by the algorithm. We discuss a prototype of this system and statistically confirm that it successfully locates those areas on the map where the algorithm needs help.

References

  1. Mauricio Giraldo Arteaga. 2013. Historical map polygon and feature extractor. In Proceedings of the 1st ACM SIGSPATIAL International Workshop on MapInteraction (MapInteract’13). ACM, 66--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Benedikt Budig and Thomas C. van Dijk. 2015. Active learning for classifying template matches in historical maps. In Proceedings of the 18th International Conference on Discovery Science (DS’15) (Lecture Notes in Computer Science), Vol. 9356. Springer, 33--47.Google ScholarGoogle Scholar
  3. Yao-Yi Chiang. 2015. Querying historical maps as a unified, structured, and linked spatiotemporal source. In Proceedings of the 23rd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL’15). ACM, 270--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Yao-Yi Chiang and Craig A. Knoblock. 2015. Recognizing text in raster maps. GeoInformatica 19, 1 (2015), 1--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Yao-Yi Chiang, Stefan Leyk, and Craig A. Knoblock. 2014. A survey of digital map processing techniques. ACM Computing Surveys (CSUR) 47, 1, Article 1 (2014), 44 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms (3rd ed.). MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Boris Epshtein, Eyal Ofek, and Yonatan Wexler. 2010. Detecting text in natural scenes with stroke width transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 2963--2970.Google ScholarGoogle ScholarCross RefCross Ref
  8. Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD-96). The AAAI Press, Menlo Park, California, 226--231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Tom Fawcett. 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 8 (2006), 861--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Christopher Fleet, Kimberly C. Kowal, and Petr Pridal. 2012. Georeferencer: Crowdsourced georeferencing for map library collections. D-Lib Magazine 18, 11/12 (2012).Google ScholarGoogle ScholarCross RefCross Ref
  11. Andrew V. Goldberg. 1997. An efficient implementation of a scaling minimum-cost flow algorithm. Journal of Algorithms 22 (1997), 1--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Winfried Höhn. 2013. Detecting arbitrarily oriented text labels in early maps. In Proceedings of the 6th Iberian Conference on Pattern Recognition and Image Analysis (LNCS), Vol. 7887. Springer, 424--432.Google ScholarGoogle ScholarCross RefCross Ref
  13. Winfried Höhn, Hans-Günter Schmidt, and Hendrik Schöneberg. 2013. Semiautomatic recognition and georeferencing of places in early maps. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13). ACM, 335--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David W. Hosmer Jr. and Stanley Lemeshow. 2004. Applied Logistic Regression. John Wiley 8 Sons.Google ScholarGoogle Scholar
  15. Paul Jaccard. 1912. The distribution of the flora in the alpine zone. New Phytologist 11, 2 (1912), 37--50.Google ScholarGoogle ScholarCross RefCross Ref
  16. Bernhard Jenny and Lorenz Hurni. 2011. Cultural heritage: Studying cartographic heritage: Analysis and visualization of geometric distortions. Computers 8 Graphics 35, 2 (2011), 402--411. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Richard M. Karp. 1972. Reducibility among combinatorial problems. In Complexity of Computer Computations. Springer, 85--103.Google ScholarGoogle Scholar
  18. Stefan Leyk, Ruedi Boesch, and Robert Weibel. 2006. Saliency and semantic processing: Extracting forest cover from historical topographic maps. Pattern Recognition 39, 5 (2006), 953--968. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Carlos A. B. Mello, Diogo C. Costa, and T. J. dos Santos. 2012. Automatic image segmentation of old topographic maps and floor plans. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC’12). IEEE, 132--137.Google ScholarGoogle ScholarCross RefCross Ref
  20. Reza Farrahi Moghaddam and Mohamed Cheriet. 2009. Application of multi-level classifiers and clustering for automatic word spotting in historical document images. In Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR’09). IEEE, 511--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Toni M. Rath and Raghavan Manmatha. 2003. Word image matching using dynamic time warping. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’03), Vol. 2. IEEE, II--521--II--527.Google ScholarGoogle Scholar
  22. Toni M. Rath and Raghavan Manmatha. 2007. Word spotting for historical documents. International Journal of Document Analysis and Recognition (IJDAR’07) 9, 2--4 (2007), 139--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hendrik Schöneberg, Hans-Günter Schmidt, and Winfried Höhn. 2013. A scalable, distributed and dynamic workflow system for digitization processes. In Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’13). ACM New York, NY, USA, 359--362. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Alexander Schrijver. 2003. Combinatorial Optimization: Polyhedra and Efficiency. Springer.Google ScholarGoogle Scholar
  25. Tenzing Shaw and Peter Bajcsy. 2011. Automation of digital historical map analyses. In Proceedings of the IS8T/SPIE Electronic Imaging 2011, Vol. 7869. SPIE.Google ScholarGoogle Scholar
  26. Rainer Simon, Elton Barker, Leif Isaksen, and Pau de Soto Cañamares. 2015. Linking early geospatial documents, one place at a time: Annotation of geographic documents with recogito. e-Perimetron 10, 2 (2015), 49--59.Google ScholarGoogle Scholar
  27. Rainer Simon, Bernhard Haslhofer, Werner Robitza, and Elaheh Momeni. 2011. Semantically augmented annotations in digitized map collections. In Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (JCDL’11). ACM New York, NY, USA, 199--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Rainer Simon, Peter Pilgerstorfer, Leif Isaksen, and Elton Barker. 2014. Towards semi-automatic annotation of toponyms on old maps. e-Perimetron 9, 3 (2014), 105--112.Google ScholarGoogle Scholar
  29. Kai Wang, Boris Babenko, and Serge Belongie. 2011. End-to-end scene text recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’11). IEEE, 1457--1464. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jerod Weinman. 2013. Toponym recognition in historical maps by gazetteer alignment. In Proceedings of the 12th International Conference on Document Analysis and Recognition (ICDAR’13). IEEE, 1044--1048. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Matching Labels and Markers in Historical Maps: An Algorithm with Interactive Postprocessing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Spatial Algorithms and Systems
          ACM Transactions on Spatial Algorithms and Systems  Volume 2, Issue 4
          Regular Papers and SIGSPATIAL Paper
          November 2016
          116 pages
          ISSN:2374-0353
          EISSN:2374-0361
          DOI:10.1145/3014433
          • Editor:
          • Hanan Samet
          Issue’s Table of Contents

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 November 2016
          • Accepted: 1 September 2016
          • Revised: 1 June 2016
          • Received: 1 December 2015
          Published in tsas Volume 2, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader