skip to main content
10.1145/2835776.2835820acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Improving IP Geolocation using Query Logs

Published:08 February 2016Publication History

ABSTRACT

IP geolocation databases map IP addresses to their geographical locations. These databases are important for several applications such as local search engine relevance, credit card fraud protection, geotargetted advertising, and online content delivery. While they are the most popular method of geolocation, they can have low accuracy at the city level. In this paper we evaluate and improve IP geolocation databases using data collected from search engine logs. We generate a large ground-truth dataset using real time global positioning data extracted from search engine logs. We show that incorrect geolocation information can have a negative impact on implicit user metrics. Using the dataset we measure the accuracy of three state-of-the-art commercial IP geolocation databases. We then introduce a technique to improve existing geolocation databases by mining explicit locations from query logs. We show significant accuracy gains in 44 to 49 out of the top 50 countries, depending on the IP geolocation database. Finally, we validate the approach with a large scale A/B experiment that shows improvements in several user metrics.

References

  1. Bing Geocoding API. http://msdn.microsoft.com/en-us/library/ff701711.aspx, (accessed July 17, 2015).Google ScholarGoogle Scholar
  2. Bing Reverse Geocoding API. http://msdn.microsoft.com/en-us/library/ff701710.aspx, (accessed July 17, 2015).Google ScholarGoogle Scholar
  3. Google Geocoding API. https://developers.google.com/maps/documentation/geocoding/, (accessed July 17, 2015).Google ScholarGoogle Scholar
  4. OpenCalais. http://www.opencalais.com/, (accessed July 17, 2015).Google ScholarGoogle Scholar
  5. Yahoo! PlaceSpotter. https://developer.yahoo.com/boss/geo/docs/key-concepts.html, (accessed July 17, 2015).Google ScholarGoogle Scholar
  6. L. Backstrom, E. Sun, and C. Marlow. Find Me if You Can: Improving Geographical Prediction with Social and Spatial Proximity. In WWW 2010, pages 61--70, Raleigh, North Carolina, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and Using Location Metadata to Personalize Web Search. In SIGIR 2011, pages 135--144, Beijing, China, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. J. L. Berry. City Size Distributions and Economic Development. Economic Development and Cultural Change, 9(4):573--588, 1961.Google ScholarGoogle ScholarCross RefCross Ref
  9. T. P. Bhatla, V. Prabhu, and A. Dua. Understanding credit card frauds. Cards business review, 1(6), 2003.Google ScholarGoogle Scholar
  10. R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: Easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265--1276, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y.-C. Cheng, Y. Chawathe, A. LaMarca, and J. Krumm. Accuracy Characterization for Metropolitan-scale Wi-Fi Localization. In MobiSys 2005, pages 233--245, Seattle, Washington, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In ACL 2002, pages 168--175, 2002.Google ScholarGoogle Scholar
  13. A. El-Rabbany. Introduction to GPS: The Global Positioning System. Artech House mobile communications series. Artech House, 2002.Google ScholarGoogle Scholar
  14. P. Endo and D. Sadok. Whois Based Geolocation: A Strategy to Geolocate Internet Hosts. In AINA 2010, pages 408--413, April 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. R. Finkel, T. Grenager, and C. Manning. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In ACL 2005, pages 363--370, Ann Arbor, Michigan, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating Implicit Measures to Improve Web Search. ACM Transactions on Information Systems, 23(2):147--168, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Gueye, A. Ziviani, M. Crovella, and S. Fdida. Constraint-Based Geolocation of Internet Hosts. IEEE/ACM Transactions on Networking, 14(6):1219--1232, Dec 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Guo, Y. Liu, W. Shen, H. Wang, Q. Yu, and Y. Zhang. Mining the Web and the Internet for Accurate IP Address Geolocations. In INFOCOM 2009, pages 2841--2845, April 2009.Google ScholarGoogle Scholar
  19. B. Hofmann-Wellenhof, H. Lichtenegger, and J. Collins. Global Positioning System: Theory and Practice. Springer, 1993.Google ScholarGoogle Scholar
  20. B. Hofmann-Wellenhof, H. Lichtenegger, and E. Wasle. GNSS -- Global Navigation Satellite Systems: GPS, GLONASS, Galileo, and more. Springer, 2007.Google ScholarGoogle Scholar
  21. C. Huang, D. Maltz, J. Li, and A. Greenberg. Public DNS system and Global Traffic Management. In INFOCOM 2011, pages 2615--2623, April 2011.Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Hubbard, M. Kosters, D. Conrad, D. Karrenberg, and J. Postel. Internet Registry IP Allocation Guidelines. Technical report, United States, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E. Katz-Bassett, J. P. John, A. Krishnamurthy, D. Wetherall, T. Anderson, and Y. Chawathe. Towards IP Geolocation Using Delay and Topology Measurements. In IMC 2006, pages 71--84, Rio de Janeriro, Brazil, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. Kölmel and S. Alexakis. Location based advertising. In First International Conference on Mobile Business, Athens, Greece, 2002.Google ScholarGoogle Scholar
  25. L. MacVittie. Geolocation and Application Delivery. https://f5.com/resources/white-papers/geolocation-and-application-delivery, 2012 (accessed November 28, 2015).Google ScholarGoogle Scholar
  26. V. N. Padmanabhan and L. Subramanian. An Investigation of Geographic Mapping Techniques for Internet Hosts. In SIGCOMM 2001, pages 173--185, San Diego, California, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. C. A. Shue, N. Paul, and C. R. Taylor. From an IP Address to a Street Address: Using Wireless Signals to Locate a Target. In WOOT 2013, Washington, D.C., 2013. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. J. B. Svantesson. E-Commerce Tax: How The Taxman Brought Geography To The 'Borderless' Internet. Revenue Law Journal, 17(1):11, 2007.Google ScholarGoogle Scholar
  29. L. Wang, C. Wang, X. Xie, J. Forman, Y. Lu, W.-Y. Ma, and Y. Li. Detecting Dominant Locations from Search Queries. In SIGIR 2015, pages 424--431, Salvador, Brazil, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Wang, D. Burgener, M. Flores, A. Kuzmanovic, and C. Huang. Towards Street-level Client-independent IP Geolocation. In NSDI 2011, pages 365--379, Berkeley, CA, USA, 2011. USENIX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. T. White. Hadoop: The Definitive Guide. O'Reilly and Associates Series. O'Reilly, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Wong, I. Stoyanov, and E. G. Sirer. Octant: A Comprehensive Framework for the Geolocalization of Internet Hosts. In NSDI 2007, pages 23--23, Berkeley, CA, USA, 2007. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. I. Youn, B. Mark, and D. Richards. Statistical Geolocation of Internet Hosts. In ICCCN 2009, pages 1--6, Aug 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving IP Geolocation using Query Logs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining
        February 2016
        746 pages
        ISBN:9781450337168
        DOI:10.1145/2835776

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 February 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        WSDM '16 Paper Acceptance Rate67of368submissions,18%Overall Acceptance Rate498of2,863submissions,17%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader