ABSTRACT
IP geolocation databases map IP addresses to their geographical locations. These databases are important for several applications such as local search engine relevance, credit card fraud protection, geotargetted advertising, and online content delivery. While they are the most popular method of geolocation, they can have low accuracy at the city level. In this paper we evaluate and improve IP geolocation databases using data collected from search engine logs. We generate a large ground-truth dataset using real time global positioning data extracted from search engine logs. We show that incorrect geolocation information can have a negative impact on implicit user metrics. Using the dataset we measure the accuracy of three state-of-the-art commercial IP geolocation databases. We then introduce a technique to improve existing geolocation databases by mining explicit locations from query logs. We show significant accuracy gains in 44 to 49 out of the top 50 countries, depending on the IP geolocation database. Finally, we validate the approach with a large scale A/B experiment that shows improvements in several user metrics.
- Bing Geocoding API. http://msdn.microsoft.com/en-us/library/ff701711.aspx, (accessed July 17, 2015).Google Scholar
- Bing Reverse Geocoding API. http://msdn.microsoft.com/en-us/library/ff701710.aspx, (accessed July 17, 2015).Google Scholar
- Google Geocoding API. https://developers.google.com/maps/documentation/geocoding/, (accessed July 17, 2015).Google Scholar
- OpenCalais. http://www.opencalais.com/, (accessed July 17, 2015).Google Scholar
- Yahoo! PlaceSpotter. https://developer.yahoo.com/boss/geo/docs/key-concepts.html, (accessed July 17, 2015).Google Scholar
- L. Backstrom, E. Sun, and C. Marlow. Find Me if You Can: Improving Geographical Prediction with Social and Spatial Proximity. In WWW 2010, pages 61--70, Raleigh, North Carolina, USA, 2010. ACM. Google ScholarDigital Library
- P. N. Bennett, F. Radlinski, R. W. White, and E. Yilmaz. Inferring and Using Location Metadata to Personalize Web Search. In SIGIR 2011, pages 135--144, Beijing, China, 2011. ACM. Google ScholarDigital Library
- B. J. L. Berry. City Size Distributions and Economic Development. Economic Development and Cultural Change, 9(4):573--588, 1961.Google ScholarCross Ref
- T. P. Bhatla, V. Prabhu, and A. Dua. Understanding credit card frauds. Cards business review, 1(6), 2003.Google Scholar
- R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou. Scope: Easy and efficient parallel processing of massive data sets. Proc. VLDB Endow., 1(2):1265--1276, Aug. 2008. Google ScholarDigital Library
- Y.-C. Cheng, Y. Chawathe, A. LaMarca, and J. Krumm. Accuracy Characterization for Metropolitan-scale Wi-Fi Localization. In MobiSys 2005, pages 233--245, Seattle, Washington, 2005. ACM. Google ScholarDigital Library
- H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In ACL 2002, pages 168--175, 2002.Google Scholar
- A. El-Rabbany. Introduction to GPS: The Global Positioning System. Artech House mobile communications series. Artech House, 2002.Google Scholar
- P. Endo and D. Sadok. Whois Based Geolocation: A Strategy to Geolocate Internet Hosts. In AINA 2010, pages 408--413, April 2010. Google ScholarDigital Library
- J. R. Finkel, T. Grenager, and C. Manning. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. In ACL 2005, pages 363--370, Ann Arbor, Michigan, 2005. Google ScholarDigital Library
- S. Fox, K. Karnawat, M. Mydland, S. Dumais, and T. White. Evaluating Implicit Measures to Improve Web Search. ACM Transactions on Information Systems, 23(2):147--168, Apr. 2005. Google ScholarDigital Library
- B. Gueye, A. Ziviani, M. Crovella, and S. Fdida. Constraint-Based Geolocation of Internet Hosts. IEEE/ACM Transactions on Networking, 14(6):1219--1232, Dec 2006. Google ScholarDigital Library
- C. Guo, Y. Liu, W. Shen, H. Wang, Q. Yu, and Y. Zhang. Mining the Web and the Internet for Accurate IP Address Geolocations. In INFOCOM 2009, pages 2841--2845, April 2009.Google Scholar
- B. Hofmann-Wellenhof, H. Lichtenegger, and J. Collins. Global Positioning System: Theory and Practice. Springer, 1993.Google Scholar
- B. Hofmann-Wellenhof, H. Lichtenegger, and E. Wasle. GNSS -- Global Navigation Satellite Systems: GPS, GLONASS, Galileo, and more. Springer, 2007.Google Scholar
- C. Huang, D. Maltz, J. Li, and A. Greenberg. Public DNS system and Global Traffic Management. In INFOCOM 2011, pages 2615--2623, April 2011.Google ScholarCross Ref
- K. Hubbard, M. Kosters, D. Conrad, D. Karrenberg, and J. Postel. Internet Registry IP Allocation Guidelines. Technical report, United States, 1996. Google ScholarDigital Library
- E. Katz-Bassett, J. P. John, A. Krishnamurthy, D. Wetherall, T. Anderson, and Y. Chawathe. Towards IP Geolocation Using Delay and Topology Measurements. In IMC 2006, pages 71--84, Rio de Janeriro, Brazil, 2006. ACM. Google ScholarDigital Library
- B. Kölmel and S. Alexakis. Location based advertising. In First International Conference on Mobile Business, Athens, Greece, 2002.Google Scholar
- L. MacVittie. Geolocation and Application Delivery. https://f5.com/resources/white-papers/geolocation-and-application-delivery, 2012 (accessed November 28, 2015).Google Scholar
- V. N. Padmanabhan and L. Subramanian. An Investigation of Geographic Mapping Techniques for Internet Hosts. In SIGCOMM 2001, pages 173--185, San Diego, California, USA, 2001. ACM. Google ScholarDigital Library
- C. A. Shue, N. Paul, and C. R. Taylor. From an IP Address to a Street Address: Using Wireless Signals to Locate a Target. In WOOT 2013, Washington, D.C., 2013. USENIX. Google ScholarDigital Library
- D. J. B. Svantesson. E-Commerce Tax: How The Taxman Brought Geography To The 'Borderless' Internet. Revenue Law Journal, 17(1):11, 2007.Google Scholar
- L. Wang, C. Wang, X. Xie, J. Forman, Y. Lu, W.-Y. Ma, and Y. Li. Detecting Dominant Locations from Search Queries. In SIGIR 2015, pages 424--431, Salvador, Brazil, 2005. ACM. Google ScholarDigital Library
- Y. Wang, D. Burgener, M. Flores, A. Kuzmanovic, and C. Huang. Towards Street-level Client-independent IP Geolocation. In NSDI 2011, pages 365--379, Berkeley, CA, USA, 2011. USENIX. Google ScholarDigital Library
- T. White. Hadoop: The Definitive Guide. O'Reilly and Associates Series. O'Reilly, 2012. Google ScholarDigital Library
- B. Wong, I. Stoyanov, and E. G. Sirer. Octant: A Comprehensive Framework for the Geolocalization of Internet Hosts. In NSDI 2007, pages 23--23, Berkeley, CA, USA, 2007. USENIX Association. Google ScholarDigital Library
- I. Youn, B. Mark, and D. Richards. Statistical Geolocation of Internet Hosts. In ICCCN 2009, pages 1--6, Aug 2009. Google ScholarDigital Library
Index Terms
Improving IP Geolocation using Query Logs
Recommendations
IP Geolocation Using Traceroute Location Propagation and IP Range Location Interpolation
WWW '21: Companion Proceedings of the Web Conference 2021Many online services, including search engines, content delivery networks, ad networks, and fraud detection utilize IP geolocation databases to map IP addresses to their physical locations. However, IP geolocation databases are often inaccurate. We ...
IP Geolocation through Reverse DNS
IP Geolocation databases are widely used in online services to map end-user IP addresses to their geographical location. However, they use proprietary geolocation methods, and in some cases they have poor accuracy. We propose a systematic approach to use ...
A look at router geolocation in public and commercial databases
IMC '17: Proceedings of the 2017 Internet Measurement ConferenceInternet measurement research frequently needs to map infrastructure components, such as routers, to their physical locations. Although public and commercial geolocation services are often used for this purpose, their accuracy when applied to network ...
Comments