Abstract
This article investigates the problem of geosocial similarity among users of online social networks, based on the locations of their activities (e.g., posting messages or photographs). Finding pairs of geosocially similar users or detecting that two sets of locations (of activities) belong to the same user has important applications in privacy protection, recommendation systems, urban planning, and public health, among others. It is explained and shown empirically that common distance measures between sets of locations are inadequate for determining geosocial similarity. Two novel distance measures between sets of locations are introduced. One is the mutually nearest distance that is based on computing a matching between two sets. The second measure uses a quad-tree index. It is highly scalable but incurs the overhead of creating and maintaining the index. Algorithms with optimization techniques are developed for computing the two distance measures and also for finding the k-most-similar users of a given one. Extensive experiments, using geotagged messages from Twitter, show that the new distance measures are both more accurate and more efficient than existing ones.
- Marco D. Adelfio, Sarana Nutanong, and Hanan Samet. 2011. Similarity search on a large collection of point sets. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS’11). ACM, New York, NY, 132--141. Google ScholarDigital Library
- Gediminas Adomavicius and Alexander Tuzhilin. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering 17, 6, 734--749. Google ScholarDigital Library
- Ravindra K. Ahuja, Thomas L. Magnanti, and James B. Orlin. 1993. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, Upper Saddle River, NJ. Google ScholarDigital Library
- Deepa Anand and Kamal K. Bharadwaj. 2010. Enhancing accuracy of recommender system through adaptive similarity measures based on hybrid features. In Proceedings of the 2nd International Conference on Intelligent Information and Database Systems: Part II (ACIIDS’10). 1--10. http://dl.acm.org/citation.cfm?id=1894808.1894810 Google ScholarDigital Library
- Lars Arge, Octavian Procopiuc, Sridhar Ramaswamy, Torsten Suel, Jan Vahrenhold, and Jeffrey Scott Vitter. 2000. A unified approach for indexed and non-indexed spatial joins. In Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology (EDBT’00). 413--429. http://dl.acm.org/citation.cfm?id=645339.650131 Google ScholarDigital Library
- Lars Arge, Octavian Procopiuc, Sridhar Ramaswamy, Torsten Suel, and Jeffrey Scott Vitter. 1998. Scalable sweeping-based spatial join. In Proceedings of the 24rd International Conference on Very Large Data Bases (VLDB’98). 570--581. http://dl.acm.org/citation.cfm?id=645924.671340 Google ScholarDigital Library
- Jaime Ballesteros, Ariel Cary, and Naphtali Rishe. 2011. SpSJoin: Parallel spatial similarity joins. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS’11). ACM, New York, NY, 481--484. Google ScholarDigital Library
- Ronald H. Ballou, Handoko Rahardja, and Noriaki Sakai. 2002. Selected country circuity factors for road travel distance estimation. Transportation Research Part A: Policy and Practice 36, 9, 843--848.Google ScholarCross Ref
- Jie Bao, Yu Zheng, and Mohamed F. Mokbel. 2012. Location-based and preference-aware recommendation using sparse geo-social networking data. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems. ACM, New York, NY, 199--208. Google ScholarDigital Library
- Catriel Beeri, Yaron Kanza, Eliyahu Safra, and Yehoshua Sagiv. 2004. Object fusion in geographic information systems. In Proceedings of the 30th International Conference on Very Large Data Bases, Volume 30 (VLDB’04). 816--827. http://dl.acm.org/citation.cfm?id=1316689.1316760 Google ScholarDigital Library
- Panagiotis Bouros, Shen Ge, and Nikos Mamoulis. 2012. Spatio-textual similarity joins. Proceedings of the VLDB Endowment 6, 1, 1--12. Google ScholarDigital Library
- Ceren Budak, Divyakant Agrawal, and Amr El Abbadi. 2011. Structural trend analysis for online social networks. Proceedings of the VLDB Endowment 4, 10, 646--656. Google ScholarDigital Library
- Laurent Candillier, Frank Meyer, and Françoise Fessant. 2008. Designing specific weighted similarity measures to improve collaborative filtering systems. In Proceedings of the 8th Industrial Conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects (ICDM’08). 242--255. Google ScholarDigital Library
- Bin Cao, Jian-Tao Sun, Jianmin Wu, Qiang Yang, and Zheng Chen. 2008. Learning bidirectional similarity for collaborative filtering. In Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases, Part I (ECML PKDD’08). 178--194.Google ScholarCross Ref
- Bogdan Carbunar and Radu Sion. 2011. Private geosocial networking. In Proceedings of the 19th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS’11). ACM, New York, NY, 365--368. Google ScholarDigital Library
- Bogdan Carbunar, Radu Sion, Rahul Potharaju, and Moussa Ehsan. 2012. The shy mayor: Private badges in geosocial networks. In Proceedings of the 10th International Conference on Applied Cryptography and Network Security (ACNS’12). 436--454. Google ScholarDigital Library
- Francesca Carmagnola, Francesco Osborne, and Ilaria Torre. 2014. Escaping the big brother: An empirical study on factors influencing identification and information leakage on the Web. Journal of Information Science 40, 2, 180--197. Google ScholarDigital Library
- Peter J. Carrington, John Scott, and Stanley Wasserman. 2005. Models and Methods in Social Network Analysis. Cambridge University Press, New York, NY.Google Scholar
- Yen-Yu Chen, Torsten Suel, and Alexander Markowetz. 2006. Efficient query processing in geographic Web search engines. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data (SIGMOD’06). ACM, New York, NY, 277--288. Google ScholarDigital Library
- Kwok-Wai Cheung and Lily F. Tian. 2004. Learning user similarity and rating style for collaborative recommendation. Information Retrieval 7, 3--4, 395--410. Google ScholarDigital Library
- Maria Christoforaki, Jinru He, Constantinos Dimopoulos, Alexander Markowetz, and Torsten Suel. 2011. Text vs. space: Efficient geo-search query processing. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 423--432. Google ScholarDigital Library
- Cheng Ta Chung, Chia Jui Lin, Chih Hung Lin, and Pu Jen Cheng. 2014. Person identification between different online social networks. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Volume 01 (WI-IAT’14). IEEE, Los Alamitos, CA, 94--101. Google ScholarDigital Library
- Ian De Felipe, Vagelis Hristidis, and Naphtali Rishe. 2008. Keyword search on spatial databases. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE’08). IEEE, Los Alamitos, CA, 656--665. Google ScholarDigital Library
- Patrick Doreian, Vladimir Batagelj, and Anuska Ferligoj. 2005. Generalized Blockmodeling. Cambridge University Press, New York, NY.Google Scholar
- Yerach Doytsher, Ben Galon, and Yaron Kanza. 2011. Storing routes in socio-spatial networks and supporting social-based route recommendation. In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Location-Based Social Networks. ACM, New York, NY, 49--56. Google ScholarDigital Library
- M. P. Dubuisson and A. K. Jain. 1994. A modified Hausdorff distance for object matching. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 1—Conference A: Computer Vision and Image Processing. IEEE, Los Alamitos, CA, 566--568.Google Scholar
- Thomas Eiter and Heikki Mannila. 1997. Distance measures for point sets and their computation. Acta Informatica 34, 103--133.Google ScholarCross Ref
- Kahina Gani, Hakim Hacid, and Ryan Skraba. 2012. Towards multiple identity detection in social networks. In Proceedings of the 21st International Conference Companion on World Wide Web (WWW’12 Companion). ACM, New York, NY, 503--504. Google ScholarDigital Library
- Jennifer Golbeck. 2009. Trust and nuanced profile similarity in online social networks. ACM Transactions on the Web 3, 4, Article No. 12. Google ScholarDigital Library
- Philippe Golle and Kurt Partridge. 2009. On the anonymity of home/work location pairs. In Proceedings of the 7th International Conference on Pervasive Computing (Pervasive’09). 390--397. Google ScholarDigital Library
- Irena Grabovitch-Zuyev, Yaron Kanza, Elad Kravi, and Barak Pat. 2007. On the correlation between textual content and geospatial locations in microblogs. In Proceedings of Workshop on Managing and Mining Enriched Geo-Spatial Data (GeoRich’14). ACM, New York, NY, Article No. 3. Google ScholarDigital Library
- Krzysztof Janowicz, Martin Raubal, Angela Schwering, and Werner Kuhn. 2008. Semantic similarity measurement and geospatial applications. Transactions in GIS 12, 6, 651--659.Google ScholarCross Ref
- Lei Jin, Hassan Takabi, and James B. D. Joshi. 2011. Towards active detection of identity clone attacks on online social networks. In Proceedings of the 1st ACM Conference on Data and Application Security and Privacy (CODASPY’11). ACM, New York, NY, 27--38. Google ScholarDigital Library
- Yaron Kanza. 2016. Uncertainty in geosocial data: Friend or foe? SIGSPATIAL Special 8, 2, 3--10. Google ScholarDigital Library
- Yaron Kanza, Elad Kravi, and Uri Motchan. 2014. City nexus: Discovering pairs of jointly-visited locations based on geo-tagged posts in social networks. In Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (SIGSPATIAL’14). ACM, New York, NY, 597--600. Google ScholarDigital Library
- Xiangnan Kong, Jiawei Zhang, and Philip S. Yu. 2013. Inferring anchor links across multiple heterogeneous social networks. In Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (CIKM’13). ACM, New York, NY, 179--188. Google ScholarDigital Library
- Balachander Krishnamurthy and Craig E. Wills. 2009. On the leakage of personally identifiable information via online social networks. In Proceedings of the 2nd ACM Workshop on Online Social Networks (WOSN’09). ACM, New York, NY, 7--12. Google ScholarDigital Library
- John Krumm. 2007. Inference attacks on location tracks. In Proceedings of the 5th International Conference on Pervasive Computing (PERVASIVE’07). 127--143. http://dl.acm.org/citation.cfm?id=1758156.1758167 Google ScholarDigital Library
- Takeshi Kurashima, Tomoharu Iwata, Go Irie, and Ko Fujimura. 2010. Travel route recommendation using geotags in photo sharing sites. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management. ACM, New York, NY, 579--588. Google ScholarDigital Library
- Min-Joong Lee and Chin-Wan Chung. 2011. A user similarity calculation based on the location for social network services. In Proceedings of the 16th International Conference on Database Systems for Advanced Applications, Volume Part I (DASFAA’11). 38--52. http://dl.acm.org/citation.cfm?id=1997305.1997313 Google ScholarDigital Library
- Erich L. Lehmann and Joseph P. Romano. 2005. Testing Statistical Hypotheses (3rd ed.). Springer, New York, NY.Google Scholar
- Roy Levin and Yaron Kanza. 2014. Stratified-sampling over social networks using MapReduce. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD’14). ACM, New York, NY, 863--874. Google ScholarDigital Library
- Quannan Li, Yu Zheng, Xing Xie, Yukun Chen, Wenyu Liu, and Wei-Ying Ma. 2008. Mining user similarity based on location history. In Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS’08). ACM,, New York, NY, Article No. 34. Google ScholarDigital Library
- Sitong Liu, Guoliang Li, and Jianhua Feng. 2012. Star-join: Spatio-textual similarity join. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM’12). ACM, New York, NY, 2194--2198. Google ScholarDigital Library
- C. T. Lu, D. Chen, and Y. Kou. 2003. Algorithms for spatial outlier detection. In Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE, Los Alamitos, CA, 597--600. Google ScholarDigital Library
- Haiping Ma, Huanhuan Cao, Qiang Yang, Enhong Chen, and Jilei Tian. 2012. A habit mining approach for discovering similar mobile users. In Proceedings of the 21st International Conference on World Wide Web. ACM, New York, NY, 231--240. Google ScholarDigital Library
- Christian Matyas and Christoph Schlieder. 2009. A spatial user similarity measure for geographic recommender systems. In Proceedings of the 3rd International Conference on GeoSpatial Semantics (GeoS’09). 122--139. Google ScholarDigital Library
- Grant McKenzie, Benjamin Adams, and Krzysztof Janowicz. 2013. A Thematic Approach to User Similarity Built on Geosocial Check-ins. Springer International Publishing, Cham, Switzerland, 39--53.Google Scholar
- Marti Motoyama and George Varghese. 2009. I seek you: Searching and matching individuals in social networks. In Proceedings of the 11th International Workshop on Web Information and Data Management (WIDM’09). ACM, New York, NY, 67--75. Google ScholarDigital Library
- Arvind Narayanan and Vitaly Shmatikov. 2009. De-anonymizing social networks. In Proceedings of the 2009 30th IEEE Symposium on Security and Privacy (SP’09). IEEE, Los Alamitos, CA, 173--187. Google ScholarDigital Library
- Aviv Nisgav and Boaz Patt-Shamir. 2011. Finding similar users in social networks. Theory of Computing Systems 49, 4, 720--737. Google ScholarDigital Library
- Sarana Nutanong, Edwin H. Jacox, and Hanan Samet. 2011. An incremental Hausdorff distance calculation algorithm. Proceedings of the VLDB Endowment 4, 8, 506--517. Google ScholarDigital Library
- Barak Pat, Yaron Kanza, and Mor Naaman. 2015. Geosocial search: Finding places based on geotagged social-media posts. In Proceedings of the 24th International Conference on World Wide Web. ACM, Los Alamitos, CA, 231--234. Google ScholarDigital Library
- Elie Raad, Richard Chbeir, and Albert Dipanda. 2010. User profile matching in social networks. In Proceedings of the 2010 13th International Conference on Network-Based Information Systems (NBIS’10). IEEE, Los Alamitos, CA, 297--304. Google ScholarDigital Library
- Eliyahu Safra, Yaron Kanza, Yehoshua Sagiv, Catriel Beeri, and Yerach Doytsher. 2010. Location-based algorithms for finding sets of corresponding objects over several geo-spatial data sets. International Journal of Geographical Information Science 24, 1, 69--106. Google ScholarDigital Library
- Hanan Samet. 1984. The quadtree and related hierarchical data structures. ACM Computing Surveys 16, 2, 187--260. Google ScholarDigital Library
- Hanan Samet. 2005. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Series in Computer Graphics and Geometric Modeling. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
- Angela Schwering. 2008. Approaches to semantic similarity measurement for geo-spatial data: A survey. Transactions in GIS 12, 1, 5--29.Google ScholarCross Ref
- Vivek Sehgal, Lise Getoor, and Peter D. Viechnicki. 2006. Entity resolution in geospatial data integration. In Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems. ACM, New York, NY, 83--90. Google ScholarDigital Library
- Armin Stahl and Thomas Gabel. 2003. Using evolution programs to learn local similarity measures. In Proceedings of the 5th International Conference on Case-Based Reasoning: Research and Development (ICCBR’03). 537--551. http://dl.acm.org/citation.cfm?id=1760422.1760465 Google ScholarDigital Library
- Torsten Suel. 2009. Geo-targeted Web search. In Encyclopedia of Database Systems. Springer, 1251--1255.Google Scholar
- Leong Hou U, Kyriakos Mouratidis, Man Lung Yiu, and Nikos Mamoulis. 2010. Optimal matching between spatial datasets under capacity constraints. ACM Transactions on Database Systems 35, 2, Article No. 9. Google ScholarDigital Library
- Leong Hou U, Man Lung Yiu, Kyriakos Mouratidis, and Nikos Mamoulis. 2008. Capacity constrained assignment in spatial databases. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD’08). ACM, New York, NY, 15--28. Google ScholarDigital Library
- Hao Wang, Manolis Terrovitis, and Nikos Mamoulis. 2013. Location recommendation in location-based social networks using user check-in data. In Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, New York, NY, 374--383. Google ScholarDigital Library
- Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 207--244. http://dl.acm.org/citation.cfm?id=1577069.1577078 Google ScholarDigital Library
- Xiangye Xiao, Yu Zheng, Qiong Luo, and Xing Xie. 2010. Finding similar users using category-based location history. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. ACM, New York, NY, 442--445. Google ScholarDigital Library
- Mao Ye, Peifeng Yin, and Wang-Chien Lee. 2010. Location recommendation for location-based social networks. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems (GIS’10). ACM, New York, NY, 458--461. Google ScholarDigital Library
- Josh Jia-Ching Ying, Eric Hsueh-Chan Lu, Wang-Chien Lee, Tz-Chiao Weng, and Vincent S. Tseng. 2010. Mining user similarity from semantic trajectories. In Proceedings of the 2nd ACM SIGSPATIAL International Workshop on Location Based Social Networks. ACM, New York, NY, 19--26. Google ScholarDigital Library
- Nicholas Jing Yuan, Fuzheng Zhang, Defu Lian, Kai Zheng, Siyu Yu, and Xing Xie. 2013. We know how you live: Exploring the spectrum of urban lifestyles. In Proceedings of the 1st ACM Conference on Online Social Networks. ACM, New York, NY, 3--14. Google ScholarDigital Library
- Yu Zhang, Youzhong Ma, and Xiaofeng Meng. 2014. Efficient spatio-textual similarity join using MapReduce. In Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Volume 01 (WI-IAT’14). IEEE, Los Alamitos, CA, 52--59. Google ScholarDigital Library
- Vincent Wenchen Zheng, Yu Zheng, and Qiang Yang. 2009. Joint learning user’s activities and profiles from GPS data. In Proceedings of the 2009 International Workshop on Location Based Social Networks (LBSN’09). ACM, New York, NY, 17--20. Google ScholarDigital Library
- Yu Zheng and Xing Xie. 2010. Learning location correlation from GPS trajectories. In Proceedings of the 2010 11th International Conference on Mobile Data Management (MDM’10). IEEE, Los Alamitos, CA, 27--32. Google ScholarDigital Library
- Ge Zhong, Ian Goldberg, and Urs Hengartner. 2007. Louis, Lester and Pierre: Three protocols for location privacy. In Proceedings of the 7th International Conference on Privacy Enhancing Technologies (PET’07). 62--76. http://dl.acm.org/citation.cfm?id=1779330.1779335 Google ScholarDigital Library
- Yuan Zhong, Nicholas Jing Yuan, Wen Zhong, Fuzheng Zhang, and Xing Xie. 2015. You are where you go: Inferring demographic attributes from location check-ins. In Proceedings of the 8th ACM International Conference on Web Search and Data Mining. ACM, New York, NY, 295--304. Google ScholarDigital Library
Index Terms
- Location-Based Distance Measures for Geosocial Similarity
Recommendations
Similarity measures of intuitionistic fuzzy sets based on Hausdorff distance
This paper presents a new method for similarity measures between intuitionistic fuzzy sets (IFSs). We will present a method to calculate the distance between IFSs on the basis of the Hausdorff distance. We will then use this distance to generate a new ...
The earth mover's distance as a semantic measure for document similarity
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge managementDifferent words are usually assumed to be semantically independent in most existing similarity measures, which is not often true in practice. The semantic relatedness between words cannot be conveniently employed in the existing measures. We propose a ...
Distance: A more comprehensible perspective for measures in rough set theory
Distance provides a comprehensible perspective for characterizing the difference between two objects in a metric space. There are many measures which have been proposed and applied for various targets in rough set theory. In this study, through ...
Comments