skip to main content
10.1145/3025453.3026015acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
research-article
Open Access

The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum

Published:02 May 2017Publication History

ABSTRACT

Much research has shown that social media platforms have substantial population biases. However, very little is known about how these population biases affect the many algorithms that rely on social media data. Focusing on the case study of geolocation inference algorithms and their performance across the urban-rural spectrum, we establish that these algorithms exhibit significantly worse performance for underrepresented populations (i.e. rural users). We further establish that this finding is robust across both text- and network-based algorithm designs. However, we also show that some of this bias can be attributed to the design of algorithms themselves rather than population biases in the underlying data sources. For instance, in some cases, algorithms perform badly for rural users even when we substantially overcorrect for population biases by training exclusively on rural data. We discuss the implications of our findings for the design and study of social media-based algorithms.

Skip Supplemental Material Section

Supplemental Material

p1167-johnson.mp4

mp4

170.7 MB

References

  1. Saeed Abdullah, Elizabeth L. Murnane, Jean M.R. Costa, and Tanzeem Choudhury. 2015. Collective Smile: Measuring Societal Happiness from Geolocated Images. In CSCW. https://doi.org/10.1145/2675133.2675186Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mike Ananny, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2015. Auditing Algorithms from the Outside: Methods and Implications. In ICWSM.Google ScholarGoogle Scholar
  3. Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me if you can: improving geographical prediction with social and spatial proximity. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Saeideh Bakhshi, David A. Shamma, and Eric Gilbert. 201 Faces Engage Us: Photos with Faces Attract More Likes and Comments on Instagram. In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14), 965--974. https://doi.org/10.1145/2556288.2557403Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In EMNLP.Google ScholarGoogle Scholar
  6. Miriam Cha, Youngjune Gwon, and H. T. Kung. 2015. Twitter Geolocation and Regional Classification via Sparse Coding. In ICWSM.Google ScholarGoogle Scholar
  7. Le Chen, Alan Mislove, and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. 495--508. https://doi.org/10.1145/2815675.2815681Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You Are Where You Tweet?: A Content-Based Approach to Geo-locating Twitter Users. CIKM. https://doi.org/10.1145/1871437.1871535Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Zhiyuan Cheng, James Caverlee, Kyumin Lee, and Daniel Z. Sui. 2011. Exploring Millions of Footprints in Location Sharing Services. ICWSM 2011.Google ScholarGoogle Scholar
  10. Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million twitter accounts with total variation minimization. In IEEE BigData. Google ScholarGoogle ScholarCross RefCross Ref
  11. Ryan Compton, Craig Lee, Jiejun Xu, Luis Artieda-moncada, Tsai-ching Lu, Lalindra De Silva, and Michael Macy. 2013. Using publicly visible social media to build detailed forecasts of civil unrest. 1--Google ScholarGoogle Scholar
  12. Justin Cranshaw, Jason I Hong, and Norman Sadeh. 20 The Livehoods Project?: Utilizing Social Media to Understand the Dynamics of a City. ICWSM: 58--65.Google ScholarGoogle Scholar
  13. Aron Culotta. 2014. Estimating county health statistics with twitter. In JSM Proceedings, 1335--1344. https://doi.org/10.1145/2556288.2557139Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Aron Culotta. 20 Reducing Sampling Bias in Social Media Data for County Health Inference. JSM Proceedings.Google ScholarGoogle Scholar
  15. Mark Dredze, Michael J. Paul, Shane Bergsma, and Hieu Tran. 2013. Carmen: A twitter geolocation system with applications to public health. In AAAI Workshop: HIAI.Google ScholarGoogle Scholar
  16. Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, and Eric P. Xing. 2010. A latent variable model for geographic lexical variation. EMNLP. https://doi.org/10.1038/nrm2900Google ScholarGoogle Scholar
  17. Benjamin Elgin and Peter Robison. 2016. How Despots Use Twitter to Hunt Dissidents. Bloomberg Technology. Retrieved from https://www.bloomberg.com/news/articles/2016--10--27/twitter-s-firehose-of-tweets-is-incredibly-valuable-and-just-as-dangerousGoogle ScholarGoogle Scholar
  18. David Flatow, Mor Naaman, Ke Eddie Xie, Yana Volkovich, and Yaron Kanza. 2015. On the Accuracy of Hyper-local Geotagging of Social Media Content. In WSDM. https://doi.org/10.1145/2684822.2685296Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Andrew Gallagher, Devashree Joshi, Jie Yu, and Jiebo Luo. 2009. Geo-location inference from image content and user tags. In Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, 55--62. Google ScholarGoogle ScholarCross RefCross Ref
  20. Ruth Garcia-Gavilanes, Daniele Quercia, and Alejandro Jaimes. 2013. Cultural dimensions in twitter: Time, individualism and power. ICWSM 13.Google ScholarGoogle Scholar
  21. Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2008. The Network in the Garden?: An Empirical Analysis of Social Media in Rural Life. CHI: 1603--1612.Google ScholarGoogle Scholar
  22. Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2010. The Network in the Garden: Designing Social Media for Rural Life. American Behavioral Scientist 53, 9: 1367--1388. https://doi.org/10.1177/0002764210361690 Google ScholarGoogle ScholarCross RefCross Ref
  23. Mark Graham, Scott A. Hale, and Devin Gaffney. 2014. Where in the World Are You? Geolocation and Language Identification in Twitter. The Professional Geographer 0, 0: 1--11. https://doi.org/10.1080/00330124.2014.907699 Google ScholarGoogle ScholarCross RefCross Ref
  24. T. Hagerstrand. 1968. Innovation diffusion as a spatial process. 334 pp.Google ScholarGoogle Scholar
  25. Bo Han, Paul Cook, and Timothy Baldwin. 2014. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research: 451--500.Google ScholarGoogle ScholarCross RefCross Ref
  26. Brent Hecht and Darren Gergle. 2010. On the "localness" of user-generated content. CSCW: 229. https://doi.org/10.1145/1718918.1718962Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H. Chi. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Brent Hecht and Monica Stephens. 2014. A Tale of Cities: Urban Biases in Volunteered Geographic Information. In Eighth International AAAI Conference on Weblogs and Social Media.Google ScholarGoogle ScholarCross RefCross Ref
  29. DD Ingram and SJ Franco. 2014. 2013 NCHS urban-rural classification scheme for counties. Vital Health Statistics 2, 166.Google ScholarGoogle Scholar
  30. Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual Search at Pinterest. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15), 1889--1898. https://doi.org/10.1145/2783258.2788621Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Isaac L. Johnson, Subhasree Sengupta, Johannes Schöning, and Brent Hecht. 2016. The Geography and Importance of Localness in Geotagged Social Media. In 2016 CHI Conference on Human Factors in Computing Systems, 515--526. https://doi.org/10.1145/2858036.2858122Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Isaac Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at Home on the Range: Peer Production and the Urban/Rural Divide. CHI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. David Jurgens. 2013. That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships. ICWSM 13: 273--282.Google ScholarGoogle Scholar
  34. David Jurgens, Tyler Finethy, James McCorriston, Yi Tian Xu, and Derek Ruths. 2015. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. In ICWSM.Google ScholarGoogle Scholar
  35. Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 3819--3828. https://doi.org/10.1145/2702123.2702520Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Lorin D. Kusmin. 2016. Rural America At A Glance: 2015 Edition. USA Dept. of Agriculture. Retrieved from http://www.ers.usda.gov/media/1952235/eib145.pdfGoogle ScholarGoogle Scholar
  37. Virgile Landeiro and Aron Culotta. 2016. Robust text classification in the presence of confounding bias. In Thirtieth AAAI Conference on Artificial Intelligence. Retrieved May 17, 2016 from http://www.aaai.org/Conferences/AAAI/2016/Papers/02Landeiro12445.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  38. Géraud Le Falher, Aristides Gionis, and Michael Mathioudakis. 2015. Where Is the Soho of Rome? Measures and Algorithms for Finding Similar Neighborhoods in Cities. In ICWSM.Google ScholarGoogle Scholar
  39. Linna Li, Michael F. Goodchild, and Bo Xu. 2013. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science 40, 2: 61--77. https://doi.org/10.1080/15230406.2013.777139 Google ScholarGoogle ScholarCross RefCross Ref
  40. Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: unified and discriminative influence model for inferring home locations. In SIGKDD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, and Shonali Krishnaswamy. 2015. Where You Instagram?: Associating Your Instagram Photos with Points of Interest. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15), 1231--1240. https://doi.org/10.1145/2806416.2806463Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. 2009. Inferring Private Information Using Social Network Data. In WWW '09: 2009 International World Wide Web Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Jalal Mahmud, Jeffrey Nichols, and Clemens Drews. 2014. Home Location Identification of Twitter Users. ACM TIST 5, 3: 1--21. https://doi.org/10.1145/2528548Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Momin M. Malik, Hemank Lamba, Constantine Nakos, and Jürgen Pfeffer. 2015. Population Bias in Geotagged Tweets. In ICWSM.Google ScholarGoogle Scholar
  45. Jeffrey McGee, James Caverlee, and Zhiyuan Cheng. 2013. Location prediction in social media based on tie strength. In CIKM, 459--468. https://doi.org/10.1145/2505515.2505544Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Alan Mislove, Sune Lehmann, Yong-yeol Ahn, Jukka-pekka Onnela, and J Niels Rosenquist. Understanding the Demographics of Twitter Users. ICWSM: 554--557.Google ScholarGoogle Scholar
  47. Lewis Mitchell, Morgan R Frank, Kameron Decker Harris, Peter Sheridan Dodds, and Christopher M Danforth. 2013. The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place. PloS one 8, 5: e64417. https://doi.org/10.1371/journal.pone.0064417Google ScholarGoogle ScholarCross RefCross Ref
  48. Cathy O'Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, New York.Google ScholarGoogle Scholar
  49. Aditya Pal, Amac Herdagdelen, Sourav Chatterji, Sumit Taank, and Deepayan Chakrabarti. 2016. Discovery of Topical Authorities in Instagram. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Umashanthi Pavalanathan and Jacob Eisenstein. 2015. Confounds and Consequences in Geotagged Twitter Data. EMNLP. Google ScholarGoogle ScholarCross RefCross Ref
  51. Andrew Perrin. 2015. Social Media Usage: 2005--2015. Pew Research Center.Google ScholarGoogle Scholar
  52. Reid Priedhorsky, Aron Culotta, and Sara Y. Del Valle. 2014. Inferring the Origin Locations of Tweets with Quantitative Confidence. CSCW 29: 997--1003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing, and Jason Baldridge. 2012. Supervised text-based geolocation using language models on an adaptive grid. In EMNLP-CoNLL.Google ScholarGoogle Scholar
  54. Dominic Rout, Kalina Bontcheva, Daniel Preotiuc-Pietro, and Trevor Cohn. 2013. Where's@ wally?: a classification approach to geolocating users based on their social ties. In Hypertext, 11--20.Google ScholarGoogle Scholar
  55. Derek Ruths and Jürgen Pfeffer. 2014. Social media for large studies of behavior. Science 346, 6213: 1063--1064. https://doi.org/10.1126/science.346.6213.1063 Google ScholarGoogle ScholarCross RefCross Ref
  56. Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and Discrimination.Google ScholarGoogle Scholar
  57. Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2015. Can an Algorithm be Unethical? In 65th Annual Meeting of the International Communication Association.Google ScholarGoogle Scholar
  58. Shilad Sen, Toby Jia-Jun Li, WikiBrain Team, and Brent Hecht. 2014. WikiBrain: Democratizing Computation on Wikipedia. In OpenSym (OpenSym '14), 27:1--27:10. https://doi.org/10.1145/2641580.2641615Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In WWW. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Gary Soeller, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2016. MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. In Proceedings of the 25th International Conference on World Wide Web (WWW '16), 867--878. https://doi.org/10.1145/2872427.2883016Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Monica Stephens. 2013. Gender and the GeoWeb: divisions in the production of user-generated cartographic information. GeoJournal 78, 6: 981--996. https://doi.org/10.1007/s10708-013--9492-z Google ScholarGoogle ScholarCross RefCross Ref
  62. Suresh Venkatasubramanian. 2016. Algorithmic Fairness: From social good to a mathematical framework. Retrieved September 17, 2016 from https://algorithmicfairness.wordpress.com/2016/04/15/keynote-at-icwsm/Google ScholarGoogle Scholar
  63. Jacob Thebault-Spieker, Loren G. Terveen, and Brent Hecht. 2015. Avoiding the South Side and the Suburbs: The Geography of Mobile Crowdsourcing Markets. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15), 265--275. https://doi.org/10.1145/2675133.2675278Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Benjamin P. Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In ACL.Google ScholarGoogle Scholar
  65. Wilbur Zelinsky. 1980. North America's Vernacular Regions. Annals of the Association of American Geographers 70, 1: 1--16. https://doi.org/10.1111/j.1467--8306.1980.tb01293.x Google ScholarGoogle ScholarCross RefCross Ref
  66. Danning Zheng, Tianran Hu, Quanzeng You, Henry Kautz, and Jiebo Luo. 2015. Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User's Online Photo Collections. In ICWSM.Google ScholarGoogle Scholar
  67. Kathryn Zickuhr and Aaron Smith. 2011. 28% of American Adults Use Mobile and Social Location-Based Services. Pew Internet and American Life Project.Google ScholarGoogle Scholar

Index Terms

  1. The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
      May 2017
      7138 pages
      ISBN:9781450346559
      DOI:10.1145/3025453

      Copyright © 2017 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 May 2017

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      CHI '17 Paper Acceptance Rate600of2,400submissions,25%Overall Acceptance Rate6,199of26,314submissions,24%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader