ABSTRACT
Much research has shown that social media platforms have substantial population biases. However, very little is known about how these population biases affect the many algorithms that rely on social media data. Focusing on the case study of geolocation inference algorithms and their performance across the urban-rural spectrum, we establish that these algorithms exhibit significantly worse performance for underrepresented populations (i.e. rural users). We further establish that this finding is robust across both text- and network-based algorithm designs. However, we also show that some of this bias can be attributed to the design of algorithms themselves rather than population biases in the underlying data sources. For instance, in some cases, algorithms perform badly for rural users even when we substantially overcorrect for population biases by training exclusively on rural data. We discuss the implications of our findings for the design and study of social media-based algorithms.
Supplemental Material
- Saeed Abdullah, Elizabeth L. Murnane, Jean M.R. Costa, and Tanzeem Choudhury. 2015. Collective Smile: Measuring Societal Happiness from Geolocated Images. In CSCW. https://doi.org/10.1145/2675133.2675186Google ScholarDigital Library
- Mike Ananny, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2015. Auditing Algorithms from the Outside: Methods and Implications. In ICWSM.Google Scholar
- Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me if you can: improving geographical prediction with social and spatial proximity. In WWW. Google ScholarDigital Library
- Saeideh Bakhshi, David A. Shamma, and Eric Gilbert. 201 Faces Engage Us: Photos with Faces Attract More Likes and Comments on Instagram. In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14), 965--974. https://doi.org/10.1145/2556288.2557403Google ScholarDigital Library
- John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In EMNLP.Google Scholar
- Miriam Cha, Youngjune Gwon, and H. T. Kung. 2015. Twitter Geolocation and Regional Classification via Sparse Coding. In ICWSM.Google Scholar
- Le Chen, Alan Mislove, and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. 495--508. https://doi.org/10.1145/2815675.2815681Google ScholarDigital Library
- Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You Are Where You Tweet?: A Content-Based Approach to Geo-locating Twitter Users. CIKM. https://doi.org/10.1145/1871437.1871535Google ScholarDigital Library
- Zhiyuan Cheng, James Caverlee, Kyumin Lee, and Daniel Z. Sui. 2011. Exploring Millions of Footprints in Location Sharing Services. ICWSM 2011.Google Scholar
- Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million twitter accounts with total variation minimization. In IEEE BigData. Google ScholarCross Ref
- Ryan Compton, Craig Lee, Jiejun Xu, Luis Artieda-moncada, Tsai-ching Lu, Lalindra De Silva, and Michael Macy. 2013. Using publicly visible social media to build detailed forecasts of civil unrest. 1--Google Scholar
- Justin Cranshaw, Jason I Hong, and Norman Sadeh. 20 The Livehoods Project?: Utilizing Social Media to Understand the Dynamics of a City. ICWSM: 58--65.Google Scholar
- Aron Culotta. 2014. Estimating county health statistics with twitter. In JSM Proceedings, 1335--1344. https://doi.org/10.1145/2556288.2557139Google ScholarDigital Library
- Aron Culotta. 20 Reducing Sampling Bias in Social Media Data for County Health Inference. JSM Proceedings.Google Scholar
- Mark Dredze, Michael J. Paul, Shane Bergsma, and Hieu Tran. 2013. Carmen: A twitter geolocation system with applications to public health. In AAAI Workshop: HIAI.Google Scholar
- Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, and Eric P. Xing. 2010. A latent variable model for geographic lexical variation. EMNLP. https://doi.org/10.1038/nrm2900Google Scholar
- Benjamin Elgin and Peter Robison. 2016. How Despots Use Twitter to Hunt Dissidents. Bloomberg Technology. Retrieved from https://www.bloomberg.com/news/articles/2016--10--27/twitter-s-firehose-of-tweets-is-incredibly-valuable-and-just-as-dangerousGoogle Scholar
- David Flatow, Mor Naaman, Ke Eddie Xie, Yana Volkovich, and Yaron Kanza. 2015. On the Accuracy of Hyper-local Geotagging of Social Media Content. In WSDM. https://doi.org/10.1145/2684822.2685296Google ScholarDigital Library
- Andrew Gallagher, Devashree Joshi, Jie Yu, and Jiebo Luo. 2009. Geo-location inference from image content and user tags. In Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, 55--62. Google ScholarCross Ref
- Ruth Garcia-Gavilanes, Daniele Quercia, and Alejandro Jaimes. 2013. Cultural dimensions in twitter: Time, individualism and power. ICWSM 13.Google Scholar
- Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2008. The Network in the Garden?: An Empirical Analysis of Social Media in Rural Life. CHI: 1603--1612.Google Scholar
- Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2010. The Network in the Garden: Designing Social Media for Rural Life. American Behavioral Scientist 53, 9: 1367--1388. https://doi.org/10.1177/0002764210361690 Google ScholarCross Ref
- Mark Graham, Scott A. Hale, and Devin Gaffney. 2014. Where in the World Are You? Geolocation and Language Identification in Twitter. The Professional Geographer 0, 0: 1--11. https://doi.org/10.1080/00330124.2014.907699 Google ScholarCross Ref
- T. Hagerstrand. 1968. Innovation diffusion as a spatial process. 334 pp.Google Scholar
- Bo Han, Paul Cook, and Timothy Baldwin. 2014. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research: 451--500.Google ScholarCross Ref
- Brent Hecht and Darren Gergle. 2010. On the "localness" of user-generated content. CSCW: 229. https://doi.org/10.1145/1718918.1718962Google ScholarDigital Library
- Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H. Chi. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In CHI. Google ScholarDigital Library
- Brent Hecht and Monica Stephens. 2014. A Tale of Cities: Urban Biases in Volunteered Geographic Information. In Eighth International AAAI Conference on Weblogs and Social Media.Google ScholarCross Ref
- DD Ingram and SJ Franco. 2014. 2013 NCHS urban-rural classification scheme for counties. Vital Health Statistics 2, 166.Google Scholar
- Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual Search at Pinterest. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15), 1889--1898. https://doi.org/10.1145/2783258.2788621Google ScholarDigital Library
- Isaac L. Johnson, Subhasree Sengupta, Johannes Schöning, and Brent Hecht. 2016. The Geography and Importance of Localness in Geotagged Social Media. In 2016 CHI Conference on Human Factors in Computing Systems, 515--526. https://doi.org/10.1145/2858036.2858122Google ScholarDigital Library
- Isaac Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at Home on the Range: Peer Production and the Urban/Rural Divide. CHI. Google ScholarDigital Library
- David Jurgens. 2013. That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships. ICWSM 13: 273--282.Google Scholar
- David Jurgens, Tyler Finethy, James McCorriston, Yi Tian Xu, and Derek Ruths. 2015. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. In ICWSM.Google Scholar
- Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 3819--3828. https://doi.org/10.1145/2702123.2702520Google ScholarDigital Library
- Lorin D. Kusmin. 2016. Rural America At A Glance: 2015 Edition. USA Dept. of Agriculture. Retrieved from http://www.ers.usda.gov/media/1952235/eib145.pdfGoogle Scholar
- Virgile Landeiro and Aron Culotta. 2016. Robust text classification in the presence of confounding bias. In Thirtieth AAAI Conference on Artificial Intelligence. Retrieved May 17, 2016 from http://www.aaai.org/Conferences/AAAI/2016/Papers/02Landeiro12445.pdfGoogle ScholarDigital Library
- Géraud Le Falher, Aristides Gionis, and Michael Mathioudakis. 2015. Where Is the Soho of Rome? Measures and Algorithms for Finding Similar Neighborhoods in Cities. In ICWSM.Google Scholar
- Linna Li, Michael F. Goodchild, and Bo Xu. 2013. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science 40, 2: 61--77. https://doi.org/10.1080/15230406.2013.777139 Google ScholarCross Ref
- Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: unified and discriminative influence model for inferring home locations. In SIGKDD. Google ScholarDigital Library
- Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, and Shonali Krishnaswamy. 2015. Where You Instagram?: Associating Your Instagram Photos with Points of Interest. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15), 1231--1240. https://doi.org/10.1145/2806416.2806463Google ScholarDigital Library
- J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. 2009. Inferring Private Information Using Social Network Data. In WWW '09: 2009 International World Wide Web Conference. Google ScholarDigital Library
- Jalal Mahmud, Jeffrey Nichols, and Clemens Drews. 2014. Home Location Identification of Twitter Users. ACM TIST 5, 3: 1--21. https://doi.org/10.1145/2528548Google ScholarDigital Library
- Momin M. Malik, Hemank Lamba, Constantine Nakos, and Jürgen Pfeffer. 2015. Population Bias in Geotagged Tweets. In ICWSM.Google Scholar
- Jeffrey McGee, James Caverlee, and Zhiyuan Cheng. 2013. Location prediction in social media based on tie strength. In CIKM, 459--468. https://doi.org/10.1145/2505515.2505544Google ScholarDigital Library
- Alan Mislove, Sune Lehmann, Yong-yeol Ahn, Jukka-pekka Onnela, and J Niels Rosenquist. Understanding the Demographics of Twitter Users. ICWSM: 554--557.Google Scholar
- Lewis Mitchell, Morgan R Frank, Kameron Decker Harris, Peter Sheridan Dodds, and Christopher M Danforth. 2013. The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place. PloS one 8, 5: e64417. https://doi.org/10.1371/journal.pone.0064417Google ScholarCross Ref
- Cathy O'Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, New York.Google Scholar
- Aditya Pal, Amac Herdagdelen, Sourav Chatterji, Sumit Taank, and Deepayan Chakrabarti. 2016. Discovery of Topical Authorities in Instagram. In WWW. Google ScholarDigital Library
- Umashanthi Pavalanathan and Jacob Eisenstein. 2015. Confounds and Consequences in Geotagged Twitter Data. EMNLP. Google ScholarCross Ref
- Andrew Perrin. 2015. Social Media Usage: 2005--2015. Pew Research Center.Google Scholar
- Reid Priedhorsky, Aron Culotta, and Sara Y. Del Valle. 2014. Inferring the Origin Locations of Tweets with Quantitative Confidence. CSCW 29: 997--1003. Google ScholarDigital Library
- Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing, and Jason Baldridge. 2012. Supervised text-based geolocation using language models on an adaptive grid. In EMNLP-CoNLL.Google Scholar
- Dominic Rout, Kalina Bontcheva, Daniel Preotiuc-Pietro, and Trevor Cohn. 2013. Where's@ wally?: a classification approach to geolocating users based on their social ties. In Hypertext, 11--20.Google Scholar
- Derek Ruths and Jürgen Pfeffer. 2014. Social media for large studies of behavior. Science 346, 6213: 1063--1064. https://doi.org/10.1126/science.346.6213.1063 Google ScholarCross Ref
- Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and Discrimination.Google Scholar
- Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2015. Can an Algorithm be Unethical? In 65th Annual Meeting of the International Communication Association.Google Scholar
- Shilad Sen, Toby Jia-Jun Li, WikiBrain Team, and Brent Hecht. 2014. WikiBrain: Democratizing Computation on Wikipedia. In OpenSym (OpenSym '14), 27:1--27:10. https://doi.org/10.1145/2641580.2641615Google ScholarDigital Library
- Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In WWW. Google ScholarDigital Library
- Gary Soeller, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2016. MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. In Proceedings of the 25th International Conference on World Wide Web (WWW '16), 867--878. https://doi.org/10.1145/2872427.2883016Google ScholarDigital Library
- Monica Stephens. 2013. Gender and the GeoWeb: divisions in the production of user-generated cartographic information. GeoJournal 78, 6: 981--996. https://doi.org/10.1007/s10708-013--9492-z Google ScholarCross Ref
- Suresh Venkatasubramanian. 2016. Algorithmic Fairness: From social good to a mathematical framework. Retrieved September 17, 2016 from https://algorithmicfairness.wordpress.com/2016/04/15/keynote-at-icwsm/Google Scholar
- Jacob Thebault-Spieker, Loren G. Terveen, and Brent Hecht. 2015. Avoiding the South Side and the Suburbs: The Geography of Mobile Crowdsourcing Markets. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15), 265--275. https://doi.org/10.1145/2675133.2675278Google ScholarDigital Library
- Benjamin P. Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In ACL.Google Scholar
- Wilbur Zelinsky. 1980. North America's Vernacular Regions. Annals of the Association of American Geographers 70, 1: 1--16. https://doi.org/10.1111/j.1467--8306.1980.tb01293.x Google ScholarCross Ref
- Danning Zheng, Tianran Hu, Quanzeng You, Henry Kautz, and Jiebo Luo. 2015. Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User's Online Photo Collections. In ICWSM.Google Scholar
- Kathryn Zickuhr and Aaron Smith. 2011. 28% of American Adults Use Mobile and Social Location-Based Services. Pew Internet and American Life Project.Google Scholar
Index Terms
- The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum
Recommendations
Initial-population bias in the univariate estimation of distribution algorithm
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computationThis paper analyzes the effects of an initial-population bias on the performance of the univariate marginal distribution algorithm (UMDA). The analysis considers two test problems: (1) onemax and (2) noisy onemax. Theoretical models are provided and ...
Uses and gratifications of social networking sites for bridging and bonding social capital
Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...
Social capital, social media, and TV ratings
Motivated by the increasing role of social media in relating to economic outcomes, this paper examines the relationship between social networking sites SNS and television ratings drawing from the social capital theoretical framework of bonding and ...
Comments