research-article

Open Access

The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum

Authors:
Isaac Johnson

Northwestern University, Evanston, USA

Northwestern University, Evanston, USA
View Profile

,
Connor McMahon

University of Minnesota, Minneapolis, MI, USA

University of Minnesota, Minneapolis, MI, USA
View Profile

,
Johannes Schöning

University of Bremen, Bremen, Germany

University of Bremen, Bremen, Germany
View Profile

,
Brent Hecht

Northwestern University, Evanston, IL, USA

Northwestern University, Evanston, IL, USA
View Profile

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing SystemsMay 2017Pages 1167–1178https://doi.org/10.1145/3025453.3026015

Published:02 May 2017Publication History

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

Pages 1167–1178

ABSTRACT

Much research has shown that social media platforms have substantial population biases. However, very little is known about how these population biases affect the many algorithms that rely on social media data. Focusing on the case study of geolocation inference algorithms and their performance across the urban-rural spectrum, we establish that these algorithms exhibit significantly worse performance for underrepresented populations (i.e. rural users). We further establish that this finding is robust across both text- and network-based algorithm designs. However, we also show that some of this bias can be attributed to the design of algorithms themselves rather than population biases in the underlying data sources. For instance, in some cases, algorithms perform badly for rural users even when we substantially overcorrect for population biases by training exclusively on rural data. We discuss the implications of our findings for the design and study of social media-based algorithms.

Supplemental Material

p1167-johnson.mp4

mp4

170.7 MB

Download

References

Saeed Abdullah, Elizabeth L. Murnane, Jean M.R. Costa, and Tanzeem Choudhury. 2015. Collective Smile: Measuring Societal Happiness from Geolocated Images. In CSCW. https://doi.org/10.1145/2675133.2675186Google ScholarDigital Library
Mike Ananny, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2015. Auditing Algorithms from the Outside: Methods and Implications. In ICWSM.Google Scholar
Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Find me if you can: improving geographical prediction with social and spatial proximity. In WWW. Google ScholarDigital Library
Saeideh Bakhshi, David A. Shamma, and Eric Gilbert. 201 Faces Engage Us: Photos with Faces Attract More Likes and Comments on Instagram. In Proceedings of the 32nd Annual ACM Conference on Human Factors in Computing Systems (CHI '14), 965--974. https://doi.org/10.1145/2556288.2557403Google ScholarDigital Library
John D. Burger, John Henderson, George Kim, and Guido Zarrella. 2011. Discriminating gender on Twitter. In EMNLP.Google Scholar
Miriam Cha, Youngjune Gwon, and H. T. Kung. 2015. Twitter Geolocation and Regional Classification via Sparse Coding. In ICWSM.Google Scholar
Le Chen, Alan Mislove, and Christo Wilson. 2015. Peeking Beneath the Hood of Uber. 495--508. https://doi.org/10.1145/2815675.2815681Google ScholarDigital Library
Zhiyuan Cheng, James Caverlee, and Kyumin Lee. 2010. You Are Where You Tweet?: A Content-Based Approach to Geo-locating Twitter Users. CIKM. https://doi.org/10.1145/1871437.1871535Google ScholarDigital Library
Zhiyuan Cheng, James Caverlee, Kyumin Lee, and Daniel Z. Sui. 2011. Exploring Millions of Footprints in Location Sharing Services. ICWSM 2011.Google Scholar
Ryan Compton, David Jurgens, and David Allen. 2014. Geotagging one hundred million twitter accounts with total variation minimization. In IEEE BigData. Google ScholarCross Ref
Ryan Compton, Craig Lee, Jiejun Xu, Luis Artieda-moncada, Tsai-ching Lu, Lalindra De Silva, and Michael Macy. 2013. Using publicly visible social media to build detailed forecasts of civil unrest. 1--Google Scholar
Justin Cranshaw, Jason I Hong, and Norman Sadeh. 20 The Livehoods Project?: Utilizing Social Media to Understand the Dynamics of a City. ICWSM: 58--65.Google Scholar
Aron Culotta. 2014. Estimating county health statistics with twitter. In JSM Proceedings, 1335--1344. https://doi.org/10.1145/2556288.2557139Google ScholarDigital Library
Aron Culotta. 20 Reducing Sampling Bias in Social Media Data for County Health Inference. JSM Proceedings.Google Scholar
Mark Dredze, Michael J. Paul, Shane Bergsma, and Hieu Tran. 2013. Carmen: A twitter geolocation system with applications to public health. In AAAI Workshop: HIAI.Google Scholar
Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, and Eric P. Xing. 2010. A latent variable model for geographic lexical variation. EMNLP. https://doi.org/10.1038/nrm2900Google Scholar
Benjamin Elgin and Peter Robison. 2016. How Despots Use Twitter to Hunt Dissidents. Bloomberg Technology. Retrieved from https://www.bloomberg.com/news/articles/2016--10--27/twitter-s-firehose-of-tweets-is-incredibly-valuable-and-just-as-dangerousGoogle Scholar
David Flatow, Mor Naaman, Ke Eddie Xie, Yana Volkovich, and Yaron Kanza. 2015. On the Accuracy of Hyper-local Geotagging of Social Media Content. In WSDM. https://doi.org/10.1145/2684822.2685296Google ScholarDigital Library
Andrew Gallagher, Devashree Joshi, Jie Yu, and Jiebo Luo. 2009. Geo-location inference from image content and user tags. In Computer Vision and Pattern Recognition Workshops, 2009. CVPR Workshops 2009. IEEE Computer Society Conference on, 55--62. Google ScholarCross Ref
Ruth Garcia-Gavilanes, Daniele Quercia, and Alejandro Jaimes. 2013. Cultural dimensions in twitter: Time, individualism and power. ICWSM 13.Google Scholar
Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2008. The Network in the Garden?: An Empirical Analysis of Social Media in Rural Life. CHI: 1603--1612.Google Scholar
Eric Gilbert, Karrie Karahalios, and Christian Sandvig. 2010. The Network in the Garden: Designing Social Media for Rural Life. American Behavioral Scientist 53, 9: 1367--1388. https://doi.org/10.1177/0002764210361690 Google ScholarCross Ref
Mark Graham, Scott A. Hale, and Devin Gaffney. 2014. Where in the World Are You? Geolocation and Language Identification in Twitter. The Professional Geographer 0, 0: 1--11. https://doi.org/10.1080/00330124.2014.907699 Google ScholarCross Ref
T. Hagerstrand. 1968. Innovation diffusion as a spatial process. 334 pp.Google Scholar
Bo Han, Paul Cook, and Timothy Baldwin. 2014. Text-based twitter user geolocation prediction. Journal of Artificial Intelligence Research: 451--500.Google ScholarCross Ref
Brent Hecht and Darren Gergle. 2010. On the "localness" of user-generated content. CSCW: 229. https://doi.org/10.1145/1718918.1718962Google ScholarDigital Library
Brent Hecht, Lichan Hong, Bongwon Suh, and Ed H. Chi. 2011. Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles. In CHI. Google ScholarDigital Library
Brent Hecht and Monica Stephens. 2014. A Tale of Cities: Urban Biases in Volunteered Geographic Information. In Eighth International AAAI Conference on Weblogs and Social Media.Google ScholarCross Ref
DD Ingram and SJ Franco. 2014. 2013 NCHS urban-rural classification scheme for counties. Vital Health Statistics 2, 166.Google Scholar
Yushi Jing, David Liu, Dmitry Kislyuk, Andrew Zhai, Jiajing Xu, Jeff Donahue, and Sarah Tavel. 2015. Visual Search at Pinterest. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '15), 1889--1898. https://doi.org/10.1145/2783258.2788621Google ScholarDigital Library
Isaac L. Johnson, Subhasree Sengupta, Johannes Schöning, and Brent Hecht. 2016. The Geography and Importance of Localness in Geotagged Social Media. In 2016 CHI Conference on Human Factors in Computing Systems, 515--526. https://doi.org/10.1145/2858036.2858122Google ScholarDigital Library
Isaac Johnson, Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, and Brent Hecht. 2016. Not at Home on the Range: Peer Production and the Urban/Rural Divide. CHI. Google ScholarDigital Library
David Jurgens. 2013. That's What Friends Are For: Inferring Location in Online Social Media Platforms Based on Social Relationships. ICWSM 13: 273--282.Google Scholar
David Jurgens, Tyler Finethy, James McCorriston, Yi Tian Xu, and Derek Ruths. 2015. Geolocation prediction in twitter using social networks: A critical analysis and review of current practice. In ICWSM.Google Scholar
Matthew Kay, Cynthia Matuszek, and Sean A. Munson. 2015. Unequal Representation and Gender Stereotypes in Image Search Results for Occupations. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15), 3819--3828. https://doi.org/10.1145/2702123.2702520Google ScholarDigital Library
Lorin D. Kusmin. 2016. Rural America At A Glance: 2015 Edition. USA Dept. of Agriculture. Retrieved from http://www.ers.usda.gov/media/1952235/eib145.pdfGoogle Scholar
Virgile Landeiro and Aron Culotta. 2016. Robust text classification in the presence of confounding bias. In Thirtieth AAAI Conference on Artificial Intelligence. Retrieved May 17, 2016 from http://www.aaai.org/Conferences/AAAI/2016/Papers/02Landeiro12445.pdfGoogle ScholarDigital Library
Géraud Le Falher, Aristides Gionis, and Michael Mathioudakis. 2015. Where Is the Soho of Rome? Measures and Algorithms for Finding Similar Neighborhoods in Cities. In ICWSM.Google Scholar
Linna Li, Michael F. Goodchild, and Bo Xu. 2013. Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science 40, 2: 61--77. https://doi.org/10.1080/15230406.2013.777139 Google ScholarCross Ref
Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and Kevin Chen-Chuan Chang. 2012. Towards social user profiling: unified and discriminative influence model for inferring home locations. In SIGKDD. Google ScholarDigital Library
Xutao Li, Tuan-Anh Nguyen Pham, Gao Cong, Quan Yuan, Xiao-Li Li, and Shonali Krishnaswamy. 2015. Where You Instagram?: Associating Your Instagram Photos with Points of Interest. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (CIKM '15), 1231--1240. https://doi.org/10.1145/2806416.2806463Google ScholarDigital Library
J. Lindamood, R. Heatherly, M. Kantarcioglu, and B. Thuraisingham. 2009. Inferring Private Information Using Social Network Data. In WWW '09: 2009 International World Wide Web Conference. Google ScholarDigital Library
Jalal Mahmud, Jeffrey Nichols, and Clemens Drews. 2014. Home Location Identification of Twitter Users. ACM TIST 5, 3: 1--21. https://doi.org/10.1145/2528548Google ScholarDigital Library
Momin M. Malik, Hemank Lamba, Constantine Nakos, and Jürgen Pfeffer. 2015. Population Bias in Geotagged Tweets. In ICWSM.Google Scholar
Jeffrey McGee, James Caverlee, and Zhiyuan Cheng. 2013. Location prediction in social media based on tie strength. In CIKM, 459--468. https://doi.org/10.1145/2505515.2505544Google ScholarDigital Library
Alan Mislove, Sune Lehmann, Yong-yeol Ahn, Jukka-pekka Onnela, and J Niels Rosenquist. Understanding the Demographics of Twitter Users. ICWSM: 554--557.Google Scholar
Lewis Mitchell, Morgan R Frank, Kameron Decker Harris, Peter Sheridan Dodds, and Christopher M Danforth. 2013. The geography of happiness: connecting twitter sentiment and expression, demographics, and objective characteristics of place. PloS one 8, 5: e64417. https://doi.org/10.1371/journal.pone.0064417Google ScholarCross Ref
Cathy O'Neil. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Crown, New York.Google Scholar
Aditya Pal, Amac Herdagdelen, Sourav Chatterji, Sumit Taank, and Deepayan Chakrabarti. 2016. Discovery of Topical Authorities in Instagram. In WWW. Google ScholarDigital Library
Umashanthi Pavalanathan and Jacob Eisenstein. 2015. Confounds and Consequences in Geotagged Twitter Data. EMNLP. Google ScholarCross Ref
Andrew Perrin. 2015. Social Media Usage: 2005--2015. Pew Research Center.Google Scholar
Reid Priedhorsky, Aron Culotta, and Sara Y. Del Valle. 2014. Inferring the Origin Locations of Tweets with Quantitative Confidence. CSCW 29: 997--1003. Google ScholarDigital Library
Stephen Roller, Michael Speriosu, Sarat Rallapalli, Benjamin Wing, and Jason Baldridge. 2012. Supervised text-based geolocation using language models on an adaptive grid. In EMNLP-CoNLL.Google Scholar
Dominic Rout, Kalina Bontcheva, Daniel Preotiuc-Pietro, and Trevor Cohn. 2013. Where's@ wally?: a classification approach to geolocating users based on their social ties. In Hypertext, 11--20.Google Scholar
Derek Ruths and Jürgen Pfeffer. 2014. Social media for large studies of behavior. Science 346, 6213: 1063--1064. https://doi.org/10.1126/science.346.6213.1063 Google ScholarCross Ref
Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and Discrimination.Google Scholar
Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2015. Can an Algorithm be Unethical? In 65th Annual Meeting of the International Communication Association.Google Scholar
Shilad Sen, Toby Jia-Jun Li, WikiBrain Team, and Brent Hecht. 2014. WikiBrain: Democratizing Computation on Wikipedia. In OpenSym (OpenSym '14), 27:1--27:10. https://doi.org/10.1145/2641580.2641615Google ScholarDigital Library
Börkur Sigurbjörnsson and Roelof Van Zwol. 2008. Flickr tag recommendation based on collective knowledge. In WWW. Google ScholarDigital Library
Gary Soeller, Karrie Karahalios, Christian Sandvig, and Christo Wilson. 2016. MapWatch: Detecting and Monitoring International Border Personalization on Online Maps. In Proceedings of the 25th International Conference on World Wide Web (WWW '16), 867--878. https://doi.org/10.1145/2872427.2883016Google ScholarDigital Library
Monica Stephens. 2013. Gender and the GeoWeb: divisions in the production of user-generated cartographic information. GeoJournal 78, 6: 981--996. https://doi.org/10.1007/s10708-013--9492-z Google ScholarCross Ref
Suresh Venkatasubramanian. 2016. Algorithmic Fairness: From social good to a mathematical framework. Retrieved September 17, 2016 from https://algorithmicfairness.wordpress.com/2016/04/15/keynote-at-icwsm/Google Scholar
Jacob Thebault-Spieker, Loren G. Terveen, and Brent Hecht. 2015. Avoiding the South Side and the Suburbs: The Geography of Mobile Crowdsourcing Markets. In Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW '15), 265--275. https://doi.org/10.1145/2675133.2675278Google ScholarDigital Library
Benjamin P. Wing and Jason Baldridge. 2011. Simple supervised document geolocation with geodesic grids. In ACL.Google Scholar
Wilbur Zelinsky. 1980. North America's Vernacular Regions. Annals of the Association of American Geographers 70, 1: 1--16. https://doi.org/10.1111/j.1467--8306.1980.tb01293.x Google ScholarCross Ref
Danning Zheng, Tianran Hu, Quanzeng You, Henry Kautz, and Jiebo Luo. 2015. Towards Lifestyle Understanding: Predicting Home and Vacation Locations from User's Online Photo Collections. In ICWSM.Google Scholar
Kathryn Zickuhr and Aaron Smith. 2011. 28% of American Adults Use Mobile and Social Location-Based Services. Pew Internet and American Life Project.Google Scholar

Index Terms

The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum
1. Human-centered computing

Recommendations

Initial-population bias in the univariate estimation of distribution algorithm
GECCO '09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation

This paper analyzes the effects of an initial-population bias on the performance of the univariate marginal distribution algorithm (UMDA). The analysis considers two test problems: (1) onemax and (2) noisy onemax. Theoretical models are provided and ...
Read More
Uses and gratifications of social networking sites for bridging and bonding social capital

Applying uses and gratifications theory (UGT) and social capital theory, our study examined users of four social networking sites (SNSs) (Facebook, Twitter, Instagram, and Snapchat), and their influence on online bridging and bonding social capital. ...
Read More
Social capital, social media, and TV ratings

Motivated by the increasing role of social media in relating to economic outcomes, this paper examines the relationship between social networking sites SNS and television ratings drawing from the social capital theoretical framework of bonding and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems
May 2017
7138 pages
ISBN:9781450346559
DOI:10.1145/3025453
General Chairs:
Gloria Mark
University of California Irvine
,
Susan Fussell
Cornell University
,
Program Chairs:
Cliff Lampe
University of Michigan
,
m.c. schraefel
University of Southampton
,
Juan Pablo Hourcade
University of Iowa
,
Caroline Appert
Université Paris-Sud
,
Daniel Wigdor
University of Toronto
Copyright © 2017 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 May 2017
Check for updates
Author Tags
algorithmic accountability
geolocation inference
population bias
social media
Qualifiers
- research-article
Conference

Acceptance Rates
CHI '17 Paper Acceptance Rate600of2,400submissions,25%Overall Acceptance Rate6,199of26,314submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 39
  Total Citations
  View Citations
- 2,555
  Total Downloads
- Downloads (Last 12 months)399
- Downloads (Last 6 weeks)56
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

The Effect of Population and "Structural" Biases on Social Media-based Algorithms: A Case Study in Geolocation Inference Across the Urban-Rural Spectrum

CHI '17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Initial-population bias in the univariate estimation of distribution algorithm

Uses and gratifications of social networking sites for bridging and bonding social capital

Social capital, social media, and TV ratings