skip to main content
10.1145/3038912.3052665acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Sampling from Social Networks with Attributes

Published: 03 April 2017 Publication History

Abstract

Sampling from large networks represents a fundamental challenge for social network research. In this paper, we explore the sensitivity of different sampling techniques (node sampling, edge sampling, random walk sampling, and snowball sampling) on social networks with attributes. We consider the special case of networks (i) where we have one attribute with two values (e.g., male and female in the case of gender), (ii) where the size of the two groups is unequal (e.g., a male majority and a female minority), and (iii) where nodes with the same or different attribute value attract or repel each other (i.e., homophilic or heterophilic behavior). We evaluate the different sampling techniques with respect to conserving the position of nodes and the visibility of groups in such networks. Experiments are conducted both on synthetic and empirical social networks. Our results provide evidence that different network sampling techniques are highly sensitive with regard to capturing the expected centrality of nodes, and that their accuracy depends on relative group size differences and on the level of homophily that can be observed in the network. We conclude that uninformed sampling from social networks with attributes thus can significantly impair the ability of researchers to draw valid conclusions about the centrality of nodes and the visibility or invisibility of groups in social networks.

References

[1]
R. Atkinson and J. Flint. Accessing hidden and hard-to-reach populations: Snowball research strategies. Social research update, 33(1):1--4, 2001.
[2]
A.-L. Barabási and R. Albert. Emergence of scaling in random networks. Science, 286(5439):509--512, 1999.
[3]
P. S. Bearman, J. Moody, and K. Stovel. Chains of affection: The structure of adolescent romantic and sexual networks1. American journal of sociology, 110(1):44--91, 2004.
[4]
C. K. Borgatti, S.P. and D. Krackhardt. Robustness of centrality measures under conditions of imperfect data. Social Networks, 28(1):124--136, 2006.
[5]
M. B. Brewer. In-group bias in the minimal intergroup situation: A cognitive-motivational analysis. Psychological Bulletin, 86(2):307--324, 1979.
[6]
E. Costenbader and T. W. Valente. The stability of centrality measures when networks are sampled. Social Networks, 25(4):283--307, Oct. 2003.
[7]
D. Crandall, D. Cosley, D. Huttenlocher, J. Kleinberg, and S. Suri. Feedback effects between similarity and social influence in online communities. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '08, pages 160--168, New York, NY, USA, 2008. ACM.
[8]
M. L. de Almeida, G. A. Mendes, G. M. Viswanathan, and L. R. da Silva. Scale-free homophilic network. The European Physical Journal B, 86(2):1--6, 2013.
[9]
A. T. Fiore and J. S. Donath. Homophily in online dating: when do you like someone like yourself? In CHI'05 Extended Abstracts on Human Factors in Computing Systems, pages 1371--1374. ACM, 2005.
[10]
L. C. Freeman. Centrality in social networks: Conceptual clarification. Social Networks, 1(3):215--239, 1979.
[11]
J. Galaskiewicz. Estimating point centrality using different network sampling techniques. Social Networks, 13(4):347--386, Dec. 1991.
[12]
C. A. Hidalgo and C. Rodriguez-Sickert. The dynamics of a mobile phone network. Physica A: Statistical Mechanics and its Applications, 387(12):3017--3024, 2008.
[13]
M. Huisman. Imputation of missing network data: some simple procedures. Social Structure, 10(1):1--29, 2009.
[14]
F. Karimi, M. Génois, C. Wagner, P. Singer, and M. Strohmaier. Visibility of minorities in social networks. arXiv:1702.00150, 2017.
[15]
G. Kossinets. Effects of missing data in social networks. Social Networks, 28:247--268, 2006.
[16]
J. Lee and J. Pfeffer. Estimating centrality statistics for complete and sampled networks: Some approaches and complications. In 48th Hawaii International Conference on System Sciences, HICSS 2015, Kauai, Hawaii, USA, January 5--8, 2015, pages 1686--1695, 2015.
[17]
S. H. Lee, P.-J. Kim, and H. Jeong. Statistical properties of sampled networks. Physical Review E, 73(1):016102, 2006.
[18]
J. Leskovec and C. Faloutsos. Sampling from large graphs. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 631--636. ACM, 2006.
[19]
J.-Y. Li and M.-Y. Yeh. On sampling type distribution from heterogeneous social networks. In Proceedings of the 15th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining - Volume Part II, PAKDD'11, pages 111--122, Berlin, Heidelberg, 2011. Springer-Verlag.
[20]
R. Mastrandrea, J. Fournet, and A. Barrat. Contact patterns in a high school: A comparison between data collected using wearable sensors, contact diaries and friendship surveys. PLoS ONE, 10(9):e0136497, 09 2015.
[21]
M. McPherson, L. Smith-Lovin, and J. M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1):415--444, 2001.
[22]
F. Menczer. Growing and navigating the small world web by local content. Proceedings of the National Academy of Sciences, 99(22):14014--14019, 2002.
[23]
A. Mislove, B. Viswanath, K. P. Gummadi, and P. Druschel. You are who you know: inferring user profiles in online social networks. In Proceedings of the third ACM international conference on Web search and data mining, pages 251--260. ACM, 2010.
[24]
S. Redner. How popular is your paper? an empirical study of the citation distribution. European Physical Journal B, 4(2):131--134, 1998.
[25]
L. E. C. Rocha, F. Liljeros, and P. Holme. Simulated Epidemics in an Empirical Spatiotemporal Network of 50,185 Sexual Contacts. PLoS Computational Biology, 7(3), Mar. 2011.
[26]
W. Shrum, N. H. Cheek Jr, and S. MacD. Friendship in school: Gender and racial homophily. Sociology of Education, pages 227--239, 1988.
[27]
Ö. Şimşek and D. Jensen. Navigating networks by using homophily and degree. Proceedings of the National Academy of Sciences, 105(35):12758--12762, 2008.
[28]
J. A. Smith and J. Moody. Structural effects of network sampling coverage i: Nodes missing at random. Social Networks, 35(4):652--668, 2013.
[29]
L. Takac and M. Zabovsky. Data analysis in public social networks. In International Scientific Conference and International Workshop Present Day Trends of Innovations, pages 1--6, 2012.
[30]
D. J. Wang, X. Shi, D. A. McFarland, and J. Leskovec. Measurement error in network data: A re-classification. Social Networks, 34(4):396--409, 2012.
[31]
D. J. Watts, P. S. Dodds, and M. E. J. Newman. Identity and search in social networks. Science, 296:1302--1305, 2002.
[32]
W. Webber, A. Moffat, and J. Zobel. A similarity measure for indefinite rankings. ACM Transactions on Information Systems, 28(4):1--38, Nov. 2010.
[33]
G. U. Yule. A mathematical theory of evolution, based on the conclusions of dr. j. c. willis, f.r.s. Philosophical Transactions of the Royal Society of London. Series B, Containing Papers of a Biological Character, 213:pp. 21--87, 1925.

Cited By

View all
  • (2025)Bridging Source and Target Domains via Link Prediction for Unsupervised Domain Adaptation on GraphsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703519(678-687)Online publication date: 10-Mar-2025
  • (2025)FairNet: A Genetic Framework to Reduce Marginalization in Social NetworksSocial Networks Analysis and Mining10.1007/978-3-031-78541-2_9(139-154)Online publication date: 24-Jan-2025
  • (2024)Representation, ranking and bias of minorities in sampling attributed networksSocial Network Analysis and Mining10.1007/s13278-024-01326-614:1Online publication date: 10-Aug-2024
  • Show More Cited By

Index Terms

  1. Sampling from Social Networks with Attributes

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      WWW '17: Proceedings of the 26th International Conference on World Wide Web
      April 2017
      1678 pages
      ISBN:9781450349130

      Sponsors

      • IW3C2: International World Wide Web Conference Committee

      In-Cooperation

      Publisher

      International World Wide Web Conferences Steering Committee

      Republic and Canton of Geneva, Switzerland

      Publication History

      Published: 03 April 2017

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. homophily
      2. sampling bias
      3. sampling methods
      4. social networks

      Qualifiers

      • Research-article

      Conference

      WWW '17
      Sponsor:
      • IW3C2

      Acceptance Rates

      WWW '17 Paper Acceptance Rate 164 of 966 submissions, 17%;
      Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)41
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Bridging Source and Target Domains via Link Prediction for Unsupervised Domain Adaptation on GraphsProceedings of the Eighteenth ACM International Conference on Web Search and Data Mining10.1145/3701551.3703519(678-687)Online publication date: 10-Mar-2025
      • (2025)FairNet: A Genetic Framework to Reduce Marginalization in Social NetworksSocial Networks Analysis and Mining10.1007/978-3-031-78541-2_9(139-154)Online publication date: 24-Jan-2025
      • (2024)Representation, ranking and bias of minorities in sampling attributed networksSocial Network Analysis and Mining10.1007/s13278-024-01326-614:1Online publication date: 10-Aug-2024
      • (2024)Analyse großer NetzwerkeHandbuch Netzwerkforschung10.1007/978-3-658-37507-2_38-1(1-11)Online publication date: 1-Mar-2024
      • (2023)A Graph-Based Stratified Sampling Methodology for the Analysis of (Underground) ForumsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.330442418(5473-5483)Online publication date: 2023
      • (2023)Impact of Structure of Network Based Data on Performance of Graph Neural Networks2023 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS46773.2023.10182188(1-5)Online publication date: 21-May-2023
      • (2023)Learning attribute and homophily measures through random walksApplied Network Science10.1007/s41109-023-00558-38:1Online publication date: 27-Jun-2023
      • (2023)Design and Performance Analysis of a Smart Bag Reminder System for ParentsHuman-Computer Interaction10.1007/978-3-031-35572-1_1(3-18)Online publication date: 9-Jul-2023
      • (2023)Learning Attribute Distributions Through Random WalksComplex Networks and Their Applications XI10.1007/978-3-031-21131-7_2(17-29)Online publication date: 26-Jan-2023
      • (2022)Editorial board interlocking across the social sciences: Modelling the geographic, gender, and institutional representation within and between six academic fieldsPLOS ONE10.1371/journal.pone.027355217:9(e0273552)Online publication date: 2-Sep-2022
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media