ABSTRACT
We investigate sampling techniques in unbalanced heterogeneous bipartite graphs (UHBGs), which have wide applications in real world web-scale social networks. We propose random walked-based link sampling and stratified sampling for UHBGs and show that they have advantages over generic random walk samplers. In addition, each sampler's node degree distribution parameter estimator statistic is analytically derived to be used as a quality indicator. In the experiments, we apply the two sampling techniques, with a baseline node sampling method, to both synthetic and real Facebook data. The experimental results show that random walk-based stratified sampler has significant advantage over node sampler and link sampler on UHBGs.
- M. Gjoka, M. Kurant, C. Butts, and A. Markopoulou. Walking in facebook: A case study of unbiased sampling of osns. In INFOCOM, March 2010. Google ScholarDigital Library
- J. Leskovec and C. Faloutsos. Sampling from large graphs. KDD '06, pages 631--636. ACM, 2006. Google ScholarDigital Library
- L. Liu, J. Tang, J. Han, and S. Yang. Learning influence from heterogeneous social networks. Data Min. Knowl. Discov., 25(3):511--544, 2012.Google ScholarCross Ref
- L. Lovasz. Random walks on graphs: A survey, 1993.Google Scholar
- N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equation of state calculations by fast computing machines. The Journal of Chemical Physics, 21(6):1087--1092, 1953.Google ScholarCross Ref
- M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E, 64:026118, Jul 2001.Google ScholarCross Ref
- M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Random graph models of social networks. PNAS, 99(Suppl 1):2566--2572, 2002.Google ScholarCross Ref
- Y. Xie, Z. Chen, K. Zhang, Y. Cheng, A. Agrawal, W. keng Liao, and A. Choudhary. Detecting and tracking disease outbreaks in real-time through social media. In IJCAI, 2013. Google ScholarDigital Library
- K. Zhang, Z. Chen, Y. Cheng, Y. Xie, D. Downey, A. Agrawal, W. keng Liao, and A. Choudhary. A probabilistic graphical model for brand reputation assessment in social networks. In ASONAM '13, 2013.Google ScholarDigital Library
Index Terms
- Random walk-based graphical sampling in unbalanced heterogeneous bipartite social graphs
Recommendations
Sampling online social networks by random walk with indirect jumps
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random ...
Sampling hypergraphs via joint unbiased random walk
AbstractHypergraphs are instrumental in modeling complex relational systems that encompass a wide spectrum of high-order interactions among components. One prevalent analysis task is the properties estimation of large-scale hypergraphs, which involves ...
Network Sampling Using k-hop Random Walks for Heterogeneous Network Embedding
CODS-COMAD '19: Proceedings of the ACM India Joint International Conference on Data Science and Management of DataCapturing neighborhood information by generating node sequences or node samples is an important prerequisite step for many of the neural network embedding approaches. Majority of the recent studies on neural network embedding exploit random walk as a ...
Comments