ABSTRACT
Internet device graphs identify relationships between user-centric internet connected devices such as desktops, laptops, smartphones, tablets, gaming consoles, TV's, etc. The ability to create such graphs is compelling for online advertising, content customization, recommendation systems, security, and operations. We begin by describing an algorithm for generating a device graph based on IP-colocation, and then apply the algorithm to a corpus of over 2.5 trillion internet events collected over the period of six weeks in the United States. The resulting graph exhibits immense scale with greater than 7.3 billion edges (pair-wise relationships) between more than 1.2 billion nodes (devices), accounting for the vast majority of internet connected devices in the US. Next, we apply community detection algorithms to the graph resulting in a partitioning of internet devices into 100 million small communities representing physical households. We validate this partition with a unique ground truth dataset. We report on the characteristics of the graph and the communities. Lastly, we discuss the important issues of ethics and privacy that must be considered when creating and studying device graphs, and suggest further opportunities for device graph enrichment and application.
- Apache Pig. https://pig.apache.org. Accessed: 2016-02-10.Google Scholar
- comScore Privacy Policy. https://www.scorecardresearch.com/privacy.aspx. Accessed: 2017-02-17.Google Scholar
- Drawbridge. http://www.drawbridge.com/c/graph/. Accessed: 2017-02-17.Google Scholar
- The Koblenz Network Collection. http://konect.uni-koblenz.de/networks/. Accessed: 2017-02--15.Google Scholar
- Lotame Cross Device. http://www.lotamecrossdevice.com. Accessed: 2017-02-17.Google Scholar
- Prevalence of Network Address Port Translation (NAPT). https://www.ietf.org/proceedings/93/slides/slides-93-hopsrg-1.pdf. Accessed: 2016-09-19.Google Scholar
- Stanford network analysis project. http://snap.stanford.edu/data/. Accessed: 2017-02-15.Google Scholar
- The Tapad Device Graph. http://www.tapad.com/device-graph/, 2016.Google Scholar
- Ager, B., Chatzis, N., Feldmann, A., Sarrar, N., Uhlig, S., and Willinger, W. Anatomy of a Large European IXP. In Proceedings of ACM SIGCOMM conference (2012). Google ScholarDigital Library
- Augustin, B., Cuvellier, X., Orgogozo, B., Viger, F., Friedman, T., Latapy, M., Magnien, C., and Teixeira, R. Avoiding Traceroute Anomalies with Paris Traceroute. In Proceedings of ACM SIGCOMM Internet measurement conference (2006). Google ScholarDigital Library
- Augustin, B., Krishnamurthy, B., and Willinger, W. IXPs: Mapped? In Proceedings of ACM Internet measurement conference (2009). Google ScholarDigital Library
- Backstrom, L., Boldi, P., Rosa, M., Ugander, J., and Vigna, S. Four degrees of separation. In Proceedings of the 4th Annual ACM Web Science Conference (2012), ACM, pp. 33--42. Google ScholarDigital Library
- Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proceedings of ACM SIGKDD (2006). Google ScholarDigital Library
- Bellovin, S. A Technique for Counting NATted Hosts. In Proceedings of ACM Internet Measurement Workshop (November 2002). Google ScholarDigital Library
- Blondel, V., Guillaume, J.-L., and Aynaud, T. http://sourceforge.net/projects/louvain/.Google Scholar
- Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008. Google ScholarCross Ref
- Cahn, A., Alfeld, S., Barford, P., and Muthukrishnan, S. An Empirical Study of Web Cookies. In Proceedings of the 25th International Conference on World Wide Web (2016), International World Wide Web Conferences Steering Committee, pp. 891--901. Google ScholarDigital Library
- Campigotto, R., Céspedes, P. C., and Guillaume, J.-L. A generalized and adaptive method for community detection. arXiv preprint arXiv:1406.2518 (2014).Google Scholar
- Casado, M., and Freedman, M. Peering Through the Shroud: The Effect of Edge Opacity on IP-based Client Identification. In Proceedings of the USENIX NSDI (April 2007).Google Scholar
- Clauset, A., Newman, M. E., and Moore, C. Finding Community Structure in Very Large Networks. Physical review E 70, 6 (2004), 066111.Google Scholar
- D. Spielman, S. T. Nearly-linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems. In Proceedings of ACM Symposium on Theory of Computing (2004). Google ScholarDigital Library
- DiCioccio, L., Teixeira, R., and Rosenberg, C. Measuring Home Networks with HomeNet Profiler. In Proceedings of the Passive and Active Measurement Conference (March 2013). Google ScholarDigital Library
- Dittrich, D., and Kenneally, E. The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research. In US Department of Homeland Security Tech Report (August 2012).Google Scholar
- Durairajan, R., Sommers, J., and Barford, P. Layer 1-Informed Internet Topology Measurement. In Proceedings of ACM Internet Measurement Conference (November 2014). Google ScholarDigital Library
- Durumeric, Z., Wustrow, E., and Halderman, A. ZMap: Fast Internet-Wide Scanning and its Security Applications. In Proceedings of the USENIX Security Symposium (August 2013).Google Scholar
- Eriksson, B., Barford, P., Sommers, J., and Nowak, R. Inferring Unseen Components of the Internet Core. IEEE Journal on Selected Areas in Communications (2011). Google ScholarCross Ref
- Flake, G., Lawrence, S., Giles, C., and Coetzee, F. Self-organization and Identification of Web Communities. IEEE Computer 35, 3 (August 2002). Google ScholarDigital Library
- Fortunato, S. Community Detection in Graphs. Physics Reports 486, 3 (February 2010). Google ScholarCross Ref
- Fortunato, S., and Barthelemy, M. Resolution Limit in Community Detection. Proceedings of the National Academy of Sciences 104, 1 (2007), 36--41. Google ScholarCross Ref
- Greene, D., Doyle, D., and Cunningham, P. Tracking the Evolution of Communities in Dynamic Social Networks. In Advances in social networks analysis and mining (ASONAM), 2010 international conference on (2010), IEEE, pp. 176--183. Google ScholarDigital Library
- Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., and Samatova, N. Community Detection in Large-scale Networks: a Survey and Empirical Evaluation. Wiley Interdisciplinary Reviews: Computational Statistics 6, 6 (2014), 426--439. Google ScholarDigital Library
- Haynes, J., and Perisic, I. Mapping Search Relevance to Social Networks. In Proceedings of the 3rd Workshop on Social Network Mining and Analysis (2009), ACM, p. 2. Google ScholarDigital Library
- Heidemann, J., Pradkin, Y., Govindan, R., Papadopoulos, C., Bartlett, G., and Bannister, J. Census and Survey of the Visible Internet. In Proceedings of ACM Internet Measurement Conference (October 2008). Google ScholarDigital Library
- Kenneally, E., and Bailey, M. Cyber-security Research Ethics Dialogue and Strategy Workshop. ACM SIGCOMM Computer Communication Review 4, 2 (2013).Google Scholar
- Kohno, Y., Broido, A., and Claffy, K. Remote Physical Device Fingerprinting. IEEE Transactions on Dependable and Secure Computing 2, 2 (April 2005). Google ScholarDigital Library
- Krishnamurthy, B., and Wills, C. Privacy Diffusion on the Web: A Longitudinal Perspective. In Proceedings of the 18th International Conference on World Wide Web (New York, NY, USA, 2009), WWW '09, ACM, pp. 541--550. Google ScholarDigital Library
- Kumar, R., Novak, J., and Tomkind, A. Structure and Evolution of Online Social Networks. Springer, 2010. Google ScholarCross Ref
- Leskovec, J., and Sosivc, R. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1.Google ScholarDigital Library
- Maier, G., Schneider, F., and Feldmann, A. NAT Usage in Residential Broadband Networks. In Proceedings of the Passive and Active Measurement Conference (March 2011). Google ScholarCross Ref
- Mislove, A., Viswanath, B., Bummadi, K., and Druschel, P. You are Who You Know: Inferring User Profiles in Online Social Networks. In Proceedings of ACM International Conference on Web Search and Data Mining (February 2010). Google ScholarDigital Library
- Murtagh, F. A Survey of Recent Advances in Hierarchical Clustering Algorithms. Computer Journal 26, 4 (1983). Google ScholarCross Ref
- Newman, M. E. J. Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 69 (Jun 2004), 066133. Google ScholarCross Ref
- Pujol, J. M., Erramilli, V., and Rodriguez, P. Divide and Conquer: Partitioning Online Social Networks. arXiv preprint arXiv:0905.4918 (2009).Google Scholar
- Riedy, E. J., Meyerhenke, H., Ediger, D., and Bader, D. A. Parallel Community Detection for Massive Graphs. In International Conference on Parallel Processing and Applied Mathematics (2011), Springer, pp. 286--296.Google Scholar
- Roesner, F., Kohno, T., and Wetherall, D. Detecting and Defending Against Third-party Tracking on the Web. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (Berkeley, CA, USA, 2012), NSDI'12, USENIX Association, pp. 12--12.Google ScholarDigital Library
- Roughan, M., Tuke, S. J., and Maennel, O. Bigfoot, Sasquatch, the Yeti and other Missing Links: what we don't know about the AS Graph. In Proceedings of ACM Internet measurement conference (2008). Google ScholarDigital Library
- Sherwood, R., Bender, A., and Spring, N. Discarte: a Disjunctive Internet Cartographer. In Proceedings of ACM SIGCOMM conference (2008). Google ScholarDigital Library
- Sherwood, R., and Spring, N. Touring the Internet in a TCP Sidecar. In Proceedings of ACM Internet Measurement Conference (2006). Google ScholarDigital Library
- Spinelli, L., Crovella, M., and Eriksson, B. AliasCluster: A Lightweight Approach to Interface Disambiguation. In Proceedings of the Global Internet Symposium (2013).Google Scholar
- Spring, N., Mahajan, R., and Wetherall, D. Measuring ISP Topologies with Rocketfuel. ACM SIGCOMM conference (2002).Google Scholar
- Traud, A., Mucha, P., and Porter, M. Social Structure of Facebook Networks. Physica A: Statistical Mechanics and its Applications 391, 16 (August 2012). Google ScholarCross Ref
- Ugander, J., Karrer, B., Backstrom, L., and Marlow, C. The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503 (2011).Google Scholar
- von Luxburg, U. A Tutorial on Spectral Clustering. Statistics and Computing 17, 4 (December 2007). Google ScholarDigital Library
- Xie, Y., Yu, F., Achan, K., Gillum, E., Goldszmidt, M., and Wobber, T. How Dynamic are IP Addresses? In ACM SIGCOMM Computer Communication Review (2007), vol. 37, ACM, pp. 301--312. Google ScholarDigital Library
- Zhang, Y., Oliveira, R., Wang, Y., Su, S., Zhang, B., Bi, J., Zhang, H., and Zhang, L. A Framework to Quantify the Pitfalls of using Traceroute in AS-level Topology Measurement. IEEE Journal on Selected Areas in Communications (2011). Google ScholarCross Ref
Recommendations
Comparability Graphs Among Cover-Incomparability Graphs
Algorithms and Discrete Applied MathematicsAbstractComparability graphs and cover-incomparability graphs (C-I graphs) are two interesting classes of graphs from posets. Comparability graph of a poset is a graph with vertex set V and two vertices u and v are adjacent in V if u and v are ...
Collapsible graphs and Hamiltonian connectedness of line graphs
Thomassen conjectured that every 4-connected line graph is Hamiltonian. Chen and Lai [Z.-H. Chen, H.-J. Lai, Reduction techniques for super-Eulerian graphs and related topics-an update, in: Ku Tung-Hsin (Ed.), Combinatorics and Graph Theory, vol. 95, ...
Equimatchable Graphs on Surfaces
A graph G is equimatchable if each matching in G is a subset of a maximum-size matching and it is factor critical if G-v has a perfect matching for each vertex v of G. It is known that any 2-connected equimatchable graph is either bipartite or factor ...
Comments