skip to main content
10.1145/3097983.3098114acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Internet Device Graphs

Published:13 August 2017Publication History

ABSTRACT

Internet device graphs identify relationships between user-centric internet connected devices such as desktops, laptops, smartphones, tablets, gaming consoles, TV's, etc. The ability to create such graphs is compelling for online advertising, content customization, recommendation systems, security, and operations. We begin by describing an algorithm for generating a device graph based on IP-colocation, and then apply the algorithm to a corpus of over 2.5 trillion internet events collected over the period of six weeks in the United States. The resulting graph exhibits immense scale with greater than 7.3 billion edges (pair-wise relationships) between more than 1.2 billion nodes (devices), accounting for the vast majority of internet connected devices in the US. Next, we apply community detection algorithms to the graph resulting in a partitioning of internet devices into 100 million small communities representing physical households. We validate this partition with a unique ground truth dataset. We report on the characteristics of the graph and the communities. Lastly, we discuss the important issues of ethics and privacy that must be considered when creating and studying device graphs, and suggest further opportunities for device graph enrichment and application.

References

  1. Apache Pig. https://pig.apache.org. Accessed: 2016-02-10.Google ScholarGoogle Scholar
  2. comScore Privacy Policy. https://www.scorecardresearch.com/privacy.aspx. Accessed: 2017-02-17.Google ScholarGoogle Scholar
  3. Drawbridge. http://www.drawbridge.com/c/graph/. Accessed: 2017-02-17.Google ScholarGoogle Scholar
  4. The Koblenz Network Collection. http://konect.uni-koblenz.de/networks/. Accessed: 2017-02--15.Google ScholarGoogle Scholar
  5. Lotame Cross Device. http://www.lotamecrossdevice.com. Accessed: 2017-02-17.Google ScholarGoogle Scholar
  6. Prevalence of Network Address Port Translation (NAPT). https://www.ietf.org/proceedings/93/slides/slides-93-hopsrg-1.pdf. Accessed: 2016-09-19.Google ScholarGoogle Scholar
  7. Stanford network analysis project. http://snap.stanford.edu/data/. Accessed: 2017-02-15.Google ScholarGoogle Scholar
  8. The Tapad Device Graph. http://www.tapad.com/device-graph/, 2016.Google ScholarGoogle Scholar
  9. Ager, B., Chatzis, N., Feldmann, A., Sarrar, N., Uhlig, S., and Willinger, W. Anatomy of a Large European IXP. In Proceedings of ACM SIGCOMM conference (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Augustin, B., Cuvellier, X., Orgogozo, B., Viger, F., Friedman, T., Latapy, M., Magnien, C., and Teixeira, R. Avoiding Traceroute Anomalies with Paris Traceroute. In Proceedings of ACM SIGCOMM Internet measurement conference (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Augustin, B., Krishnamurthy, B., and Willinger, W. IXPs: Mapped? In Proceedings of ACM Internet measurement conference (2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Backstrom, L., Boldi, P., Rosa, M., Ugander, J., and Vigna, S. Four degrees of separation. In Proceedings of the 4th Annual ACM Web Science Conference (2012), ACM, pp. 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Backstrom, L., Huttenlocher, D., Kleinberg, J., and Lan, X. Group Formation in Large Social Networks: Membership, Growth, and Evolution. In Proceedings of ACM SIGKDD (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bellovin, S. A Technique for Counting NATted Hosts. In Proceedings of ACM Internet Measurement Workshop (November 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Blondel, V., Guillaume, J.-L., and Aynaud, T. http://sourceforge.net/projects/louvain/.Google ScholarGoogle Scholar
  16. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., and Lefebvre, E. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008. Google ScholarGoogle ScholarCross RefCross Ref
  17. Cahn, A., Alfeld, S., Barford, P., and Muthukrishnan, S. An Empirical Study of Web Cookies. In Proceedings of the 25th International Conference on World Wide Web (2016), International World Wide Web Conferences Steering Committee, pp. 891--901. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Campigotto, R., Céspedes, P. C., and Guillaume, J.-L. A generalized and adaptive method for community detection. arXiv preprint arXiv:1406.2518 (2014).Google ScholarGoogle Scholar
  19. Casado, M., and Freedman, M. Peering Through the Shroud: The Effect of Edge Opacity on IP-based Client Identification. In Proceedings of the USENIX NSDI (April 2007).Google ScholarGoogle Scholar
  20. Clauset, A., Newman, M. E., and Moore, C. Finding Community Structure in Very Large Networks. Physical review E 70, 6 (2004), 066111.Google ScholarGoogle Scholar
  21. D. Spielman, S. T. Nearly-linear Time Algorithms for Graph Partitioning, Graph Sparsification, and Solving Linear Systems. In Proceedings of ACM Symposium on Theory of Computing (2004). Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. DiCioccio, L., Teixeira, R., and Rosenberg, C. Measuring Home Networks with HomeNet Profiler. In Proceedings of the Passive and Active Measurement Conference (March 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Dittrich, D., and Kenneally, E. The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research. In US Department of Homeland Security Tech Report (August 2012).Google ScholarGoogle Scholar
  24. Durairajan, R., Sommers, J., and Barford, P. Layer 1-Informed Internet Topology Measurement. In Proceedings of ACM Internet Measurement Conference (November 2014). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Durumeric, Z., Wustrow, E., and Halderman, A. ZMap: Fast Internet-Wide Scanning and its Security Applications. In Proceedings of the USENIX Security Symposium (August 2013).Google ScholarGoogle Scholar
  26. Eriksson, B., Barford, P., Sommers, J., and Nowak, R. Inferring Unseen Components of the Internet Core. IEEE Journal on Selected Areas in Communications (2011). Google ScholarGoogle ScholarCross RefCross Ref
  27. Flake, G., Lawrence, S., Giles, C., and Coetzee, F. Self-organization and Identification of Web Communities. IEEE Computer 35, 3 (August 2002). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Fortunato, S. Community Detection in Graphs. Physics Reports 486, 3 (February 2010). Google ScholarGoogle ScholarCross RefCross Ref
  29. Fortunato, S., and Barthelemy, M. Resolution Limit in Community Detection. Proceedings of the National Academy of Sciences 104, 1 (2007), 36--41. Google ScholarGoogle ScholarCross RefCross Ref
  30. Greene, D., Doyle, D., and Cunningham, P. Tracking the Evolution of Communities in Dynamic Social Networks. In Advances in social networks analysis and mining (ASONAM), 2010 international conference on (2010), IEEE, pp. 176--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Harenberg, S., Bello, G., Gjeltema, L., Ranshous, S., Harlalka, J., Seay, R., Padmanabhan, K., and Samatova, N. Community Detection in Large-scale Networks: a Survey and Empirical Evaluation. Wiley Interdisciplinary Reviews: Computational Statistics 6, 6 (2014), 426--439. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Haynes, J., and Perisic, I. Mapping Search Relevance to Social Networks. In Proceedings of the 3rd Workshop on Social Network Mining and Analysis (2009), ACM, p. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Heidemann, J., Pradkin, Y., Govindan, R., Papadopoulos, C., Bartlett, G., and Bannister, J. Census and Survey of the Visible Internet. In Proceedings of ACM Internet Measurement Conference (October 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kenneally, E., and Bailey, M. Cyber-security Research Ethics Dialogue and Strategy Workshop. ACM SIGCOMM Computer Communication Review 4, 2 (2013).Google ScholarGoogle Scholar
  35. Kohno, Y., Broido, A., and Claffy, K. Remote Physical Device Fingerprinting. IEEE Transactions on Dependable and Secure Computing 2, 2 (April 2005). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Krishnamurthy, B., and Wills, C. Privacy Diffusion on the Web: A Longitudinal Perspective. In Proceedings of the 18th International Conference on World Wide Web (New York, NY, USA, 2009), WWW '09, ACM, pp. 541--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Kumar, R., Novak, J., and Tomkind, A. Structure and Evolution of Online Social Networks. Springer, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  38. Leskovec, J., and Sosivc, R. Snap: A general-purpose network analysis and graph-mining library. ACM Transactions on Intelligent Systems and Technology (TIST) 8, 1 (2016), 1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Maier, G., Schneider, F., and Feldmann, A. NAT Usage in Residential Broadband Networks. In Proceedings of the Passive and Active Measurement Conference (March 2011). Google ScholarGoogle ScholarCross RefCross Ref
  40. Mislove, A., Viswanath, B., Bummadi, K., and Druschel, P. You are Who You Know: Inferring User Profiles in Online Social Networks. In Proceedings of ACM International Conference on Web Search and Data Mining (February 2010). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Murtagh, F. A Survey of Recent Advances in Hierarchical Clustering Algorithms. Computer Journal 26, 4 (1983). Google ScholarGoogle ScholarCross RefCross Ref
  42. Newman, M. E. J. Fast Algorithm for Detecting Community Structure in Networks. Phys. Rev. E 69 (Jun 2004), 066133. Google ScholarGoogle ScholarCross RefCross Ref
  43. Pujol, J. M., Erramilli, V., and Rodriguez, P. Divide and Conquer: Partitioning Online Social Networks. arXiv preprint arXiv:0905.4918 (2009).Google ScholarGoogle Scholar
  44. Riedy, E. J., Meyerhenke, H., Ediger, D., and Bader, D. A. Parallel Community Detection for Massive Graphs. In International Conference on Parallel Processing and Applied Mathematics (2011), Springer, pp. 286--296.Google ScholarGoogle Scholar
  45. Roesner, F., Kohno, T., and Wetherall, D. Detecting and Defending Against Third-party Tracking on the Web. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (Berkeley, CA, USA, 2012), NSDI'12, USENIX Association, pp. 12--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Roughan, M., Tuke, S. J., and Maennel, O. Bigfoot, Sasquatch, the Yeti and other Missing Links: what we don't know about the AS Graph. In Proceedings of ACM Internet measurement conference (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Sherwood, R., Bender, A., and Spring, N. Discarte: a Disjunctive Internet Cartographer. In Proceedings of ACM SIGCOMM conference (2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Sherwood, R., and Spring, N. Touring the Internet in a TCP Sidecar. In Proceedings of ACM Internet Measurement Conference (2006). Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Spinelli, L., Crovella, M., and Eriksson, B. AliasCluster: A Lightweight Approach to Interface Disambiguation. In Proceedings of the Global Internet Symposium (2013).Google ScholarGoogle Scholar
  50. Spring, N., Mahajan, R., and Wetherall, D. Measuring ISP Topologies with Rocketfuel. ACM SIGCOMM conference (2002).Google ScholarGoogle Scholar
  51. Traud, A., Mucha, P., and Porter, M. Social Structure of Facebook Networks. Physica A: Statistical Mechanics and its Applications 391, 16 (August 2012). Google ScholarGoogle ScholarCross RefCross Ref
  52. Ugander, J., Karrer, B., Backstrom, L., and Marlow, C. The anatomy of the facebook social graph. arXiv preprint arXiv:1111.4503 (2011).Google ScholarGoogle Scholar
  53. von Luxburg, U. A Tutorial on Spectral Clustering. Statistics and Computing 17, 4 (December 2007). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Xie, Y., Yu, F., Achan, K., Gillum, E., Goldszmidt, M., and Wobber, T. How Dynamic are IP Addresses? In ACM SIGCOMM Computer Communication Review (2007), vol. 37, ACM, pp. 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Zhang, Y., Oliveira, R., Wang, Y., Su, S., Zhang, B., Bi, J., Zhang, H., and Zhang, L. A Framework to Quantify the Pitfalls of using Traceroute in AS-level Topology Measurement. IEEE Journal on Selected Areas in Communications (2011). Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2017
    2240 pages
    ISBN:9781450348874
    DOI:10.1145/3097983

    Copyright © 2017 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 13 August 2017

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

    Upcoming Conference

    KDD '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader