skip to main content
10.1145/2806416.2806582acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Who With Whom And How?: Extracting Large Social Networks Using Search Engines

Published: 17 October 2015 Publication History

Abstract

Social network analysis is leveraged in a variety of applications such as identifying influential entities, detecting communities with special interests, and determining the flow of information and innovations. However, existing approaches for extracting social networks from unstructured Web content do not scale well and are only feasible for small graphs. In this paper, we introduce novel methodologies for query-based search engine mining, enabling efficient extraction of social networks from large amounts of Web data. To this end, we use patterns in phrase queries for retrieving entity connections, and employ a bootstrapping approach for iteratively expanding the pattern set. Our experimental evaluation in different domains demonstrates that our algorithms provide high quality results and allow for scalable and efficient construction of social graphs.

References

[1]
A. Anagnostopoulos, R. Kumar, and M. Mahdian. Influence and correlation in social networks. KDD '08, pages 7--15. ACM, 2008.
[2]
N. Bach and S. Badaskar. A Review of Relation Extraction. 2007.
[3]
C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. Mining email social networks. In Proceedings of the 2006 International Workshop on Mining Software Repositories, MSR '06, pages 137--143. ACM, 2006.
[4]
M. R. Bouadjenek, H. Hacid, and M. Bouzeghoub. Sopra: A new social personalized ranking function for improving web search. SIGIR '13, pages 861--864. ACM, 2013.
[5]
M. J. Cafarella, J. Madhavan, and A. Halevy. Web-scale extraction of structured data. SIGMOD Rec., 37(4):55--61, mar 2009.
[6]
X. Canaleta, P. Ros, A. Vallejo, D. Vernet, and A. Zaballos. A system to extract social networks based on the processing of information obtained from internet. In Proceedings of the 11th International Conference of the Catalan Association for Artificial Intelligence, pages 283--292. IOS Press, 2008.
[7]
P. Cimiano, S. Handschuh, and S. Staab. Towards the self-annotating web. WWW '04, pages 462--471. ACM, 2004.
[8]
P. Cimiano, G. Ladwig, and S. Staab. Gimme' the context: Context-driven automatic semantic annotation with c-pankow. WWW '05, pages 332--341. ACM, 2005.
[9]
D. K. Elson, N. Dames, and K. R. McKeown. Extracting social networks from literary fiction. In ACL'10.
[10]
L. W. et al. Building the social graph of the history of european integration - A pipeline for humanist-machine interaction in the digital humanities. In HISTOINFORMATICS, 2013.
[11]
O. Etzioni, M. Cafarella, D. Downey, S. Kok, A.-M. Popescu, T. Shaked, S. Soderland, D. S. Weld, and A. Yates. Web-scale information extraction in knowitall: (preliminary results). In WWW'09.
[12]
A. Goyal, F. Bonchi, and L. V. Lakshmanan. Learning influence probabilities in social networks. WSDM '10, pages 241--250. ACM, 2010.
[13]
I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommendation based on people and tags. SIGIR '10, pages 194--201. ACM, 2010.
[14]
K. Gwet. Handbook of Inter-rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters. Advanced Analytics, LLC, 2010.
[15]
J. He, Y. Liu, Q. Tu, C. Yao, and N. Di. Efficient entity relation discovery on web. JCIS, 2007.
[16]
A. Jain and P. Pantel. Identifying comparable entities on the web. CIKM '09, pages 1661--1664. ACM, 2009.
[17]
Z. Jiang, L. Ji, J. Zhang, J. Yan, P. Guo, and N. Liu. Learning open-domain comparable entity graphs from user search queries. CIKM, 2013.
[18]
X. Jin, S. Spangler, R. Ma, and J. Han. Topic initiator detection on the world wide web. WWW '10, pages 481--490. ACM, 2010.
[19]
C. Karbeyaz, E. Can, F. Can, and M. Kalpakli. A content-based social network study of evliya celebi's seyahatname-bitlis section. In Computer and Information Sciences II. Springer, 2012.
[20]
H. A. Kautz, B. Selman, and M. A. Shah. The hidden web. AI Magazine, 18(2):27--36, 1997.
[21]
J. M. Kleinberg. Hubs, authorities, and communities. ACM Comput. Surv., 31(4es), dec 1999.
[22]
N. Konstantinova. Review of relation extraction methods: What is new out there? Springer, 2014.
[23]
A. N. Langville and C. D. Meyer. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press, 2006.
[24]
C. Manning, P. Raghavan, H. Schutze, and E. Corporation. Introduction to information retrieval. Cambridge University Press, 2008.
[25]
Y. Matsuo, J. Mori, M. Hamasaki, K. Ishida, T. Nishimura, H. Takeda, K. Hasida, and M. Ishizuka. Polyphonet: An advanced social network extraction system from the web. WWW, 2006.
[26]
Y. Matsuo, H. Tomobe, and T. Nishimura. Robust estimation of google counts for social network extraction. In AAAI, 2007.
[27]
P. Mika. Flink: Semantic web technology for the extraction and analysis of social networks. Web Semant., 3(2--3):211--223, oct 2005.
[28]
A. Mohaisen, A. Yun, and Y. Kim. Measuring the mixing time of social graphs. In SIGCOMM, 2010.
[29]
M. K. M. Nasution and S. A. Noah. Superficial method for extracting social network for academics using web snippets. In RSKT, 2010.
[30]
R. Nuray-Turan, Z. Chen, D. V. Kalashnikov, and S. Mehrotra. Exploiting web querying for web people search in weps2. In Web People Search Evaluation Workshop (WePS), 18th WWW Conference, 2009.
[31]
L. Page, S. Brin, R. Motwani, and T. Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, 1999.
[32]
P. Pantel and M. Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In ACL 2006, Sydney, Australia, 2006.
[33]
M. Pasca. Acquisition of categorized named entities for web search. In CIKM'04.
[34]
J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, and Z. Su. Arnetminer: Extraction and mining of academic social networks. KDD '08, pages 990--998. ACM, 2008.
[35]
A. Yates, M. Cafarella, M. Banko, O. Etzioni, M. Broadhead, and S. Soderland. Textrunner: Open information extraction on the web. In ACL, 2007.
[36]
S. Ye and S. Wu. Measuring message propagation and social influence on twitter.com. In LNCS. 2010.
[37]
P. S. Yu, X. Li, and B. Liu. On the temporal dimension of search. WWW, 2004.

Cited By

View all
  • (2017)Timeline Summarization for Event-Related Discussions on a Chinese Social Media PlatformAdvances in Artificial Intelligence: From Theory to Practice10.1007/978-3-319-60042-0_64(579-594)Online publication date: 4-Jun-2017

Index Terms

  1. Who With Whom And How?: Extracting Large Social Networks Using Search Engines

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management
      October 2015
      1998 pages
      ISBN:9781450337946
      DOI:10.1145/2806416
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 17 October 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. pattern based queries
      2. social network extraction

      Qualifiers

      • Research-article

      Funding Sources

      • European Commission FP7 under "QualiMaster"
      • European Research Concil under "Alexandria"

      Conference

      CIKM'15
      Sponsor:

      Acceptance Rates

      CIKM '15 Paper Acceptance Rate 165 of 646 submissions, 26%;
      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 19 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Timeline Summarization for Event-Related Discussions on a Chinese Social Media PlatformAdvances in Artificial Intelligence: From Theory to Practice10.1007/978-3-319-60042-0_64(579-594)Online publication date: 4-Jun-2017

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media