skip to main content
10.1145/2187980.2188210acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
tutorial

Emails as graph: relation discovery in email archive

Published: 16 April 2012 Publication History

Abstract

In this paper, we present an approach for representing an email archive in the form of a network, capturing the communication among users and relations among the entities extracted from the textual part of the email messages. We showcase the method on the Enron email corpus, from which we extract various entities and a social network. The extracted named entities (NE), such as people, email addresses and telephone numbers, are organized in a graph along with the emails in which they were found. The edges in the graph indicate relations between NEs and represent a co-occurrence in the same email part, paragraph, sentence or a composite NE. We study mathematical properties of the graphs so created and describe our hands-on experience with the processing of such structures. Enron Graph corpus contains a few million nodes and is large enough for experimenting with various graph-querying techniques, e.g. graph traversal or spread of activation. Due to its size, the exploitation of traditional graph processing libraries might be problematic as they keep the whole structure in the memory. We describe our experience with the management of such data and with the relation discovery among the extracted entities. The described experience might be valuable for practitioners and highlights several research challenges.

References

[1]
B. Klimt, Y. Yang: Introducing the Enron Corpus. CEAS, 2004, http://www.ceas.cc/papers-2004/168.pdf, http://www.cs.cmu.edu/~enron/
[2]
C. Bird, A. Gourley, P. Devanbu, M. Gertz, A. Swaminathan: Mining Email Social Networks. In: MSR '06: Proceedings of the 2006 Workshop on Mining Software Repositories. ACM, New York (2006) 137--143.
[3]
G. Malewicz, M. H. Austern, A. J.C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing - "ABSTRACT". In PODC '09. ACM, New York, NY, USA, 6--6, 2009, DOI=10.1145/1582716.1582723
[4]
M. Laclavík, M. Kvassay, '. Dlugolinský, L. Hluchý: Use of Email Social Networks for Enterprise Benefit. In: IWCSN 2010, IEEE/WIC/ACM WI-IAT, 2010, pp 67--70, DOI 10.1109/WI-IAT.2010.126
[5]
M. Laclavík,'. Dlugolinský, M. 'eleng, M. Kvassay, E. Gatial, Z. Balogh, L. Hluchý: Email Analysis and Information Extraction for Enterprise Benefit. In Computing and Informatics, 2011, vol. 30, no. 1, p. 57--87.
[6]
M. Laclavík, '. Dlugolinský, M. Kvassay, L. Hluchý: Email Social Network Extraction and Search. In NextMail 2011 workshop, WI-IAT 2011, In The 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. IEEE Computer Society, 2011, p. 373--376. ISBN 978-0--7695--4513--4
[7]
M. Laclavík, M. 'eleng, L. Hluchý: Towards Large Scale Semantic Annotation Built on MapReduce Architecture; In Proceedings of ICCS 2008; M. Bubak et al. (Eds.): ICCS 2008, Part III, LNCS 5103, pp. 331--338, 2008.
[8]
A. Clauset, C.R. Shalizi, and M.E.J.Newman: Power-law distributions in empirical data. SIAM Review 51(4), 661--703 (2009). (arXiv:0706.1062.
[9]
J. Judge, M. Sogrin, A. Troussov: Galaxy: IBM Ontological Network Miner. In: Proceedings of the 1st Conference on Social Semantic Web, Volume P-113 of Lecture Notes in Informatics (LNI) series (ISSN 16175468, ISBN 9783--88579207--9). (2007)
[10]
A. Troussov, D. Parra, and P. Brusilovsky. Spreading Activation Approach to Tag-aware Recommenders: Modeling Similarity on Multidimensional Networks. In: D. Jannach, et al. (eds.) Proceedings of Workshop on Recommender Systems and the Social Web at the 2009 ACM conference on Recommender systems, RecSys '09, New York, NY, October 25, 2009.
[11]
M. Ciglan, K. Nørvåg: SGDB - Simple graph database optimized for activation spreading computation. Proceedings of GDM'2010 (in conjunction with DASFAA'2010)
[12]
J. Suchal: On Finding Power Method in Spreading Activation Search. In: SOFSEM 2008: Volume II -- Student Research Forum, 2007, p. 124--130.
[13]
J. Suchal, P. Navrat: Full Text Search Engine as Scalable k-Nearest Neighbor Recommendation System. In: Artificial Intelligence in Theory and Practice III IFIP Advances in Information and Communication Technology, 2010, Volume 331/2010, 165--173.
[14]
M. Ciglan, K. Nørvåg: WikiPop - Personalized Event Detection System Based on Wikipedia Page View Statistics (demo paper), Proceedings of CIKM'2010, Toronto, Canada, October 2010.
[15]
A. Lumsdaine, D. Gregor, B. Hendrickson, and J. Berry. Challenges in Parallel Graph Processing. Parallel Processing Letters, 17(1):5--20, March 2007.
[16]
M. Ciglan, A. Averbuch and L. Hluchy: Benchmarking traversal operations over graph databases, Proceedings of GDM'12, IEEE ICDE Workshop, 2012
[17]
M. E. J. Newman (2003). Mixing patterns in networks. Physical Review E 67 (2): 026126.
[18]
A. Chapanond, M. S. Krishnamoorthy & B. Yener: Graph Theoretic and Spectral Analysis of Enron Email Data. Computational & Mathematical Organization Theory, 11(3), 265--281, 2005
[19]
M. Fauscette: The Future of Email Is Social. White Paper; IBM IDC report; 2012, ftp://ftp.lotus.com/pub/lotusweb/232546_IDC_Future_of_Mail_is_Social.pdf

Cited By

View all
  • (2021)EMCODIST: A Context-based Search Tool for Email Archives2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671832(2281-2290)Online publication date: 15-Dec-2021
  • (2021)Finding light in dark archives: using AI to connect context and content in emailAI & SOCIETY10.1007/s00146-021-01369-937:3(859-872)Online publication date: 31-Dec-2021
  • (2020)How Impactful Is Presentation in Email? The Effect of Avatars and SignaturesACM Transactions on Interactive Intelligent Systems10.1145/334564110:3(1-26)Online publication date: 13-Nov-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '12 Companion: Proceedings of the 21st International Conference on World Wide Web
April 2012
1250 pages
ISBN:9781450312301
DOI:10.1145/2187980
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Univ. de Lyon: Universite de Lyon

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. email
  2. enron corpus
  3. graph data
  4. relation discovery
  5. search
  6. social networks

Qualifiers

  • Tutorial

Conference

WWW 2012
Sponsor:
  • Univ. de Lyon
WWW 2012: 21st World Wide Web Conference 2012
April 16 - 20, 2012
Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)EMCODIST: A Context-based Search Tool for Email Archives2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671832(2281-2290)Online publication date: 15-Dec-2021
  • (2021)Finding light in dark archives: using AI to connect context and content in emailAI & SOCIETY10.1007/s00146-021-01369-937:3(859-872)Online publication date: 31-Dec-2021
  • (2020)How Impactful Is Presentation in Email? The Effect of Avatars and SignaturesACM Transactions on Interactive Intelligent Systems10.1145/334564110:3(1-26)Online publication date: 13-Nov-2020
  • (2017)An email attachment is worth a thousand words, or is it?Proceedings of the 1st International Conference on Internet of Things and Machine Learning10.1145/3109761.3109765(1-10)Online publication date: 17-Oct-2017
  • (2015)Interactive and universal relationship discovery in semantic graph Data2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES)10.1109/INES.2015.7329745(417-420)Online publication date: Sep-2015
  • (2015)Lightweight Semantic approach for enterprise interoperability issues2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES)10.1109/INES.2015.7329741(395-400)Online publication date: Sep-2015
  • (2015)A forensic analysis solution of the email network based on email contents2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)10.1109/FSKD.2015.7382186(1613-1619)Online publication date: Aug-2015
  • (2015)Improving Collaboration Between Large and Small Enterprises Using Networked ServicesRisks and Resilience of Collaborative Networks10.1007/978-3-319-24141-8_18(201-208)Online publication date: 10-Dec-2015
  • (2015)Relationship Discovery and Navigation in Big GraphsIntelligent Systems in Science and Information 201410.1007/978-3-319-14654-6_7(109-123)Online publication date: 14-Feb-2015
  • (2014)Support for Collaboration between Large and Small & Medium EnterprisesProceedings of the 2014 ACM International Conference on Supporting Group Work10.1145/2660398.2663773(288-290)Online publication date: 9-Nov-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media