research-article

Mining social networks for personalized email prioritization

Authors:
Shinjae Yoo

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Yiming Yang

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Frank Lin

Carnegie Mellon University, Pittsburgh, PA, USA

Carnegie Mellon University, Pittsburgh, PA, USA
View Profile

,
Il-Chul Moon

KAIST, Daejeon, South Korea

KAIST, Daejeon, South Korea
View Profile

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data miningJune 2009Pages 967–976https://doi.org/10.1145/1557019.1557124

Published:28 June 2009Publication History

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 967–976

ABSTRACT

Email is one of the most prevalent communication tools today, and solving the email overload problem is pressingly urgent. A good way to alleviate email overload is to automatically prioritize received messages according to the priorities of each user. However, research on statistical learning methods for fully personalized email prioritization (PEP) has been sparse due to privacy issues, since people are reluctant to share personal messages and importance judgments with the research community. It is therefore important to develop and evaluate PEP methods under the assumption that only limited training examples can be available, and that the system can only have the personal email data of each user during the training and testing of the model for that user. This paper presents the first study (to the best of our knowledge) under such an assumption. Specifically, we focus on analysis of personal social networks to capture user groups and to obtain rich features that represent the social roles from the viewpoint of a particular user. We also developed a novel semi-supervised (transductive) learning algorithm that propagates importance labels from training examples to test examples through message and user nodes in a personal email network. These methods together enable us to obtain an enriched vector representation of each new email message, which consists of both standard features of an email message (such as words in the title or body, sender and receiver IDs, etc.) and the induced social features from the sender and receivers of the message. Using the enriched vector representation as the input in SVM classifiers to predict the importance level for each test message, we obtained significant performance improvement over the baseline system (without induced social features) in our experiments on a multi-user data collection. We obtained significant performance improvement over the baseline system (without induced social features) in our experiments on a multi-user data collection: the relative error reduction in MAE was 31% in micro-averaging, and 14% in macro-averaging.

Supplemental Material

p967-yoo.mp4

mp4

78.7 MB

Download

References

CEAS 2005 - Second Conference on Email and Anti-Spam, July 21-22, 2005, Stanford University, California, USA, 2005.Google Scholar
P. O. Boykin and V. P. Roychowdhury. Leveraging social networks to fight spam. Computer, 38(4):61--68,2005. Google ScholarDigital Library
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998. Google ScholarDigital Library
J. Cadiz, L. Dabbish, A. Gupta, and G. D. Venolia. Supporting email workflow. Technical Report MSR-TR-2001-88, Microsoft Research (MSR), Sept. 2001.Google Scholar
K. M. Carley, D. Columbus, M. DeReno, J. Reminga, and I. Moon. Ora user's guide 2007. Carnegie Mellon University, SCS ISRI, Technical Report, (07-115), 2007.Google Scholar
L. A. Dabbish and R. E. Kraut. Email overload at work: an analysis of factors associated with email strain. In P. J. Hinds and D. Martin, editors, Proceedings of the 2006 ACM Conference on Computer Supported Cooperative Work, CSCW 2006, Banff, Alberta, Canada, November 4-8, 2006, pages 431--440. ACM, 2006. Google ScholarDigital Library
R. B. Einat Minkov and W. Cohen. Activity-centred search in email. In Proceedings of the 5th Conference on Email and Anti-Spam (CEAS). CEAS, 2008.Google Scholar
L. C. Freeman. The Development of Social Network Analysis: A Study in the Sociology of Science. Empirical Press, 2004.Google Scholar
L. H. Gomes, F. D. O. Castro, V. A. F. Almeida, J. M. Almeida, R. B. Almeida, and L. M. A. Bettencourt. Improving spam detection based on structural similarity. In SRUTI'05: Proceedings of the Steps to Reducing Unwanted Traffic on the Internet on Steps to Reducing Unwanted Traffic on the Internet Workshop, pages 12--12, Berkeley, CA, USA, 2005. USENIX Association. Google ScholarDigital Library
T. Haveliwala, S. Kamvar, and G. Jeh. An analytical comparison of approaches to personalizing pagerank. Technical report, Stanford University, 2003.Google Scholar
E. Horvitz, A. Jacobs, and D. Hovel. Attention-sensitive alerting. In K. B. Laskey and H. Prade, editors, UAI '99: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, Stockholm, Sweden, July 30-August 1,1999, pages 305--313. Morgan Kaufmann, 1999. Google ScholarDigital Library
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. J. ACM, 46(5):604--632, 1999. Google ScholarDigital Library
R. Likert. A technique for the measurement of attitudes. Archives of Psychology, 140:1--55, 1932.Google Scholar
K. B. Lisa Johansen, Michael Rowell and P. McDaniel. Email communities of interest. In Proceedings of the 4th Conference on Email and Anti-Spam (CEAS).CEAS, 2007.Google Scholar
S. Martin, B. Nelson, A. Sewani, K. Chen, and A. D. Joseph. Analyzing behavioral features for email classification. In CEAS {1}.Google Scholar
C. Neustaedter, A. J. B. Brush, M. A. Smith, and D. Fisher. The social network and relationship finder: Social sorting for email triage. In CEAS {1}.Google Scholar
M. E. J. Newman. Modularity and community structure in networks. 2006.Google Scholar
J. R. Tyler, D. M. Wilkinson, and B. A. Huberman. Email as spectroscopy: automated discovery of community structure within organizations. pages 81--96, 2003. Google ScholarDigital Library
S. Wasserman and K. Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge, 1994.Google ScholarCross Ref
M. Wattenberg, Rohall, S. L., D. Gruen, and B. Kerr. E-mail research: Targeting the enterprise. Human-Computer Interaction, 20(1/2):139--162, 2005. Google ScholarDigital Library
Joshua Goodman, Gordon V. Cormack, and David Heckerman. Spam and the ongoing battle for the inbox. Commununications of the ACM, 50(2):24--33, 2007. Google ScholarDigital Library
M. Mojdeh and G. V. Cormack, Semi-supervised Spam Filtering: Does it Work?, SIGIR 2008. Google ScholarDigital Library
B. Klimt and Y. Yang. The Enron Corpus: A New Dataset for Email Classification Research. ECML 2004.Google Scholar
R. Balasubramanyan, V. Carvalho and W. Cohen, CutOnce - Recipient Recommendation and Leak Detection in Action. In AAAI-2008, Workshop on Enhanced Messaging.Google Scholar
P.N. Bennett and J. Carbonell (2007). Combining Probability-Based Rankers for Action-Item Detection. In Proceedings of HLT-NAACL 2007.Google Scholar
A. McCallum, X. Wang and A. Corrada-Emmanuel. Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email. Journal of Artificial Intelligence Research (JAIR), 2007.Google Scholar
D. Alwin, and J. Krosnick, The reliability of survey attitude measurement: The influence of questions and respondent attributes, Sociological Methods Research, 1991, 20(139).Google ScholarCross Ref

Index Terms

Mining social networks for personalized email prioritization
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

Mining email social networks
MSR '06: Proceedings of the 2006 international workshop on Mining software repositories

Communication & Co-ordination activities are central to large software projects, but are difficult to observe and study in traditional (closed-source, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, ...
Read More
Modeling personalized email prioritization: classification-based and regression-based approaches
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Email overload, even after spam filtering, presents a serious productivity challenge for busy professionals and executives. One solution is automated prioritization of incoming emails to ensure the most important are read and processed quickly, while ...
Read More
Email prioritization: reducing delays on legitimate mail caused by junk mail
ATEC '04: Proceedings of the annual conference on USENIX Annual Technical Conference

In recent years the volume of junk email (spam, virus etc.) has increased dramatically. These unwanted messages clutter up users' mailboxes, consume server resources, and cause delays to the delivery of mail. This paper presents an approach that ensures ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
June 2009
1426 pages
ISBN:9781605584959
DOI:10.1145/1557019
General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA
Copyright © 2009 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 28 June 2009
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
email prioritization
social network
text mining
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 72
  Total Citations
  View Citations
- 1,618
  Total Downloads
- Downloads (Last 12 months)36
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Mining social networks for personalized email prioritization

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Mining email social networks

Modeling personalized email prioritization: classification-based and regression-based approaches

Email prioritization: reducing delays on legitimate mail caused by junk mail

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Mining social networks for personalized email prioritization

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

ABSTRACT

Supplemental Material

References

Cited By

Index Terms

Recommendations

Mining email social networks

Modeling personalized email prioritization: classification-based and regression-based approaches

Email prioritization: reducing delays on legitimate mail caused by junk mail

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media