skip to main content
10.1145/3097983.3098101acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Let's See Your Digits: Anomalous-State Detection using Benford's Law

Published: 04 August 2017 Publication History

Abstract

Benford's Law explains a curious phenomenon in which the leading digits of "naturally-occurring" numerical data are distributed in a precise fashion. In this paper we begin by showing that system metrics generated by many modern information systems like Twitter, Wikipedia, YouTube and GitHub obey this law. We then propose a novel unsupervised approach called BenFound that exploits this property to detect anomalous system events. BenFound tracks the "Benfordness" of key system metrics, like the follower counts of tweeting Twitter users or the change deltas in Wikipedia page edits. It then applies a novel Benford-conformity test in real-time to identify "non-Benford events". We investigate a variety of such events, showing that they correspond to unnatural and often undesirable system interactions like spamming, hashtag-hijacking and denial-of-service attacks. The result is a technically-uncomplicated and effective "red flagging" technique that can be used to complement existing anomaly-detection approaches. Although not without its limitations, it is highly efficient and requires neither obscure parameters, nor text streams, nor natural-language processing.

References

[1]
J. Alexander, "Remarks on the use of Benford's Law", 2009 (available at SSRN 1505147).
[2]
É. Antoine, A. Jatowt, S. Wakamiya, Y. Kawai, and T. Akiyama, "Portraying collective spatial attention in twitter", KDD 2015.
[3]
A.N. Asadi, "An approach for detecting anomalies by assessing the inter-arrival time of UDP packets and flows using Benford's law", in Knowledge-Based Engineering and Innovation (KBEI) 2015.
[4]
F. Benford, "The law of anomalous numbers", in Proceedings of the American Philosophical Society, vol. 78, 1938, pp. 551--572.
[5]
A. Berger, and T.P. Hill. "Benford's Law strikes back: No simple explanation in sight", in The Mathematical Intelligencer, vol. 33, 2011, pp. 85--91.
[6]
A. Nigrini, and T.P. Hill, An Introduction to Benford's Law. Princeton University Press, 2015.
[7]
A. Clauset, C.R. Shalizi, and M.E.J. Newman, Power-law distributions in empirical data. in SIAM review, vol. 51, no. 4, 2009, pp. 661--703.
[8]
A. Edwards and L. Cavalli-Sforza, "A Method for Cluster Analysis", in Biometrics, vol. 21, no. 2, 1965.
[9]
K. Hayashi, T. Maehara, M. Toyoda and K. Kawarabayashi, "Real-time topic detection on twitter with topic hijack filtering", KDD 2015.
[10]
A. Iorliam, A.T.S. Ho, N. Poh, S. Tirunagari, and P. Bours, "Data forensic techniques using Benford's law and Zipf's law for keystroke dynamics", International Workshop in Biometrics and Forensics (IWBF), 2015.
[11]
NA. James, AK. Kejariwal and DS. Matteson, "Leveraging Cloud Data to Mitigate User Experience from Breaking Bad", arXiv preprint arXiv:1411.7955, 2014.
[12]
A. Kolmogorov. "Sulla determinazione empirica di una legge di distribuzione". Italian Actuarial Journal, 1933.
[13]
N. Laptev, S. Amizadeh, and I. Flint, "Generic and scalable framework for automated time-series anomaly detection", KDD 2015.
[14]
D. Matteson and N. James. "A Nonparametric Approach for Multiple Change Point Analysis of Multivariate Data", in Journal of the American Statistical Association, vol. 109, no. 505, 2014, pp.334--345.
[15]
M.J. Nigrini, Benford's Law: Applic. for Forensic Acc., Auditing and Fraud Det. John Wiley & Sons, 2012.
[16]
J. Golbeck. "Benford's Law Applies to Online Social Networks", in PloS one, vol. 10, 2015.
[17]
G. Rigaill, "Pruned Dynamic Programming for Optimal Multiple Change-Point Detection", arXiv:1004.0887.
[18]
A. Ritter, Mausam, O. Etzioni, S. Clark, "Open Domain Event Extraction from Twitter", KDD 2012.
[19]
A. Scott and M. Knott, "A Cluster Analysis Method for Grouping Means in the Analysis of Variance", in Biometrics, vol. 30, no. 3, 1974, pp. 507--512.
[20]
Twitter Inc. "Twitter Terms of Service". Online Resource. https://www.twitter.com/tos.
[21]
O. Vallis, J. Hochenbaum and A. Kejariwal, "A novel technique for long-term anomaly detection in the cloud", in 6th USENIX (HotCloud 2014).

Cited By

View all
  • (2021)Methods and Challenges in Social Bots Detection: A Systematic ReviewProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466973(1-8)Online publication date: 7-Jun-2021
  • (2021)Group Anomaly Detection: Past Notions, Present Insights, and Future ProspectsSN Computer Science10.1007/s42979-021-00603-x2:3Online publication date: 16-Apr-2021
  • (2019)Real-Time Attack Detection on Robot Cameras: A Self-Driving Car Application2019 Third IEEE International Conference on Robotic Computing (IRC)10.1109/IRC.2019.00023(102-109)Online publication date: Feb-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2017
2240 pages
ISBN:9781450348874
DOI:10.1145/3097983
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anomaly detection
  2. benford's law
  3. data streams
  4. nonparametric statistical tests
  5. time series data

Qualifiers

  • Research-article

Conference

KDD '17
Sponsor:

Acceptance Rates

KDD '17 Paper Acceptance Rate 64 of 748 submissions, 9%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Methods and Challenges in Social Bots Detection: A Systematic ReviewProceedings of the XVII Brazilian Symposium on Information Systems10.1145/3466933.3466973(1-8)Online publication date: 7-Jun-2021
  • (2021)Group Anomaly Detection: Past Notions, Present Insights, and Future ProspectsSN Computer Science10.1007/s42979-021-00603-x2:3Online publication date: 16-Apr-2021
  • (2019)Real-Time Attack Detection on Robot Cameras: A Self-Driving Car Application2019 Third IEEE International Conference on Robotic Computing (IRC)10.1109/IRC.2019.00023(102-109)Online publication date: Feb-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media