research-article

LogMine: Fast Pattern Recognition for Log Analytics

Authors:
Hossein Hamooni

University of New Mexico, Albuquerque, NM, USA

University of New Mexico, Albuquerque, NM, USA
View Profile

,
Biplob Debnath

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

,
Jianwu Xu

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

,
Hui Zhang

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

,
Guofei Jiang

NEC Laboratories America, Princeton, NJ, USA

NEC Laboratories America, Princeton, NJ, USA
View Profile

,
Abdullah Mueen

University of New Mexico, Albuquerque, NM, USA

University of New Mexico, Albuquerque, NM, USA
View Profile

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge ManagementOctober 2016Pages 1573–1582https://doi.org/10.1145/2983323.2983358

Published:24 October 2016Publication History

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Pages 1573–1582

ABSTRACT

Modern engineering incorporates smart technologies in all aspects of our lives. Smart technologies are generating terabytes of log messages every day to report their status. It is crucial to analyze these log messages and present usable information (e.g. patterns) to administrators, so that they can manage and monitor these technologies. Patterns minimally represent large groups of log messages and enable the administrators to do further analysis, such as anomaly detection and event prediction. Although patterns exist commonly in automated log messages, recognizing them in massive set of log messages from heterogeneous sources without any prior information is a significant undertaking. We propose a method, named LogMine, that extracts high quality patterns for a given set of log messages. Our method is fast, memory efficient, accurate, and scalable. LogMine is implemented in map-reduce framework for distributed platforms to process millions of log messages in seconds. LogMine is a robust method that works for heterogeneous log messages generated in a wide variety of systems. Our method exploits algorithmic techniques to minimize the computational overhead based on the fact that log messages are always automatically generated. We evaluate the performance of LogMine on massive sets of log messages generated in industrial applications. LogMine has successfully generated patterns which are as good as the patterns generated by exact and unscalable method, while achieving a 500× speedup. Finally, we describe three applications of the patterns generated by LogMine in monitoring large scale industrial systems.

References

Anonymous repository. https://files.secureserver.net/0fsleuxZLY7vjK.Google Scholar
Benchmarking for DBSCAN and OPTICS. http://elki.dbs.ifi.lmu.de/wiki/Benchmarking.Google Scholar
Elasticsearch: Store, Search, and Analyze. https://www.elastic.co/guide/index.html.Google Scholar
EPA dataset. http://ita.ee.lbl.gov/html/contrib/EPA-HTTP.html.Google Scholar
GrayLog. https://www.graylog.org.Google Scholar
Internet of Things (IoT). http://www.cisco.com/web/solutions/trends/iot/overview.html.Google Scholar
Log Management Explained. https://www.loggly.com/log-management-explained/.Google Scholar
LogEntries. https://logentries.com/doc/.Google Scholar
OSSIM (Open Source Security Information Management). https://en.wikipedia.org/wiki/OSSIM.Google Scholar
SDSC dataset. http://ita.ee.lbl.gov/html/contrib/SDSC-HTTP.html.Google Scholar
Splunk. http://www.splunk.com/en_us/solutions/solution-areas/internet-of-things%.html.Google Scholar
Sumo Logic. https://www.sumologic.com/.Google Scholar
M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: ordering points to identify the clustering structure. In ACM Sigmod Record, volume 28, pages 49--60. ACM, 1999. Google ScholarDigital Library
S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapreduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 975--986. ACM, 2010. Google ScholarDigital Library
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarDigital Library
C. Ding and J. Zhou. Log-based indexing to improve web site search. In Proceedings of the 2007 ACM Symposium on Applied Computing, SAC '07, pages 829--833, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
M. Eltahir and A. Dafa-Alla. Extracting knowledge from web server logs using web usage mining. In Computing, Electrical and Electronics Engineering (ICCEEE), 2013 International Conference on, pages 413--417, Aug 2013.Google ScholarCross Ref
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996. Google ScholarDigital Library
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases, volume 23. ACM, 1994. Google ScholarDigital Library
E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and knowledge discovery, 7(4):349--371, 2003. Google ScholarDigital Library
G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy. The unified logging infrastructure for data analytics at twitter. Proceedings of the VLDB Endowment, 5(12):1771--1780, 2012. Google ScholarDigital Library
K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon. Parallel data processing with mapreduce: a survey. AcM SIGMOD Record, 40(4):11--20, 2012. Google ScholarDigital Library
C. D. Martino, S. Jha, W. Kramer, Z. Kalbarczyk, and R. K. Iyer. Logdiver: A tool for measuring resilience of extreme-scale systems and applications. In Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS '15, pages 11--18, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
A. Mueen, E. J. Keogh, Q. Zhu, S. Cash, and M. B. Westover. Exact discovery of time series motifs. In SDM, pages 473--484. SIAM, 2009.Google ScholarCross Ref
X. Ning and G. Jiang.mboxHLAer: A system for heterogeneous log analysis, 2014. phSDM Workshop on Heterogeneous Learning.Google Scholar
R. Rajachandrasekar, X. Besseron, and D. K. Panda. Monitoring and predicting hardware failures in hpc clusters with ftb-ipmi. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pages 1136--1143. IEEE, 2012. Google ScholarDigital Library
T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 262--270. ACM, 2012. Google ScholarDigital Library
K. S. Reddy, G. P. S. Varma, and I. R. Babu. Preprocessing the web server logs: An illustrative approach for effective usage mining. SIGSOFT Softw. Eng. Notes, 37(3):1--5, May 2012. Google ScholarDigital Library
T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of molecular biology, 147(1):195--197, 1981.Google Scholar
P. Sneath and R. Sokal. Unweighted pair group method with arithmetic mean. Numerical Taxonomy, pages 230--234, 1973.Google Scholar
H. T. Vo, S. Wang, D. Agrawal, G. Chen, and B. C. Ooi. Logbase: A scalable log-structured database system in the cloud. Proc. VLDB Endow., 5(10):1004--1015, 2012. Google ScholarDigital Library
Wikipedia. Dbscan -- wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=DBSCAN&oldid=672504091, 2015.Google Scholar
C. Xu, S. Chen, and J. Cheng. Network user interest pattern mining based on entropy clustering algorithm. In Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2015 International Conference on, pages 200--204, Sept 2015. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, volume 10, page 10, 2010. Google ScholarDigital Library

Index Terms

LogMine: Fast Pattern Recognition for Log Analytics
1. Computing methodologies
  1. Parallel computing methodologies
    1. Parallel algorithms
      1. MapReduce algorithms
2. Information systems
  1. Information systems applications
    1. Data mining
      1. Clustering

Recommendations

Robust log-based anomaly detection on unstable log data
ESEC/FSE 2019: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

Logs are widely used by large and complex software-intensive systems for troubleshooting. There have been a lot of studies on log-based anomaly detection. To detect the anomalies, the existing methods mainly construct a detection model using log event ...
Read More
Scale-out beyond map-reduce
KDD '13: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

The amount and variety of data being collected in the enterprise is growing at a staggering pace. The default now is to capture and store any and all data, in anticipation of potential future strategic value, and vast amounts of data are being generated ...
Read More
An O((log log n)2) Time Algorithm to Compute the Convex Hull of Sorted Points on Reconfigurable Meshes

The problem of computing the convex hull of a set of n sorted points in the plane is one of the fundamental tasks in image processing, pattern recognition, cellular network design, and robotics, among many others. Somewhat surprisingly, in spite of a ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
General Chairs:
Snehasis Mukhopadhyay
Indiana University Purdue University Indianapolis, USA
,
ChengXiang Zhai
University of Illinois at Urbana-Champaign, USA
,
Program Chairs:
Elisa Bertino
Purdue University
,
Fabio Crestani
University of Lugano
,
Javed Mostafa
University of North Carolina
,
Jie Tang
Tsinghua University
,
Luo Si
Alibaba Group Inc & Purdue University
,
Xiaofang Zhou
University of Queensland
,
Yi Chang
Yahoo Research
,
Yunyao Li
IBM Research - Almaden
,
Parikshit Sondhi
WalmartLabs
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
log analysis
map-reduce
pattern recognition
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 126
  Total Citations
  View Citations
- 754
  Total Downloads
- Downloads (Last 12 months)102
- Downloads (Last 6 weeks)15
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

LogMine: Fast Pattern Recognition for Log Analytics

CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Robust log-based anomaly detection on unstable log data

Scale-out beyond map-reduce

An O((log log n)2) Time Algorithm to Compute the Convex Hull of Sorted Points on Reconfigurable Meshes