skip to main content
10.1145/2983323.2983358acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

LogMine: Fast Pattern Recognition for Log Analytics

Published:24 October 2016Publication History

ABSTRACT

Modern engineering incorporates smart technologies in all aspects of our lives. Smart technologies are generating terabytes of log messages every day to report their status. It is crucial to analyze these log messages and present usable information (e.g. patterns) to administrators, so that they can manage and monitor these technologies. Patterns minimally represent large groups of log messages and enable the administrators to do further analysis, such as anomaly detection and event prediction. Although patterns exist commonly in automated log messages, recognizing them in massive set of log messages from heterogeneous sources without any prior information is a significant undertaking. We propose a method, named LogMine, that extracts high quality patterns for a given set of log messages. Our method is fast, memory efficient, accurate, and scalable. LogMine is implemented in map-reduce framework for distributed platforms to process millions of log messages in seconds. LogMine is a robust method that works for heterogeneous log messages generated in a wide variety of systems. Our method exploits algorithmic techniques to minimize the computational overhead based on the fact that log messages are always automatically generated. We evaluate the performance of LogMine on massive sets of log messages generated in industrial applications. LogMine has successfully generated patterns which are as good as the patterns generated by exact and unscalable method, while achieving a 500× speedup. Finally, we describe three applications of the patterns generated by LogMine in monitoring large scale industrial systems.

References

  1. Anonymous repository. https://files.secureserver.net/0fsleuxZLY7vjK.Google ScholarGoogle Scholar
  2. Benchmarking for DBSCAN and OPTICS. http://elki.dbs.ifi.lmu.de/wiki/Benchmarking.Google ScholarGoogle Scholar
  3. Elasticsearch: Store, Search, and Analyze. https://www.elastic.co/guide/index.html.Google ScholarGoogle Scholar
  4. EPA dataset. http://ita.ee.lbl.gov/html/contrib/EPA-HTTP.html.Google ScholarGoogle Scholar
  5. GrayLog. https://www.graylog.org.Google ScholarGoogle Scholar
  6. Internet of Things (IoT). http://www.cisco.com/web/solutions/trends/iot/overview.html.Google ScholarGoogle Scholar
  7. Log Management Explained. https://www.loggly.com/log-management-explained/.Google ScholarGoogle Scholar
  8. LogEntries. https://logentries.com/doc/.Google ScholarGoogle Scholar
  9. OSSIM (Open Source Security Information Management). https://en.wikipedia.org/wiki/OSSIM.Google ScholarGoogle Scholar
  10. SDSC dataset. http://ita.ee.lbl.gov/html/contrib/SDSC-HTTP.html.Google ScholarGoogle Scholar
  11. Splunk. http://www.splunk.com/en_us/solutions/solution-areas/internet-of-things%.html.Google ScholarGoogle Scholar
  12. Sumo Logic. https://www.sumologic.com/.Google ScholarGoogle Scholar
  13. M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. Optics: ordering points to identify the clustering structure. In ACM Sigmod Record, volume 28, pages 49--60. ACM, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Blanas, J. M. Patel, V. Ercegovac, J. Rao, E. J. Shekita, and Y. Tian. A comparison of join algorithms for log processing in mapreduce. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pages 975--986. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107--113, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Ding and J. Zhou. Log-based indexing to improve web site search. In Proceedings of the 2007 ACM Symposium on Applied Computing, SAC '07, pages 829--833, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. Eltahir and A. Dafa-Alla. Extracting knowledge from web server logs using web usage mining. In Computing, Electrical and Electronics Engineering (ICCEEE), 2013 International Conference on, pages 413--417, Aug 2013.Google ScholarGoogle ScholarCross RefCross Ref
  18. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd, volume 96, pages 226--231, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases, volume 23. ACM, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: a survey and empirical demonstration. Data Mining and knowledge discovery, 7(4):349--371, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Lee, J. Lin, C. Liu, A. Lorek, and D. Ryaboy. The unified logging infrastructure for data analytics at twitter. Proceedings of the VLDB Endowment, 5(12):1771--1780, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon. Parallel data processing with mapreduce: a survey. AcM SIGMOD Record, 40(4):11--20, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. D. Martino, S. Jha, W. Kramer, Z. Kalbarczyk, and R. K. Iyer. Logdiver: A tool for measuring resilience of extreme-scale systems and applications. In Proceedings of the 5th Workshop on Fault Tolerance for HPC at eXtreme Scale, FTXS '15, pages 11--18, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Mueen, E. J. Keogh, Q. Zhu, S. Cash, and M. B. Westover. Exact discovery of time series motifs. In SDM, pages 473--484. SIAM, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  25. X. Ning and G. Jiang.mboxHLAer: A system for heterogeneous log analysis, 2014. phSDM Workshop on Heterogeneous Learning.Google ScholarGoogle Scholar
  26. R. Rajachandrasekar, X. Besseron, and D. K. Panda. Monitoring and predicting hardware failures in hpc clusters with ftb-ipmi. In Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International, pages 1136--1143. IEEE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 262--270. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. K. S. Reddy, G. P. S. Varma, and I. R. Babu. Preprocessing the web server logs: An illustrative approach for effective usage mining. SIGSOFT Softw. Eng. Notes, 37(3):1--5, May 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. F. Smith and M. S. Waterman. Identification of common molecular subsequences. Journal of molecular biology, 147(1):195--197, 1981.Google ScholarGoogle Scholar
  30. P. Sneath and R. Sokal. Unweighted pair group method with arithmetic mean. Numerical Taxonomy, pages 230--234, 1973.Google ScholarGoogle Scholar
  31. H. T. Vo, S. Wang, D. Agrawal, G. Chen, and B. C. Ooi. Logbase: A scalable log-structured database system in the cloud. Proc. VLDB Endow., 5(10):1004--1015, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Wikipedia. Dbscan -- wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=DBSCAN&oldid=672504091, 2015.Google ScholarGoogle Scholar
  33. C. Xu, S. Chen, and J. Cheng. Network user interest pattern mining based on entropy clustering algorithm. In Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2015 International Conference on, pages 200--204, Sept 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, volume 10, page 10, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. LogMine: Fast Pattern Recognition for Log Analytics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
        October 2016
        2566 pages
        ISBN:9781450340731
        DOI:10.1145/2983323

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 24 October 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        CIKM '16 Paper Acceptance Rate160of701submissions,23%Overall Acceptance Rate1,861of8,427submissions,22%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader