skip to main content
10.1145/3307681.3325404acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Public Access

Semantic-aware Workflow Construction and Analysis for Distributed Data Analytics Systems

Authors Info & Claims
Published:17 June 2019Publication History

ABSTRACT

Logging is a universal approach to recording important events in system workflows of distributed systems. Current log analysis tools ignore the semantic knowledge that is key to workflow construction and analysis. In addition, they focus on infrastructure-level distributed systems. Because of fundamental differences in log features, they are ineffective in distributed data analytics systems. This paper proposes IntelLog, a semantic-aware non-intrusive workflow reconstruction tool for distributed data analytics systems. It is capable of building hierarchical relationships between components and events from logs generated by the targeted systems with little or even no domain knowledge. Leveraging natural language processing, IntelLog automatically extracts and formats semantic information in each log message, including system events, identifiers, locality information, and metrics values. It builds a graph to represent the hierarchical relationship of components in the targeted system via nomenclature conventions. We implement IntelLog for Hadoop MapReduce, Spark and Tez. Evaluation results show that IntelLog provides a fine-grained view of the system workflows with semantics. It outperforms existing tools in automatically detecting anomalies caused by real-world problems, misconfigurations and system bugs. Users can query the formatted semantic knowledge to understand and further troubleshoot the systems.

References

  1. Graphite. https://graphite.readthedocs.io/.Google ScholarGoogle Scholar
  2. JSONQuery. https://github.com/burt202/jsonquery/.Google ScholarGoogle Scholar
  3. OpenNLP. https://opennlp.apache.org/, a .Google ScholarGoogle Scholar
  4. OpenStack. https://www.openstack.org/, b .Google ScholarGoogle Scholar
  5. OpenTSDB. http://opentsdb.net//, c .Google ScholarGoogle Scholar
  6. Spark-19371. https://issues.apache.org/jira/browse/SPARK-19371/.Google ScholarGoogle Scholar
  7. TPC-H. http://www.tpc.org/tpch/.Google ScholarGoogle Scholar
  8. TensorFlow. https://www.tensorflow.org/.Google ScholarGoogle Scholar
  9. I. Beschastnikh, Y. Brun, S. Schneider, M. Sloan, and M. D. Ernst. Leveraging existing instrumentation to automatically infer invariant-constrained models. In Proc. of ACM SIGSOFT ESEC/FSE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Borthakur. Hdfs architecture guide. hadoop apache project, 2008.Google ScholarGoogle Scholar
  11. Brid, Steven, E. Loper, and E. Klein. Natural Language Processing with Python. O'Reilly Media Inc., 2009.Google ScholarGoogle Scholar
  12. B. M. Cantrill, M. W. Shapiro, and A. H. Leventhal. Dynamic instrumentation of production systems. In Proc. of USENIX ATC, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Chen and C. D. Manning. A fast and accurate dependency parser using neural networks. In Proc. of ACL EMNLP, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  14. W. Chen, J. Rao, and X. Zhou. Preemptive, low latency datacenter scheduling via lightweight virtualization. In Proc. of USENIX ATC, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Chen, A. Pi, S. Wang, and X. Zhou. Characterizing scheduling delay for low-latency data analytic workloads. In Proc. of IEEE IPDPS, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  16. D. J. Dean, H. Nguyen, X. Gu, H. Zhang, J. Rhee, Nipun, Arora, and G. Jiang. Perfscope: Practical online server performance bug inference in production cloud computing infrastructures. In Proc. of ACM SoCC, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proc. of ACM Communications, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Du and F. Li. Spell: Streaming parsing of system event logs. In Proc. of IEEE ICDM, 2017.Google ScholarGoogle Scholar
  19. M. Du, F. Li, G. Zheng, and V. Srikumar. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proc. of ACM CCS, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Huang, J. Huang, J. Dai, T. Xie, and B. Huang. The HiBench benchmark suite: Characterization of the mapreduce-based data analysis. In Proc. of IEEE Data Engineering Workshops (ICDEW), 2010.Google ScholarGoogle ScholarCross RefCross Ref
  21. J. S. Justeson and S. M. Katz. Technical terminology: some linguistic properties and an algorithm for identification in text. Natural Language Engineering, 1995.Google ScholarGoogle ScholarCross RefCross Ref
  22. Q. Lin, H. Zhang, J.-G. Lou, Y. Zhang, and X. Chen. Log clustering based problem identification for online service systems. In Proc. of IEEE/ACM ICSE, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. L. Luo, S. Nath, L. R. Sivalingam, M. Musuvathi, and L. Ceze. Troubleshooting, transiently-recurring problems in production systems with blame-proportional logging. In Proc. of USENIX ATC, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Mace, R. Roelke, and R. Fonseca. Pivot tracing: Dynamic causal monitoring for distributed systems. In Proc. of ACM SOSP, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. P. Marcus, M. A. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19 (2): 313--330, June 1993. ISSN 0891--2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Mejbah ul Alam, T. Liu, G. Zeng, and A. Muzahid. Syncperf: Categorizing, detecting, and diagnosing synchronization performance bugs. In Proc. of ACM Eurosys, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Nagaraj, C. Killian, and J. Neville. Structured comparative analysis of systems logs to diagnose performance problems. In Proc. of USENIX NSDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Nivre, M.-C. Marneffe, F. Ginter, Y. Goldberg, J. Hajic, C. D. Manning, R. McDonald, S. Petrov, S. Pyysalo, N. Silveira, R. Tsarfaty, and D. Zeman. Universal dependencies v1: A multilingual treebank collection. In Proc. of LREC, 2016.Google ScholarGoogle Scholar
  29. A. Pi, W. Chen, X. Zhou, and M. Ji. Profiling distributed systems in lightweight virtualized environments with logs and resource metrics. In Proc. of ACM HPDC, 2018.Google ScholarGoogle Scholar
  30. A. Pi, W. Chen, W. Zeller, and X. Zhou. It can understand the logs, literally. In Proc. of IPDPSW, 2019.Google ScholarGoogle ScholarCross RefCross Ref
  31. R. Potharaju, N. Jain, and C. Nita-Rotaru. Juggling the jigsaw: Towards automated problem inference from network trouble tickets. In Proc. of USENIX NSDI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Saha, H. Shah, S. Seth, G. Vijayaraghavan, A. Murthy, and C. Curino. Apache tez: A unifying framework for modeling and building data processing applications. In Proc. of ACM SIGMOD, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy. Hive: a warehousing solution over a map-reduce framework. Proc. of VLDB Endowment, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. of HLT-NAACL, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. V. K. Vavilapalli, A. C. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, et al. Apache Hadoop YARN: Yet another resource negotiator. In Proc. of ACM SoCC, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Yamamoto and K. W. Church. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Computational Linguistics, 27 (1): 1--30, Mar. 2001. ISSN 0891--2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. X. Yu, P. Joshi, J. Xu, and G. Jin. CloudSeer: Workflow monitoring of cloud infrastructures via interleaved logs. In Proc. of ACM ASPLOS, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proc. of USENIX HOTCLOUD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. X. Zhao, Y. Zhang, D. Lion, M. FaizanUllah, Y. Luo, D. Yuan, and M. Stumm. Iprof: A non-intrusive request flow profiler for distributed systems. In Proc. of USENIX OSDI, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. X. Zhao, K. Rodrigues, Y. Luo, D. Yuan, and M. Stumm. Non-intrusive performance profiling for entire software stacks based on the flow reconstruction principle. In Proc. of USENIX OSDI, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantic-aware Workflow Construction and Analysis for Distributed Data Analytics Systems

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in
                  • Published in

                    cover image ACM Conferences
                    HPDC '19: Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing
                    June 2019
                    278 pages
                    ISBN:9781450366700
                    DOI:10.1145/3307681

                    Copyright © 2019 ACM

                    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                    Publisher

                    Association for Computing Machinery

                    New York, NY, United States

                    Publication History

                    • Published: 17 June 2019

                    Permissions

                    Request permissions about this article.

                    Request Permissions

                    Check for updates

                    Qualifiers

                    • research-article

                    Acceptance Rates

                    HPDC '19 Paper Acceptance Rate22of106submissions,21%Overall Acceptance Rate166of966submissions,17%

                    Upcoming Conference

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader