skip to main content
10.1145/3106237.3106261acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Toward full elasticity in distributed static analysis: the case of callgraph analysis

Published: 21 August 2017 Publication History

Abstract

In this paper we present the design and implementation of a distributed, whole-program static analysis framework that is designed to scale with the size of the input. Our approach is based on the actor programming model and is deployed in the cloud. Our reliance on a cloud cluster provides a degree of elasticity for CPU, memory, and storage resources. To demonstrate the potential of our technique, we show how a typical call graph analysis can be implemented in a distributed setting. The vision that motivates this work is that every large-scale software repository such as GitHub, BitBucket, or Visual Studio Online will be able to perform static analysis on a large scale. We experimentally validate our implementation of the distributed call graph analysis using a combination of both synthetic and real benchmarks. To show scalability, we demonstrate how the analysis presented in this paper is able to handle inputs that are almost 10 million lines of code (LOC) in size, without running out of memory. Our results show that the analysis scales well in terms of memory pressure independently of the input size, as we add more virtual machines (VMs). As the number of worker VMs increases, we observe that the analysis time generally improves as well. Lastly, we demonstrate that querying the results can be performed with a median latency of 15 ms.

References

[1]
Representational state transfer. https://en.wikipedia.org/wiki/ Representational_state_transfer, 2015.
[2]
G. Agha. Actors: A Model of Concurrent Computation in Distributed Systems. MIT Press, 1986.
[3]
A. Albarghouthi, R. Kumar, A. V. Nori, and S. K. Rajamani. Parallelizing topdown interprocedural analyses. In Proceedings of the Conference on Programming Language Design and Implementation, 2012.
[4]
L. O. Andersen. Program analysis and specialization for the C programming language. PhD thesis, University of Cophenhagen, 1994.
[5]
D. F. Bacon and P. F. Sweeney. Fast static analysis of C++ virtual function calls. In Proceedings of the Conference on Object-oriented Programming, Systems, Languages, and Applications, 1996.
[6]
P. A. Bernstein, S. Bykov, A. Geller, G. Kliot, and J. Thelin. Orleans: Distributed virtual actors for programmability and scalability. Technical Report MSR-TR- 2014-41, Microsoft Research, 2014.
[7]
J. Bornholt and E. Torlak. Scaling program synthesis by exploiting existing code. Machine Learning for Programming Languages, 2015.
[8]
C. Calcagno, D. Distefano, J. Dubreil, D. Gabi, P. Hooimeijer, M. Luca, P. OâĂŹ-Hearn, I. Papakonstantinou, J. Purbrick, and D. Rodriguez. Moving fast with software veriication. In NASA Formal Methods Symposium, pages 3ś11. Springer, 2015.
[9]
P. Cousot. Asynchronous iterative methods for solving a ixed point system of monotone equations in a complete lattice. Technical report, Laboratoire IMAG, Université scientiique et médicale de Grenoble, 1977.
[10]
R. Dyer, H. A. Nguyen, H. Rajan, and T. N. Nguyen. Boa: A language and infrastructure for analyzing ultra-large-scale software repositories. In Proceedings of the International Conference on Software Engineering. IEEE Press, 2013.
[11]
R. Dyer, H. Rajan, H. A. Nguyen, and T. N. Nguyen. Mining billions of ast nodes to study actual and potential usage of Java language features. In Proceedings of the International Conference on Software Engineering, 2014.
[12]
R. Dyer, H. Rajan, and T. N. Nguyen. Declarative visitors to ease ine-grained source code mining with full history on billions of ast nodes. In ACM SIGPLAN Notices. ACM, 2013.
[13]
D. Grove and C. Chambers. A framework for call graph construction algorithms. ACM Transactions on Programming Languages and Systems, 2001.
[14]
D. Grove, G. DeFouw, J. Dean, and C. Chambers. Call graph construction in object-oriented languages. ACM SIGPLAN Notices, 1997.
[15]
B. Hardekopf and C. Lin. The ant and the grasshopper: Fast and accurate pointer analysis for millions of lines of code. In Proceedings of the Conference on Programming Language Design and Implementation, 2007.
[16]
B. Hardekopf and C. Lin. Flow-sensitive pointer analysis for millions of lines of code. In Proceedings of the International Symposium on Code Generation and Optimization, 2011.
[17]
M. S. Lam, J. Whaley, B. Livshits, M. C. Martin, D. Avots, M. Carbin, and C. Unkel. Context-sensitive program analysis as database queries. In Proceedings of the Symposium on Principles of Database Systems, June 2005.
[18]
J.-L. Lassez, V. Nguyen, and E. Sonenberg. Fixed point theorems and semantics: A folk tale. Information Processing Letters, 1982.
[19]
Y. Y. Lee, S. Harwell, S. Khurshid, and D. Marinov. Temporal code completion and navigation. In Proceedings of the International Conference on Software Engineering, 2013.
[20]
O. Lhoták and L. Hendren. Context-sensitive points-to analysis: Is it worth it? In Proceedings of the International Conference on Compiler Construction, 2006.
[21]
B. Livshits, M. Sridharan, Y. Smaragdakis, O. Lhoták, J. N. Amaral, B.-Y. E. Chang, S. Z. Guyer, U. P. Khedker, A. Mùller, and D. Vardoulakis. In defense of soundiness: A manifesto. Communications of the ACM, 2015.
[22]
N. P. Lopes and A. Rybalchenko. Distributed and predictable software model checking. In Proceedings of the International Conference on Veriication, Model Checking, and Abstract Interpretation, 2011.
[23]
M. Madsen, B. Livshits, and M. Fanning. Practical static analysis of JavaScript applications in the presence of frameworks and libraries. In Proceedings of the International Symposium on the Foundations of Software Engineering, 2013.
[24]
G. Malewicz, M. H. Austern, A. J. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: A system for large-scale graph processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD, 2010.
[25]
S. McPeak, C.-H. Gros, and M. K. Ramanathan. Scalable and incremental software bug detection. In Proceedings of the Symposium on the Foundations of Software Engineering, 2013.
[26]
M. Mendez-Lojo, A. Mathew, and K. Pingali. Parallel inclusion-based points-to analysis. In Conference on Object Oriented Programming Systems Languages and Applications, 2010.
[27]
Microsoft Corporation. .NET Compiler Platform ("Roslyn"). https://roslyn. codeplex.com/, 2015.
[28]
Mozilla. LXR. https://en.wikipedia.org/wiki/LXR_Cross_Referencer, 2015.
[29]
K. Murray, J. P. Bigham, et al. Beyond autocomplete: Automatic function deinition. In Proceedings of the Visual Languages and Human-Centric Computing Symposium, 2011.
[30]
C. Omar, Y. Yoon, T. D. LaToza, and B. A. Myers. Active code completion. In Proceedings of the International Conference on Software Engineering, 2012.
[31]
J. Rodriguez and O. Lhoták. Actor-Based Parallel Datalow Analysis. 2011.
[32]
C. Sadowski, J. van Gogh, C. Jaspan, E. Soederberg, and C. Winter. Tricorder: Building a program analysis ecosystem. In International Conference on Software Engineering (ICSE), 2015.
[33]
M. Sridharan, D. Gopan, L. Shan, and R. Bodík. Demand-driven points-to analysis for java. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Objectoriented Programming, Systems, Languages, and Applications, OOPSLA ’05, pages ESEC/FSE’17, September 04-08, 2017, Paderborn, Germany Diego Garbervetsky, Edgardo Zoppi, and Benjamin Livshits 59ś76, New York, NY, USA, 2005. ACM.
[34]
B. Steensgaard. Points-to analysis in almost linear time. In Proceedings of the Symposium on Principles of Programming Languages. ACM, 1996.
[35]
V. Sundaresan, L. Hendren, C. Razaimahefa, R. Vallée-Rai, P. Lam, E. Gagnon, and C. Godin. Practical virtual method call resolution for Java. In Proceedings of the Conference on Object-oriented Programming, Systems, Languages, and Applications, 2000.
[36]
F. Tip and J. Palsberg. Scalable propagation-based call graph construction algorithms. In Proceedings of the Conference on Object-oriented Programming, Systems, Languages, and Applications, 2000.
[37]
J. W. Voung, R. Jhala, and S. Lerner. Relay: Static race detection on millions of lines of code. In Proceedings of the Symposium on the Foundations of Software Engineering, 2007.
[38]
K. Wang, A. Hussain, Z. Zuo, G. Xu, and A. Amiri Sani. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 2017.
[39]
J. Whaley and M. S. Lam. Cloning-based context-sensitive pointer alias analysis using binary decision diagrams. In ACM SIGPLAN Notices, volume 39, 2004.
[40]
Y. Xie and A. Aiken. Saturn: A scalable framework for error detection using boolean satisiability. ACM Trans. Program. Lang. Syst., 29(3), May 2007.
[41]
H. Yu, J. Xue, W. Huo, X. Feng, and Z. Zhang. Level by level: Making low- and context-sensitive pointer analysis scalable for millions of lines of code. In Proceedings of the International Symposium on Code Generation and Optimization, 2010.

Cited By

View all
  • (2024)Accelerating Static Null Pointer Dereference Detection with Parallel ComputingProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671385(135-144)Online publication date: 24-Jul-2024
  • (2023)BigDataflow: A Distributed Interprocedural Dataflow Analysis FrameworkProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616348(1431-1443)Online publication date: 30-Nov-2023
  • (2023)A Parallel Memory Defect Detection Method based on Sparse-Value-Flow Graph2023 IEEE International Conference on Joint Cloud Computing (JCC)10.1109/JCC59055.2023.00014(57-65)Online publication date: Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ESEC/FSE 2017: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
August 2017
1073 pages
ISBN:9781450351058
DOI:10.1145/3106237
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 August 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Development environments and tools
  2. Parallel
  3. Performance and scalability
  4. Program analysis
  5. Program comprehension and visualization
  6. and concurrent systems
  7. distributed

Qualifiers

  • Research-article

Conference

ESEC/FSE'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 112 of 543 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 08 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Accelerating Static Null Pointer Dereference Detection with Parallel ComputingProceedings of the 15th Asia-Pacific Symposium on Internetware10.1145/3671016.3671385(135-144)Online publication date: 24-Jul-2024
  • (2023)BigDataflow: A Distributed Interprocedural Dataflow Analysis FrameworkProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616348(1431-1443)Online publication date: 30-Nov-2023
  • (2023)A Parallel Memory Defect Detection Method based on Sparse-Value-Flow Graph2023 IEEE International Conference on Joint Cloud Computing (JCC)10.1109/JCC59055.2023.00014(57-65)Online publication date: Jul-2023
  • (2023)Incremental Call Graph Construction in Industrial PracticeProceedings of the 45th International Conference on Software Engineering: Software Engineering in Practice10.1109/ICSE-SEIP58684.2023.00048(471-482)Online publication date: 17-May-2023
  • (2023)P-DATA: A Task-Level Parallel Framework for Dependency-Aware Value Flow Taint Analysis2023 30th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC60848.2023.00035(249-258)Online publication date: 4-Dec-2023
  • (2022)Call Graph Evolution Analytics over a Version Series of an Evolving Software SystemProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3559573(1-5)Online publication date: 10-Oct-2022
  • (2022)Input splitting for cloud-based static application security testing platformsProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3540250.3558944(1367-1378)Online publication date: 7-Nov-2022
  • (2021)Chianina: an evolving graph system for flow- and context-sensitive analyses of million lines of C codeProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3453483.3454085(914-929)Online publication date: 19-Jun-2021
  • (2020)Pipelining bottom-up data flow analysisProceedings of the ACM/IEEE 42nd International Conference on Software Engineering10.1145/3377811.3380425(835-847)Online publication date: 27-Jun-2020
  • (2020)Static Analysis Framework Based on Multi - Agent System2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS)10.1109/IEMTRONICS51293.2020.9216382(1-9)Online publication date: Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media