Abstract
A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graph where the edges are labeled with different types of opening and closing parentheses, and the reachability information is computed via paths whose parentheses are properly matched. We present new results for Dyck reachability problems with applications to alias analysis and data-dependence analysis. Our main contributions, that include improved upper bounds as well as lower bounds that establish optimality guarantees, are as follows:
First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph with n nodes and m edges, we present: (i) an algorithm with worst-case running time O(m + n · α(n)), where α(n) is the inverse Ackermann function, improving the previously known O(n2) time bound; (ii) a matching lower bound that shows that our algorithm is optimal wrt to worst-case complexity; and (iii) an optimal average-case upper bound of O(m) time, improving the previously known O(m · logn) bound.
Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is only linear, and only wrt the number of call sites.
Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean Matrix Multiplication, which is a long-standing open problem. Thus we establish that the existing combinatorial algorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the same hardness holds for graphs of constant treewidth.
Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependence analysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform all existing methods on the two problems, over real-world benchmarks.
Supplemental Material
- 2003. T. J. Watson Libraries for Analysis (WALA). https://github.com.Google Scholar
- 2008. GitHub Home. https://github.com.Google Scholar
- 2008. SPECjvm2008 Benchmark Suit. http://www.spec.org/jvm2008/.Google Scholar
- Amir Abboud and Virginia Vassilevska Williams. 2014. Popular Conjectures Imply Strong Lower Bounds for Dynamic Problems. In FOCS. 434–443. Google ScholarDigital Library
- Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees . Discrete Appl Math (1989).Google Scholar
- Robert S. Arnold. 1996. Software Change Impact Analysis. IEEE Computer Society Press, Los Alamitos, CA, USA.Google ScholarDigital Library
- Lech Banachowski. 1980. A complement to Tarjan’s result about the lower bound on the complexity of the set union problem. Inform. Process. Lett. 11, 2 (1980), 59 – 65. Google ScholarCross Ref
- M.W Bern, E.L Lawler, and A.L Wong. 1987. Linear-time computation of optimal subgraphs of decomposable graphs. J Algorithm (1987).Google Scholar
- Umberto Bertele and Francesco Brioschi. 1972. Nonserial Dynamic Programming. Academic Press, Inc., Orlando, FL, USA.Google ScholarDigital Library
- Stephen M. et al. Blackburn. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA.Google Scholar
- Eric Bodden. 2012. Inter-procedural Data-flow Analysis with IFDS/IDE and Soot. In SOAP. ACM, New York, NY, USA. Google ScholarDigital Library
- HansL. Bodlaender and Torben Hagerup. 1995. Parallel algorithms with optimal speedup for bounded treewidth. Vol. 27. 1725–1746. Google ScholarCross Ref
- Hans L. Bodlaender. 1988. Dynamic programming on graphs with bounded treewidth. In ICALP. Springer. Google ScholarCross Ref
- Hans L. Bodlaender. 1998. A partial k-arboretum of graphs with bounded treewidth. TCS (1998).Google Scholar
- Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2017. Optimal Dyck Reachability for Data-dependence and Alias Analysis. Technical Report. IST Austria. https://repository.ist.ac.at/id/eprint/870Google Scholar
- Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016a. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. In POPL. 733–747. Google ScholarDigital Library
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Prateesh Goyal, and Andreas Pavlogiannis. 2015b. Faster Algorithms for Algebraic Path Properties in Recursive State Machines with Constant Treewidth. In POPL. Google ScholarDigital Library
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2015a. Faster Algorithms for Quantitative Verification in Constant Treewidth Graphs. In CAV. Google ScholarCross Ref
- Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016b. Optimal Reachability and a Space-Time Tradeoff for Distance Queries in Constant-Treewidth Graphs. In 24th Annual European Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark. 28:1–28:17.Google Scholar
- Swarat Chaudhuri. 2008. Subcubic Algorithms for Recursive State Machines. In POPL. ACM, New York, NY, USA. Google ScholarDigital Library
- Shiva Chaudhuri and Christos D. Zaroliagis. 1995. Shortest Paths in Digraphs of Small Treewidth. Part I: Sequential Algorithms. Algorithmica (1995).Google Scholar
- Jong-Deok Choi, Michael Burke, and Paul Carini. 1993. Efficient Flow-sensitive Interprocedural Computation of Pointerinduced Aliases and Side Effects. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’93). ACM, 232–245. Google ScholarDigital Library
- T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. 2001. Introduction To Algorithms. MIT Press.Google Scholar
- Jon Doyle and Ronald L. Rivest. 1976. Linear expected time of a simple union-find algorithm. Inform. Process. Lett. 5, 5 (1976), 146 – 148. Google ScholarCross Ref
- Harold N. Gabow and Robert Endre Tarjan. 1985. A linear-time algorithm for a special case of disjoint set union. J. Comput. System Sci. 30, 2 (1985), 209 – 221. Google ScholarCross Ref
- Zvi Galil and Giuseppe F. Italiano. 1991. Data Structures and Algorithms for Disjoint Set Union Problems. ACM Comput. Surv. 23, 3 (1991), 319–344. Google ScholarDigital Library
- Jens Gustedt, OleA. Mæhle, and JanArne Telle. 2002. The Treewidth of Java Programs. In Algorithm Engineering and Experiments. Springer.Google Scholar
- Nevin Heintze and David McAllester. 1997. On the Cubic Bottleneck in Subtyping and Flow Analysis. In Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science (LICS ’97). IEEE Computer Society, Washington, DC, USA, 342–. http://dl.acm.org/citation.cfm?id=788019.788876 Google ScholarCross Ref
- Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. 2015. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In STOC. 21–30. Google ScholarDigital Library
- Michael Hind. 2001. Pointer Analysis: Haven’T We Solved This Problem Yet?. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’01). ACM, 54–61. Google ScholarDigital Library
- Susan Horwitz. 1997. Precise Flow-insensitive May-alias Analysis is NP-hard. ACM Trans. Program. Lang. Syst. 19, 1 (1997), 1–6. Google ScholarDigital Library
- D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. 1981. Dependence Graphs and Compiler Optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 207–218. Google ScholarDigital Library
- William Landi and Barbara G. Ryder. 1992. A Safe Approximate Algorithm for Interprocedural Aliasing. In Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation (PLDI ’92). ACM, 235–248. Google ScholarDigital Library
- François Le Gall. 2014. Powers of Tensors and Fast Matrix Multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation (ISSAC). 296–303. Google ScholarDigital Library
- Lillian Lee. 2002. Fast Context-free Grammar Parsing Requires Fast Boolean Matrix Multiplication. J. ACM 49, 1 (2002), 1–15. Google ScholarDigital Library
- Ondřej Lhoták and Laurie Hendren. 2006. Context-Sensitive Points-to Analysis: Is It Worth It?. In Proceedings of the 15th International Conference on Compiler Construction (CC). 47–64. Google ScholarDigital Library
- Vijay Krishna Palepu, Guoqing Xu, and James A. Jones. 2017. Dynamic Dependence Summaries. ACM Trans. Softw. Eng. Methodol. 25, 4 (2017), 30:1–30:41.Google ScholarDigital Library
- G. Ramalingam. 1994. The Undecidability of Aliasing. ACM Trans. Program. Lang. Syst. 16, 5 (1994), 1467–1471. Google ScholarDigital Library
- Jakob Rehof and Manuel Fähndrich. 2001. Type-base Flow Analysis: From Polymorphic Subtyping to CFL-reachability. In Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 54–66. Google ScholarDigital Library
- Thomas Reps. 1995. Shape Analysis As a Generalized Path Problem. In Proceedings of the 1995 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM ’95). ACM, 1–11. Google ScholarDigital Library
- Thomas Reps. 1997. Program Analysis via Graph Reachability. In Proceedings of the 1997 International Symposium on Logic Programming (ILPS). 5–19.Google ScholarDigital Library
- Thomas Reps. 2000. Undecidability of Context-sensitive Data-dependence Analysis. ACM Trans. Program. Lang. Syst. 22, 1 (2000), 162–186. Google ScholarDigital Library
- Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In Proceedings of the 22Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’95). ACM, 49–61. Google ScholarDigital Library
- Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding Up Slicing. SIGSOFT Softw. Eng. Notes 19, 5 (1994), 11–20. Google ScholarDigital Library
- Neil Robertson and P.D Seymour. 1984. Graph minors. III. Planar tree-width. Journal of Combinatorial Theory, Series B (1984).Google Scholar
- Lei Shang, Xinwei Xie, and Jingling Xue. 2012. On-demand Dynamic Summary-based Points-to Analysis. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO ’12). ACM, 264–274. Google ScholarDigital Library
- Manu Sridharan and Rastislav Bodík. 2006. Refinement-based Context-sensitive Points-to Analysis for Java. SIGPLAN Not. 41, 6 (2006), 387–400. Google ScholarDigital Library
- Manu Sridharan, Satish Chandra, Julian Dolby, Stephen J. Fink, and Eran Yahav. 2013. Aliasing in Object-Oriented Programming. Chapter Alias Analysis for Object-oriented Programs, 196–232.Google Scholar
- Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven Points-to Analysis for Java. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’05). ACM, 59–76. Google ScholarDigital Library
- Volker Strassen. 1969. Gaussian Elimination is Not Optimal. Numer. Math. 13, 4 (1969), 354–356. Google ScholarDigital Library
- Hao Tang, Xiaoyin Wang, Lingming Zhang, Bing Xie, Lu Zhang, and Hong Mei. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 83–95. Google ScholarDigital Library
- Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (April 1975), 215–225. Google ScholarDigital Library
- Robert Endre Tarjan. 1979. A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. System Sci. 18, 2 (1979), 110 – 127. Google ScholarCross Ref
- Mikkel Thorup. 1998. All Structured Programs Have Small Tree Width and Good Register Allocation. Information and Computation (1998).Google Scholar
- Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecode optimization framework. In CASCON ’99. IBM Press.Google Scholar
- Thomas van Dijk, Jan-Pieter van den Heuvel, and Wouter Slob. 2006. Computing treewidth with LibTW. Technical Report. University of Utrecht.Google Scholar
- Virginia Vassilevska Williams and Ryan Williams. 2010. Subcubic Equivalences between Path, Matrix and Triangle Problems. In FOCS. 645–654.Google Scholar
- Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010. Finding Low-utility Data Structures. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 174–186. Google ScholarDigital Library
- Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-Reachability-Based Points-To Analysis Using ContextSensitive Must-Not-Alias Analysis. Springer Berlin Heidelberg, 98–122.Google Scholar
- Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011a. Demand-driven Context-sensitive Alias Analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA ’11). ACM, 155–165.Google ScholarDigital Library
- Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011b. Demand-driven Context-sensitive Alias Analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA). 155–165. Google ScholarDigital Library
- Mihalis Yannakakis. 1990. Graph-theoretic Methods in Database Theory. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 230–242. Google ScholarDigital Library
- Andrew C. Yao. 1985. On the Expected Performance of Path Compression Algorithms. SIAM J. Comput. 14, 1 (1985), 129–133. Google ScholarDigital Library
- Hao Yuan and Patrick Eugster. 2009. An Efficient Algorithm for Solving the Dyck-CFL Reachability Problem on Trees. In Proceedings of the 18th European Symposium on Programming Languages and Systems: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (ESOP). 175–189. Google ScholarDigital Library
- Qirun Zhang, Michael R. Lyu, Hao Yuan, and Zhendong Su. 2013. Fast Algorithms for Dyck-CFL-reachability with Applications to Alias Analysis (PLDI). ACM.Google Scholar
- Qirun Zhang and Zhendong Su. 2017. Context-sensitive Data-dependence Analysis via Linear Conjunctive Language Reachability. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). 344–358. Google ScholarDigital Library
- Xin Zheng and Radu Rugina. 2008. Demand-driven Alias Analysis for C. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’08). ACM, 197–208. Google ScholarDigital Library
Index Terms
- Optimal Dyck reachability for data-dependence and alias analysis
Recommendations
Fast algorithms for Dyck-CFL-reachability with applications to alias analysis
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationThe context-free language (CFL) reachability problem is a well-known fundamental formulation in program analysis. In practice, many program analyses, especially pointer analyses, adopt a restricted version of CFL-reachability, Dyck-CFL-reachability, and ...
Demand-driven alias analysis for C
POPL '08This paper presents a demand-driven, flow-insensitive analysisalgorithm for answering may-alias queries. We formulate thecomputation of alias queries as a CFL-reachability problem, and use this formulation to derive a demand-driven analysis algorithm. ...
Demand-driven alias analysis for C
POPL '08: Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesThis paper presents a demand-driven, flow-insensitive analysisalgorithm for answering may-alias queries. We formulate thecomputation of alias queries as a CFL-reachability problem, and use this formulation to derive a demand-driven analysis algorithm. ...
Comments