skip to main content
research-article
Open Access
Artifacts Evaluated & Functional

Optimal Dyck reachability for data-dependence and alias analysis

Published:27 December 2017Publication History
Skip Abstract Section

Abstract

A fundamental algorithmic problem at the heart of static analysis is Dyck reachability. The input is a graph where the edges are labeled with different types of opening and closing parentheses, and the reachability information is computed via paths whose parentheses are properly matched. We present new results for Dyck reachability problems with applications to alias analysis and data-dependence analysis. Our main contributions, that include improved upper bounds as well as lower bounds that establish optimality guarantees, are as follows:

First, we consider Dyck reachability on bidirected graphs, which is the standard way of performing field-sensitive points-to analysis. Given a bidirected graph with n nodes and m edges, we present: (i) an algorithm with worst-case running time O(m + n · α(n)), where α(n) is the inverse Ackermann function, improving the previously known O(n2) time bound; (ii) a matching lower bound that shows that our algorithm is optimal wrt to worst-case complexity; and (iii) an optimal average-case upper bound of O(m) time, improving the previously known O(m · logn) bound.

Second, we consider the problem of context-sensitive data-dependence analysis, where the task is to obtain analysis summaries of library code in the presence of callbacks. Our algorithm preprocesses libraries in almost linear time, after which the contribution of the library in the complexity of the client analysis is only linear, and only wrt the number of call sites.

Third, we prove that combinatorial algorithms for Dyck reachability on general graphs with truly sub-cubic bounds cannot be obtained without obtaining sub-cubic combinatorial algorithms for Boolean Matrix Multiplication, which is a long-standing open problem. Thus we establish that the existing combinatorial algorithms for Dyck reachability are (conditionally) optimal for general graphs. We also show that the same hardness holds for graphs of constant treewidth.

Finally, we provide a prototype implementation of our algorithms for both alias analysis and data-dependence analysis. Our experimental evaluation demonstrates that the new algorithms significantly outperform all existing methods on the two problems, over real-world benchmarks.

Skip Supplemental Material Section

Supplemental Material

optimaldyckreachability.webm

webm

88.4 MB

References

  1. 2003. T. J. Watson Libraries for Analysis (WALA). https://github.com.Google ScholarGoogle Scholar
  2. 2008. GitHub Home. https://github.com.Google ScholarGoogle Scholar
  3. 2008. SPECjvm2008 Benchmark Suit. http://www.spec.org/jvm2008/.Google ScholarGoogle Scholar
  4. Amir Abboud and Virginia Vassilevska Williams. 2014. Popular Conjectures Imply Strong Lower Bounds for Dynamic Problems. In FOCS. 434–443. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Stefan Arnborg and Andrzej Proskurowski. 1989. Linear time algorithms for NP-hard problems restricted to partial k-trees . Discrete Appl Math (1989).Google ScholarGoogle Scholar
  6. Robert S. Arnold. 1996. Software Change Impact Analysis. IEEE Computer Society Press, Los Alamitos, CA, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Lech Banachowski. 1980. A complement to Tarjan’s result about the lower bound on the complexity of the set union problem. Inform. Process. Lett. 11, 2 (1980), 59 – 65. Google ScholarGoogle ScholarCross RefCross Ref
  8. M.W Bern, E.L Lawler, and A.L Wong. 1987. Linear-time computation of optimal subgraphs of decomposable graphs. J Algorithm (1987).Google ScholarGoogle Scholar
  9. Umberto Bertele and Francesco Brioschi. 1972. Nonserial Dynamic Programming. Academic Press, Inc., Orlando, FL, USA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stephen M. et al. Blackburn. 2006. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In OOPSLA.Google ScholarGoogle Scholar
  11. Eric Bodden. 2012. Inter-procedural Data-flow Analysis with IFDS/IDE and Soot. In SOAP. ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. HansL. Bodlaender and Torben Hagerup. 1995. Parallel algorithms with optimal speedup for bounded treewidth. Vol. 27. 1725–1746. Google ScholarGoogle ScholarCross RefCross Ref
  13. Hans L. Bodlaender. 1988. Dynamic programming on graphs with bounded treewidth. In ICALP. Springer. Google ScholarGoogle ScholarCross RefCross Ref
  14. Hans L. Bodlaender. 1998. A partial k-arboretum of graphs with bounded treewidth. TCS (1998).Google ScholarGoogle Scholar
  15. Krishnendu Chatterjee, Bhavya Choudhary, and Andreas Pavlogiannis. 2017. Optimal Dyck Reachability for Data-dependence and Alias Analysis. Technical Report. IST Austria. https://repository.ist.ac.at/id/eprint/870Google ScholarGoogle Scholar
  16. Krishnendu Chatterjee, Amir Kafshdar Goharshady, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016a. Algorithms for algebraic path properties in concurrent systems of constant treewidth components. In POPL. 733–747. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, Prateesh Goyal, and Andreas Pavlogiannis. 2015b. Faster Algorithms for Algebraic Path Properties in Recursive State Machines with Constant Treewidth. In POPL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2015a. Faster Algorithms for Quantitative Verification in Constant Treewidth Graphs. In CAV. Google ScholarGoogle ScholarCross RefCross Ref
  19. Krishnendu Chatterjee, Rasmus Ibsen-Jensen, and Andreas Pavlogiannis. 2016b. Optimal Reachability and a Space-Time Tradeoff for Distance Queries in Constant-Treewidth Graphs. In 24th Annual European Symposium on Algorithms, ESA 2016, August 22-24, 2016, Aarhus, Denmark. 28:1–28:17.Google ScholarGoogle Scholar
  20. Swarat Chaudhuri. 2008. Subcubic Algorithms for Recursive State Machines. In POPL. ACM, New York, NY, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Shiva Chaudhuri and Christos D. Zaroliagis. 1995. Shortest Paths in Digraphs of Small Treewidth. Part I: Sequential Algorithms. Algorithmica (1995).Google ScholarGoogle Scholar
  22. Jong-Deok Choi, Michael Burke, and Paul Carini. 1993. Efficient Flow-sensitive Interprocedural Computation of Pointerinduced Aliases and Side Effects. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’93). ACM, 232–245. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. 2001. Introduction To Algorithms. MIT Press.Google ScholarGoogle Scholar
  24. Jon Doyle and Ronald L. Rivest. 1976. Linear expected time of a simple union-find algorithm. Inform. Process. Lett. 5, 5 (1976), 146 – 148. Google ScholarGoogle ScholarCross RefCross Ref
  25. Harold N. Gabow and Robert Endre Tarjan. 1985. A linear-time algorithm for a special case of disjoint set union. J. Comput. System Sci. 30, 2 (1985), 209 – 221. Google ScholarGoogle ScholarCross RefCross Ref
  26. Zvi Galil and Giuseppe F. Italiano. 1991. Data Structures and Algorithms for Disjoint Set Union Problems. ACM Comput. Surv. 23, 3 (1991), 319–344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Jens Gustedt, OleA. Mæhle, and JanArne Telle. 2002. The Treewidth of Java Programs. In Algorithm Engineering and Experiments. Springer.Google ScholarGoogle Scholar
  28. Nevin Heintze and David McAllester. 1997. On the Cubic Bottleneck in Subtyping and Flow Analysis. In Proceedings of the 12th Annual IEEE Symposium on Logic in Computer Science (LICS ’97). IEEE Computer Society, Washington, DC, USA, 342–. http://dl.acm.org/citation.cfm?id=788019.788876 Google ScholarGoogle ScholarCross RefCross Ref
  29. Monika Henzinger, Sebastian Krinninger, Danupon Nanongkai, and Thatchaphol Saranurak. 2015. Unifying and Strengthening Hardness for Dynamic Problems via the Online Matrix-Vector Multiplication Conjecture. In STOC. 21–30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Michael Hind. 2001. Pointer Analysis: Haven’T We Solved This Problem Yet?. In Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE ’01). ACM, 54–61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Susan Horwitz. 1997. Precise Flow-insensitive May-alias Analysis is NP-hard. ACM Trans. Program. Lang. Syst. 19, 1 (1997), 1–6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. J. Kuck, R. H. Kuhn, D. A. Padua, B. Leasure, and M. Wolfe. 1981. Dependence Graphs and Compiler Optimizations. In Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 207–218. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. William Landi and Barbara G. Ryder. 1992. A Safe Approximate Algorithm for Interprocedural Aliasing. In Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation (PLDI ’92). ACM, 235–248. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. François Le Gall. 2014. Powers of Tensors and Fast Matrix Multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation (ISSAC). 296–303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Lillian Lee. 2002. Fast Context-free Grammar Parsing Requires Fast Boolean Matrix Multiplication. J. ACM 49, 1 (2002), 1–15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Ondřej Lhoták and Laurie Hendren. 2006. Context-Sensitive Points-to Analysis: Is It Worth It?. In Proceedings of the 15th International Conference on Compiler Construction (CC). 47–64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Vijay Krishna Palepu, Guoqing Xu, and James A. Jones. 2017. Dynamic Dependence Summaries. ACM Trans. Softw. Eng. Methodol. 25, 4 (2017), 30:1–30:41.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Ramalingam. 1994. The Undecidability of Aliasing. ACM Trans. Program. Lang. Syst. 16, 5 (1994), 1467–1471. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jakob Rehof and Manuel Fähndrich. 2001. Type-base Flow Analysis: From Polymorphic Subtyping to CFL-reachability. In Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 54–66. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Thomas Reps. 1995. Shape Analysis As a Generalized Path Problem. In Proceedings of the 1995 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (PEPM ’95). ACM, 1–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Thomas Reps. 1997. Program Analysis via Graph Reachability. In Proceedings of the 1997 International Symposium on Logic Programming (ILPS). 5–19.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Thomas Reps. 2000. Undecidability of Context-sensitive Data-dependence Analysis. ACM Trans. Program. Lang. Syst. 22, 1 (2000), 162–186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Thomas Reps, Susan Horwitz, and Mooly Sagiv. 1995. Precise Interprocedural Dataflow Analysis via Graph Reachability. In Proceedings of the 22Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’95). ACM, 49–61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Thomas Reps, Susan Horwitz, Mooly Sagiv, and Genevieve Rosay. 1994. Speeding Up Slicing. SIGSOFT Softw. Eng. Notes 19, 5 (1994), 11–20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Neil Robertson and P.D Seymour. 1984. Graph minors. III. Planar tree-width. Journal of Combinatorial Theory, Series B (1984).Google ScholarGoogle Scholar
  46. Lei Shang, Xinwei Xie, and Jingling Xue. 2012. On-demand Dynamic Summary-based Points-to Analysis. In Proceedings of the Tenth International Symposium on Code Generation and Optimization (CGO ’12). ACM, 264–274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Manu Sridharan and Rastislav Bodík. 2006. Refinement-based Context-sensitive Points-to Analysis for Java. SIGPLAN Not. 41, 6 (2006), 387–400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Manu Sridharan, Satish Chandra, Julian Dolby, Stephen J. Fink, and Eran Yahav. 2013. Aliasing in Object-Oriented Programming. Chapter Alias Analysis for Object-oriented Programs, 196–232.Google ScholarGoogle Scholar
  49. Manu Sridharan, Denis Gopan, Lexin Shan, and Rastislav Bodík. 2005. Demand-driven Points-to Analysis for Java. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications (OOPSLA ’05). ACM, 59–76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Volker Strassen. 1969. Gaussian Elimination is Not Optimal. Numer. Math. 13, 4 (1969), 354–356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Hao Tang, Xiaoyin Wang, Lingming Zhang, Bing Xie, Lu Zhang, and Hong Mei. 2015. Summary-Based Context-Sensitive Data-Dependence Analysis in Presence of Callbacks. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL). 83–95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (April 1975), 215–225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Robert Endre Tarjan. 1979. A class of algorithms which require nonlinear time to maintain disjoint sets. J. Comput. System Sci. 18, 2 (1979), 110 – 127. Google ScholarGoogle ScholarCross RefCross Ref
  54. Mikkel Thorup. 1998. All Structured Programs Have Small Tree Width and Good Register Allocation. Information and Computation (1998).Google ScholarGoogle Scholar
  55. Raja Vallée-Rai, Phong Co, Etienne Gagnon, Laurie Hendren, Patrick Lam, and Vijay Sundaresan. 1999. Soot - a Java bytecode optimization framework. In CASCON ’99. IBM Press.Google ScholarGoogle Scholar
  56. Thomas van Dijk, Jan-Pieter van den Heuvel, and Wouter Slob. 2006. Computing treewidth with LibTW. Technical Report. University of Utrecht.Google ScholarGoogle Scholar
  57. Virginia Vassilevska Williams and Ryan Williams. 2010. Subcubic Equivalences between Path, Matrix and Triangle Problems. In FOCS. 645–654.Google ScholarGoogle Scholar
  58. Guoqing Xu, Nick Mitchell, Matthew Arnold, Atanas Rountev, Edith Schonberg, and Gary Sevitsky. 2010. Finding Low-utility Data Structures. In Proceedings of the 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 174–186. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Guoqing Xu, Atanas Rountev, and Manu Sridharan. 2009. Scaling CFL-Reachability-Based Points-To Analysis Using ContextSensitive Must-Not-Alias Analysis. Springer Berlin Heidelberg, 98–122.Google ScholarGoogle Scholar
  60. Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011a. Demand-driven Context-sensitive Alias Analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA ’11). ACM, 155–165.Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Dacong Yan, Guoqing Xu, and Atanas Rountev. 2011b. Demand-driven Context-sensitive Alias Analysis for Java. In Proceedings of the 2011 International Symposium on Software Testing and Analysis (ISSTA). 155–165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Mihalis Yannakakis. 1990. Graph-theoretic Methods in Database Theory. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS). 230–242. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Andrew C. Yao. 1985. On the Expected Performance of Path Compression Algorithms. SIAM J. Comput. 14, 1 (1985), 129–133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Hao Yuan and Patrick Eugster. 2009. An Efficient Algorithm for Solving the Dyck-CFL Reachability Problem on Trees. In Proceedings of the 18th European Symposium on Programming Languages and Systems: Held As Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009 (ESOP). 175–189. Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Qirun Zhang, Michael R. Lyu, Hao Yuan, and Zhendong Su. 2013. Fast Algorithms for Dyck-CFL-reachability with Applications to Alias Analysis (PLDI). ACM.Google ScholarGoogle Scholar
  66. Qirun Zhang and Zhendong Su. 2017. Context-sensitive Data-dependence Analysis via Linear Conjunctive Language Reachability. In Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages (POPL). 344–358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Xin Zheng and Radu Rugina. 2008. Demand-driven Alias Analysis for C. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’08). ACM, 197–208. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimal Dyck reachability for data-dependence and alias analysis

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader