skip to main content
research-article

Bias-variance tradeoffs in program analysis

Published: 08 January 2014 Publication History

Abstract

It is often the case that increasing the precision of a program analysis leads to worse results. It is our thesis that this phenomenon is the result of fundamental limits on the ability to use precise abstract domains as the basis for inferring strong invariants of programs. We show that bias-variance tradeoffs, an idea from learning theory, can be used to explain why more precise abstractions do not necessarily lead to better results and also provides practical techniques for coping with such limitations. Learning theory captures precision using a combinatorial quantity called the VC dimension. We compute the VC dimension for different abstractions and report on its usefulness as a precision metric for program analyses. We evaluate cross validation, a technique for addressing bias-variance tradeoffs, on an industrial strength program verification tool called YOGI. The tool produced using cross validation has significantly better running time, finds new defects, and has fewer time-outs than the current production version. Finally, we make some recommendations for tackling bias-variance tradeoffs in program analysis.

Supplementary Material

JPG File (d1_left_t7.jpg)
MP4 File (d1_left_t7.mp4)

References

[1]
G. Amato, M. Parton, and F. Scozzari. Discovering invariants via simple component analysis. J. Symb. Comput., 47(12):1533--1560, 2012.
[2]
S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40--79, 2010.
[3]
R. Bagnara, P. M. Hill, E. Ricci, and E. Zaffanella. Precise widening operators for convex polyhedra. Sci. Comput. Program., 58(1-2):28--56, 2005.
[4]
D. Beyer. Second competition on software verification - (summary of SV-COMP 2013). In TACAS, pages 594--609, 2013.
[5]
D. Beyer, T. A. Henzinger, and G. Théoduloz. Program analysis with dynamic precision adjustment. In ASE, pages 29--38, 2008.
[6]
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., 2006. ISBN 0387310738.
[7]
N. Bjørner, K. L. McMillan, and A. Rybalchenko. On solving universally quantified horn clauses. In SAS, pages 105--125, 2013.
[8]
A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. J. ACM, 36(4):929--965, 1989.
[9]
N. H. Bshouty, S. A. Goldman, H. D. Mathias, S. Suri, and H. Tamaki. Noise-tolerant distribution-free learning of general geometric concepts. J. ACM, 45(5):863--890, 1998.
[10]
C. Calcagno, D. Distefano, P. W. O'Hearn, and H. Yang. Compositional shape analysis by means of bi-abduction. In POPL, pages 289--300, 2009.
[11]
G. C. Cawley and N. L. C. Talbot. On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11:2079--2107, 2010.
[12]
S. Chaki, E. M. Clarke, A. Groce, and O. Strichman. Predicate abstraction with minimum predicates. In CHARME, pages 19--34, 2003.
[13]
E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In CAV, pages 154--169, 2000.
[14]
E. M. Clarke, D. Kroening, N. Sharygina, and K. Yorav. Predicate abstraction of ansi-c programs using sat. Formal Methods in System Design, 25(2-3):105--127, 2004.
[15]
M. Colón, S. Sankaranarayanan, and H. Sipma. Linear invariant generation using non-linear constraint solving. In CAV, pages 420--432, 2003.
[16]
P. Cousot and R. Cousot. Static determination of dynamic properties of programs. In ISOP, pages 106--130, 1976.
[17]
P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238--252, 1977.
[18]
P. Cousot and R. Cousot. Comparing the Galois connection and widening/narrowing approaches to abstract interpretation. In PLILP, pages 269--295, 1992.
[19]
P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In POPL, pages 84--96, 1978.
[20]
P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, and X. Rival. Why does Astrée scale up? Formal Methods in System Design, 35(3): 229--264, 2009.
[21]
P. Domingos. A few useful things to know about machine learning. Commun. ACM, 55(10):78--87, 2012.
[22]
T. Gawlitza and H. Seidl. Precise fixpoint computation through strategy iteration. In ESOP, pages 300--315, 2007.
[23]
T. Gawlitza and H. Seidl. Precise relational invariants through strategy iteration. In CSL, pages 23--40, 2007.
[24]
S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1--58, 1992.
[25]
P. Godefroid, A. V. Nori, S. K. Rajamani, and S. Tetali. Compositional may-must program analysis: unleashing the power of alternation. In POPL, pages 43--56, 2010.
[26]
B. S. Gulavani and S. Gulwani. A numerical abstract domain based on expression abstraction and max operator with application in timing analysis. In CAV, pages 370--384, 2008.
[27]
S. Gulwani, S. Srivastava, and R. Venkatesan. Program analysis as constraint solving. In PLDI, pages 281--292, 2008.
[28]
A. Gupta, R. Majumdar, and A. Rybalchenko. From tests to proofs. In TACAS, pages 262--276, 2009.
[29]
J. Henry, D. Monniaux, and M. Moy. Pagai: A path sensitive static analyser. Electr. Notes Theor. Comput. Sci., 289:15--25, 2012.
[30]
J. Henry, D. Monniaux, and M. Moy. Succinct representations for abstract interpretation - combined analysis algorithms and experimental evaluation. In SAS, pages 283--299, 2012.
[31]
T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMillan. Abstractions from proofs. In POPL, pages 232--244, 2004.
[32]
R. Jhala and K. L. McMillan. A practical and complete approach to predicate refinement. In TACAS, pages 459--473, 2006.
[33]
M. Kearns and D. Ron. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11:152--162, 1997.
[34]
M. J. Kearns and U. V. Vazirani. An introduction to computational learning theory. MIT Press, Cambridge, MA, USA, 1994. ISBN 0-262-11193-4.
[35]
G. Klein, J. Andronick, K. Elphinstone, G. Heiser, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. seL4: formal verification of an operatingsystem kernel. Commun. ACM, 53(6):107--115, 2010.
[36]
G. Lalire, M. Argoud, and B. Jeannet. The Interproc Analyzer. http://pop-art.inrialpes.fr/people/bjeannet/bjeannetforge/interproc/index.html.
[37]
P. Liang, O. Tripp, and M. Naik. Learning minimal abstractions. In POPL, pages 31--42, 2011.
[38]
K. L. McMillan. An interpolating theorem prover. Theoretical Computer Science, 345(1):101--121, 2005.
[39]
A. Minè. The octagon abstract domain. Higher-Order and Symbolic Computation, 19(1):31--100, 2006.
[40]
D. Monniaux and L. Gonnord. Using bounded model checking to focus fixpoint iterations. In SAS, pages 369--385, 2011.
[41]
D. Monniaux and J. L. Guen. Stratified static analysis based on variable dependencies. Electr. Notes Theor. Comput. Sci., 288:61--74, 2012.
[42]
A. Y. Ng. Preventing "overfitting" of cross-validation data. In ICML, pages 245--253, 1997.
[43]
A. V. Nori and S. K. Rajamani. An empirical study of optimizations in YOGI. In ICSE (1), pages 355--364, 2010.
[44]
J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In LICS, pages 55--74, 2002.
[45]
S. Sankaranarayanan, H. B. Sipma, and Z. Manna. Scalable analysis of linear systems using mathematical programming. In VMCAI, pages 25--41, 2005.
[46]
S. Sankaranarayanan, F. Ivancic, I. Shlyakhter, and A. Gupta. Static analysis in disjunctive numerical domains. In SAS, pages 3--17, 2006.
[47]
R. Sharma, S. Gupta, B. Hariharan, A. Aiken, and A. V. Nori. Verification as learning geometric concepts. In SAS, pages 388--411, 2013.
[48]
L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11): 1134--1142, 1984.
[49]
X. Zhang, M. Naik, and H. Yang. Finding optimum abstractions in parametric dataflow analysis. In PLDI, pages 365--376, 2013.

Cited By

View all
  • (2024)A Hierarchical-Based Learning Approach for Multi-Action Intent RecognitionSensors10.3390/s2423785724:23(7857)Online publication date: 9-Dec-2024
  • (2023)Heart and brain traumatic stress biomarker analysis with and without machine learning: A scoping reviewInternational Journal of Psychophysiology10.1016/j.ijpsycho.2023.01.009185(27-49)Online publication date: Mar-2023
  • (2022)Leveraging power consumption for anomaly detection on IoT devices in smart homesJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-022-04110-614:10(14045-14056)Online publication date: 23-Jun-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 49, Issue 1
POPL '14
January 2014
661 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2578855
Issue’s Table of Contents
  • cover image ACM Conferences
    POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
    January 2014
    702 pages
    ISBN:9781450325448
    DOI:10.1145/2535838
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2014
Published in SIGPLAN Volume 49, Issue 1

Check for updates

Author Tags

  1. machine learning
  2. program analysis
  3. verification

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)2
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Hierarchical-Based Learning Approach for Multi-Action Intent RecognitionSensors10.3390/s2423785724:23(7857)Online publication date: 9-Dec-2024
  • (2023)Heart and brain traumatic stress biomarker analysis with and without machine learning: A scoping reviewInternational Journal of Psychophysiology10.1016/j.ijpsycho.2023.01.009185(27-49)Online publication date: Mar-2023
  • (2022)Leveraging power consumption for anomaly detection on IoT devices in smart homesJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-022-04110-614:10(14045-14056)Online publication date: 23-Jun-2022
  • (2020)Empirical AbstractionRuntime Verification10.1007/978-3-030-60508-7_14(259-278)Online publication date: 6-Oct-2020
  • (2023)Demystifying Template-Based Invariant Generation for Bit-Vector Programs2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00069(673-685)Online publication date: 11-Sep-2023
  • (2022)Abstract interpretation repairProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523453(426-441)Online publication date: 9-Jun-2022
  • (2020)First-order quantified separatorsProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386018(703-717)Online publication date: 11-Jun-2020
  • (2020)Learning fast and precise numerical analysisProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386016(1112-1127)Online publication date: 11-Jun-2020
  • (2019)Overfitting in Synthesis: Theory and PracticeComputer Aided Verification10.1007/978-3-030-25540-4_17(315-334)Online publication date: 12-Jul-2019
  • (2017)Verifying Reliability Properties Using the Hyperball Abstract DomainACM Transactions on Programming Languages and Systems10.1145/315601740:1(1-29)Online publication date: 19-Dec-2017
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media