research-article

Bias-variance tradeoffs in program analysis

Authors:

Aditya V. Nori,

Alex AikenAuthors Info & Claims

ACM SIGPLAN Notices, Volume 49, Issue 1

Pages 127 - 137

https://doi.org/10.1145/2578855.2535853

Published: 08 January 2014 Publication History

Abstract

It is often the case that increasing the precision of a program analysis leads to worse results. It is our thesis that this phenomenon is the result of fundamental limits on the ability to use precise abstract domains as the basis for inferring strong invariants of programs. We show that bias-variance tradeoffs, an idea from learning theory, can be used to explain why more precise abstractions do not necessarily lead to better results and also provides practical techniques for coping with such limitations. Learning theory captures precision using a combinatorial quantity called the VC dimension. We compute the VC dimension for different abstractions and report on its usefulness as a precision metric for program analyses. We evaluate cross validation, a technique for addressing bias-variance tradeoffs, on an industrial strength program verification tool called YOGI. The tool produced using cross validation has significantly better running time, finds new defects, and has fewer time-outs than the current production version. Finally, we make some recommendations for tackling bias-variance tradeoffs in program analysis.

Supplementary Material

JPG File (d1_left_t7.jpg)

Download
15.82 KB

MP4 File (d1_left_t7.mp4)

Download
222.82 MB

References

[1]

G. Amato, M. Parton, and F. Scozzari. Discovering invariants via simple component analysis. J. Symb. Comput., 47(12):1533--1560, 2012.

Digital Library

[2]

S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection. Statistics Surveys, 4:40--79, 2010.

[3]

R. Bagnara, P. M. Hill, E. Ricci, and E. Zaffanella. Precise widening operators for convex polyhedra. Sci. Comput. Program., 58(1-2):28--56, 2005.

Digital Library

[4]

D. Beyer. Second competition on software verification - (summary of SV-COMP 2013). In TACAS, pages 594--609, 2013.

Digital Library

[5]

D. Beyer, T. A. Henzinger, and G. Théoduloz. Program analysis with dynamic precision adjustment. In ASE, pages 29--38, 2008.

Digital Library

[6]

C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., 2006. ISBN 0387310738.

Digital Library

[7]

N. Bjørner, K. L. McMillan, and A. Rybalchenko. On solving universally quantified horn clauses. In SAS, pages 105--125, 2013.

[8]

A. Blumer, A. Ehrenfeucht, D. Haussler, and M. K. Warmuth. Learnability and the Vapnik-Chervonenkis dimension. J. ACM, 36(4):929--965, 1989.

Digital Library

[9]

N. H. Bshouty, S. A. Goldman, H. D. Mathias, S. Suri, and H. Tamaki. Noise-tolerant distribution-free learning of general geometric concepts. J. ACM, 45(5):863--890, 1998.

Digital Library

[10]

C. Calcagno, D. Distefano, P. W. O'Hearn, and H. Yang. Compositional shape analysis by means of bi-abduction. In POPL, pages 289--300, 2009.

Digital Library

[11]

G. C. Cawley and N. L. C. Talbot. On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11:2079--2107, 2010.

Digital Library

[12]

S. Chaki, E. M. Clarke, A. Groce, and O. Strichman. Predicate abstraction with minimum predicates. In CHARME, pages 19--34, 2003.

[13]

E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In CAV, pages 154--169, 2000.

Digital Library

[14]

E. M. Clarke, D. Kroening, N. Sharygina, and K. Yorav. Predicate abstraction of ansi-c programs using sat. Formal Methods in System Design, 25(2-3):105--127, 2004.

Digital Library

[15]

M. Colón, S. Sankaranarayanan, and H. Sipma. Linear invariant generation using non-linear constraint solving. In CAV, pages 420--432, 2003.

[16]

P. Cousot and R. Cousot. Static determination of dynamic properties of programs. In ISOP, pages 106--130, 1976.

[17]

P. Cousot and R. Cousot. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints. In POPL, pages 238--252, 1977.

Digital Library

[18]

P. Cousot and R. Cousot. Comparing the Galois connection and widening/narrowing approaches to abstract interpretation. In PLILP, pages 269--295, 1992.

Digital Library

[19]

P. Cousot and N. Halbwachs. Automatic discovery of linear restraints among variables of a program. In POPL, pages 84--96, 1978.

Digital Library

[20]

P. Cousot, R. Cousot, J. Feret, L. Mauborgne, A. Miné, and X. Rival. Why does Astrée scale up? Formal Methods in System Design, 35(3): 229--264, 2009.

Digital Library

[21]

P. Domingos. A few useful things to know about machine learning. Commun. ACM, 55(10):78--87, 2012.

Digital Library

[22]

T. Gawlitza and H. Seidl. Precise fixpoint computation through strategy iteration. In ESOP, pages 300--315, 2007.

Digital Library

[23]

T. Gawlitza and H. Seidl. Precise relational invariants through strategy iteration. In CSL, pages 23--40, 2007.

Digital Library

[24]

S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1--58, 1992.

Digital Library

[25]

P. Godefroid, A. V. Nori, S. K. Rajamani, and S. Tetali. Compositional may-must program analysis: unleashing the power of alternation. In POPL, pages 43--56, 2010.

Digital Library

[26]

B. S. Gulavani and S. Gulwani. A numerical abstract domain based on expression abstraction and max operator with application in timing analysis. In CAV, pages 370--384, 2008.

Digital Library

[27]

S. Gulwani, S. Srivastava, and R. Venkatesan. Program analysis as constraint solving. In PLDI, pages 281--292, 2008.

Digital Library

[28]

A. Gupta, R. Majumdar, and A. Rybalchenko. From tests to proofs. In TACAS, pages 262--276, 2009.

Digital Library

[29]

J. Henry, D. Monniaux, and M. Moy. Pagai: A path sensitive static analyser. Electr. Notes Theor. Comput. Sci., 289:15--25, 2012.

Digital Library

[30]

J. Henry, D. Monniaux, and M. Moy. Succinct representations for abstract interpretation - combined analysis algorithms and experimental evaluation. In SAS, pages 283--299, 2012.

Digital Library

[31]

T. A. Henzinger, R. Jhala, R. Majumdar, and K. L. McMillan. Abstractions from proofs. In POPL, pages 232--244, 2004.

Digital Library

[32]

R. Jhala and K. L. McMillan. A practical and complete approach to predicate refinement. In TACAS, pages 459--473, 2006.

Digital Library

[33]

M. Kearns and D. Ron. Algorithmic stability and sanity-check bounds for leave-one-out cross-validation. Neural Computation, 11:152--162, 1997.

Digital Library

[34]

M. J. Kearns and U. V. Vazirani. An introduction to computational learning theory. MIT Press, Cambridge, MA, USA, 1994. ISBN 0-262-11193-4.

Digital Library

[35]

G. Klein, J. Andronick, K. Elphinstone, G. Heiser, D. Cock, P. Derrin, D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, H. Tuch, and S. Winwood. seL4: formal verification of an operatingsystem kernel. Commun. ACM, 53(6):107--115, 2010.

Digital Library

[36]

G. Lalire, M. Argoud, and B. Jeannet. The Interproc Analyzer. http://pop-art.inrialpes.fr/people/bjeannet/bjeannetforge/interproc/index.html.

[37]

P. Liang, O. Tripp, and M. Naik. Learning minimal abstractions. In POPL, pages 31--42, 2011.

Digital Library

[38]

K. L. McMillan. An interpolating theorem prover. Theoretical Computer Science, 345(1):101--121, 2005.

Digital Library

[39]

A. Minè. The octagon abstract domain. Higher-Order and Symbolic Computation, 19(1):31--100, 2006.

Digital Library

[40]

D. Monniaux and L. Gonnord. Using bounded model checking to focus fixpoint iterations. In SAS, pages 369--385, 2011.

Digital Library

[41]

D. Monniaux and J. L. Guen. Stratified static analysis based on variable dependencies. Electr. Notes Theor. Comput. Sci., 288:61--74, 2012.

Digital Library

[42]

A. Y. Ng. Preventing "overfitting" of cross-validation data. In ICML, pages 245--253, 1997.

Digital Library

[43]

A. V. Nori and S. K. Rajamani. An empirical study of optimizations in YOGI. In ICSE (1), pages 355--364, 2010.

Digital Library

[44]

J. C. Reynolds. Separation logic: A logic for shared mutable data structures. In LICS, pages 55--74, 2002.

Digital Library

[45]

S. Sankaranarayanan, H. B. Sipma, and Z. Manna. Scalable analysis of linear systems using mathematical programming. In VMCAI, pages 25--41, 2005.

Digital Library

[46]

S. Sankaranarayanan, F. Ivancic, I. Shlyakhter, and A. Gupta. Static analysis in disjunctive numerical domains. In SAS, pages 3--17, 2006.

Digital Library

[47]

R. Sharma, S. Gupta, B. Hariharan, A. Aiken, and A. V. Nori. Verification as learning geometric concepts. In SAS, pages 388--411, 2013.

[48]

L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11): 1134--1142, 1984.

Digital Library

[49]

X. Zhang, M. Naik, and H. Yang. Finding optimum abstractions in parametric dataflow analysis. In PLDI, pages 365--376, 2013.

Digital Library

Cited By

Hollinger DPollard RSchall MChen HZabala M(2024)A Hierarchical-Based Learning Approach for Multi-Action Intent RecognitionSensors10.3390/s2423785724:23(7857)Online publication date: 9-Dec-2024
https://doi.org/10.3390/s24237857
Rountree-Harrison DBerkovsky SKangas M(2023)Heart and brain traumatic stress biomarker analysis with and without machine learning: A scoping reviewInternational Journal of Psychophysiology10.1016/j.ijpsycho.2023.01.009185(27-49)Online publication date: Mar-2023
https://doi.org/10.1016/j.ijpsycho.2023.01.009
Nimmy KDilraj MSankaran SAchuthan K(2022)Leveraging power consumption for anomaly detection on IoT devices in smart homesJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-022-04110-614:10(14045-14056)Online publication date: 23-Jun-2022
https://doi.org/10.1007/s12652-022-04110-6
Show More Cited By

Index Terms

Bias-variance tradeoffs in program analysis
1. Theory of computation
  1. Semantics and reasoning
    1. Program reasoning
      1. Program analysis
      2. Program verification
    2. Program semantics

Recommendations

Bias-variance tradeoffs in program analysis
POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

It is often the case that increasing the precision of a program analysis leads to worse results. It is our thesis that this phenomenon is the result of fundamental limits on the ability to use precise abstract domains as the basis for inferring strong ...
Comparing and combining analysis-based and learning-based regression test selection
AST '22: Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test

Regression testing---rerunning tests on each code version to detect newly-broken functionality---is important and widely practiced. But, regression testing is costly due to the large number of tests and the high frequency of code changes. Regression ...
Gradient Bias to Solve the Generalization Limit of Genetic Algorithms Through Hybridization with Reinforcement Learning
Machine Learning, Optimization, and Data Science
Abstract
Genetic Algorithms have recently been successfully applied to the Machine Learning framework, being able to train autonomous agents and proving to be valid alternatives to state-of-the-art Reinforcement Learning techniques. Their attractiveness ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 49, Issue 1

POPL '14

January 2014

661 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2578855

Editor:
Mark W. Bailey
Hamilton College, Clinton, NY

Issue’s Table of Contents

POPL '14: Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
January 2014
702 pages
ISBN:9781450325448
DOI:10.1145/2535838
General Chair:
Suresh Jagannathan
Purdue University, USA
,
Program Chair:
Peter Sewell
University of Cambridge, UK

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 January 2014

Published in SIGPLAN Volume 49, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
418
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)2

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hollinger DPollard RSchall MChen HZabala M(2024)A Hierarchical-Based Learning Approach for Multi-Action Intent RecognitionSensors10.3390/s2423785724:23(7857)Online publication date: 9-Dec-2024
https://doi.org/10.3390/s24237857
Rountree-Harrison DBerkovsky SKangas M(2023)Heart and brain traumatic stress biomarker analysis with and without machine learning: A scoping reviewInternational Journal of Psychophysiology10.1016/j.ijpsycho.2023.01.009185(27-49)Online publication date: Mar-2023
https://doi.org/10.1016/j.ijpsycho.2023.01.009
Nimmy KDilraj MSankaran SAchuthan K(2022)Leveraging power consumption for anomaly detection on IoT devices in smart homesJournal of Ambient Intelligence and Humanized Computing10.1007/s12652-022-04110-614:10(14045-14056)Online publication date: 23-Jun-2022
https://doi.org/10.1007/s12652-022-04110-6
Ho VAlvin CMukhopadhyay SPeterson BLawson J(2020)Empirical AbstractionRuntime Verification10.1007/978-3-030-60508-7_14(259-278)Online publication date: 6-Oct-2020
https://dl.acm.org/doi/10.1007/978-3-030-60508-7_14
Yao PKe JSun JFu HWu RRen K(2023)Demystifying Template-Based Invariant Generation for Bit-Vector Programs2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00069(673-685)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00069
Bruni RGiacobazzi RGori RRanzato FJhala RDillig I(2022)Abstract interpretation repairProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523453(426-441)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519939.3523453
Koenig JPadon OImmerman NAiken ADonaldson ATorlak E(2020)First-order quantified separatorsProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386018(703-717)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386018
He JSingh GPüschel MVechev MDonaldson ATorlak E(2020)Learning fast and precise numerical analysisProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3385412.3386016(1112-1127)Online publication date: 11-Jun-2020
https://dl.acm.org/doi/10.1145/3385412.3386016
Padhi SMillstein TNori ASharma R(2019)Overfitting in Synthesis: Theory and PracticeComputer Aided Verification10.1007/978-3-030-25540-4_17(315-334)Online publication date: 12-Jul-2019
https://doi.org/10.1007/978-3-030-25540-4_17
Lidman JMckee S(2017)Verifying Reliability Properties Using the Hyperball Abstract DomainACM Transactions on Programming Languages and Systems10.1145/315601740:1(1-29)Online publication date: 19-Dec-2017
https://dl.acm.org/doi/10.1145/3156017
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents