research-article

Sigma*: symbolic learning of input-output specifications

Authors:

Matko Botinčan,

Domagoj BabićAuthors Info & Claims

POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

Pages 443 - 456

https://doi.org/10.1145/2429069.2429123

Published: 23 January 2013 Publication History

Abstract

We present Sigma*, a novel technique for learning symbolic models of software behavior. Sigma* addresses the challenge of synthesizing models of software by using symbolic conjectures and abstraction. By combining dynamic symbolic execution to discover symbolic input-output steps of the programs and counterexample guided abstraction refinement to over-approximate program behavior, Sigma* transforms arbitrary source representation of programs into faithful input-output models. We define a class of stream filters---programs that process streams of data items---for which Sigma* converges to a complete model if abstraction refinement eventually builds up a sufficiently strong abstraction. In other words, Sigma* is complete relative to abstraction. To represent inferred symbolic models, we use a variant of symbolic transducers that can be effectively composed and equivalence checked. Thus, Sigma* enables fully automatic analysis of behavioral properties such as commutativity, reversibility and idempotence, which is useful for web sanitizer verification and stream programs compiler optimizations, as we show experimentally. We also show how models inferred by Sigma* can boost performance of stream programs by parallelized code generation.

Supplementary Material

JPG File (r2d3_talk1.jpg)

Download
22.49 KB

MP4 File (r2d3_talk1.mp4)

Download
204.37 MB

References

[1]

F. Aarts, B. Jonsson, and J. Uijen. Generating models of infinite-state communication protocols using regular inference with abstraction. In Proc. of the 22nd IFIP WG 6.1 Int. Conf. on Testing Software and Systems, pages 188--204, 2010.

Digital Library

[2]

S. Agrawal, W. Thies, and S. P. Amarasinghe. Optimizing stream programs using linear state space analysis. In Proc. of the 2005 Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, pages 126--136, 2005.

Digital Library

[3]

R. Alur and P. Cerny. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proc. of the 38th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 599--610, 2011.

Digital Library

[4]

R. Alur, P. Cerny, P. Madhusudan, andW. Nam. Synthesis of interface specifications for Java classes. In Proc. of the 32nd ACM SIGPLANSIGACT Symp. on Principles of Programming Languages, pages 98--109, 2005.

Digital Library

[5]

D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2):87--106, 1987.

Digital Library

[6]

T. Ball. Formalizing counterexample-driven refinement with weakest preconditions. In Engineering Theories of Software Intensive Systems, volume 195 of NATO Science Series, pages 121--139. 2005.

[7]

T. Ball, R. Majumdar, T. Millstein, and S. Rajamani. Automatic predicate abstraction of C programs. In Proc. of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 203--213, 2001.

Digital Library

[8]

T. Ball, A. Podelski, and S. K. Rajamani. Boolean and Cartesian abstraction for model checking C programs. In Proc. of the 7th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, pages 268--283, 2001.

Digital Library

[9]

T. Ball, A. Podelski, and S. K. Rajamani. Relative completeness of abstraction refinement for software model checking. In Proc. of the 8th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, pages 158--172, 2002.

Digital Library

[10]

D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In IEEE Symposium on Security and Privacy, pages 387--401, 2008.

Digital Library

[11]

M. M. Baskaran, N. Vydyanathan, U. Bondhugula, J. Ramanujam, A. Rountev, and P. Sadayappan. Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In Proc. of the 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming, pages 219--228, 2009.

Digital Library

[12]

T. Berg, B. Jonsson, and H. Raffelt. Regular inference for state machines using domains with equality tests. In Proc. of the Theory and practice of software, 11th Int. Conf. on Fundamental approaches to software engineering, pages 317--331, 2008.

Digital Library

[13]

D. Beyer and M. E. Keremoglu. Cpachecker: A tool for configurable software verification. In Proc. of the 23rd Int. Conf. on Computer Aided Verification, pages 184--190, 2011.

Digital Library

[14]

N. Bjørner, P. Hooimeijer, B. Livshits, D. Molnar, and M. Veanes. Symbolic finite state transducers: Algorithms and applications. In Proc.of the 39th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, 2012.

Digital Library

[15]

I. Buck, T. Foley, D. R. Horn, J. Sugerman, K. Fatahalian, M. Houston, and P. Hanrahan. Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777--786, 2004.

Digital Library

[16]

C. Cadar and D. R. Engler. Execution generated test cases: How to make systems code crash itself. In Proc. of the 12th Int. SPIN Workshop on Model Checking Software, pages 2--23, 2005.

Digital Library

[17]

C. Cadar, D. Dunbar, and D. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proc. of the 8th USENIX Symp. on Operating Systems Design and Implementation, pages 209--224, 2008.

Digital Library

[18]

M. K. Chen, X.-F. Li, R. Lian, J. H. Lin, L. Liu, T. Liu, and R. Ju. Shangri-La: achieving high performance from compiled network applications while enabling ease of programming. In Proc. of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 224--236, 2005.

Digital Library

[19]

C. Y. Cho, D. Babić, P. Poosankam, K. Z. Chen, E. X. Wu, and D. Song. MACE: Model-inference-assisted concolic exploration for protocol and vulnerability discovery. In USENIX Security Symposium, 2011.

Digital Library

[20]

E. M. Clarke, O. Grumberg, S. Jha, Y. Lu, and H. Veith. Counterexample-guided abstraction refinement. In Proc. of the 12th Int. Conf. on Computer Aided Verification, pages 154--169, 2000.

Digital Library

[21]

J. M. Cobleigh, D. Giannakopoulou, and C. S. Pasareanu. Learning assumptions for compositional verification. In Proc. of the 9th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, volume 2619, pages 331--346, 2003.

Digital Library

[22]

A. J. Demers, C. Keleman, and B. Reusch. On some decidable properties of finite state translations. Acta Informatica, 17:349--364, 1982.

Digital Library

[23]

M. Drake, H. Hoffmann, R. M. Rabbah, and S. P. Amarasinghe. MPEG-2 decoding in a stream programming language. In Proc. of the 20th International Parallel and Distributed Processing Symposium, 2006.

Digital Library

[24]

J. Feret. Static analysis of digital filters. In Programming Languages and Systems, 13th European Symposium on Programming, pages 33--48, 2004.

[25]

V. Ganesh and D. L. Dill. A decision procedure for bit-vectors and arrays. In Proc. of the 19th Int. Conf. on Computer Aided Verification, pages 519--531, 2007.

Digital Library

[26]

B. Gedik, H. Andrade, K.-L. Wu, P. S. Yu, and M. Doo. Spade: the System S declarative stream processing engine. In SIGMOD Conference, pages 1123--1134, 2008.

Digital Library

[27]

D. Giannakopoulou, Z. Rakamaric, and V. Raman. Symbolic learning of component interfaces. In 19th Int. Symp. on Static Analysis, pages 248--264, 2012.

Digital Library

[28]

P. Godefroid, N. Klarlund, and K. Sen. DART: directed automated random testing. In Proc. of the ACM SIGPLAN 2005 Conf. on Programming Language Design and Implementation, pages 213--223, 2005.

Digital Library

[29]

M. I. Gordon, W. Thies, and S. P. Amarasinghe. Exploiting coarsegrained task, data, and pipeline parallelism in stream programs. In Proc. of the 12th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 151--162, 2006.

Digital Library

[30]

S. Graf and H. Saidi. Construction of abstract state graphs with PVS. In Proc. of the 9th Int. Conf. on Computer Aided Verification, pages 72--83, 1997.

Digital Library

[31]

B. S. Gulavani, T. A. Henzinger, Y. Kannan, A. V. Nori, and S. K. Rajamani. SYNERGY: a new algorithm for property checking. In Proc. of the 14th ACM SIGSOFT Int. Symp. on Foundations of Software Engineering, pages 117--127, 2006.

Digital Library

[32]

J. Gummaraju, J. Coburn, Y. Turner, and M. Rosenblum. Streamware: programming general-purpose multicore processors using streams. In Proc. of the 13th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, pages 297--307, 2008.

Digital Library

[33]

M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In Proc. of the Workload Characterization. WWC-4 IEEE Int. Workshop, pages 3--14, 2001.

Digital Library

[34]

P. Habermehl and T. Vojnar. Regular model checking using inference of regular languages. Electr. Notes Theor. Comput. Sci., 138:21--36, 2005.

Digital Library

[35]

A. Hagiescu,W.-F.Wong, D. F. Bacon, and R. M. Rabbah. A computing origami: folding streams in FPGAs. In Proc. of the 46th Design Automation Conference, pages 282--287, 2009.

Digital Library

[36]

T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy abstraction. In Proc. of the 29th ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages, pages 58--70, 2002.

Digital Library

[37]

P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with BEK. In USENIX Security Symposium, 2011.

Digital Library

[38]

J. E. Hopcroft. On the equivalence and containment problems for context-free languages. Theory of Computing Systems, 3:119--124, 1969.

[39]

A. Hormati, M. Kudlur, S. A. Mahlke, D. F. Bacon, and R. M. Rabbah. Optimus: efficient realization of streaming applications on FPGAs. In Proc. of the 2008 Int. Conf. on Compilers, Architecture, and Synthesis for Embedded Systems, pages 41--50, 2008.

Digital Library

[40]

F. Howar, B. Steffen, B. Jonsson, and S. Cassel. Inferring canonical register automata. In Proc. of the 13th Int. Conf. on Verification, Model Checking, and Abstract Interpretation, pages 251--266, 2012.

Digital Library

[41]

O. H. Ibarra. The unsolvability of the equivalence problem for _-free NGSM's with unary input (output) alphabet and applications. In Proc. of the 18th Annual Symp. on Foundations of Computer Science, pages 74--81, 1977.

Digital Library

[42]

R. Jhala and K. L. McMillan. A practical and complete approach to predicate refinement. In Proc. of the 12th Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, pages 459--473, 2006.

Digital Library

[43]

U. J. Kapasi,W. J. Dally, S. Rixner, J. D. Owens, and B. Khailany. The imagine stream processor. In Proc. of the 20th Int. Conf. on Computer Design, VLSI in Computers and Processors, pages 282--288, 2002.

Digital Library

[44]

M. Kudlur and S. A. Mahlke. Orchestrating the execution of stream programs on multicore platforms. In Proc. of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation, pages 114--124, 2008.

Digital Library

[45]

A. A. Lamb, W. Thies, and S. P. Amarasinghe. Linear analysis and optimization of stream programs. In Proc. of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, pages 12--25, 2003.

Digital Library

[46]

D. Lee and M. Yannakakis. Principles and methods of testing finite state machines-a survey. In Proc. of the IEEE, volume 84, pages 1090--1123, 1996.

[47]

S.-W. Liao, Z. Du, G. Wu, and G.-Y. Lueh. Data and computation transformations for Brook streaming applications on multiprocessors. In Proc. of the 4th IEEE/ACM Int. Symp. on Code Generation and Optimization, pages 196--207, 2006.

Digital Library

[48]

P. Prabhu, G. Ramalingam, and K. Vaswani. Safe programmable speculative parallelism. In Proc. of the 2010 ACM SIGPLAN Conf. on Programming Language Design and Implementation, pages 50--61, 2010.

Digital Library

[49]

K. Sen, D. Marinov, and G. Agha. CUTE: a concolic unit testing engine for C. In Proc. of the 10th European Software Engineering Conf. held jointly with 13th ACM SIGSOFT Int. Symp. on Foundations of Software Engineering, pages 263--272, 2005.

Digital Library

[50]

M. Shahbaz and R. Groz. Inferring Mealy machines. In Proc. of the 2nd World Congress on Formal Methods, pages 207--222, 2009.

Digital Library

[51]

R. Singh, D. Giannakopoulou, and C. S. Pasareanu. Learning component interfaces with may and must abstractions. In Proc. of the 22nd Int. Conf. on Computer Aided Verification, pages 527--542, 2010.

Digital Library

[52]

R. Soule, M. Hirzel, R. Grimm, B. Gedik, H. Andrade, V. Kumar, and K.-L. Wu. A universal calculus for stream processing languages. In Programming Languages and Systems, 19th European Symposium on Programming, pages 507--528, 2010.

Digital Library

[53]

W. Thies and S. P. Amarasinghe. An empirical characterization of stream programs and its implications for language and compiler design. In Proc. of the 19th International Conference on Parallel Architecture and Compilation Techniques, pages 365--376, 2010.

Digital Library

[54]

W. Thies, M. Karczmarek, and S. P. Amarasinghe. StreamIt: A language for streaming applications. In Proc. of the 11th International Conference on Compiler Construction, pages 179--196, 2002.

Digital Library

[55]

W. Thies, V. Chandrasekhar, and S. P. Amarasinghe. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In 40th Annual IEEE/ACM Int. Symp. on Microarchitecture, pages 356--369, 2007.

Digital Library

[56]

A. Udupa, R. Govindarajan, and M. J. Thazhuthaveetil. Software pipelined execution of stream programs on GPUs. In Proc. of the 7th Int. Symp. on Code Generation and Optimization, pages 200--209, 2009.

Digital Library

[57]

M. Veanes, D. Molnar, B. Livshits, and L. Litchev. Generating fast string manipulating code through transducer exploration and SIMD integration. Technical Report MSR-TR-2011-124, Microsoft Research, 2011.

[58]

J. M. Vilar. Query learning of subsequential transducers. In Proc. of the 3rd Int. Colloquium on Grammatical Inference: Learning Syntax from Sentences, pages 72--83, 1996.

Digital Library

Cited By

Vaandrager FMidya A(2022)A Myhill-Nerode theorem for register automata and symbolic trace languagesTheoretical Computer Science10.1016/j.tcs.2022.01.015912:C(37-55)Online publication date: 12-Apr-2022
https://dl.acm.org/doi/10.1016/j.tcs.2022.01.015
Yogananda Jeppu NMelham TKroening D(2022)Enhancing active model learning with equivalence checking using simulation relationsFormal Methods in System Design10.1007/s10703-023-00433-y61:2-3(164-197)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10703-023-00433-y
Wu ZJohnson EYang WBastani OSong DPeng JXie TDumas MPfahl DApel SRusso A(2019)REINAM: reinforcement learning for input-grammar inferenceProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338958(488-498)Online publication date: 12-Aug-2019
https://dl.acm.org/doi/10.1145/3338906.3338958
Show More Cited By

Index Terms

Sigma*: symbolic learning of input-output specifications
1. Software and its engineering

Recommendations

Sigma*: symbolic learning of input-output specifications
POPL '13

We present Sigma*, a novel technique for learning symbolic models of software behavior. Sigma* addresses the challenge of synthesizing models of software by using symbolic conjectures and abstraction. By combining dynamic symbolic execution to discover ...
Survey on Formal Verification Methods for Digital IC
ICICSE '09: Proceedings of the 2009 Fourth International Conference on Internet Computing for Science and Engineering

This paper presents a survey of the state-of-art of formal verification technique. The expression models for formal verification are introduced and analyzed. The characteristics of each model are expounded. Moreover, the typical model checking ...
Formal verification of code motion techniques using data-flow-driven equivalence checking
Special section on verification challenges in the concurrent world

A formal verification method for checking correctness of code motion techniques is presented in this article. Finite State Machine with Datapath (FSMD) models have been used to represent the input and the output behaviors of each synthesis step. The ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages

January 2013

586 pages

ISBN:9781450318327

DOI:10.1145/2429069

General Chair:
Roberto Giacobazzi
Università di Verona, Italy
,
Program Chair:
Radhia Cousot
École normale supérieure CNRS & INRIA, France

ACM SIGPLAN Notices Volume 48, Issue 1
POPL '13
January 2013
561 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2480359
Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

In-Cooperation

SIGACT: ACM Special Interest Group on Algorithms and Computation Theory

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 January 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

POPL '13

Sponsor:

SIGPLAN

POPL '13: The 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

January 23 - 25, 2013

Rome, Italy

Acceptance Rates

Overall Acceptance Rate 860 of 4,328 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

48
Total Citations
View Citations
683
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)2

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Vaandrager FMidya A(2022)A Myhill-Nerode theorem for register automata and symbolic trace languagesTheoretical Computer Science10.1016/j.tcs.2022.01.015912:C(37-55)Online publication date: 12-Apr-2022
https://dl.acm.org/doi/10.1016/j.tcs.2022.01.015
Yogananda Jeppu NMelham TKroening D(2022)Enhancing active model learning with equivalence checking using simulation relationsFormal Methods in System Design10.1007/s10703-023-00433-y61:2-3(164-197)Online publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1007/s10703-023-00433-y
Wu ZJohnson EYang WBastani OSong DPeng JXie TDumas MPfahl DApel SRusso A(2019)REINAM: reinforcement learning for input-grammar inferenceProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3338906.3338958(488-498)Online publication date: 12-Aug-2019
https://dl.acm.org/doi/10.1145/3338906.3338958
Jha S(2019)Trust, Resilience and Interpretability of AI ModelsNumerical Software Verification10.1007/978-3-030-28423-7_1(3-25)Online publication date: 3-Aug-2019
https://doi.org/10.1007/978-3-030-28423-7_1
Angluin DFisman D(2018)Regular omega-Languages with an Informative Right CongruenceElectronic Proceedings in Theoretical Computer Science10.4204/EPTCS.277.19277(265-279)Online publication date: 7-Sep-2018
https://doi.org/10.4204/EPTCS.277.19
Shrestha SPanda SCsallner CTichy WMinku L(2018)Complementing machine learning classifiers via dynamic symbolic executionProceedings of the 6th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering10.1145/3194104.3194111(15-20)Online publication date: 28-May-2018
https://dl.acm.org/doi/10.1145/3194104.3194111
Tripakis S(2018)Data-driven and model-based design2018 IEEE Industrial Cyber-Physical Systems (ICPS)10.1109/ICPHYS.2018.8387644(103-108)Online publication date: May-2018
https://doi.org/10.1109/ICPHYS.2018.8387644
Howar FSteffen B(2018)Active Automata Learning in PracticeMachine Learning for Dynamic Software Analysis: Potentials and Limits10.1007/978-3-319-96562-8_5(123-148)Online publication date: 20-Jul-2018
https://doi.org/10.1007/978-3-319-96562-8_5
Bastani OSharma RAiken ALiang P(2017)Synthesizing program input grammarsACM SIGPLAN Notices10.1145/3140587.306234952:6(95-110)Online publication date: 14-Jun-2017
https://dl.acm.org/doi/10.1145/3140587.3062349
Moerman JSammartino MSilva AKlin BSzynwelski M(2017)Learning nominal automataACM SIGPLAN Notices10.1145/3093333.300987952:1(613-625)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1145/3093333.3009879
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten