research-article

Program Boosting: Program Synthesis via Crowd-Sourcing

Authors:

Robert A. Cochran,

Loris D'Antoni,

Benjamin Livshits,

Margus VeanesAuthors Info & Claims

ACM SIGPLAN Notices, Volume 50, Issue 1

Pages 677 - 688

https://doi.org/10.1145/2775051.2676973

Published: 14 January 2015 Publication History

Abstract

In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, which elude even expert developers and lack an easy-to-formalize specification.

We propose an approach we call program boosting, which involves crowd-sourcing imperfect solutions to a difficult programming problem from developers and then blending these programs together in a way that improves their correctness.

We implement this approach in a system called CROWDBOOST and show in our experiments that interesting and highly non-trivial tasks such as writing regular expressions for URLs or email addresses can be effectively crowd-sourced. We demonstrate that carefully blending the crowd-sourced results together consistently produces a boost, yielding results that are better than any of the starting programs. Our experiments on 465 program pairs show consistent boosts in accuracy and demonstrate that program boosting can be performed at a relatively modest monetary cost.

Supplementary Material

MPG File (p677-sidebyside.mpg)

Download
1876.82 MB

References

[1]

D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 1987.

Digital Library

[2]

W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming: An Introduction. 1997.

Digital Library

[3]

D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor. Automan: a platform for integrating human-based and digital computation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2012.

Digital Library

[4]

D. F. Barrero, D. Camacho, and M. D.R-Moreno. Automatic web data extraction based on genetic algorithms and regular expressions. In Data Mining and Multi-agent Integration. 2009.

[5]

K. Chellapilla and D. Czarnecki. A preliminary investigation into evolving modular finite state machines. In Proceedings of the Congress on Evolutionary Computation, 1999.

[6]

B. Cody-Kenny and S. Barrett. Self-focusing genetic programming for software optimisation. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2013.

Digital Library

[7]

P. Cousot and R. Cousot. Formal language, grammar and setconstraint-based program analysis by abstract interpretation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture, 1995.

Digital Library

[8]

L. D'Antoni and M. Veanes. Minimization of symbolic automata. In Proceedings of the Symposium on Principles of Programming Languages, 2014.

Digital Library

[9]

L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. 1966.

Digital Library

[10]

S. Forrest, T. Nguyen, W. Weimer, and C. L. Goues. A genetic programming approach to automated software repair. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2009.

Digital Library

[11]

U. Galassi and A. Giordana. Learning regular expressions from noisy sequences. In Proceedings of the Symposium on Abstraction, Reformulation and Approximation, 2005.

Digital Library

[12]

E. M. Gold. Language identification in the limit. Information and Control, 10(5), 1967.

[13]

M. Goldman, G. Little, and R. C. Miller. Collabode: Collaborative coding in the browser. In Proceedings of theWorkshop on Cooperative and Human Aspects of Software Engineering, 2011.

Digital Library

[14]

A. González-Pardo and D. Camacho. Analysis of grammatical evolutionary approaches to regular expression induction. In Proceedings of the Congress on Evolutionary Computation, 2011.

[15]

S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In Proceedings of the Symposium on Principles of Programming Languages, 2011.

Digital Library

[16]

T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Proceedings of the Converence on Computer Aided Verification, 2011.

Digital Library

[17]

P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with Bek. In Proceedings of the USENIX Security Symposium, 2011.

Digital Library

[18]

Y. Inagaki. On synchronized evolution of the network of automata. IEEE Transactions on Evolutionary Computation, 6(2), 2002.

Digital Library

[19]

M. Kearns and L. Valiant. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 41(1), 1994.

Digital Library

[20]

R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 2009.

Digital Library

[21]

J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. 1992.

Digital Library

[22]

B. Lambeau, C. Damas, and P. Dupont. State-merging DFA induction algorithms with mandatory merge constraints. In Proceedings of the International Colloquim on Grammatical Inference, 2008.

Digital Library

[23]

K. J. Lang. Random DFA's can be approximately learned from sparse uniform examples. In Proceedings of Workshop on Computational Learning Theory, 1992.

Digital Library

[24]

K. J. Lang, B. A. Pearlmutter, and R. Price. Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In Proceedings of the International Colloquim on Grammatical Inference, 1998.

Digital Library

[25]

T. D. LaToza,W. B. Towne, C. M. Adriano, and A. van der Hoek. Microtask programming: Building software with a crowd. In Proceedings of the Symposium on User Interface Software and Technology, 2014.

Digital Library

[26]

Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. V. Jagadish. Regular expression learning for information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.

Digital Library

[27]

G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: tools for iterative tasks on mechanical turk. In Proceedings of the SIGKDD Workshop on Human Computation, 2009.

Digital Library

[28]

S. M. Lucas and T. J. Reynolds. Learning DFA: evolution versus evidence driven state merging. In Proceedings of the Congress on Evolutionary Computation, 2003.

[29]

S. M. Lucas and T. J. Reynolds. Learning deterministic finite automata with a smart state labeling evolutionary algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 2005.

Digital Library

[30]

O. Maler and I. E. Mens. Learning regular languages over large alphabets. In Tools and Algorithms for the Construction and Analysis of Systems, 2014.

[31]

J. V. Nickerson, Y. Sakamoto, and L. Yu. Structures for creativity: The crowdsourcing of design. In CHI Workshop on Crowd-sourcing and Human Computation, 2011.

[32]

J. Oncina and P. Garcíá. Identifying regular languages in polynomial time. In Advances in Structural and Syntactic Pattern Recognition, Series in Machine Perception and Artificial Intelligence. 1992.

[33]

A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In Proceedings of the Conference on Information and Knowledge Management, 2012.

Digital Library

[34]

L. Pitt and M. K. Warmuth. The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the ACM, 40(1), 1993.

Digital Library

[35]

R. Poli, W. B. Langdon, and N. F. McPhee. A Field Guide to Genetic Programming. Lulu Enterprises, 2008.

Digital Library

[36]

A. J. Quinn, B. B. Bederson, T. Yeh, and J. Lin. Crowdflow: Integrating machine learning with mechanical turk for speed-cost-quality flexibility. Technical Report HCIL-2010-09, University of Maryland, College Park, 2010.

[37]

R. E. Schapire. The boosting approach to machine learning: An overview. https://www.cs.princeton.edu/courses/ archive/spring07/cos424/papers/boosting-survey.pdf, 2002.

[38]

A. Solar-Lezama. Program sketching. STTT, 15(5--6):475--495, 2013.

[39]

S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In Proceedings of the Symposium on Principles of Programming Languages, 2010.

Digital Library

[40]

T. E. Uribe and M. E. Stickel. Ordered binary decision diagrams and the davis-putnam procedure. In Proceedings of the Conference on Constraints in Computational Linguistics, 1994.

Digital Library

[41]

M. Veanes and N. Bjørner. Symbolic automata: The toolkit. In Proceedings of the Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2012.

Digital Library

[42]

R. A. Wagner. Order-n correction for regular languages. Communications of the ACM, 17(5), 1974.

Digital Library

Cited By

Tseng SHemberg EO’Reilly U(2022)Synthesizing Programs from Program Pieces Using Genetic Programming and Refinement Type CheckingGenetic Programming10.1007/978-3-031-02056-8_13(197-211)Online publication date: 20-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-031-02056-8_13
de Almeida Farzat Ade Oliveira Barros M(2021)Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithmGenetic Programming and Evolvable Machines10.1007/s10710-021-09411-x23:1(105-131)Online publication date: 1-Oct-2021
https://doi.org/10.1007/s10710-021-09411-x
Chen YHerskovitz JLasecki WOney S(2020)Bashon: A Hybrid Crowd-Machine Workflow for Shell Command Synthesis2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC50065.2020.9127248(1-8)Online publication date: Aug-2020
https://doi.org/10.1109/VL/HCC50065.2020.9127248
Show More Cited By

Index Terms

Program Boosting: Program Synthesis via Crowd-Sourcing
1. Human-centered computing
  1. Collaborative and social computing
2. Theory of computation
  1. Formal languages and automata theory
    1. Grammars and context-free languages

Recommendations

Component-based synthesis for complex APIs

POPL '17: Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages

Component-based approaches to program synthesis assemble programs from a database of existing components, such as methods provided by an API. In this paper, we present a novel type-directed algorithm for component-based synthesis. The key novelty of ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 50, Issue 1

POPL '15

January 2015

682 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2775051

Editor:
Andy Gill
University of Kansas, Lawrence, KS

Issue’s Table of Contents

POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
January 2015
716 pages
ISBN:9781450333009
DOI:10.1145/2676726
General Chair:
Sriram Rajamani
Microsoft Research, India
,
Program Chair:
David Walker
Princeton University, USA

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 January 2015

Published in SIGPLAN Volume 50, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

32
Total Citations
View Citations
437
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)3

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Tseng SHemberg EO’Reilly U(2022)Synthesizing Programs from Program Pieces Using Genetic Programming and Refinement Type CheckingGenetic Programming10.1007/978-3-031-02056-8_13(197-211)Online publication date: 20-Apr-2022
https://dl.acm.org/doi/10.1007/978-3-031-02056-8_13
de Almeida Farzat Ade Oliveira Barros M(2021)Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithmGenetic Programming and Evolvable Machines10.1007/s10710-021-09411-x23:1(105-131)Online publication date: 1-Oct-2021
https://doi.org/10.1007/s10710-021-09411-x
Chen YHerskovitz JLasecki WOney S(2020)Bashon: A Hybrid Crowd-Machine Workflow for Shell Command Synthesis2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC50065.2020.9127248(1-8)Online publication date: Aug-2020
https://doi.org/10.1109/VL/HCC50065.2020.9127248
Wang HHong TCao JZhang HNie CCheng BHe YKuang LGong DChen WShi YHuang D(2020)COSINE:a software development model integrating collective intelligence, service and ecosystem2020 IEEE World Congress on Services (SERVICES)10.1109/SERVICES48979.2020.00038(122-127)Online publication date: Oct-2020
https://doi.org/10.1109/SERVICES48979.2020.00038
Zhang SHe LDragut EVucetic STeredesai AKumar VLi YRosales RTerzi EKarypis G(2019)How to Invest my TimeProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330773(2305-2313)Online publication date: 25-Jul-2019
https://dl.acm.org/doi/10.1145/3292500.3330773
Sun PBrown CBeschastnikh IStolee K(2019)Mining Specifications from Documentation using a Crowd2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER.2019.8668025(275-286)Online publication date: Feb-2019
https://doi.org/10.1109/SANER.2019.8668025
Bartoli ADe Lorenzo AMedvet ETarlao F(2018)Active Learning of Regular Expressions for Entity ExtractionIEEE Transactions on Cybernetics10.1109/TCYB.2017.268046648:3(1067-1080)Online publication date: Mar-2018
https://doi.org/10.1109/TCYB.2017.2680466
Cody-Kenny BFenton MRonayne AConsidine EMcGuire TO'Neill MBosman P(2017)A search for improved performance in regular expressionsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3071178.3071196(1280-1287)Online publication date: 1-Jul-2017
https://dl.acm.org/doi/10.1145/3071178.3071196
Argyros GStais IKiayias AKeromytis A(2016)Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters2016 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2016.14(91-109)Online publication date: May-2016
https://doi.org/10.1109/SP.2016.14
Yu SHan LIndulska MSadiq SDemartini G(2023)Human-in-the-loop Regular Expression Extraction for Single Column Format InconsistencyProceedings of the ACM Web Conference 202310.1145/3543507.3583515(3859-3867)Online publication date: 30-Apr-2023
https://dl.acm.org/doi/10.1145/3543507.3583515
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents