skip to main content
research-article

Program Boosting: Program Synthesis via Crowd-Sourcing

Published: 14 January 2015 Publication History

Abstract

In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, which elude even expert developers and lack an easy-to-formalize specification.
We propose an approach we call program boosting, which involves crowd-sourcing imperfect solutions to a difficult programming problem from developers and then blending these programs together in a way that improves their correctness.
We implement this approach in a system called CROWDBOOST and show in our experiments that interesting and highly non-trivial tasks such as writing regular expressions for URLs or email addresses can be effectively crowd-sourced. We demonstrate that carefully blending the crowd-sourced results together consistently produces a boost, yielding results that are better than any of the starting programs. Our experiments on 465 program pairs show consistent boosts in accuracy and demonstrate that program boosting can be performed at a relatively modest monetary cost.

Supplementary Material

MPG File (p677-sidebyside.mpg)

References

[1]
D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 1987.
[2]
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming: An Introduction. 1997.
[3]
D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor. Automan: a platform for integrating human-based and digital computation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2012.
[4]
D. F. Barrero, D. Camacho, and M. D.R-Moreno. Automatic web data extraction based on genetic algorithms and regular expressions. In Data Mining and Multi-agent Integration. 2009.
[5]
K. Chellapilla and D. Czarnecki. A preliminary investigation into evolving modular finite state machines. In Proceedings of the Congress on Evolutionary Computation, 1999.
[6]
B. Cody-Kenny and S. Barrett. Self-focusing genetic programming for software optimisation. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2013.
[7]
P. Cousot and R. Cousot. Formal language, grammar and setconstraint-based program analysis by abstract interpretation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture, 1995.
[8]
L. D'Antoni and M. Veanes. Minimization of symbolic automata. In Proceedings of the Symposium on Principles of Programming Languages, 2014.
[9]
L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. 1966.
[10]
S. Forrest, T. Nguyen, W. Weimer, and C. L. Goues. A genetic programming approach to automated software repair. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2009.
[11]
U. Galassi and A. Giordana. Learning regular expressions from noisy sequences. In Proceedings of the Symposium on Abstraction, Reformulation and Approximation, 2005.
[12]
E. M. Gold. Language identification in the limit. Information and Control, 10(5), 1967.
[13]
M. Goldman, G. Little, and R. C. Miller. Collabode: Collaborative coding in the browser. In Proceedings of theWorkshop on Cooperative and Human Aspects of Software Engineering, 2011.
[14]
A. González-Pardo and D. Camacho. Analysis of grammatical evolutionary approaches to regular expression induction. In Proceedings of the Congress on Evolutionary Computation, 2011.
[15]
S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In Proceedings of the Symposium on Principles of Programming Languages, 2011.
[16]
T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Proceedings of the Converence on Computer Aided Verification, 2011.
[17]
P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with Bek. In Proceedings of the USENIX Security Symposium, 2011.
[18]
Y. Inagaki. On synchronized evolution of the network of automata. IEEE Transactions on Evolutionary Computation, 6(2), 2002.
[19]
M. Kearns and L. Valiant. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 41(1), 1994.
[20]
R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 2009.
[21]
J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. 1992.
[22]
B. Lambeau, C. Damas, and P. Dupont. State-merging DFA induction algorithms with mandatory merge constraints. In Proceedings of the International Colloquim on Grammatical Inference, 2008.
[23]
K. J. Lang. Random DFA's can be approximately learned from sparse uniform examples. In Proceedings of Workshop on Computational Learning Theory, 1992.
[24]
K. J. Lang, B. A. Pearlmutter, and R. Price. Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In Proceedings of the International Colloquim on Grammatical Inference, 1998.
[25]
T. D. LaToza,W. B. Towne, C. M. Adriano, and A. van der Hoek. Microtask programming: Building software with a crowd. In Proceedings of the Symposium on User Interface Software and Technology, 2014.
[26]
Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. V. Jagadish. Regular expression learning for information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.
[27]
G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: tools for iterative tasks on mechanical turk. In Proceedings of the SIGKDD Workshop on Human Computation, 2009.
[28]
S. M. Lucas and T. J. Reynolds. Learning DFA: evolution versus evidence driven state merging. In Proceedings of the Congress on Evolutionary Computation, 2003.
[29]
S. M. Lucas and T. J. Reynolds. Learning deterministic finite automata with a smart state labeling evolutionary algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 2005.
[30]
O. Maler and I. E. Mens. Learning regular languages over large alphabets. In Tools and Algorithms for the Construction and Analysis of Systems, 2014.
[31]
J. V. Nickerson, Y. Sakamoto, and L. Yu. Structures for creativity: The crowdsourcing of design. In CHI Workshop on Crowd-sourcing and Human Computation, 2011.
[32]
J. Oncina and P. Garcíá. Identifying regular languages in polynomial time. In Advances in Structural and Syntactic Pattern Recognition, Series in Machine Perception and Artificial Intelligence. 1992.
[33]
A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In Proceedings of the Conference on Information and Knowledge Management, 2012.
[34]
L. Pitt and M. K. Warmuth. The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the ACM, 40(1), 1993.
[35]
R. Poli, W. B. Langdon, and N. F. McPhee. A Field Guide to Genetic Programming. Lulu Enterprises, 2008.
[36]
A. J. Quinn, B. B. Bederson, T. Yeh, and J. Lin. Crowdflow: Integrating machine learning with mechanical turk for speed-cost-quality flexibility. Technical Report HCIL-2010-09, University of Maryland, College Park, 2010.
[37]
R. E. Schapire. The boosting approach to machine learning: An overview. https://www.cs.princeton.edu/courses/ archive/spring07/cos424/papers/boosting-survey.pdf, 2002.
[38]
A. Solar-Lezama. Program sketching. STTT, 15(5--6):475--495, 2013.
[39]
S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In Proceedings of the Symposium on Principles of Programming Languages, 2010.
[40]
T. E. Uribe and M. E. Stickel. Ordered binary decision diagrams and the davis-putnam procedure. In Proceedings of the Conference on Constraints in Computational Linguistics, 1994.
[41]
M. Veanes and N. Bjørner. Symbolic automata: The toolkit. In Proceedings of the Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2012.
[42]
R. A. Wagner. Order-n correction for regular languages. Communications of the ACM, 17(5), 1974.

Cited By

View all
  • (2022)Synthesizing Programs from Program Pieces Using Genetic Programming and Refinement Type CheckingGenetic Programming10.1007/978-3-031-02056-8_13(197-211)Online publication date: 20-Apr-2022
  • (2021)Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithmGenetic Programming and Evolvable Machines10.1007/s10710-021-09411-x23:1(105-131)Online publication date: 1-Oct-2021
  • (2020)Bashon: A Hybrid Crowd-Machine Workflow for Shell Command Synthesis2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC50065.2020.9127248(1-8)Online publication date: Aug-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 50, Issue 1
POPL '15
January 2015
682 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/2775051
  • Editor:
  • Andy Gill
Issue’s Table of Contents
  • cover image ACM Conferences
    POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
    January 2015
    716 pages
    ISBN:9781450333009
    DOI:10.1145/2676726
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 January 2015
Published in SIGPLAN Volume 50, Issue 1

Check for updates

Author Tags

  1. crowd-sourcing
  2. program synthesis
  3. regular expressions
  4. symbolic automata

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Synthesizing Programs from Program Pieces Using Genetic Programming and Refinement Type CheckingGenetic Programming10.1007/978-3-031-02056-8_13(197-211)Online publication date: 20-Apr-2022
  • (2021)Automatic generation of regular expressions for the Regex Golf challenge using a local search algorithmGenetic Programming and Evolvable Machines10.1007/s10710-021-09411-x23:1(105-131)Online publication date: 1-Oct-2021
  • (2020)Bashon: A Hybrid Crowd-Machine Workflow for Shell Command Synthesis2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)10.1109/VL/HCC50065.2020.9127248(1-8)Online publication date: Aug-2020
  • (2020)COSINE:a software development model integrating collective intelligence, service and ecosystem2020 IEEE World Congress on Services (SERVICES)10.1109/SERVICES48979.2020.00038(122-127)Online publication date: Oct-2020
  • (2019)How to Invest my TimeProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3330773(2305-2313)Online publication date: 25-Jul-2019
  • (2019)Mining Specifications from Documentation using a Crowd2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER)10.1109/SANER.2019.8668025(275-286)Online publication date: Feb-2019
  • (2018)Active Learning of Regular Expressions for Entity ExtractionIEEE Transactions on Cybernetics10.1109/TCYB.2017.268046648:3(1067-1080)Online publication date: Mar-2018
  • (2017)A search for improved performance in regular expressionsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3071178.3071196(1280-1287)Online publication date: 1-Jul-2017
  • (2016)Back in Black: Towards Formal, Black Box Analysis of Sanitizers and Filters2016 IEEE Symposium on Security and Privacy (SP)10.1109/SP.2016.14(91-109)Online publication date: May-2016
  • (2023)Human-in-the-loop Regular Expression Extraction for Single Column Format InconsistencyProceedings of the ACM Web Conference 202310.1145/3543507.3583515(3859-3867)Online publication date: 30-Apr-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media