skip to main content
10.1145/2676726.2676973acmconferencesArticle/Chapter ViewAbstractPublication PagespoplConference Proceedingsconference-collections
research-article

Program Boosting: Program Synthesis via Crowd-Sourcing

Published: 14 January 2015 Publication History

Abstract

In this paper, we investigate an approach to program synthesis that is based on crowd-sourcing. With the help of crowd-sourcing, we aim to capture the "wisdom of the crowds" to find good if not perfect solutions to inherently tricky programming tasks, which elude even expert developers and lack an easy-to-formalize specification.
We propose an approach we call program boosting, which involves crowd-sourcing imperfect solutions to a difficult programming problem from developers and then blending these programs together in a way that improves their correctness.
We implement this approach in a system called CROWDBOOST and show in our experiments that interesting and highly non-trivial tasks such as writing regular expressions for URLs or email addresses can be effectively crowd-sourced. We demonstrate that carefully blending the crowd-sourced results together consistently produces a boost, yielding results that are better than any of the starting programs. Our experiments on 465 program pairs show consistent boosts in accuracy and demonstrate that program boosting can be performed at a relatively modest monetary cost.

Supplementary Material

MPG File (p677-sidebyside.mpg)

References

[1]
D. Angluin. Learning regular sets from queries and counterexamples. Information and Computation, 75(2), 1987.
[2]
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone. Genetic Programming: An Introduction. 1997.
[3]
D. W. Barowy, C. Curtsinger, E. D. Berger, and A. McGregor. Automan: a platform for integrating human-based and digital computation. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2012.
[4]
D. F. Barrero, D. Camacho, and M. D.R-Moreno. Automatic web data extraction based on genetic algorithms and regular expressions. In Data Mining and Multi-agent Integration. 2009.
[5]
K. Chellapilla and D. Czarnecki. A preliminary investigation into evolving modular finite state machines. In Proceedings of the Congress on Evolutionary Computation, 1999.
[6]
B. Cody-Kenny and S. Barrett. Self-focusing genetic programming for software optimisation. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2013.
[7]
P. Cousot and R. Cousot. Formal language, grammar and setconstraint-based program analysis by abstract interpretation. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture, 1995.
[8]
L. D'Antoni and M. Veanes. Minimization of symbolic automata. In Proceedings of the Symposium on Principles of Programming Languages, 2014.
[9]
L. J. Fogel, A. J. Owens, and M. J. Walsh. Artificial Intelligence through Simulated Evolution. 1966.
[10]
S. Forrest, T. Nguyen, W. Weimer, and C. L. Goues. A genetic programming approach to automated software repair. In Proceedings of the Conference on Genetic and Evolutionary Computation, 2009.
[11]
U. Galassi and A. Giordana. Learning regular expressions from noisy sequences. In Proceedings of the Symposium on Abstraction, Reformulation and Approximation, 2005.
[12]
E. M. Gold. Language identification in the limit. Information and Control, 10(5), 1967.
[13]
M. Goldman, G. Little, and R. C. Miller. Collabode: Collaborative coding in the browser. In Proceedings of theWorkshop on Cooperative and Human Aspects of Software Engineering, 2011.
[14]
A. González-Pardo and D. Camacho. Analysis of grammatical evolutionary approaches to regular expression induction. In Proceedings of the Congress on Evolutionary Computation, 2011.
[15]
S. Gulwani. Automating string processing in spreadsheets using inputoutput examples. In Proceedings of the Symposium on Principles of Programming Languages, 2011.
[16]
T. Gvero, V. Kuncak, and R. Piskac. Interactive synthesis of code snippets. In Proceedings of the Converence on Computer Aided Verification, 2011.
[17]
P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with Bek. In Proceedings of the USENIX Security Symposium, 2011.
[18]
Y. Inagaki. On synchronized evolution of the network of automata. IEEE Transactions on Evolutionary Computation, 6(2), 2002.
[19]
M. Kearns and L. Valiant. Cryptographic limitations on learning boolean formulae and finite automata. Journal of the ACM, 41(1), 1994.
[20]
R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 2009.
[21]
J. R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. 1992.
[22]
B. Lambeau, C. Damas, and P. Dupont. State-merging DFA induction algorithms with mandatory merge constraints. In Proceedings of the International Colloquim on Grammatical Inference, 2008.
[23]
K. J. Lang. Random DFA's can be approximately learned from sparse uniform examples. In Proceedings of Workshop on Computational Learning Theory, 1992.
[24]
K. J. Lang, B. A. Pearlmutter, and R. Price. Results of the abbadingo one DFA learning competition and a new evidence-driven state merging algorithm. In Proceedings of the International Colloquim on Grammatical Inference, 1998.
[25]
T. D. LaToza,W. B. Towne, C. M. Adriano, and A. van der Hoek. Microtask programming: Building software with a crowd. In Proceedings of the Symposium on User Interface Software and Technology, 2014.
[26]
Y. Li, R. Krishnamurthy, S. Raghavan, S. Vaithyanathan, and H. V. Jagadish. Regular expression learning for information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008.
[27]
G. Little, L. B. Chilton, M. Goldman, and R. C. Miller. TurKit: tools for iterative tasks on mechanical turk. In Proceedings of the SIGKDD Workshop on Human Computation, 2009.
[28]
S. M. Lucas and T. J. Reynolds. Learning DFA: evolution versus evidence driven state merging. In Proceedings of the Congress on Evolutionary Computation, 2003.
[29]
S. M. Lucas and T. J. Reynolds. Learning deterministic finite automata with a smart state labeling evolutionary algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7), 2005.
[30]
O. Maler and I. E. Mens. Learning regular languages over large alphabets. In Tools and Algorithms for the Construction and Analysis of Systems, 2014.
[31]
J. V. Nickerson, Y. Sakamoto, and L. Yu. Structures for creativity: The crowdsourcing of design. In CHI Workshop on Crowd-sourcing and Human Computation, 2011.
[32]
J. Oncina and P. Garcíá. Identifying regular languages in polynomial time. In Advances in Structural and Syntactic Pattern Recognition, Series in Machine Perception and Artificial Intelligence. 1992.
[33]
A. G. Parameswaran, H. Park, H. Garcia-Molina, N. Polyzotis, and J. Widom. Deco: declarative crowdsourcing. In Proceedings of the Conference on Information and Knowledge Management, 2012.
[34]
L. Pitt and M. K. Warmuth. The minimum consistent DFA problem cannot be approximated within any polynomial. Journal of the ACM, 40(1), 1993.
[35]
R. Poli, W. B. Langdon, and N. F. McPhee. A Field Guide to Genetic Programming. Lulu Enterprises, 2008.
[36]
A. J. Quinn, B. B. Bederson, T. Yeh, and J. Lin. Crowdflow: Integrating machine learning with mechanical turk for speed-cost-quality flexibility. Technical Report HCIL-2010-09, University of Maryland, College Park, 2010.
[37]
R. E. Schapire. The boosting approach to machine learning: An overview. https://www.cs.princeton.edu/courses/ archive/spring07/cos424/papers/boosting-survey.pdf, 2002.
[38]
A. Solar-Lezama. Program sketching. STTT, 15(5--6):475--495, 2013.
[39]
S. Srivastava, S. Gulwani, and J. S. Foster. From program verification to program synthesis. In Proceedings of the Symposium on Principles of Programming Languages, 2010.
[40]
T. E. Uribe and M. E. Stickel. Ordered binary decision diagrams and the davis-putnam procedure. In Proceedings of the Conference on Constraints in Computational Linguistics, 1994.
[41]
M. Veanes and N. Bjørner. Symbolic automata: The toolkit. In Proceedings of the Conference on Tools and Algorithms for the Construction and Analysis of Systems, 2012.
[42]
R. A. Wagner. Order-n correction for regular languages. Communications of the ACM, 17(5), 1974.

Cited By

View all
  • (2023)Human-in-the-loop Regular Expression Extraction for Single Column Format InconsistencyProceedings of the ACM Web Conference 202310.1145/3543507.3583515(3859-3867)Online publication date: 30-Apr-2023
  • (2022)Quality of Automated Program Repair on Real-World DefectsIEEE Transactions on Software Engineering10.1109/TSE.2020.299878548:2(637-661)Online publication date: 1-Feb-2022
  • (2021)Ensuring the Correctness of Regular Expressions: A ReviewInternational Journal of Automation and Computing10.1007/s11633-021-1301-418:4(521-535)Online publication date: 1-Aug-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
January 2015
716 pages
ISBN:9781450333009
DOI:10.1145/2676726
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 50, Issue 1
    POPL '15
    January 2015
    682 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2775051
    • Editor:
    • Andy Gill
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 January 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. crowd-sourcing
  2. program synthesis
  3. regular expressions
  4. symbolic automata

Qualifiers

  • Research-article

Conference

POPL '15
Sponsor:

Acceptance Rates

POPL '15 Paper Acceptance Rate 52 of 227 submissions, 23%;
Overall Acceptance Rate 860 of 4,328 submissions, 20%

Upcoming Conference

POPL '26

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Human-in-the-loop Regular Expression Extraction for Single Column Format InconsistencyProceedings of the ACM Web Conference 202310.1145/3543507.3583515(3859-3867)Online publication date: 30-Apr-2023
  • (2022)Quality of Automated Program Repair on Real-World DefectsIEEE Transactions on Software Engineering10.1109/TSE.2020.299878548:2(637-661)Online publication date: 1-Feb-2022
  • (2021)Ensuring the Correctness of Regular Expressions: A ReviewInternational Journal of Automation and Computing10.1007/s11633-021-1301-418:4(521-535)Online publication date: 1-Aug-2021
  • (2020)A New Approach to Fuzzy Regular Expression Parsers for Cybersecurity Logs2020 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ48607.2020.9177710(1-8)Online publication date: 19-Jul-2020
  • (2019)Automatic repair of regular expressionsProceedings of the ACM on Programming Languages10.1145/33605653:OOPSLA(1-29)Online publication date: 10-Oct-2019
  • (2019)Knowledge Graph Programming with a Human-in-the-LoopProceedings of the Workshop on Human-In-the-Loop Data Analytics10.1145/3328519.3329132(1-7)Online publication date: 5-Jul-2019
  • (2019)Replication can improve prior resultsProceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00037(179-190)Online publication date: 25-May-2019
  • (2017)WearMailProceedings of the 30th Annual ACM Symposium on User Interface Software and Technology10.1145/3126594.3126603(807-815)Online publication date: 20-Oct-2017
  • (2017)Synthesis with Abstract ExamplesComputer Aided Verification10.1007/978-3-319-63387-9_13(254-278)Online publication date: 13-Jul-2017
  • (2016)FIDEX: filtering spreadsheet data using examplesACM SIGPLAN Notices10.1145/3022671.298403051:10(195-213)Online publication date: 19-Oct-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media