article

Active learning for automatic classification of software behavior

Authors:

James F. Bowring,

Mary Jean HarroldAuthors Info & Claims

ACM SIGSOFT Software Engineering Notes, Volume 29, Issue 4

Pages 195 - 205

https://doi.org/10.1145/1013886.1007539

Published: 01 July 2004 Publication History

Abstract

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this paper is on the automatic classification of program behavior using execution data. Prior work on classifiers for software engineering adopts a classical batch-learning approach. In contrast, we explore an active-learning paradigm for behavior classification. In active learning, the classifier is trained incrementally on a series of labeled data elements. Secondly, we explore the thesis that certain features of program behavior are stochastic processes that exhibit the Markov property, and that the resultant Markov models of individual program executions can be automatically clustered into effective predictors of program behavior. We present a technique that models program executions as Markov models, and a clustering method for Markov models that aggregates multiple program executions into effective behavior classifiers. We evaluate an application of active learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating automated test plan augmentation.

References

[1]

G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In Proceedings of the 2002 ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02), pages 4--16, January 2002.

Digital Library

[2]

Aristotle Research Group. Aristotle: Software engineering tools, 2002. http://www.cc.gatech.edu/aristotle/.

[3]

G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison-Wesley, Boston, 1998.

Digital Library

[4]

Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In Proceedings of the 26th International Conference on Software Engineering.

Digital Library

[5]

D. A. Cohn, L. Atlas, and R. E. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.

[6]

J. E. Cook and A. L. Wolf. Automating process discovery through event-data analysis. In Proceedings of the 17th International Conference on Software Engineering (ICSE'95), pages 73--82, January 1999.

Digital Library

[7]

W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In Proceedings of the 23rd International Conference on Software Engineering (ICSE'01), pages 339--348, May 2001.

Digital Library

[8]

T. G. Dietterich. Machine learning for sequential data: A review. In T. Caelli, editor, Structural, Syntactic, and Statistical Pattern Recognition, volume 2396 of Lecture Notes in Computer Science, pages 15--30. Springer-Verlag, 2002.

Digital Library

[9]

R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, Inc., New York, 2001.

Digital Library

[10]

M. Girolami and A. Kaban. Simplicial mixtures of markov chains: Distributed modelling of dynamic user profiles. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.

[11]

K. C. Gross, S. McMaster, A. Porter, A. Urmanov, and L. Votta. Proactive system maintenance using software telemetry. In Proceedings of the 1st International Conference on Remote Analysis and Measurement of Software Systems (RAMSS'03), pages 24--26, May 2003.

[12]

M. Harder, J. Mellen, and M. D. Ernst. Improving test suites via operational abstraction. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 60--71, May 2003.

Digital Library

[13]

M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi. An empirical investigation of the relationship between fault-revealing test behavior and differences in program spectra. Journal of Software Testing, Verifications, and Reliability, 10(3), September 2000.

[14]

K. Ilgun, R. A. Kemmerer, and P. A. Porras. State transition analysis: A rule-based intrusion detection approach. Software Engineering, 21(3):181--199, 1995.

Digital Library

[15]

S. Jha, K. Tan, and R. A. Maxion. Markov chains, classifiers, and intrusion detection. In Proceedings of the 14th IEEE Computer Security Foundations Workshop (CSFW'01), pages 206--219, June 2001.

Digital Library

[16]

J. A. Kowal. Behavior Models: Specifying User's Expectations. Prentice Hall, Englewood Cliffs, New Jersey, 1992.

Digital Library

[17]

T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997.

Digital Library

[18]

J. C. Munson and S. Elbaum. Software reliability as a function of user execution patterns. In Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences, January 1999.

Digital Library

[19]

J. Musa. Software Reliability Engineering: More Reliable Software, Faster Development and Testing. McGraw-Hill, New York, 1999.

Digital Library

[20]

A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 465--474, May 2003.

Digital Library

[21]

S. J. Prowell, C. J. Trammell, R. C. Linger, and J. H. Poore. Cleanroom Software Engineering: Technology and Process. Addison-Wesley, Reading, Mass., 1999.

Digital Library

[22]

L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, New Jersey, 1993.

Digital Library

[23]

T. Reps, T. Ball, M. Das, and J. Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. ACM Software Engineering Notes, 22(6):432--439, November 1997.

Digital Library

[24]

J. A. Whittaker and J. H. Poore. Markov analysis of software specifications. ACM Transactions on Software Engineering and Methodology, 2(1):93--106, January 1996.

Digital Library

Cited By

Geethal CBöhme MPham V(2023)Human-in-the-Loop Automatic Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2023.330505249:10(4526-4549)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3305052
Meng FWang YYu HZhu Z(2022)Devising optimal integration test orders using cost–benefit analysis基于成本收益分析的集成测试序列生成优化方法Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210046623:5(692-714)Online publication date: 25-May-2022
https://doi.org/10.1631/FITEE.2100466
Kang HLo D(2022)Active Learning of Discriminative Subgraph Patterns for API Misuse DetectionIEEE Transactions on Software Engineering10.1109/TSE.2021.306997848:8(2761-2783)Online publication date: 1-Aug-2022
https://doi.org/10.1109/TSE.2021.3069978
Show More Cited By

Recommendations

Active learning for automatic classification of software behavior
ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this ...
When does my program do this? learning circumstances of software behavior
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering

A program fails. Under which circumstances does the failure occur? Our Alhazenapproach starts with a run that exhibits a particular behavior and automatically determines input features associated with the behavior in question: (1) We use a grammar to ...
The role of Reinforcement Learning in software testing
Abstract Context:
Software testing is applied to validate the behavior of the software system and identify flaws and bugs. Different machine learning technique types such as supervised and unsupervised learning were utilized in software testing. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes

ACM SIGSOFT Software Engineering Notes Volume 29, Issue 4

July 2004

284 pages

ISSN:0163-5948

DOI:10.1145/1013886

Issue’s Table of Contents

ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
July 2004
294 pages
ISBN:1581138202
DOI:10.1145/1007512
General Chair:
George Avrunin
University of Massachusetts, USA
,
Program Chair:
Gregg Rothermel
University of Nebraska -- Lincoln, USA

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2004

Published in SIGSOFT Volume 29, Issue 4

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

146
Total Citations
View Citations
1,946
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Geethal CBöhme MPham V(2023)Human-in-the-Loop Automatic Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2023.330505249:10(4526-4549)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1109/TSE.2023.3305052
Meng FWang YYu HZhu Z(2022)Devising optimal integration test orders using cost–benefit analysis基于成本收益分析的集成测试序列生成优化方法Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210046623:5(692-714)Online publication date: 25-May-2022
https://doi.org/10.1631/FITEE.2100466
Kang HLo D(2022)Active Learning of Discriminative Subgraph Patterns for API Misuse DetectionIEEE Transactions on Software Engineering10.1109/TSE.2021.306997848:8(2761-2783)Online publication date: 1-Aug-2022
https://doi.org/10.1109/TSE.2021.3069978
Guo YHu QCordy MPapadakis MLe Traon Y(2022)DRE: density-based data selection with entropy for adversarial-robust deep learning modelsNeural Computing and Applications10.1007/s00521-022-07812-235:5(4009-4026)Online publication date: 19-Oct-2022
https://doi.org/10.1007/s00521-022-07812-2
Taromirad MRuneson P(2022)Near Failure Analysis Using Dynamic Behavioural DataProduct-Focused Software Process Improvement10.1007/978-3-031-21388-5_12(171-178)Online publication date: 21-Nov-2022
https://dl.acm.org/doi/10.1007/978-3-031-21388-5_12
Tsimpourlas FRooijackers GRajan AAllamanis M(2021)Embedding and classifying test execution traces using neural networksIET Software10.1049/sfw2.1203816:3(301-316)Online publication date: 17-Aug-2021
https://doi.org/10.1049/sfw2.12038
Yang XYu ZWang JMenzies T(2020)Understanding Static Code Warnings: an Incremental AI ApproachExpert Systems with Applications10.1016/j.eswa.2020.114134(114134)Online publication date: Nov-2020
https://doi.org/10.1016/j.eswa.2020.114134
Golagha MLehnhoff CPretschner AIlmberger HZhang DMøller A(2019)Failure clustering without coverageProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3293882.3330561(134-145)Online publication date: 10-Jul-2019
https://dl.acm.org/doi/10.1145/3293882.3330561
Mashhadi MHemmati HGuéhéneuc YKhomh FSarro F(2019)An empirical study on practicality of specification mining algorithms on a real-world applicationProceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00020(65-69)Online publication date: 25-May-2019
https://dl.acm.org/doi/10.1109/ICPC.2019.00020
Kulah YDincer BYilmaz CSavas E(2019)SpyDetectorInternational Journal of Information Security10.1007/s10207-018-0411-718:4(393-422)Online publication date: 1-Aug-2019
https://dl.acm.org/doi/10.1007/s10207-018-0411-7
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents