skip to main content
article

Active learning for automatic classification of software behavior

Published: 01 July 2004 Publication History

Abstract

A program's behavior is ultimately the collection of all its executions. This collection is diverse, unpredictable, and generally unbounded. Thus it is especially suited to statistical analysis and machine learning techniques. The primary focus of this paper is on the automatic classification of program behavior using execution data. Prior work on classifiers for software engineering adopts a classical batch-learning approach. In contrast, we explore an active-learning paradigm for behavior classification. In active learning, the classifier is trained incrementally on a series of labeled data elements. Secondly, we explore the thesis that certain features of program behavior are stochastic processes that exhibit the Markov property, and that the resultant Markov models of individual program executions can be automatically clustered into effective predictors of program behavior. We present a technique that models program executions as Markov models, and a clustering method for Markov models that aggregates multiple program executions into effective behavior classifiers. We evaluate an application of active learning to the efficient refinement of our classifiers by conducting three empirical studies that explore a scenario illustrating automated test plan augmentation.

References

[1]
G. Ammons, R. Bodik, and J. R. Larus. Mining specifications. In Proceedings of the 2002 ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'02), pages 4--16, January 2002.
[2]
Aristotle Research Group. Aristotle: Software engineering tools, 2002. http://www.cc.gatech.edu/aristotle/.
[3]
G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison-Wesley, Boston, 1998.
[4]
Y. Brun and M. D. Ernst. Finding latent code errors via machine learning over program executions. In Proceedings of the 26th International Conference on Software Engineering.
[5]
D. A. Cohn, L. Atlas, and R. E. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994.
[6]
J. E. Cook and A. L. Wolf. Automating process discovery through event-data analysis. In Proceedings of the 17th International Conference on Software Engineering (ICSE'95), pages 73--82, January 1999.
[7]
W. Dickinson, D. Leon, and A. Podgurski. Finding failures by cluster analysis of execution profiles. In Proceedings of the 23rd International Conference on Software Engineering (ICSE'01), pages 339--348, May 2001.
[8]
T. G. Dietterich. Machine learning for sequential data: A review. In T. Caelli, editor, Structural, Syntactic, and Statistical Pattern Recognition, volume 2396 of Lecture Notes in Computer Science, pages 15--30. Springer-Verlag, 2002.
[9]
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, Inc., New York, 2001.
[10]
M. Girolami and A. Kaban. Simplicial mixtures of markov chains: Distributed modelling of dynamic user profiles. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
[11]
K. C. Gross, S. McMaster, A. Porter, A. Urmanov, and L. Votta. Proactive system maintenance using software telemetry. In Proceedings of the 1st International Conference on Remote Analysis and Measurement of Software Systems (RAMSS'03), pages 24--26, May 2003.
[12]
M. Harder, J. Mellen, and M. D. Ernst. Improving test suites via operational abstraction. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 60--71, May 2003.
[13]
M. J. Harrold, G. Rothermel, K. Sayre, R. Wu, and L. Yi. An empirical investigation of the relationship between fault-revealing test behavior and differences in program spectra. Journal of Software Testing, Verifications, and Reliability, 10(3), September 2000.
[14]
K. Ilgun, R. A. Kemmerer, and P. A. Porras. State transition analysis: A rule-based intrusion detection approach. Software Engineering, 21(3):181--199, 1995.
[15]
S. Jha, K. Tan, and R. A. Maxion. Markov chains, classifiers, and intrusion detection. In Proceedings of the 14th IEEE Computer Security Foundations Workshop (CSFW'01), pages 206--219, June 2001.
[16]
J. A. Kowal. Behavior Models: Specifying User's Expectations. Prentice Hall, Englewood Cliffs, New Jersey, 1992.
[17]
T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997.
[18]
J. C. Munson and S. Elbaum. Software reliability as a function of user execution patterns. In Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences, January 1999.
[19]
J. Musa. Software Reliability Engineering: More Reliable Software, Faster Development and Testing. McGraw-Hill, New York, 1999.
[20]
A. Podgurski, D. Leon, P. Francis, W. Masri, M. Minch, J. Sun, and B. Wang. Automated support for classifying software failure reports. In Proceedings of the 25rd International Conference on Software Engineering (ICSE'03), pages 465--474, May 2003.
[21]
S. J. Prowell, C. J. Trammell, R. C. Linger, and J. H. Poore. Cleanroom Software Engineering: Technology and Process. Addison-Wesley, Reading, Mass., 1999.
[22]
L. Rabiner and B. Juang. Fundamentals of Speech Recognition. Prentice Hall, New Jersey, 1993.
[23]
T. Reps, T. Ball, M. Das, and J. Larus. The use of program profiling for software maintenance with applications to the year 2000 problem. ACM Software Engineering Notes, 22(6):432--439, November 1997.
[24]
J. A. Whittaker and J. H. Poore. Markov analysis of software specifications. ACM Transactions on Software Engineering and Methodology, 2(1):93--106, January 1996.

Cited By

View all
  • (2023)Human-in-the-Loop Automatic Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2023.330505249:10(4526-4549)Online publication date: 1-Oct-2023
  • (2022)Devising optimal integration test orders using cost–benefit analysis基于成本收益分析的集成测试序列生成优化方法Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210046623:5(692-714)Online publication date: 25-May-2022
  • (2022)Active Learning of Discriminative Subgraph Patterns for API Misuse DetectionIEEE Transactions on Software Engineering10.1109/TSE.2021.306997848:8(2761-2783)Online publication date: 1-Aug-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGSOFT Software Engineering Notes
ACM SIGSOFT Software Engineering Notes  Volume 29, Issue 4
July 2004
284 pages
ISSN:0163-5948
DOI:10.1145/1013886
Issue’s Table of Contents
  • cover image ACM Conferences
    ISSTA '04: Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
    July 2004
    294 pages
    ISBN:1581138202
    DOI:10.1145/1007512
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2004
Published in SIGSOFT Volume 29, Issue 4

Check for updates

Author Tags

  1. Markov models
  2. machine learning
  3. software behavior
  4. software testing

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Human-in-the-Loop Automatic Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2023.330505249:10(4526-4549)Online publication date: 1-Oct-2023
  • (2022)Devising optimal integration test orders using cost–benefit analysis基于成本收益分析的集成测试序列生成优化方法Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.210046623:5(692-714)Online publication date: 25-May-2022
  • (2022)Active Learning of Discriminative Subgraph Patterns for API Misuse DetectionIEEE Transactions on Software Engineering10.1109/TSE.2021.306997848:8(2761-2783)Online publication date: 1-Aug-2022
  • (2022)DRE: density-based data selection with entropy for adversarial-robust deep learning modelsNeural Computing and Applications10.1007/s00521-022-07812-235:5(4009-4026)Online publication date: 19-Oct-2022
  • (2022)Near Failure Analysis Using Dynamic Behavioural DataProduct-Focused Software Process Improvement10.1007/978-3-031-21388-5_12(171-178)Online publication date: 21-Nov-2022
  • (2021)Embedding and classifying test execution traces using neural networksIET Software10.1049/sfw2.1203816:3(301-316)Online publication date: 17-Aug-2021
  • (2020)Understanding Static Code Warnings: an Incremental AI ApproachExpert Systems with Applications10.1016/j.eswa.2020.114134(114134)Online publication date: Nov-2020
  • (2019)Failure clustering without coverageProceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis10.1145/3293882.3330561(134-145)Online publication date: 10-Jul-2019
  • (2019)An empirical study on practicality of specification mining algorithms on a real-world applicationProceedings of the 27th International Conference on Program Comprehension10.1109/ICPC.2019.00020(65-69)Online publication date: 25-May-2019
  • (2019)SpyDetectorInternational Journal of Information Security10.1007/s10207-018-0411-718:4(393-422)Online publication date: 1-Aug-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media