skip to main content
10.1145/1181775.1181808acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
Article

SMArTIC: towards building an accurate, robust and scalable specification miner

Published: 05 November 2006 Publication History

Abstract

Improper management of software evolution, compounded by imprecise, and changing requirements, along with the "short time to market" requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. Software specification mining is a new technique to address this concern by inferring specifications automatically. In this paper, we propose a novel API specification mining architecture called SMArTIC Specification Mining Architecture with Trace fIltering and Clustering) to improve the accuracy, robustness and scalability of specification miners. This architecture is constructed based on two hypotheses: (1) Erroneous traces should be pruned from the input traces to a miner, and (2) Clustering related traces will localize inaccuracies and reduce over-generalizationin learning. Correspondingly, SMArTIC comprises four components: an erroneous-trace filtering block, a related-trace clustering block, a learner, and a merger. We show through experiments that the quality of specification mining can be significantly improved using SMArTIC.

References

[1]
G. Ammons, R. Bodik, and J. R. Larus. Mining specification. In Proc. of Principles of Programming Languages, 2002.
[2]
G. Ammons, D. Mandelin, R. Bodik, and J.R. Larus. Debugging temporal specifications with concept analysis. In Proc. of Programming Language Design and Implementation, 2003.
[3]
T. Arts and L. Fredlund. Trace analysis of Erlang program. In Proc. of Erlang Workshop, 2002.
[4]
A.W. Biermann and J.A. Feldman. On the synthesis of finite-state machines from samples of their behaviour. IEEE Transactions on Computers, 21:591--597, 1972.
[5]
R. Capilla and J.C. Dueñas. Light-weight product-lines for evolution and maintenance of web sites. In Proc. of European Conf. On Software Maintenance And Reengineering, 2003.
[6]
J. E. Cook and A. L. Wolf. Discovering models of software processes from event-based data. ACM Transactions on Software Engineering and Methodology, 7(3):215--249, July 1998.
[7]
T.H. Cormen, C.E. Leiserson, R.L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001.
[8]
S. Deelstra, M. Sinnema, and J. Bosch. Experiences in software product families: Problems and issues during product derivation. In Proc. of Software Product Line Conf., 2004.
[9]
D.Lo and S-C.Khoo. QUARK: Empirical assessment of automaton-based specification miners. In Proc. of Working Conf. on Reverse Engineering, 2006.
[10]
M. Dwyer, G. Avrunin, and J. Corbett. Patterns in property specifications for finite-state verification. In Proc. of Int. Conf. on Software Engineering, 1999.
[11]
M. El-Ramly, E. Stroulia, and P. Sorenson. Interaction-pattern mining: Extracting usage scenarios from run-time behavior traces. In Proc. of ACM-SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2002.
[12]
M.D. Ernst, J. Cockrell, W.G. Griswold, and D. Notkin. Dynamically discovering likely program invariants to support program evolution. IEEE Transaction on Software Engineering, 27(2):99--123, February 2001.
[13]
A. Foss and O.R. Zaiane. A parameterless method for efficiently discovering clusters of arbitrary shape in large datasets. In Proc. IEEE Int. Conf. on Data Mining, 2002.
[14]
A. Fox. Addressing software dependability with statistical and machine learning techniques. In Proc. of Int. Conf. of Software Engineering, 2005. Invited Talk.
[15]
N. Gupta. Generating test data for dynamically discovering likely program invariants. In Proc. of the workshop on dynamic analysis, 2003.
[16]
J. Han and M. Kamber. Data Mining Concepts and Techniques. Morgan Kaufmann, 2001.
[17]
M. Huth and M. Ryan. Logic in Computer Science. Cambridge, 2004.
[18]
The Java hotspot performance engine architecture. http://java.sun.com/products/hotspot/whitepaper.html.
[19]
J. Henkel and A. Diwan. Discovering algebraic specifications from java classes. In Proc. of the Euro. Conf. of Object Oriented Programming, 2003.
[20]
JRat. http://jrat.sourceforge.net/.
[21]
L. Kaufman and P.J. Rousseeuw. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
[22]
E. Keogh, S. Lonardi, and C.A. Ratanamahatana. Towards parameter-free data mining. Proc. of Int. Conf. on Knowledge Discovery and Data Mining, 2004.
[23]
J.R. Larus. Whole program paths. In Proc. of Programming Language Design and Implementation, 1999.
[24]
A.M. Law and W. D. Kelton. Simulation modelling and analysis. McGraw-Hill, 2000.
[25]
D. Lo and S-C. Khoo. SMArTIC: Specification mining architecture with trace filtering and clustering. In SoC-NUS tech. report, TRA 8/06, 2006.
[26]
R.B. Lyngsø, C.N.S. Pedersen, and H. Nielsen. Metrics and similarity measures for hidden Markov models. In Proc. of National Conf. on Artificial Intelligence, 1999.
[27]
D.S. Moore and G.P. McCabe. Introduction to the Practice of Statistics. W.H. Freeman & Co., Third Edition, 1998.
[28]
Jakarta Commons Net. http://jakarta.apache.org/commons/net/.
[29]
C.G. Nevill-Manning, I.H. Witten, and D.L. Maulsby. Compression by induction of hierarchical grammars. In Proc. of Data Compression Conf., 1994.
[30]
J. W. Nimmer and M. D. Ernst. Automatic generation of program specifications. In Proc. of Int. Symposium on Software Testing and Analysis, 2002.
[31]
S.C. Ntafos. A comparison of some structural testing strategies. IEEE Transactions on Software Enginering, 14(6):868--874, 1988.
[32]
A. V. Raman and J. D. Patrick. The sk-strings method for inferring PFSA. In Proc. of the workshop on automata induction, grammatical inference and language acquisition, 1997.
[33]
O. Raz, P. Koopman, and M. Shaw. Semantic anomaly detection in online data sources. In Proc. of Int. Conf. on Software Engineering, 2002.
[34]
S. P. Reiss and M. Renieris. Encoding program executions. In Proc. of the Int. Conf. on Software Engineering, 2001.
[35]
S. P. Reiss and M. Renieris. Generating java trace data. In Proc. of the ACM Conf. on Java Grande, 2000.
[36]
M. Tompa. Lecture notes on biological sequence analysis. Technical Report 2000-06-01 University of Washington, 2000.
[37]
C. J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.
[38]
J. Wang and J. Han. BIDE: Efficient mining of frequent closed sequences. In Proc. of Int. Conf. on Data Engineering, 2004.
[39]
J. Whaley, M. C. Martin, and M. S. Lam. Automatic extraction of object oriented component interfaces. In Proc. of the Int. Symposium on Software Testing and Analysis, 2002.
[40]
W. Weimer and G. Necula. Mining temporal specifications for error detection. In Proc. of Int. Conf. on Tools and Algorithms for the Construction and Analysis of Systems, 2005.
[41]
J. Yang, D. Evans, D. Bhardwaj, T. Bhat, and M. Das. Perracotta: Mining temporal API rules from imperfect traces. In Proc. of Int. Conf. on Software Engineering, 2006.

Cited By

View all
  • (2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
  • (2024)Rigorous Assessment of Model Inference Accuracy using Language CardinalityACM Transactions on Software Engineering and Methodology10.1145/364033233:4(1-39)Online publication date: 16-Jan-2024
  • (2024)Requirements Engineering for Trustworthy Human-AI Synergy in Software Engineering 2.02024 IEEE 32nd International Requirements Engineering Conference (RE)10.1109/RE59067.2024.00011(3-4)Online publication date: 24-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGSOFT '06/FSE-14: Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
November 2006
298 pages
ISBN:1595934685
DOI:10.1145/1181775
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 November 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. clustering traces
  2. filtering errors
  3. specification mining

Qualifiers

  • Article

Conference

SIGSOFT06/FSE-14
Sponsor:

Acceptance Rates

Overall Acceptance Rate 17 of 128 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)3
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Specification Mining Based on the Ordering Points to Identify the Clustering Structure Clustering Algorithm and Model CheckingAlgorithms10.3390/a1701002817:1(28)Online publication date: 10-Jan-2024
  • (2024)Rigorous Assessment of Model Inference Accuracy using Language CardinalityACM Transactions on Software Engineering and Methodology10.1145/364033233:4(1-39)Online publication date: 16-Jan-2024
  • (2024)Requirements Engineering for Trustworthy Human-AI Synergy in Software Engineering 2.02024 IEEE 32nd International Requirements Engineering Conference (RE)10.1109/RE59067.2024.00011(3-4)Online publication date: 24-Jun-2024
  • (2023)Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps2023 IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE)10.1109/ICSE-FoSE59343.2023.00010(69-85)Online publication date: 14-May-2023
  • (2023)PURLTL: Mining LTL Specification from Imperfect Traces in TestingProceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE56229.2023.00202(1766-1770)Online publication date: 11-Nov-2023
  • (2023)An empirical study on API usages from code search engine and local libraryEmpirical Software Engineering10.1007/s10664-023-10304-z28:3Online publication date: 13-Apr-2023
  • (2022)A Hybrid Approach for Inference between Behavioral Exception API Documentation and Implementations, and Its ApplicationsProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3560434(1-13)Online publication date: 10-Oct-2022
  • (2022)Call Me Maybe: Using NLP to Automatically Generate Unit Test Cases Respecting Temporal ConstraintsProceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering10.1145/3551349.3556961(1-11)Online publication date: 10-Oct-2022
  • (2021)Scrutinizing Implementations of Smart Home IntegrationsIEEE Transactions on Software Engineering10.1109/TSE.2019.296069047:12(2667-2683)Online publication date: 1-Dec-2021
  • (2021)SOSRepair: Expressive Semantic Search for Real-World Program RepairIEEE Transactions on Software Engineering10.1109/TSE.2019.294491447:10(2162-2181)Online publication date: 1-Oct-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media