research-article

AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools

Authors:

Zachary P. Fry,

Giriprasad Sridhara,

K. Vijay-ShankerAuthors Info & Claims

MSR '08: Proceedings of the 2008 international working conference on Mining software repositories

Pages 79 - 88

https://doi.org/10.1145/1370750.1370771

Published: 10 May 2008 Publication History

Abstract

When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. In this paper, we present an automated approach to mining abbreviation expansions from source code to enhance software maintenance tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art.

References

[1]

G. Antoniol, G. Canfora, G. Casazza, A. D. Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Trans. Soft. Eng., 28(10):970---983, 2002.

Digital Library

[2]

J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proc. Inter. Conf. Soft. Eng., 2006.

Digital Library

[3]

B. Caprile and P. Tonella. Restructuring program identifier names. In Proc. Inter. Conf. Soft. Maintenance, 2000.

Digital Library

[4]

H. Feild, D. Binkley, and D. Lawrie. An empirical comparison of techniques for extracting concept abbreviations from identifiers. In Proc. Inter. Conf. Soft. Eng. and Applications, 2006.

[5]

F. Feng and W. B. Croft. Probabilistic techniques for phrase extraction. Inf. Process. Manage., 37(2):199--220, 2001.

Digital Library

[6]

E. Hill, L. Pollock, and K. Vijay-Shanker. Exploring the neighborhood with Dora to expedite software maintenance. In Proc. Inter. Conf. Auto. Soft. Eng., 2007.

Digital Library

[7]

D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall PTR, 2000.

Digital Library

[8]

L. S. Larkey, P. Ogilvie, M. A. Price, and B. Tamilio. Acrophile: an automated acronym extractor and server. In Proc. Conf. Digital Libraries, 2000.

Digital Library

[9]

D. Lawrie, H. Feild, and D. Binkley. Extracting meaning from abbreviated identifiers. In Proc. Inter. Working Conf. Source Code Analysis and Manipulation, 2007.

Digital Library

[10]

B. Liblit, A. Begel, and E. Sweeser. Cognitive perspectives on the role of naming in computer programs. In Proc. Annual Psychology Programming Workshop, 2006.

[11]

C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.

Digital Library

[12]

A. Marcus and J. I. Maletic. Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proc. Inter. Conf. Soft. Eng., 2003.

Digital Library

[13]

A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. An information retrieval approach to concept location in source code. In Proc. Working Conf. Reverse Eng., 2004.

Digital Library

[14]

S. Pakhomov. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proc. Association for Computational Linguistics, 2001.

Digital Library

[15]

M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.

[16]

P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proc. Inter. Conf. Soft. Eng., 2007.

Digital Library

[17]

D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. Using natural language program analysis to locate and understand action--oriented concerns. In Proc. Inter. Conf. Aspect--oriented Soft. Devel., 2007.

Digital Library

[18]

C. Simonyi. Hungarian notation. In Visual Studio 6.0 Technical Articles. Microsoft Corporation. Reprinted 1999.

[19]

W. Zhao, L. Zhang, Y. Liu, J. Sun, and F. Yang. SNIAFL: Towards a static non-interactive approach to feature location. ACM Trans. Soft. Eng. and Methodology, 15(2):195--226, 2006.

Digital Library

Cited By

Jiang YLiu HCheung SZhang L(2024)Shortening Overlong Method Names with AbbreviationsACM Transactions on Software Engineering and Methodology10.1145/367695933:8(1-24)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676959
Mahmud JDe Silva NKhan SMostafavi SMansur SChaparro OMarcus AMoran KRoychoudhury APaiva AAbreu RStorey M(2024)On Using GUI Interaction Data to Improve Text Retrieval-based Bug LocalizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608139(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3608139
Wang CPeng XXing ZMeng X(2023)Beyond Literal Meaning: Uncover and Explain Implicit Knowledge in Code Through Wikipedia-Based Concept LinkingIEEE Transactions on Software Engineering10.1109/TSE.2023.325002949:5(3226-3240)Online publication date: 1-May-2023
https://doi.org/10.1109/TSE.2023.3250029
Show More Cited By

Index Terms

AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools
1. Social and professional topics
  1. Professional topics
    1. Management of computing and information systems
      1. Software management
        Software maintenance
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
  2. Software notations and tools

Recommendations

Reverse engineering: a roadmap
ICSE '00: Proceedings of the Conference on The Future of Software Engineering
The SEXTANT Software Exploration Tool

In this paper, we discuss a set of functional requirements for software exploration tools and provide initial evidence that various combinations of these features are needed to effectively assist developers in understanding software. We observe that ...
Opusdei-Integrated Environment for Software Development and Maintenance
COMPSAC '96: Proceedings of the 20th Conference on Computer Software and Applications

Abstract: This paper discusses an integrated software development and maintenance environment, Opusdei, built and used for the past seven years at Hitachi Software Engineering (HSK) for its various projects. Industrial software is usually large, has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MSR '08: Proceedings of the 2008 international working conference on Mining software repositories

May 2008

162 pages

ISBN:9781605580241

DOI:10.1145/1370750

General Chair:
Ahmed E. Hassan
Queen's University, Canada
,
Program Chairs:
Michele Lanza
University of Lugano, Switzerland
,
Michael W. Godfrey
University of Waterloo, Canada

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 May 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICSE '08

Sponsor:

ICSE '08: International Conference on Software Engineering

May 10 - 11, 2008

Leipzig, Germany

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

74
Total Citations
View Citations
696
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jiang YLiu HCheung SZhang L(2024)Shortening Overlong Method Names with AbbreviationsACM Transactions on Software Engineering and Methodology10.1145/367695933:8(1-24)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3676959
Mahmud JDe Silva NKhan SMostafavi SMansur SChaparro OMarcus AMoran KRoychoudhury APaiva AAbreu RStorey M(2024)On Using GUI Interaction Data to Improve Text Retrieval-based Bug LocalizationProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3608139(1-13)Online publication date: 20-May-2024
https://dl.acm.org/doi/10.1145/3597503.3608139
Wang CPeng XXing ZMeng X(2023)Beyond Literal Meaning: Uncover and Explain Implicit Knowledge in Code Through Wikipedia-Based Concept LinkingIEEE Transactions on Software Engineering10.1109/TSE.2023.325002949:5(3226-3240)Online publication date: 1-May-2023
https://doi.org/10.1109/TSE.2023.3250029
Zhang JLiu SGong LZhang HHuang ZJiang H(2023)BEQAIN: An Effective and Efficient Identifier Normalization Approach With BERT and the Question Answering SystemIEEE Transactions on Software Engineering10.1109/TSE.2022.322755949:4(2597-2620)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TSE.2022.3227559
Wang CLou YLiu JPeng X(2023)Generating Variable Explanations via Zero-shot Prompt Learning2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00130(748-760)Online publication date: 11-Sep-2023
https://doi.org/10.1109/ASE56229.2023.00130
Mastropaolo AAghajani EPascarella LBavota G(2023)Automated variable renaming: are we there yet?Empirical Software Engineering10.1007/s10664-022-10274-828:2Online publication date: 14-Feb-2023
https://dl.acm.org/doi/10.1007/s10664-022-10274-8
Florez JPerry JWei SMarcus ADwyer MDamian DZeller A(2022)Retrieving data constraint implementations using fine-grained code patternsProceedings of the 44th International Conference on Software Engineering10.1145/3510003.3510167(1893-1905)Online publication date: 21-May-2022
https://dl.acm.org/doi/10.1145/3510003.3510167
Newman CDecker MAlsuhaibani RPeruma AMkaouer MMohapatra SVishnoi TZampieri MSheldon THill E(2022)An Ensemble Approach for Annotating Source Code Identifiers With Part-of-Speech TagsIEEE Transactions on Software Engineering10.1109/TSE.2021.309824248:9(3506-3522)Online publication date: 1-Sep-2022
https://doi.org/10.1109/TSE.2021.3098242
Jiang YLiu HJin JZhang L(2022)Automated Expansion of Abbreviations Based on Semantic Relation and Transfer ExpansionIEEE Transactions on Software Engineering10.1109/TSE.2020.299573648:2(519-537)Online publication date: 1-Feb-2022
https://doi.org/10.1109/TSE.2020.2995736
Osumi YUmekawa NKomata HHayashi S(2022)Empirical Study of Co-Renamed Identifiers2022 29th Asia-Pacific Software Engineering Conference (APSEC)10.1109/APSEC57359.2022.00019(71-80)Online publication date: Dec-2022
https://doi.org/10.1109/APSEC57359.2022.00019
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten