| AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools |
| Full text |
Pdf
(372 KB)
|
Source
|
International Conference on Software Engineering
archive
Proceedings of the 2008 international working conference on Mining software repositories
table of contents
Leipzig, Germany
SESSION: Mining 2
table of contents
Pages 79-88
Year of Publication: 2008
ISBN:978-1-60558-024-1
|
|
Authors
|
|
Emily Hill
|
University of Delaware, Newark, DE, USA
|
|
Zachary P. Fry
|
University of Delaware, Newark, DE, USA
|
|
Haley Boyd
|
University of Delaware, Newark, DE, USA
|
|
Giriprasad Sridhara
|
University of Delaware, Newark, DE, USA
|
|
Yana Novikova
|
University of Delaware, Newark, DE, USA
|
|
Lori Pollock
|
University of Delaware, Newark, DE, USA
|
|
K. Vijay-Shanker
|
University of Delaware, Newark, DE, USA
|
|
| Sponsors |
|
| Publisher |
|
| Bibliometrics |
Downloads (6 Weeks): 6, Downloads (12 Months): 44, Citation Count: 0
|
|
|
ABSTRACT
When writing software, developers often employ abbreviations in identifier names. In fact, some abbreviations may never occur with the expanded word, or occur more often in the code. However, most existing program comprehension and search tools do little to address the problem of abbreviations, and therefore may miss meaningful pieces of code or relationships between software artifacts. In this paper, we present an automated approach to mining abbreviation expansions from source code to enhance software maintenance tools that utilize natural language information. Our scoped approach uses contextual information at the method, program, and general software level to automatically select the most appropriate expansion for a given abbreviation. We evaluated our approach on a set of 250 potential abbreviations and found that our scoped approach provides a 57% improvement in accuracy over the current state of the art.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
G. Antoniol, G. Canfora, G. Casazza, A. D. Lucia, and E. Merlo. Recovering traceability links between code and documentation. IEEE Trans. Soft. Eng., 28(10):970---983, 2002.
|
| |
2
|
J. Anvik, L. Hiew, and G. C. Murphy. Who should fix this bug? In Proc. Inter. Conf. Soft. Eng., 2006.
|
| |
3
|
B. Caprile and P. Tonella. Restructuring program identifier names. In Proc. Inter. Conf. Soft. Maintenance, 2000.
|
| |
4
|
H. Feild, D. Binkley, and D. Lawrie. An empirical comparison of techniques for extracting concept abbreviations from identifiers. In Proc. Inter. Conf. Soft. Eng. and Applications, 2006.
|
| |
5
|
F. Feng and W. B. Croft. Probabilistic techniques for phrase extraction. Inf. Process. Manage., 37(2):199--220, 2001.
|
| |
6
|
E. Hill, L. Pollock, and K. Vijay-Shanker. Exploring the neighborhood with Dora to expedite software maintenance. In Proc. Inter. Conf. Auto. Soft. Eng., 2007.
|
| |
7
|
D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice Hall PTR, 2000.
|
| |
8
|
L. S. Larkey, P. Ogilvie, M. A. Price, and B. Tamilio. Acrophile: an automated acronym extractor and server. In Proc. Conf. Digital Libraries, 2000.
|
| |
9
|
D. Lawrie, H. Feild, and D. Binkley. Extracting meaning from abbreviated identifiers. In Proc. Inter. Working Conf. Source Code Analysis and Manipulation, 2007.
|
| |
10
|
B. Liblit, A. Begel, and E. Sweeser. Cognitive perspectives on the role of naming in computer programs. In Proc. Annual Psychology Programming Workshop, 2006.
|
| |
11
|
C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.
|
| |
12
|
A. Marcus and J. I. Maletic. Recovering documentation-to-source-code traceability links using latent semantic indexing. In Proc. Inter. Conf. Soft. Eng., 2003.
|
| |
13
|
A. Marcus, A. Sergeyev, V. Rajlich, and J. I. Maletic. An information retrieval approach to concept location in source code. In Proc. Working Conf. Reverse Eng., 2004.
|
| |
14
|
S. Pakhomov. Semi-supervised maximum entropy based approach to acronym and abbreviation normalization in medical texts. In Proc. Association for Computational Linguistics, 2001.
|
| |
15
|
M. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.
|
| |
16
|
P. Runeson, M. Alexandersson, and O. Nyholm. Detection of duplicate defect reports using natural language processing. In Proc. Inter. Conf. Soft. Eng., 2007.
|
| |
17
|
D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. Using natural language program analysis to locate and understand action--oriented concerns. In Proc. Inter. Conf. Aspect--oriented Soft. Devel., 2007.
|
| |
18
|
C. Simonyi. Hungarian notation. In Visual Studio 6.0 Technical Articles. Microsoft Corporation. Reprinted 1999.
|
| |
19
|
W. Zhao, L. Zhang, Y. Liu, J. Sun, and F. Yang. SNIAFL: Towards a static non-interactive approach to feature location. ACM Trans. Soft. Eng. and Methodology, 15(2):195--226, 2006.
|
|