ABSTRACT
With as much as 60-90% of software life cycle resources spent on program maintenance, there is a critical need for automated software tools to help explore and understand today's large and complex software. One important source of information software maintenance tools can draw from is lexical information in comments and identifiers. Identifier names often communicate a programmer's intent when writing code, and help developers map real-world concepts to code during comprehension. My dissertation will develop specialized information retrieval techniques and natural language analyses for software so that software maintenance tools can take full advantage of the wealth of information in program identifiers, and integrate these techniques into software tools to expedite the maintenance activities of program exploration, concern location, and fault localization.
- R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999. Google ScholarDigital Library
- T. J. Biggerstaff. Design recovery for maintenance and reuse. Computer, 22(7):36--49, 1989. Google ScholarDigital Library
- T. J. Biggerstaff, B. G. Mitbander, and D. Webster. The concept assignment problem in program understanding. In Proceedings of the 15th International Conference on Software Engineering, pages 482--498, 1993. Google ScholarDigital Library
- B. Boehm. Software engineering. IEEE Transactions on Computers, C-25(12):1226--1241, Dec. 1976. Google ScholarDigital Library
- B. Caprile and P. Tonella. Nomen est omen: Analyzing the language of function identifiers. In Proceedings of the Sixth Working Conference on Reverse Engineering, page 112, 1999. Google ScholarDigital Library
- H. Cleve and A. Zeller. Locating causes of program failures. In Proceedings of the 27th International Conference on Software engineering, pages 342--351, 2005. Google ScholarDigital Library
- V. Dallmeier and T. Zimmermann. Extraction of bug localization benchmarks from history. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering, November 2007. Google ScholarDigital Library
- F. Deissenboeck and M. Pizka. Concise and consistent naming. Software Quality Control, 14(3):261--282, 2006. Google ScholarDigital Library
- M. Eaddy. ConcernTagger case study data. Online, 2008. http://www1.cs.columbia.edu/ eaddy/concerntagger/.Google Scholar
- A. D. Eisenberg and K. D. Volder. Dynamic feature traces: Finding features in unfamiliar code. In Proceedings of the 21st IEEE International Conference on Software Maintenance, pages 337--346, 2005. Google ScholarDigital Library
- L. Erlikh. Leveraging legacy system dollars for e-business. IT Professional, 2(3):17--23, 2000. Google ScholarDigital Library
- M. Fuller, E. Mackie, R. Sacks-Davis, and R. Wilkinson. Structured answers for a large structured document collection. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pages 204--213, 1993. Google ScholarDigital Library
- P. Group. PROMISE data repository. Online, 2008. http://promisedata.org/.Google Scholar
- E. Hill, L. Pollock, and K. Vijay-Shanker. Exploring the neighborhood with Dora to expedite software maintenance. In Proceedings of the 22nd IEEE International Conference on Automated Software Engineering, 2007. Google ScholarDigital Library
- D. Hovemeyer and W. Pugh. Finding bugs is easy. SIGPLAN Not., 39(12):92--106, 2004. Google ScholarDigital Library
- J. A. Jones and M. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, pages 273--282, 2005. Google ScholarDigital Library
- A. J. Ko, H. Aung, and B. A. Myers. Eliciting design requirements for maintenance-oriented ides: a detailed study of corrective and perfective maintenance tasks. In Proceedings of the 27th International Conference on Software Engineering, pages 126--135, 2005. Google ScholarDigital Library
- T. K. Landauer, D. S. McNamara, S. Dennis, and W. Kintsch, editors. Handbook of Latent Semantic Analysis. Erlbaum, Mahwah, NJ, USA, 2007.Google Scholar
- B. Liblit, A. Begel, and E. Sweeser. Cognitive perspectives on the role of naming in computer programs. In Proceedings of the 18th Annual Psychology of Programming Workshop, 2006.Google Scholar
- G. C. Murphy, M. Kersten, and L. Findlater. How are Java software developers using the Eclipse IDE? IEEE Softw., 23(4):76--83, 2006. Google ScholarDigital Library
- D. Poshyvanyk, Y.-G. Gueheneuc, A. Marcus, G. Antoniol, and V. Rajlich. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Eng., 33(6):420--432, 2007. Google ScholarDigital Library
- D. Poshyvanyk, A. Marcus, V. Rajlich, Y.-G. Gueheneuc, and G. Antoniol.Combining probabilistic ranking and latent semantic indexing for feature identification. In Proceedings of the 14th IEEE International Conference on Program Comprehension, pages 137--148, 2006. Google ScholarDigital Library
- M. Renieris and S. P. Reiss. Fault localization with nearest neighbor queries. In 18th IEEE International Conference on Automated Software Engineering, pages 30--39, 2003.Google ScholarDigital Library
- M. P. Robillard. Automatic generation of suggestions for program investigation. In Proceedings of the 10th European Software Engineering Conference held jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 11--20, 2005. Google ScholarDigital Library
- M. P. Robillard and G. C. Murphy. Concern graphs: finding and describing concerns using structural program dependencies. In Proceedings of the 24th International Conference on Software Engineering, pages 406--416, 2002. Google ScholarDigital Library
- M. P. Robillard and G. C. Murphy. Representing concerns in source code. ACM Trans. Softw. Eng. Methodol., 16(1):3, 2007. Google ScholarDigital Library
- M. P. Robillard, D. Shepherd, E. Hill, K. Vijay-Shanker, and L. Pollock. An empirical study of the concept assignment problem. Technical Report SOCS-TR-2007.3, School of Computer Science, McGill University, June 2007. http://www.cs.mcgill.ca/ martin/concerns/.Google Scholar
- Z. M. Saul, V. Filkov, P. Devanbu, and C. Bird. Recommending random walks. In Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering, pages 15--24, 2007. Google ScholarDigital Library
- A. Schröter, T. Zimmermann, R. Premraj, and A. Zeller. If your bug database could talk\dots. In Proceedings of the 5th International Symposium on Empirical Software Engineering, Volume II: Short Papers and Posters, pages 18--20, September 2006. Available at http://www.st.cs.uni--sb.de/softevo/.Google Scholar
- D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of the 6th International Conference on Aspect-oriented Software Development, 2007. Google ScholarDigital Library
- V. Sinha, D. Karger, and R. Miller. Relo: Helping users manage context during interactive exploratory visualization of large codebases. In Visual Languages and Human-Centric Computing, 2006. Google ScholarDigital Library
- L. Tan, D. Yuan, G. Krishna, and Y. Zhou. /*iComment: Bugs or bad comments?*/. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, pages 145--158, 2007. Google ScholarDigital Library
- F. Tip. A survey of program slicing techniques. Journal of Programming Languages, 3(3):121--189, 1995.Google Scholar
- A. Trotman. Choosing document structure weights. Inf. Process. Manage., 41(2):243--264, 2005. Google ScholarDigital Library
- N. Wilde and M. C. Scully. Software reconnaissance: mapping program features to code. Journal of Software Maintenance, 7(1):49--62, 1995. Google ScholarDigital Library
- A. Williams, W. Thies, and M. D. Ernst. Static deadlock detection for Java libraries. In Object-Oriented Programming, 19th European Conference, pages 602--629, 2005. Google ScholarDigital Library
- B. Xu, J. Qian, X. Zhang, Z. Wu, and L. Chen. A brief survey of program slicing. SIGSOFT Software Engineering Notes, 30(2):1--36, 2005. Google ScholarDigital Library
Index Terms
- Developing natural language-based program analyses and tools to expedite software maintenance
Recommendations
Using software metrics tools for maintenance decisions: a classroom exercise
SAST '96: Proceedings of the Proceedings of the Fourth International Symposium on Assessment of Software Tools (SAST '96)We explore the use of software metrics tools to guide software maintenance decisions. A senior undergraduate class was given a copy of QUIPU, an implementation of the X.500 directory standard, and asked to determine which component of the system would be ...
Exploring the neighborhood with dora to expedite software maintenance
ASE '07: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software EngineeringCompleting software maintenance and evolution tasks for today's large, complex software systems can be difficult, often requiring considerable time to understand the system well enough to make correct changes. Despite evidence that successful ...
Opusdei-Integrated Environment for Software Development and Maintenance
COMPSAC '96: Proceedings of the 20th Conference on Computer Software and ApplicationsAbstract: This paper discusses an integrated software development and maintenance environment, Opusdei, built and used for the past seven years at Hitachi Software Engineering (HSK) for its various projects. Industrial software is usually large, has ...
Comments