Abstract
The article addresses the problem of concept location in source code by proposing an approach that combines Formal Concept Analysis and Information Retrieval. In the proposed approach, Latent Semantic Indexing, an advanced Information Retrieval approach, is used to map textual descriptions of software features or bug reports to relevant parts of the source code, presented as a ranked list of source code elements. Given the ranked list, the approach selects the most relevant attributes from the best ranked documents, clusters the results, and presents them as a concept lattice, generated using Formal Concept Analysis.
The approach is evaluated through a large case study on concept location in the source code on six open-source systems, using several hundred features and bugs. The empirical study focuses on the analysis of various configurations of the generated concept lattices and the results indicate that our approach is effective in organizing different concepts and their relationships present in the subset of the search results. In consequence, the proposed concept location method has been shown to outperform a standalone Information Retrieval based concept location technique by reducing the number of irrelevant search results across all the systems and lattice configurations evaluated, potentially reducing the programmers' effort during software maintenance tasks involving concept location.
- Antoniol, G. and Guéhéneuc, Y. G. 2006. Feature identification: An epidemiological metaphor. IEEE Trans. Softw. Engin. 32, 627--641. Google ScholarDigital Library
- Bacchelli, A., D'ambros, M., and Lanza, M. 2010a. Extracting source code from e-mails. In Proceedings of the 18th IEEE International Conference on Program Comprehension (ICPC'10). Google ScholarDigital Library
- Bacchelli, A., D'ambros, M., Lanza, M., and Robbes, R. 2009. Benchmarking lightweight techniques to link e-mails and source code. In Proceedings of the 16th Working Conference on Reverse Engineering (WCRE'09). Google ScholarDigital Library
- Bacchelli, A., Lanza, M., and Robbes, R. 2010b. Linking e-mails and source code artifacts. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). Google ScholarDigital Library
- Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley. Google ScholarDigital Library
- Biggerstaff, T. J., Mitbander, B. G., and Webster, D. E. 1994. The concept assignment problem in program understanding. In Proceedings of the 15th IEEE/ACM International Conference on Software Engineering (ICSE'94). 482--498. Google ScholarDigital Library
- Carpineto, C., Osinski, S., Romano, G., and Weiss, D. 2009. A survey of web clustering engines. ACM Comput. Surv. 41. Google ScholarDigital Library
- Chen, K. and Rajlich, V. 2000. Case study of feature location using dependence graph. In Proceedings of the 8th IEEE International Workshop on Program Comprehension (IWPC'00). 241--249. Google ScholarDigital Library
- Cigarran, J., Peñas, A., Gonzalo, J., and Verdejo, F. 2005. Evaluating hierarchical clustering of search results. In Proceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE'05). 49--54. Google ScholarDigital Library
- Cigarrán, J.M., Gonzalo, J., Peñas, A., and Verdejo, F. 2004. Browsing search results via formal concept analysis: Automatic selection of attributes. In Proceedings of the 2nd International Conference on Formal Concept Analysis (ICFCA'04). 74--87.Google Scholar
- Cleary, B., Exton, C., Buckley, J., and English, M. 2009. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir. Softw. Engin. 14, 93--130. Google ScholarDigital Library
- Cubranic, D., Murphy, G.C., Singer, J., and Booth, K.S. 2005. Hipikat: A project memory for software development. IEEE Trans. Softw. Engin. 31, 446--465. Google ScholarDigital Library
- de Alwis, B., Murphy, G. C., and Robillard, M. 2007. A comparative study of three program exploration tools. In Proceedings of the 15th IEEE International Conference on Program Comprehension. 103--112. Google ScholarDigital Library
- De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G. 2007. Recovering traceability links in software artefact management systems. ACM Trans. Softw. Engin. Methodol. 16. Google ScholarDigital Library
- De Lucia, A., Oliveto, R., and Vorraro, L. 2008. Using structural and semantic metrics to improve class cohesion. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM'08). 27--36.Google Scholar
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 391--407.Google ScholarCross Ref
- Dit, B., Guerrouj, L., Poshyvanyk, D., and Antoniol, G. 2011a. Can better identifier splitting techniques help feature location? InProceedings of the 19th IEEE International Conference on Program Comprehension (ICPC'11). 11--20. Google ScholarDigital Library
- Dit, B., Revelle, M., Gethers, M., and Poshyvanyk, D. 2011b. Feature location in source code: A taxonomy and survey. J. Softw. Main. Evolution: Resear. Pract.Google Scholar
- Eaddy, M., Aho, A. V., Antoniol, G., and Guéhéneuc, Y.G. 2008a. Cerberus: Tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. InProceedings of the 16th IEEE International Conference on Program Comprehension (ICPC'08). 53--62. Google ScholarDigital Library
- Eaddy, M., Zimmermann, T., Sherwood, K., Garg, V., Murphy, G., Nagappan, N., and Aho, A.V. 2008b. Do crosscutting concerns cause defects? IEEE Trans. Softw. Engin. 34, 497--515. Google ScholarDigital Library
- Eisenbarth, T., Koschke, R., and Simon, D. 2003. Locating features in source code. IEEE Trans. Softw. Engin. 29, 210--224. Google ScholarDigital Library
- Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of the 6th IEEE Working Conference on Mining Software Repositories (MSR'09). 71--80. Google ScholarDigital Library
- Ferenc, R., Siket, I., and Gyimothy, T. 2004. Extracting facts from open source software. In Proceedings of the 20th IEEE International Conference on Software Maintenance (ICSM'04). IEEE, 60--69. Google ScholarDigital Library
- Fluri, B., Würsch, M., Giger, E., and Gall, H. 2009. Analyzing the co-evolution of comments and source code. Softw. Quality J. 17, 367--394. Google ScholarDigital Library
- Fritz, T., Murphy, G. C., and Hill, E. 2007. Does a programmer's activity indicate knowledge of code? In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM Sigsoft Symposium on the Foundations of Software Engineering. 341--350. Google ScholarDigital Library
- Ganter, B. and Wille, R. 1996. Formal Concept Analysis. Springer.Google Scholar
- Gay, G., Haiduc, S., Marcus, M., and Menzies, T. 2009. On the use of relevance feedback in ir-based concept location. In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09). 351--360.Google Scholar
- Gold, N., Harman, M., Li, Z., and Mahdavi, K. 2006. Allowing overlapping boundaries in source code using a search based approach to concept binding. In Proceedings of the 22nd IEEE International Conference on Software Maintenance (ICSM'06). 310--319. Google ScholarDigital Library
- Grant, S., Cordy, J. R., and Skillicorn, D. B. 2008. Automated concept location using independent component analysis In Proceedings of the 15th Working Conference on Reverse Engineering (WCRE'08). 138--142. Google ScholarDigital Library
- Haiduc, S., Aponte, J., Moreno, L., and Marcus, A. 2010. On The Use Of Automated Text Summarization Techniques For Summarizing Source Code. In Proceedings of the 17th IEEE Working Conference on Reverse Engineering (WCRE'10). Google ScholarDigital Library
- Hayes, J.H., Dekhtyar, A., and Sundaram, S.K. 2006. Advancing Candidate link generation for requirements tracing: The study of methods. IEEE Trans. Softw. Engin. 32, 4--19. Google ScholarDigital Library
- Hill, E., Fry, Z. P., Boyd, H., Sridhara, G., Novikova, Y., Pollock, L., and Vijay-Shanker, K. 2008. AMAP: Automatically Mining abbreviation expansions in programs to enhance software maintenance tools. In Proceedings of the 5th Working Conference on Mining Software Repositories. Google ScholarDigital Library
- Hill, E., Pollock, L., and Vijay-Shanker, K. 2007. Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 14--23. Google ScholarDigital Library
- Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st IEEE/ACM International Conference on Software Engineering (ICSE'09). Google ScholarDigital Library
- Jiang, H., Nguyen, T., Che, I. X., Jaygarl, H., and Chang, C. 2008. Incremental latent semantic indexing for effective, automatic traceability link evolution management. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE'08). Google ScholarDigital Library
- Kersten, M. and Murphy, G. C. 2006. Using Task Context To Improve Programmer Productivity. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 1--11. Google ScholarDigital Library
- Ko, A. J., Myers, B. A., Coblenz, M. J., and Aung, H. H. 2006. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans. Softw. Engin. 32, 971--987. Google ScholarDigital Library
- Kuhn, A. 2009. Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories (MSR'09). Google ScholarDigital Library
- Kuhn, A., Ducasse, S., and Gîrba, T. 2007. Semantic Clustering: Identifying Topics In Source Code. Inf. Softw. Techn. 49, 230--243. Google ScholarDigital Library
- Lienhard, A., Ducasse, S., and Arevalo, G. 2005. Identifying traits with formal concept analysis. In Proceedings of the 20th IEEE/ACM international Conference on Automated Software Engineering (ASE'05). 66--75. Google ScholarDigital Library
- Liu, D., Marcus, A., Poshyvanyk, D., and Rajlich, V. 2007. Feature location via information retrieval based filtering of a single scenario execution trace. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 234--243. Google ScholarDigital Library
- Lo, K. K., Chan, M. K., and Baniassad, E. 2006. Isolating and relating concerns in requirements using latent semantic analysis. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA'06). 383--396 Google ScholarDigital Library
- Lormans, M. and Van Deursen, A. 2006. Can LSI help reconstructing requirements traceability in design and test? In Proceedings of the 10th European Conference on Software Maintenance and Reengineering (CSMR'06). 47--56. Google ScholarDigital Library
- Lukins, S., Kraft, N., and Etzkorn, L. 2008. Source code retrieval for bug location using latent dirichlet allocation. In Proceedings of the 15th Working Conference on Reverse Engineering (WCRE'08). 155--164. Google ScholarDigital Library
- Maletic, J. I., Collard, M. L., and Marcus, A. 2002. Source code files as structured documents. InProceedings of the 10th IEEE International Workshop on Program Comprehension (IWPC'02). 289--292. Google ScholarDigital Library
- Marcus, A. and Maletic, J. I. 2001. Identification Of High-Level Concept Clones In Source Code. In Automated Software Engineering (Ase'01). 107--114. Google ScholarDigital Library
- Marcus, A., Maletic, J. I., and Sergeyev, A. 2005. Recovery of traceability links between software documentation and source code. Int. J. Softw. Engin. Knowl. Engin. 15, 811--836.Google ScholarCross Ref
- Marcus, A., Poshyvanyk, D., and Ferenc, R. 2008. Using the conceptual cohesion of classes for fault prediction in object oriented systems. IEEE Trans. Soft. Engin. 34, 287--300. Google ScholarDigital Library
- Marcus, A., Rajlich, V., Buchta, J., Petrenko, M., and Sergeyev, A. 2005. Static techniques for concept location in object-oriented code. In Proceedings of the 13th IEEE International Workshop on Program Comprehension (IWPC'05). 33--42. Google ScholarDigital Library
- Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE'04). 214--223. Google ScholarDigital Library
- Mens, K. and Tourwe, T. 2005. Delving source code with formal concept analysis. Comput. Lang. Syst. Struct. 31, 183--198. Google ScholarDigital Library
- Poshyvanyk, D., Guéhéneuc, Y. G., Marcus, A., Antoniol, G., and Rajlich, V. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Engin. 33, 420--432. Google ScholarDigital Library
- Poshyvanyk, D., Marcus, A., and Dong, Y. 2006a. JIRiSS---An Eclipse plug-in for source code exploration. In Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC'06). 252--255. Google ScholarDigital Library
- Poshyvanyk, D., Marcus, A., Dong, Y., and Sergeyev, A. 2005. IRiSS---A source code exploration tool. In Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM'05). 69--72.Google Scholar
- Poshyvanyk, D., Marcus, A., Ferenc, R., and Gyimóthy, T. 2009. Using information retrieval based coupling measures for impact analysis. Empirical Softw. Engin. 14, 5--32. Google ScholarDigital Library
- Poshyvanyk, D. and Marcus, D. 2007. Combining formal concept analysis with information retrieval for concept location in source code. In Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC'07). 37--48. Google ScholarDigital Library
- Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., and Liu, D. 2006b. Source code exploration with Google In Proceedings of the 22nd IEEE International Conference on Software Maintenance (ICSM'06). 334--338. Google ScholarDigital Library
- Rajlich, V. and Gosavi, P. 2004. Incremental change in object-oriented programming. IEEE Softw. 2--9. Google ScholarDigital Library
- Rastkar, S., Murphy, G., and Murray, G. 2010. Summarizing software artifacts: A case study of bug reports. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 505--514. Google ScholarDigital Library
- Ratiu, D. and Deissenboeck, F. 2007. From reality to programs and (not quite) back again. In Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC'07). 91--102. Google ScholarDigital Library
- Ren, X., Shah, F., Tip, F., Ryder, B. G., and Chesley, O. 2004. Chianti: A tool for change impact analysis of Java programs. In Proceedings of the 19th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'04). 432--448. Google ScholarDigital Library
- Revelle, M., Dit, B., and Poshyvanyk, D. 2010. Using data fusion and web mining to support feature location in software. InProceedings of the 18th Ieee International Conference on Program Comprehension (ICPC'10). 14--23. Google ScholarDigital Library
- Revelle, M. and Poshyvanyk, D. 2009. An exploratory study on assessing feature location techniques. InProceedings of the 17th IEEE International Conference on Program Comprehension (ICPC'09). 218--222.Google Scholar
- Robillard, M. 2005. Automatic generation of suggestions for program investigation. In Proceedings of the Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering. 11--20 Google ScholarDigital Library
- Robillard, M. P. 2008. Topology Analysis Of Software Dependencies. ACM Trans. Softw. Engin. Methodol. 17. Google ScholarDigital Library
- Robillard, M. P., Coelho, W., and Murphy, G.C. 2004. How effective developers investigate source code: An exploratory study. IEEE Trans. Softw. Engin. 30, 889--903. Google ScholarDigital Library
- Salton, G. and McGill, M. 1983. Introduction To Modern Information Retrieval. McGraw-Hill. Google ScholarDigital Library
- Savage, T., Revelle, M., and Poshyvanyk, D. 2010. Flat3: Feature location and textual tracing tool. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 255--258. Google ScholarDigital Library
- Shepherd, D., Fry, Z., Gibson, E., Pollock, L., and Vijay-Shanker, K. 2007. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of the 6th International Conference on Aspect Oriented Software Development (AOSD'07). 212--224. Google ScholarDigital Library
- Sillito, J., Murphy, G. C., and De Volder, K. 2008. Asking and answering questions during a programming change task. IEEE Trans. Softw. Engin. 34, 434--451. Google ScholarDigital Library
- Simmons, S., Edwards, D., Wilde, N., Homan, J., and Groble, M. 2006. Industrial tools for the feature location problem: An Exploratory Study. J. Softw. Maint. Resear. Pract. 18, 457--474. Google ScholarDigital Library
- Snelting, G. 2005. Concept lattices in software analysis. In Formal Concept Analysis. 272--287. Google ScholarDigital Library
- Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., and Vijay-Shanker, K. 2010. Towards Automatically Generating Comments For Java Methods. In Proceedings of the 25th Ieee/Acm International Conference on Automated Software Engineering (Ase'10). Google ScholarDigital Library
- Starke, J., Luce, C., and Sillito, J. 2009. Searching and skimming: An exploratory study. In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09).Google Scholar
- Tairas, R. and Gray, J. 2009. An information retrieval process to aid in the analysis of code clones. Empirical Softw. Engin. 14, 33--56. Google ScholarDigital Library
- Tonella, P. 2003. Using a concept lattice of decomposition slices for program understanding and impact analysis. IEEE Trans. Softw. Engin. 29, 495--509. Google ScholarDigital Library
- Tonella, P. and Ceccato, M. 2004. Aspect mining through the formal concept analysis of execution traces. InProceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE'04), 112--121 Google ScholarDigital Library
- Van Geet, J. and Demeyer, S. 2009. Feature location in Cobol mainframe systems: An Experience Report In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09). 361--370.Google Scholar
- Wilde, N., Buckellew, M., Page, H., Rajlich, V., and Pounds, L. 2003. A comparison of methods for locating features in legacy software. J. Syst. Softw. 65, 105--114. Google ScholarDigital Library
- Wilde, N., Gomez, J. A., Gust, T., and Strasburg, D. 1992. Locating user functionality in old code. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM'92). 200--205.Google Scholar
- Yin, R. K. 2003. Applications Of Case Study Research. Sage Publications, Inc.Google Scholar
- Zhao, W., Zhang, L., Liu, Y., Sun, J., and Yang, F. 2006. SNIAFL: Towards a static non-interactive approach to feature location. ACM Trans. Softw. Engin. Methodol. 15, 195--226. Google ScholarDigital Library
Index Terms
- Concept location using formal concept analysis and information retrieval
Recommendations
Extending Bug Localization Using Information Retrieval and Code Clone Location Techniques
WCRE '11: Proceedings of the 2011 18th Working Conference on Reverse EngineeringBug localization involves the use of information about a bug to assist in locating sections of code that must be modified to fix the bug. Such a task can involve a considerable amount of time and effort on the part of software developers and/or ...
On the Effect of the Query in IR-based Concept Location
ICPC '11: Proceedings of the 2011 IEEE 19th International Conference on Program ComprehensionConcept location is an essential task during software maintenance and in particular program comprehension activities. One of the approaches to this task is the based on leveraging the lexical information found in the source code by means of Information ...
Using pattern structures to support information retrieval with formal concept analysis
FCA4AI'13: Proceedings of the 2nd International Conference on What can FCA do for Artificial Intelligence? - Volume 1058In this paper we introduce a novel approach to information retrieval (IR) based on Formal Concept Analysis (FCA). The use of concept lattices to support the task of document retrieval in IR has proven effective since they allow querying in the space of ...
Comments