skip to main content
research-article

Concept location using formal concept analysis and information retrieval

Published:07 February 2013Publication History
Skip Abstract Section

Abstract

The article addresses the problem of concept location in source code by proposing an approach that combines Formal Concept Analysis and Information Retrieval. In the proposed approach, Latent Semantic Indexing, an advanced Information Retrieval approach, is used to map textual descriptions of software features or bug reports to relevant parts of the source code, presented as a ranked list of source code elements. Given the ranked list, the approach selects the most relevant attributes from the best ranked documents, clusters the results, and presents them as a concept lattice, generated using Formal Concept Analysis.

The approach is evaluated through a large case study on concept location in the source code on six open-source systems, using several hundred features and bugs. The empirical study focuses on the analysis of various configurations of the generated concept lattices and the results indicate that our approach is effective in organizing different concepts and their relationships present in the subset of the search results. In consequence, the proposed concept location method has been shown to outperform a standalone Information Retrieval based concept location technique by reducing the number of irrelevant search results across all the systems and lattice configurations evaluated, potentially reducing the programmers' effort during software maintenance tasks involving concept location.

References

  1. Antoniol, G. and Guéhéneuc, Y. G. 2006. Feature identification: An epidemiological metaphor. IEEE Trans. Softw. Engin. 32, 627--641. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bacchelli, A., D'ambros, M., and Lanza, M. 2010a. Extracting source code from e-mails. In Proceedings of the 18th IEEE International Conference on Program Comprehension (ICPC'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bacchelli, A., D'ambros, M., Lanza, M., and Robbes, R. 2009. Benchmarking lightweight techniques to link e-mails and source code. In Proceedings of the 16th Working Conference on Reverse Engineering (WCRE'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bacchelli, A., Lanza, M., and Robbes, R. 2010b. Linking e-mails and source code artifacts. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern Information Retrieval. Addison-Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Biggerstaff, T. J., Mitbander, B. G., and Webster, D. E. 1994. The concept assignment problem in program understanding. In Proceedings of the 15th IEEE/ACM International Conference on Software Engineering (ICSE'94). 482--498. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Carpineto, C., Osinski, S., Romano, G., and Weiss, D. 2009. A survey of web clustering engines. ACM Comput. Surv. 41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chen, K. and Rajlich, V. 2000. Case study of feature location using dependence graph. In Proceedings of the 8th IEEE International Workshop on Program Comprehension (IWPC'00). 241--249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cigarran, J., Peñas, A., Gonzalo, J., and Verdejo, F. 2005. Evaluating hierarchical clustering of search results. In Proceedings of the 12th International Conference on String Processing and Information Retrieval (SPIRE'05). 49--54. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cigarrán, J.M., Gonzalo, J., Peñas, A., and Verdejo, F. 2004. Browsing search results via formal concept analysis: Automatic selection of attributes. In Proceedings of the 2nd International Conference on Formal Concept Analysis (ICFCA'04). 74--87.Google ScholarGoogle Scholar
  11. Cleary, B., Exton, C., Buckley, J., and English, M. 2009. An empirical analysis of information retrieval based concept location techniques in software comprehension. Empir. Softw. Engin. 14, 93--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cubranic, D., Murphy, G.C., Singer, J., and Booth, K.S. 2005. Hipikat: A project memory for software development. IEEE Trans. Softw. Engin. 31, 446--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. de Alwis, B., Murphy, G. C., and Robillard, M. 2007. A comparative study of three program exploration tools. In Proceedings of the 15th IEEE International Conference on Program Comprehension. 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. De Lucia, A., Fasano, F., Oliveto, R., and Tortora, G. 2007. Recovering traceability links in software artefact management systems. ACM Trans. Softw. Engin. Methodol. 16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. De Lucia, A., Oliveto, R., and Vorraro, L. 2008. Using structural and semantic metrics to improve class cohesion. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM'08). 27--36.Google ScholarGoogle Scholar
  16. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. 1990. Indexing by latent semantic analysis. J. Amer. Soc. Inf. Sci. 41, 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  17. Dit, B., Guerrouj, L., Poshyvanyk, D., and Antoniol, G. 2011a. Can better identifier splitting techniques help feature location? InProceedings of the 19th IEEE International Conference on Program Comprehension (ICPC'11). 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dit, B., Revelle, M., Gethers, M., and Poshyvanyk, D. 2011b. Feature location in source code: A taxonomy and survey. J. Softw. Main. Evolution: Resear. Pract.Google ScholarGoogle Scholar
  19. Eaddy, M., Aho, A. V., Antoniol, G., and Guéhéneuc, Y.G. 2008a. Cerberus: Tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. InProceedings of the 16th IEEE International Conference on Program Comprehension (ICPC'08). 53--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Eaddy, M., Zimmermann, T., Sherwood, K., Garg, V., Murphy, G., Nagappan, N., and Aho, A.V. 2008b. Do crosscutting concerns cause defects? IEEE Trans. Softw. Engin. 34, 497--515. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Eisenbarth, T., Koschke, R., and Simon, D. 2003. Locating features in source code. IEEE Trans. Softw. Engin. 29, 210--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of the 6th IEEE Working Conference on Mining Software Repositories (MSR'09). 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ferenc, R., Siket, I., and Gyimothy, T. 2004. Extracting facts from open source software. In Proceedings of the 20th IEEE International Conference on Software Maintenance (ICSM'04). IEEE, 60--69. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Fluri, B., Würsch, M., Giger, E., and Gall, H. 2009. Analyzing the co-evolution of comments and source code. Softw. Quality J. 17, 367--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Fritz, T., Murphy, G. C., and Hill, E. 2007. Does a programmer's activity indicate knowledge of code? In Proceedings of the 6th Joint Meeting of the European Software Engineering Conference and the ACM Sigsoft Symposium on the Foundations of Software Engineering. 341--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ganter, B. and Wille, R. 1996. Formal Concept Analysis. Springer.Google ScholarGoogle Scholar
  27. Gay, G., Haiduc, S., Marcus, M., and Menzies, T. 2009. On the use of relevance feedback in ir-based concept location. In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09). 351--360.Google ScholarGoogle Scholar
  28. Gold, N., Harman, M., Li, Z., and Mahdavi, K. 2006. Allowing overlapping boundaries in source code using a search based approach to concept binding. In Proceedings of the 22nd IEEE International Conference on Software Maintenance (ICSM'06). 310--319. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Grant, S., Cordy, J. R., and Skillicorn, D. B. 2008. Automated concept location using independent component analysis In Proceedings of the 15th Working Conference on Reverse Engineering (WCRE'08). 138--142. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Haiduc, S., Aponte, J., Moreno, L., and Marcus, A. 2010. On The Use Of Automated Text Summarization Techniques For Summarizing Source Code. In Proceedings of the 17th IEEE Working Conference on Reverse Engineering (WCRE'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Hayes, J.H., Dekhtyar, A., and Sundaram, S.K. 2006. Advancing Candidate link generation for requirements tracing: The study of methods. IEEE Trans. Softw. Engin. 32, 4--19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Hill, E., Fry, Z. P., Boyd, H., Sridhara, G., Novikova, Y., Pollock, L., and Vijay-Shanker, K. 2008. AMAP: Automatically Mining abbreviation expansions in programs to enhance software maintenance tools. In Proceedings of the 5th Working Conference on Mining Software Repositories. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Hill, E., Pollock, L., and Vijay-Shanker, K. 2007. Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 14--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st IEEE/ACM International Conference on Software Engineering (ICSE'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jiang, H., Nguyen, T., Che, I. X., Jaygarl, H., and Chang, C. 2008. Incremental latent semantic indexing for effective, automatic traceability link evolution management. In Proceedings of the 23rd IEEE/ACM International Conference on Automated Software Engineering (ASE'08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Kersten, M. and Murphy, G. C. 2006. Using Task Context To Improve Programmer Productivity. In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Ko, A. J., Myers, B. A., Coblenz, M. J., and Aung, H. H. 2006. An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans. Softw. Engin. 32, 971--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Kuhn, A. 2009. Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories (MSR'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Kuhn, A., Ducasse, S., and Gîrba, T. 2007. Semantic Clustering: Identifying Topics In Source Code. Inf. Softw. Techn. 49, 230--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Lienhard, A., Ducasse, S., and Arevalo, G. 2005. Identifying traits with formal concept analysis. In Proceedings of the 20th IEEE/ACM international Conference on Automated Software Engineering (ASE'05). 66--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Liu, D., Marcus, A., Poshyvanyk, D., and Rajlich, V. 2007. Feature location via information retrieval based filtering of a single scenario execution trace. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 234--243. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Lo, K. K., Chan, M. K., and Baniassad, E. 2006. Isolating and relating concerns in requirements using latent semantic analysis. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages and Applications (OOPSLA'06). 383--396 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Lormans, M. and Van Deursen, A. 2006. Can LSI help reconstructing requirements traceability in design and test? In Proceedings of the 10th European Conference on Software Maintenance and Reengineering (CSMR'06). 47--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Lukins, S., Kraft, N., and Etzkorn, L. 2008. Source code retrieval for bug location using latent dirichlet allocation. In Proceedings of the 15th Working Conference on Reverse Engineering (WCRE'08). 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Maletic, J. I., Collard, M. L., and Marcus, A. 2002. Source code files as structured documents. InProceedings of the 10th IEEE International Workshop on Program Comprehension (IWPC'02). 289--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Marcus, A. and Maletic, J. I. 2001. Identification Of High-Level Concept Clones In Source Code. In Automated Software Engineering (Ase'01). 107--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Marcus, A., Maletic, J. I., and Sergeyev, A. 2005. Recovery of traceability links between software documentation and source code. Int. J. Softw. Engin. Knowl. Engin. 15, 811--836.Google ScholarGoogle ScholarCross RefCross Ref
  48. Marcus, A., Poshyvanyk, D., and Ferenc, R. 2008. Using the conceptual cohesion of classes for fault prediction in object oriented systems. IEEE Trans. Soft. Engin. 34, 287--300. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Marcus, A., Rajlich, V., Buchta, J., Petrenko, M., and Sergeyev, A. 2005. Static techniques for concept location in object-oriented code. In Proceedings of the 13th IEEE International Workshop on Program Comprehension (IWPC'05). 33--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE'04). 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Mens, K. and Tourwe, T. 2005. Delving source code with formal concept analysis. Comput. Lang. Syst. Struct. 31, 183--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Poshyvanyk, D., Guéhéneuc, Y. G., Marcus, A., Antoniol, G., and Rajlich, V. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Engin. 33, 420--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Poshyvanyk, D., Marcus, A., and Dong, Y. 2006a. JIRiSS---An Eclipse plug-in for source code exploration. In Proceedings of the 14th IEEE International Conference on Program Comprehension (ICPC'06). 252--255. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Poshyvanyk, D., Marcus, A., Dong, Y., and Sergeyev, A. 2005. IRiSS---A source code exploration tool. In Proceedings of the 21st IEEE International Conference on Software Maintenance (ICSM'05). 69--72.Google ScholarGoogle Scholar
  55. Poshyvanyk, D., Marcus, A., Ferenc, R., and Gyimóthy, T. 2009. Using information retrieval based coupling measures for impact analysis. Empirical Softw. Engin. 14, 5--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Poshyvanyk, D. and Marcus, D. 2007. Combining formal concept analysis with information retrieval for concept location in source code. In Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC'07). 37--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Poshyvanyk, D., Petrenko, M., Marcus, A., Xie, X., and Liu, D. 2006b. Source code exploration with Google In Proceedings of the 22nd IEEE International Conference on Software Maintenance (ICSM'06). 334--338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Rajlich, V. and Gosavi, P. 2004. Incremental change in object-oriented programming. IEEE Softw. 2--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Rastkar, S., Murphy, G., and Murray, G. 2010. Summarizing software artifacts: A case study of bug reports. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 505--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Ratiu, D. and Deissenboeck, F. 2007. From reality to programs and (not quite) back again. In Proceedings of the 15th IEEE International Conference on Program Comprehension (ICPC'07). 91--102. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Ren, X., Shah, F., Tip, F., Ryder, B. G., and Chesley, O. 2004. Chianti: A tool for change impact analysis of Java programs. In Proceedings of the 19th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'04). 432--448. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Revelle, M., Dit, B., and Poshyvanyk, D. 2010. Using data fusion and web mining to support feature location in software. InProceedings of the 18th Ieee International Conference on Program Comprehension (ICPC'10). 14--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Revelle, M. and Poshyvanyk, D. 2009. An exploratory study on assessing feature location techniques. InProceedings of the 17th IEEE International Conference on Program Comprehension (ICPC'09). 218--222.Google ScholarGoogle Scholar
  64. Robillard, M. 2005. Automatic generation of suggestions for program investigation. In Proceedings of the Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering. 11--20 Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Robillard, M. P. 2008. Topology Analysis Of Software Dependencies. ACM Trans. Softw. Engin. Methodol. 17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Robillard, M. P., Coelho, W., and Murphy, G.C. 2004. How effective developers investigate source code: An exploratory study. IEEE Trans. Softw. Engin. 30, 889--903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Salton, G. and McGill, M. 1983. Introduction To Modern Information Retrieval. McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Savage, T., Revelle, M., and Poshyvanyk, D. 2010. Flat3: Feature location and textual tracing tool. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 255--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Shepherd, D., Fry, Z., Gibson, E., Pollock, L., and Vijay-Shanker, K. 2007. Using natural language program analysis to locate and understand action-oriented concerns. In Proceedings of the 6th International Conference on Aspect Oriented Software Development (AOSD'07). 212--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. Sillito, J., Murphy, G. C., and De Volder, K. 2008. Asking and answering questions during a programming change task. IEEE Trans. Softw. Engin. 34, 434--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Simmons, S., Edwards, D., Wilde, N., Homan, J., and Groble, M. 2006. Industrial tools for the feature location problem: An Exploratory Study. J. Softw. Maint. Resear. Pract. 18, 457--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Snelting, G. 2005. Concept lattices in software analysis. In Formal Concept Analysis. 272--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  73. Sridhara, G., Hill, E., Muppaneni, D., Pollock, L., and Vijay-Shanker, K. 2010. Towards Automatically Generating Comments For Java Methods. In Proceedings of the 25th Ieee/Acm International Conference on Automated Software Engineering (Ase'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. Starke, J., Luce, C., and Sillito, J. 2009. Searching and skimming: An exploratory study. In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09).Google ScholarGoogle Scholar
  75. Tairas, R. and Gray, J. 2009. An information retrieval process to aid in the analysis of code clones. Empirical Softw. Engin. 14, 33--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  76. Tonella, P. 2003. Using a concept lattice of decomposition slices for program understanding and impact analysis. IEEE Trans. Softw. Engin. 29, 495--509. Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Tonella, P. and Ceccato, M. 2004. Aspect mining through the formal concept analysis of execution traces. InProceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE'04), 112--121 Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Van Geet, J. and Demeyer, S. 2009. Feature location in Cobol mainframe systems: An Experience Report In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09). 361--370.Google ScholarGoogle Scholar
  79. Wilde, N., Buckellew, M., Page, H., Rajlich, V., and Pounds, L. 2003. A comparison of methods for locating features in legacy software. J. Syst. Softw. 65, 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. Wilde, N., Gomez, J. A., Gust, T., and Strasburg, D. 1992. Locating user functionality in old code. In Proceedings of the IEEE International Conference on Software Maintenance (ICSM'92). 200--205.Google ScholarGoogle Scholar
  81. Yin, R. K. 2003. Applications Of Case Study Research. Sage Publications, Inc.Google ScholarGoogle Scholar
  82. Zhao, W., Zhang, L., Liu, Y., Sun, J., and Yang, F. 2006. SNIAFL: Towards a static non-interactive approach to feature location. ACM Trans. Softw. Engin. Methodol. 15, 195--226. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Concept location using formal concept analysis and information retrieval

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 21, Issue 4
        November 2012
        197 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/2377656
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 February 2013
        • Accepted: 1 May 2011
        • Revised: 1 January 2011
        • Received: 1 December 2009
        Published in tosem Volume 21, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader