skip to main content
research-article

Portfolio: Searching for relevant functions and their usages in millions of lines of code

Published:22 October 2013Publication History
Skip Abstract Section

Abstract

Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or ordinary code fragments. Therefore, developers require support in finding relevant functions and determining how these functions are used. Unfortunately, existing code search engines do not provide enough of this support to developers, thus reducing the effectiveness of code reuse. We provide this support to programmers in a code search system called Portfolio that retrieves and visualizes relevant functions and their usages. We have built Portfolio using a combination of models that address surfing behavior of programmers and sharing related concepts among functions. We conducted two experiments: first, an experiment with 49 C/C++ programmers to compare Portfolio to Google Code Search and Koders using a standard methodology for evaluating information-retrieval-based engines; and second, an experiment with 19 Java programmers to compare Portfolio to Koders. The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders. We also show that by using PageRank, Portfolio is able to rank returned relevant functions more efficiently.

References

  1. Anquetil, N. and Lethbridge, T. 1998. Assessing the relevance of identifier names in a legacy software system. In Proceedings of the Annual IBM Centers for Advanced Studies Conference (CASCON'98). 213--222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bajracharya, S. and Lopes, C. 2009. Mining search topics from a code search engine usage log. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bajracharya, S., Ossher, J., and Lopes, C. V. 2010. Leveraging usage similarity for effective retrieval of examples in code repositories. In Proceedings of the 18th International Symposium on the Foundations of Software Engineering (FSE'10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on World Wide Web. 107--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Chatterjee, S., Juvejar, S., and Sen, K. 2009. SNIFF: A search engine for java using free-form queries. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering (FASE'09). 385--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Collins, A. and Loftus, E. 1975. A spreading-activation theory of semantic processing. Psychol. Rev. 82, 21.Google ScholarGoogle ScholarCross RefCross Ref
  7. Corbi, T. A. 1989. Program understanding: Challenge for the 1990s. IBM Syst. J. 28, 294--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Crestani, F. 1997. Application of spreading activation techniques in information retrieval. Artif. Intell. 11, 29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cubranic, D., Murphy, G. C., Singer, J., and Booth, K. S. 2005. Hipikat: A project memory for software development. IEEE Trans. Softw. Engin. 31, 446--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Davison, J. W., Mancl, D. M., and Opdyke, W. F. 2000. Understanding and addressing the essential costs of evolving systems. Bell Labs Tech. J. 5, 44--54.Google ScholarGoogle ScholarCross RefCross Ref
  11. De Alwis, B. and Murphy, G. C. 2008. Answering conceptual queries with ferret. In Proceedings of the 30th International Conference on Software Engineering (ICSE'08). 21--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dit, B., Revelle, M., Gethers, M., and Poshyvanyk, D. 2011. Feature location in source code: A taxonomy and survey. J. Softw. Maint. Evolut. Res. Pract. 25, 1, 53--95.Google ScholarGoogle ScholarCross RefCross Ref
  13. Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of the 6th IEEE Working Conference on Mining Software Repositories (MSR'09). 71--80. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fritz, T. and Murphy, G. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 175--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Granka, L., Joachims, T., and Gay, G. 2004. Eye-tracking analysis of user behavior in www search. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Grechanik, M., Fu, C., Xie, Q., Mcmillan, C., Poshyvanyk, D., and Cumby, C. 2010. A search engine for finding highly relevant applications. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 475--484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Henninger, S. 1996. Supporting the construction and evolution of component repositories. In Proceedings of the 18th IEEE/ACM International Conference on Software Engineering (ICSE'96). 279--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hill, E., Pollock, L., and Vijay-Shanker, K. 2007. Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 14--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st IEEE/ACM International Conference on Software Engineering (ICSE'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hill, E., Pollock, L., and Vijay-Shanker, K. 2011. Investigating how to effectively combine static concern location techniques. In Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. ACM Press, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th International Conference on Software Engineering (ICSE'05). 117--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Holmes, R., Walker, R. J., and Murphy, G. C. 2006. Approximate structural context matching: An approach to recommend relevant examples. IEEE Trans. Softw. Engin. 32, 952--970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Hummel, O., Janjic, W., and Atkinson, C. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Softw. 25, 5, 45--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Inoue, K., Yokomori, R., Fujiwara, H., Yamamoto, T., Matsushita, M., and Kusumoto, S. 2003. Component rank: Relative significance rank for software component search. In Proceedings of the 25th IEEE International Conference on Software Engineering (ICSE'03). 14--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Inoue, K., Yokomori, R., Yamamoto, T., Matsushita, M., and Kusumoto, S. 2005. Ranking significance of software components based on use relations. IEEE Trans. Softw. Engin. 31, 213--225. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Krueger, C. W. 1992. Software reuse. ACM Comput. Surv. 24, 131--183. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Landi, W. 1992. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1, 323--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Langville, A. and Meyer, C. 2006. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lawrance, J., Bellamy, R., and Burnett, M. 2007. Scents in programs: Does information foraging theory apply to program maintenance? In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'07). 15--22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lawrance, J., Bogart, C., Burnett, M., Bellamy, R., Rector, K., and Fleming, S. 2010a. How programmers debug, revisited: An information foraging theory perspective. IEEE Trans. Softw. Engin. 39, 2, 197--215. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Lawrance, J., Burnett, M., Bellamy, R., Bogart, C., and Swart, C. 2010b. Reactive information foraging for evolving goals. In Proceedings of the 28th International Conference on Human Factors in Computing Systems. ACM Press, New York, 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Little, G. and Miller, R. C. 2007. Keyword programming in java. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 84--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Little, G. and Miller, R. C. 2008. Keyword programming in java. J. Autom. Softw. Engin. 16, 37--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. 2005. Jungloid mining: Helping to navigate the api jungle. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05). 48--61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE'04). 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., and Xie, Q. 2011a. Exemplar: A source code search engine for finding highly relevant applications. IEEE Trans. Softw. Engin. 38, 5, 1069--1087. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., and Fu, C. 2011b. Portfolio: Finding relevant functions and their usages. In Proceedings of the 33rd IEEE/ACM International Conference on Software Engineering (ICSE'11). 111--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Milanova, A., Rountev, A., and Ryder, B. 2004. Precise call graphs for c programs with function pointers. Autom. Softw. Engin. 11, 1, 7--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Murphy, G., Notkin, D., Griswold, W., and Lan, E. 1998. An empirical study of static call graph extractors. ACM Trans. Softw. Engin. Method. 7, 158--191. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Poshyvanyk, D., Gethers, M., and Marcus, A. 2012. Concept location using formal concept analysis and information retrieval. ACM Trans. Softw. Engin. Method. 21, 4, Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Poshyvanyk, D., Guéhéneuc, Y. G., Marcus, A., Antoniol, G., and Rajlich, V. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Engin. 33, 420--432. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Puppin, D. and Silvestri, F. 2006. The social network of java classes. In Proceedings of the ACM Symposium on Applied Computing (SAC'06). 1409--1413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Reiss, S. 2009. Semantics-based code search. In Proceedings of the 31st IEEE/ACM International Conference on Software Engineering (ICSE'09). 243--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Revelle, M., Dit, B., and Poshyvanyk, D. 2010. Using data fusion and web mining to support feature location in software. In Proceedings of the 18th IEEE International Conference on Program Comprehension (ICPC'10). 14--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Robillard, M. 2005. Automatic generation of suggestions for program investigation. In Proceedings of the Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering. 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Robillard, M. P. 2008. Topology analysis of software dependencies. ACM Trans. Softw. Engin. Methodol. 17, 1--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Robillard, M. P., Coelho, W., and Murphy, G. C. 2004. How effective developers investigate source code: An exploratory study. IEEE Trans. Softw. Engin. 30, 889--903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Robillard, M. P. and Murphy, G. C. 2007. Representing concerns in source code. ACM Trans. Softw. Engin. Methodol. 16, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Robillard, M. P., Shepherd, D., Hill, E., Vijay-Shanker, K., and Pollock, L. 2007. An Empirical Study of the Concept Assignment Problem. McGill University, Montreal, Quebec.Google ScholarGoogle Scholar
  51. Sahavechaphan, N. and Claypool, K. 2006. XSnippet: Mining for sample code. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'06). 413--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Saul, M. Z., Filkov, V., Devanbu, P., and Bird, C. 2007. Recommending random walks. In Proceedings of the 11th European Software Engineering Conference Held Jointly with 15th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE'07). 15--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Sillito, J., Murphy, G. C., and De Volder, K. 2008. Asking and answering questions during a programming change task. IEEE Trans. Softw. Engin. 34, 434--451. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Sim, S., Umarji, M., Ratanotayanon, S., and Lopes, C. 2011. How well do search engines support code retrieval on the web? ACM Trans. Softw. Engin. Methodol. 21. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Sim, S. E., Clarke, C. L. A., and Holt, R. C. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th International Workshop on Program Comprehension (IWPC'98). 180--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Sirkin, R. 2006. Statistics for the Social Sciences. Sage.Google ScholarGoogle Scholar
  57. Smucker, M., Allan, J., and Carterette, B. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Starke, J., Luce, C., and Sillito, J. 2009. Searching and skimming: An exploratory study. In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09).Google ScholarGoogle Scholar
  59. Stylos, J. and Myers, B. A. 2006. Mica: A web-search tool for finding api components and examples. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing. 195--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Tansalarak, N. and Claypool, K. 2005. Finding a needle in the haystack: a technique for ranking matches between components. In Proceedings of the International Symposium on Component-Based Software Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. Witten, I., Moffat, A., and Bell, T. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Fransisco. Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Ye, Y. and Fischer, G. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE'02). 513--523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Ye, Y. and Fischer, G. 2005. Reuse-conducive development environments. J. Autom. Softw. Engin. 12, 199--235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Zaidman, A. and Demeyer, S. 2008. Automatic identification of key classes in a software system using webmining techniques. J. Softw. Maint. Evolut. Res. Pract. 20, 387--417. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Portfolio: Searching for relevant functions and their usages in millions of lines of code

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Software Engineering and Methodology
        ACM Transactions on Software Engineering and Methodology  Volume 22, Issue 4
        Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
        October 2013
        387 pages
        ISSN:1049-331X
        EISSN:1557-7392
        DOI:10.1145/2522920
        Issue’s Table of Contents

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 October 2013
        • Accepted: 1 July 2012
        • Revised: 1 May 2012
        • Received: 1 July 2011
        Published in tosem Volume 22, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader