Abstract
Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or ordinary code fragments. Therefore, developers require support in finding relevant functions and determining how these functions are used. Unfortunately, existing code search engines do not provide enough of this support to developers, thus reducing the effectiveness of code reuse. We provide this support to programmers in a code search system called Portfolio that retrieves and visualizes relevant functions and their usages. We have built Portfolio using a combination of models that address surfing behavior of programmers and sharing related concepts among functions. We conducted two experiments: first, an experiment with 49 C/C++ programmers to compare Portfolio to Google Code Search and Koders using a standard methodology for evaluating information-retrieval-based engines; and second, an experiment with 19 Java programmers to compare Portfolio to Koders. The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders. We also show that by using PageRank, Portfolio is able to rank returned relevant functions more efficiently.
- Anquetil, N. and Lethbridge, T. 1998. Assessing the relevance of identifier names in a legacy software system. In Proceedings of the Annual IBM Centers for Advanced Studies Conference (CASCON'98). 213--222. Google ScholarDigital Library
- Bajracharya, S. and Lopes, C. 2009. Mining search topics from a code search engine usage log. In Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. IEEE Computer Society. Google ScholarDigital Library
- Bajracharya, S., Ossher, J., and Lopes, C. V. 2010. Leveraging usage similarity for effective retrieval of examples in code repositories. In Proceedings of the 18th International Symposium on the Foundations of Software Engineering (FSE'10). Google ScholarDigital Library
- Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International Conference on World Wide Web. 107--117. Google ScholarDigital Library
- Chatterjee, S., Juvejar, S., and Sen, K. 2009. SNIFF: A search engine for java using free-form queries. In Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering (FASE'09). 385--400. Google ScholarDigital Library
- Collins, A. and Loftus, E. 1975. A spreading-activation theory of semantic processing. Psychol. Rev. 82, 21.Google ScholarCross Ref
- Corbi, T. A. 1989. Program understanding: Challenge for the 1990s. IBM Syst. J. 28, 294--306. Google ScholarDigital Library
- Crestani, F. 1997. Application of spreading activation techniques in information retrieval. Artif. Intell. 11, 29. Google ScholarDigital Library
- Cubranic, D., Murphy, G. C., Singer, J., and Booth, K. S. 2005. Hipikat: A project memory for software development. IEEE Trans. Softw. Engin. 31, 446--465. Google ScholarDigital Library
- Davison, J. W., Mancl, D. M., and Opdyke, W. F. 2000. Understanding and addressing the essential costs of evolving systems. Bell Labs Tech. J. 5, 44--54.Google ScholarCross Ref
- De Alwis, B. and Murphy, G. C. 2008. Answering conceptual queries with ferret. In Proceedings of the 30th International Conference on Software Engineering (ICSE'08). 21--30. Google ScholarDigital Library
- Dit, B., Revelle, M., Gethers, M., and Poshyvanyk, D. 2011. Feature location in source code: A taxonomy and survey. J. Softw. Maint. Evolut. Res. Pract. 25, 1, 53--95.Google ScholarCross Ref
- Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of the 6th IEEE Working Conference on Mining Software Repositories (MSR'09). 71--80. Google ScholarDigital Library
- Fritz, T. and Murphy, G. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 175--184. Google ScholarDigital Library
- Granka, L., Joachims, T., and Gay, G. 2004. Eye-tracking analysis of user behavior in www search. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York. Google ScholarDigital Library
- Grechanik, M., Fu, C., Xie, Q., Mcmillan, C., Poshyvanyk, D., and Cumby, C. 2010. A search engine for finding highly relevant applications. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 475--484. Google ScholarDigital Library
- Henninger, S. 1996. Supporting the construction and evolution of component repositories. In Proceedings of the 18th IEEE/ACM International Conference on Software Engineering (ICSE'96). 279--288. Google ScholarDigital Library
- Hill, E., Pollock, L., and Vijay-Shanker, K. 2007. Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 14--23. Google ScholarDigital Library
- Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31st IEEE/ACM International Conference on Software Engineering (ICSE'09). Google ScholarDigital Library
- Hill, E., Pollock, L., and Vijay-Shanker, K. 2011. Investigating how to effectively combine static concern location techniques. In Proceedings of the 3rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. ACM Press, New York. Google ScholarDigital Library
- Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the 27th International Conference on Software Engineering (ICSE'05). 117--125. Google ScholarDigital Library
- Holmes, R., Walker, R. J., and Murphy, G. C. 2006. Approximate structural context matching: An approach to recommend relevant examples. IEEE Trans. Softw. Engin. 32, 952--970. Google ScholarDigital Library
- Hummel, O., Janjic, W., and Atkinson, C. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Softw. 25, 5, 45--52. Google ScholarDigital Library
- Inoue, K., Yokomori, R., Fujiwara, H., Yamamoto, T., Matsushita, M., and Kusumoto, S. 2003. Component rank: Relative significance rank for software component search. In Proceedings of the 25th IEEE International Conference on Software Engineering (ICSE'03). 14--24. Google ScholarDigital Library
- Inoue, K., Yokomori, R., Yamamoto, T., Matsushita, M., and Kusumoto, S. 2005. Ranking significance of software components based on use relations. IEEE Trans. Softw. Engin. 31, 213--225. Google ScholarDigital Library
- Krueger, C. W. 1992. Software reuse. ACM Comput. Surv. 24, 131--183. Google ScholarDigital Library
- Landi, W. 1992. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1, 323--337. Google ScholarDigital Library
- Langville, A. and Meyer, C. 2006. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press. Google ScholarDigital Library
- Lawrance, J., Bellamy, R., and Burnett, M. 2007. Scents in programs: Does information foraging theory apply to program maintenance? In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'07). 15--22. Google ScholarDigital Library
- Lawrance, J., Bogart, C., Burnett, M., Bellamy, R., Rector, K., and Fleming, S. 2010a. How programmers debug, revisited: An information foraging theory perspective. IEEE Trans. Softw. Engin. 39, 2, 197--215. Google ScholarDigital Library
- Lawrance, J., Burnett, M., Bellamy, R., Bogart, C., and Swart, C. 2010b. Reactive information foraging for evolving goals. In Proceedings of the 28th International Conference on Human Factors in Computing Systems. ACM Press, New York, 25--34. Google ScholarDigital Library
- Little, G. and Miller, R. C. 2007. Keyword programming in java. In Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 84--93. Google ScholarDigital Library
- Little, G. and Miller, R. C. 2008. Keyword programming in java. J. Autom. Softw. Engin. 16, 37--71. Google ScholarDigital Library
- Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. 2005. Jungloid mining: Helping to navigate the api jungle. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05). 48--61. Google ScholarDigital Library
- Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11th IEEE Working Conference on Reverse Engineering (WCRE'04). 214--223. Google ScholarDigital Library
- McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., and Xie, Q. 2011a. Exemplar: A source code search engine for finding highly relevant applications. IEEE Trans. Softw. Engin. 38, 5, 1069--1087. Google ScholarDigital Library
- McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., and Fu, C. 2011b. Portfolio: Finding relevant functions and their usages. In Proceedings of the 33rd IEEE/ACM International Conference on Software Engineering (ICSE'11). 111--120. Google ScholarDigital Library
- Milanova, A., Rountev, A., and Ryder, B. 2004. Precise call graphs for c programs with function pointers. Autom. Softw. Engin. 11, 1, 7--26. Google ScholarDigital Library
- Murphy, G., Notkin, D., Griswold, W., and Lan, E. 1998. An empirical study of static call graph extractors. ACM Trans. Softw. Engin. Method. 7, 158--191. Google ScholarDigital Library
- Poshyvanyk, D., Gethers, M., and Marcus, A. 2012. Concept location using formal concept analysis and information retrieval. ACM Trans. Softw. Engin. Method. 21, 4, Google ScholarDigital Library
- Poshyvanyk, D., Guéhéneuc, Y. G., Marcus, A., Antoniol, G., and Rajlich, V. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Engin. 33, 420--432. Google ScholarDigital Library
- Puppin, D. and Silvestri, F. 2006. The social network of java classes. In Proceedings of the ACM Symposium on Applied Computing (SAC'06). 1409--1413. Google ScholarDigital Library
- Reiss, S. 2009. Semantics-based code search. In Proceedings of the 31st IEEE/ACM International Conference on Software Engineering (ICSE'09). 243--253. Google ScholarDigital Library
- Revelle, M., Dit, B., and Poshyvanyk, D. 2010. Using data fusion and web mining to support feature location in software. In Proceedings of the 18th IEEE International Conference on Program Comprehension (ICPC'10). 14--23. Google ScholarDigital Library
- Robillard, M. 2005. Automatic generation of suggestions for program investigation. In Proceedings of the Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering. 11--20. Google ScholarDigital Library
- Robillard, M. P. 2008. Topology analysis of software dependencies. ACM Trans. Softw. Engin. Methodol. 17, 1--36. Google ScholarDigital Library
- Robillard, M. P., Coelho, W., and Murphy, G. C. 2004. How effective developers investigate source code: An exploratory study. IEEE Trans. Softw. Engin. 30, 889--903. Google ScholarDigital Library
- Robillard, M. P. and Murphy, G. C. 2007. Representing concerns in source code. ACM Trans. Softw. Engin. Methodol. 16, 1. Google ScholarDigital Library
- Robillard, M. P., Shepherd, D., Hill, E., Vijay-Shanker, K., and Pollock, L. 2007. An Empirical Study of the Concept Assignment Problem. McGill University, Montreal, Quebec.Google Scholar
- Sahavechaphan, N. and Claypool, K. 2006. XSnippet: Mining for sample code. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'06). 413--430. Google ScholarDigital Library
- Saul, M. Z., Filkov, V., Devanbu, P., and Bird, C. 2007. Recommending random walks. In Proceedings of the 11th European Software Engineering Conference Held Jointly with 15th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE'07). 15--24. Google ScholarDigital Library
- Sillito, J., Murphy, G. C., and De Volder, K. 2008. Asking and answering questions during a programming change task. IEEE Trans. Softw. Engin. 34, 434--451. Google ScholarDigital Library
- Sim, S., Umarji, M., Ratanotayanon, S., and Lopes, C. 2011. How well do search engines support code retrieval on the web? ACM Trans. Softw. Engin. Methodol. 21. Google ScholarDigital Library
- Sim, S. E., Clarke, C. L. A., and Holt, R. C. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6th International Workshop on Program Comprehension (IWPC'98). 180--187. Google ScholarDigital Library
- Sirkin, R. 2006. Statistics for the Social Sciences. Sage.Google Scholar
- Smucker, M., Allan, J., and Carterette, B. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management. Google ScholarDigital Library
- Starke, J., Luce, C., and Sillito, J. 2009. Searching and skimming: An exploratory study. In Proceedings of the 25th IEEE International Conference on Software Maintenance (ICSM'09).Google Scholar
- Stylos, J. and Myers, B. A. 2006. Mica: A web-search tool for finding api components and examples. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing. 195--202. Google ScholarDigital Library
- Tansalarak, N. and Claypool, K. 2005. Finding a needle in the haystack: a technique for ranking matches between components. In Proceedings of the International Symposium on Component-Based Software Engineering. Google ScholarDigital Library
- Witten, I., Moffat, A., and Bell, T. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Fransisco. Google ScholarDigital Library
- Ye, Y. and Fischer, G. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE'02). 513--523. Google ScholarDigital Library
- Ye, Y. and Fischer, G. 2005. Reuse-conducive development environments. J. Autom. Softw. Engin. 12, 199--235. Google ScholarDigital Library
- Zaidman, A. and Demeyer, S. 2008. Automatic identification of key classes in a software system using webmining techniques. J. Softw. Maint. Evolut. Res. Pract. 20, 387--417. Google ScholarDigital Library
Index Terms
- Portfolio: Searching for relevant functions and their usages in millions of lines of code
Recommendations
Portfolio: finding relevant functions and their usage
ICSE '11: Proceedings of the 33rd International Conference on Software EngineeringDifferent studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or arbitrary code fragments [30, 29, 31]. Therefore, programmers require support in finding relevant functions and ...
Associated pagerank: improved pagerank measured by frequent term sets
VECIMS'09: Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement SystemsWeb search engines encounter many new challenges while the amount of information on the web increases rapidly. Web documents have been a main resource for various purposes, and people rely on search engines to retrieve the desired documents. This paper ...
Content and link-structure perspective of ranking webpages: A review
AbstractThe delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms ...
Comments