research-article

Portfolio: Searching for relevant functions and their usages in millions of lines of code

Authors:
Collin Mcmillan

University of Notre Dame, South Bend, IN

University of Notre Dame, South Bend, IN
View Profile

,
Denys Poshyvanyk

The College of William and Mary, Williamsburg, VA

The College of William and Mary, Williamsburg, VA
View Profile

,
Mark Grechanik

University of Illinois at Chicago, Chicago IL

University of Illinois at Chicago, Chicago IL
View Profile

,
Qing Xie

Accenture Technology Labs

Accenture Technology Labs
View Profile

,
Chen Fu

Accenture Technology Labs

Accenture Technology Labs
View Profile

ACM Transactions on Software Engineering and Methodology Volume 22 Issue 4Article No.: 37pp 1–30https://doi.org/10.1145/2522920.2522930

Published:22 October 2013Publication History

ACM Transactions on Software Engineering and Methodology

Abstract

Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or ordinary code fragments. Therefore, developers require support in finding relevant functions and determining how these functions are used. Unfortunately, existing code search engines do not provide enough of this support to developers, thus reducing the effectiveness of code reuse. We provide this support to programmers in a code search system called Portfolio that retrieves and visualizes relevant functions and their usages. We have built Portfolio using a combination of models that address surfing behavior of programmers and sharing related concepts among functions. We conducted two experiments: first, an experiment with 49 C/C++ programmers to compare Portfolio to Google Code Search and Koders using a standard methodology for evaluating information-retrieval-based engines; and second, an experiment with 19 Java programmers to compare Portfolio to Koders. The results show with strong statistical significance that users find more relevant functions with higher precision with Portfolio than with Google Code Search and Koders. We also show that by using PageRank, Portfolio is able to rank returned relevant functions more efficiently.

References

Anquetil, N. and Lethbridge, T. 1998. Assessing the relevance of identifier names in a legacy software system. In Proceedings of the Annual IBM Centers for Advanced Studies Conference (CASCON'98). 213--222. Google ScholarDigital Library
Bajracharya, S. and Lopes, C. 2009. Mining search topics from a code search engine usage log. In Proceedings of the 6^th IEEE International Working Conference on Mining Software Repositories. IEEE Computer Society. Google ScholarDigital Library
Bajracharya, S., Ossher, J., and Lopes, C. V. 2010. Leveraging usage similarity for effective retrieval of examples in code repositories. In Proceedings of the 18^th International Symposium on the Foundations of Software Engineering (FSE'10). Google ScholarDigital Library
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7^th International Conference on World Wide Web. 107--117. Google ScholarDigital Library
Chatterjee, S., Juvejar, S., and Sen, K. 2009. SNIFF: A search engine for java using free-form queries. In Proceedings of the 12^th International Conference on Fundamental Approaches to Software Engineering (FASE'09). 385--400. Google ScholarDigital Library
Collins, A. and Loftus, E. 1975. A spreading-activation theory of semantic processing. Psychol. Rev. 82, 21.Google ScholarCross Ref
Corbi, T. A. 1989. Program understanding: Challenge for the 1990s. IBM Syst. J. 28, 294--306. Google ScholarDigital Library
Crestani, F. 1997. Application of spreading activation techniques in information retrieval. Artif. Intell. 11, 29. Google ScholarDigital Library
Cubranic, D., Murphy, G. C., Singer, J., and Booth, K. S. 2005. Hipikat: A project memory for software development. IEEE Trans. Softw. Engin. 31, 446--465. Google ScholarDigital Library
Davison, J. W., Mancl, D. M., and Opdyke, W. F. 2000. Understanding and addressing the essential costs of evolving systems. Bell Labs Tech. J. 5, 44--54.Google ScholarCross Ref
De Alwis, B. and Murphy, G. C. 2008. Answering conceptual queries with ferret. In Proceedings of the 30^th International Conference on Software Engineering (ICSE'08). 21--30. Google ScholarDigital Library
Dit, B., Revelle, M., Gethers, M., and Poshyvanyk, D. 2011. Feature location in source code: A taxonomy and survey. J. Softw. Maint. Evolut. Res. Pract. 25, 1, 53--95.Google ScholarCross Ref
Enslen, E., Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Mining source code to automatically split identifiers for software analysis. In Proceedings of the 6^th IEEE Working Conference on Mining Software Repositories (MSR'09). 71--80. Google ScholarDigital Library
Fritz, T. and Murphy, G. 2010. Using information fragments to answer the questions developers ask. In Proceedings of the 32^nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 175--184. Google ScholarDigital Library
Granka, L., Joachims, T., and Gay, G. 2004. Eye-tracking analysis of user behavior in www search. In Proceedings of the 27^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press, New York. Google ScholarDigital Library
Grechanik, M., Fu, C., Xie, Q., Mcmillan, C., Poshyvanyk, D., and Cumby, C. 2010. A search engine for finding highly relevant applications. In Proceedings of the 32^nd ACM/IEEE International Conference on Software Engineering (ICSE'10). 475--484. Google ScholarDigital Library
Henninger, S. 1996. Supporting the construction and evolution of component repositories. In Proceedings of the 18^th IEEE/ACM International Conference on Software Engineering (ICSE'96). 279--288. Google ScholarDigital Library
Hill, E., Pollock, L., and Vijay-Shanker, K. 2007. Exploring the neighborhood with dora to expedite software maintenance. In Proceedings of the 22^nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 14--23. Google ScholarDigital Library
Hill, E., Pollock, L., and Vijay-Shanker, K. 2009. Automatically capturing source code context of nl-queries for software maintenance and reuse. In Proceedings of the 31^st IEEE/ACM International Conference on Software Engineering (ICSE'09). Google ScholarDigital Library
Hill, E., Pollock, L., and Vijay-Shanker, K. 2011. Investigating how to effectively combine static concern location techniques. In Proceedings of the 3^rd International Workshop on Search-Driven Development: Users, Infrastructure, Tools, and Evaluation. ACM Press, New York. Google ScholarDigital Library
Holmes, R. and Murphy, G. C. 2005. Using structural context to recommend source code examples. In Proceedings of the 27^th International Conference on Software Engineering (ICSE'05). 117--125. Google ScholarDigital Library
Holmes, R., Walker, R. J., and Murphy, G. C. 2006. Approximate structural context matching: An approach to recommend relevant examples. IEEE Trans. Softw. Engin. 32, 952--970. Google ScholarDigital Library
Hummel, O., Janjic, W., and Atkinson, C. 2008. Code conjurer: Pulling reusable software out of thin air. IEEE Softw. 25, 5, 45--52. Google ScholarDigital Library
Inoue, K., Yokomori, R., Fujiwara, H., Yamamoto, T., Matsushita, M., and Kusumoto, S. 2003. Component rank: Relative significance rank for software component search. In Proceedings of the 25^th IEEE International Conference on Software Engineering (ICSE'03). 14--24. Google ScholarDigital Library
Inoue, K., Yokomori, R., Yamamoto, T., Matsushita, M., and Kusumoto, S. 2005. Ranking significance of software components based on use relations. IEEE Trans. Softw. Engin. 31, 213--225. Google ScholarDigital Library
Krueger, C. W. 1992. Software reuse. ACM Comput. Surv. 24, 131--183. Google ScholarDigital Library
Landi, W. 1992. Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1, 323--337. Google ScholarDigital Library
Langville, A. and Meyer, C. 2006. Google's PageRank and Beyond: The Science of Search Engine Rankings. Princeton University Press. Google ScholarDigital Library
Lawrance, J., Bellamy, R., and Burnett, M. 2007. Scents in programs: Does information foraging theory apply to program maintenance&quest; In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'07). 15--22. Google ScholarDigital Library
Lawrance, J., Bogart, C., Burnett, M., Bellamy, R., Rector, K., and Fleming, S. 2010a. How programmers debug, revisited: An information foraging theory perspective. IEEE Trans. Softw. Engin. 39, 2, 197--215. Google ScholarDigital Library
Lawrance, J., Burnett, M., Bellamy, R., Bogart, C., and Swart, C. 2010b. Reactive information foraging for evolving goals. In Proceedings of the 28^th International Conference on Human Factors in Computing Systems. ACM Press, New York, 25--34. Google ScholarDigital Library
Little, G. and Miller, R. C. 2007. Keyword programming in java. In Proceedings of the 22^nd IEEE/ACM International Conference on Automated Software Engineering (ASE'07). 84--93. Google ScholarDigital Library
Little, G. and Miller, R. C. 2008. Keyword programming in java. J. Autom. Softw. Engin. 16, 37--71. Google ScholarDigital Library
Mandelin, D., Xu, L., Bodík, R., and Kimelman, D. 2005. Jungloid mining: Helping to navigate the api jungle. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'05). 48--61. Google ScholarDigital Library
Manning, C. D., Raghavan, P., and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
Marcus, A., Sergeyev, A., Rajlich, V., and Maletic, J. 2004. An information retrieval approach to concept location in source code. In Proceedings of the 11^th IEEE Working Conference on Reverse Engineering (WCRE'04). 214--223. Google ScholarDigital Library
McMillan, C., Grechanik, M., Poshyvanyk, D., Fu, C., and Xie, Q. 2011a. Exemplar: A source code search engine for finding highly relevant applications. IEEE Trans. Softw. Engin. 38, 5, 1069--1087. Google ScholarDigital Library
McMillan, C., Grechanik, M., Poshyvanyk, D., Xie, Q., and Fu, C. 2011b. Portfolio: Finding relevant functions and their usages. In Proceedings of the 33^rd IEEE/ACM International Conference on Software Engineering (ICSE'11). 111--120. Google ScholarDigital Library
Milanova, A., Rountev, A., and Ryder, B. 2004. Precise call graphs for c programs with function pointers. Autom. Softw. Engin. 11, 1, 7--26. Google ScholarDigital Library
Murphy, G., Notkin, D., Griswold, W., and Lan, E. 1998. An empirical study of static call graph extractors. ACM Trans. Softw. Engin. Method. 7, 158--191. Google ScholarDigital Library
Poshyvanyk, D., Gethers, M., and Marcus, A. 2012. Concept location using formal concept analysis and information retrieval. ACM Trans. Softw. Engin. Method. 21, 4, Google ScholarDigital Library
Poshyvanyk, D., Guéhéneuc, Y. G., Marcus, A., Antoniol, G., and Rajlich, V. 2007. Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval. IEEE Trans. Softw. Engin. 33, 420--432. Google ScholarDigital Library
Puppin, D. and Silvestri, F. 2006. The social network of java classes. In Proceedings of the ACM Symposium on Applied Computing (SAC'06). 1409--1413. Google ScholarDigital Library
Reiss, S. 2009. Semantics-based code search. In Proceedings of the 31^st IEEE/ACM International Conference on Software Engineering (ICSE'09). 243--253. Google ScholarDigital Library
Revelle, M., Dit, B., and Poshyvanyk, D. 2010. Using data fusion and web mining to support feature location in software. In Proceedings of the 18^th IEEE International Conference on Program Comprehension (ICPC'10). 14--23. Google ScholarDigital Library
Robillard, M. 2005. Automatic generation of suggestions for program investigation. In Proceedings of the Joint European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering. 11--20. Google ScholarDigital Library
Robillard, M. P. 2008. Topology analysis of software dependencies. ACM Trans. Softw. Engin. Methodol. 17, 1--36. Google ScholarDigital Library
Robillard, M. P., Coelho, W., and Murphy, G. C. 2004. How effective developers investigate source code: An exploratory study. IEEE Trans. Softw. Engin. 30, 889--903. Google ScholarDigital Library
Robillard, M. P. and Murphy, G. C. 2007. Representing concerns in source code. ACM Trans. Softw. Engin. Methodol. 16, 1. Google ScholarDigital Library
Robillard, M. P., Shepherd, D., Hill, E., Vijay-Shanker, K., and Pollock, L. 2007. An Empirical Study of the Concept Assignment Problem. McGill University, Montreal, Quebec.Google Scholar
Sahavechaphan, N. and Claypool, K. 2006. XSnippet: Mining for sample code. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA'06). 413--430. Google ScholarDigital Library
Saul, M. Z., Filkov, V., Devanbu, P., and Bird, C. 2007. Recommending random walks. In Proceedings of the 11^th European Software Engineering Conference Held Jointly with 15^th ACM SIGSOFT International Symposium on Foundations of Software Engineering (ESEC/FSE'07). 15--24. Google ScholarDigital Library
Sillito, J., Murphy, G. C., and De Volder, K. 2008. Asking and answering questions during a programming change task. IEEE Trans. Softw. Engin. 34, 434--451. Google ScholarDigital Library
Sim, S., Umarji, M., Ratanotayanon, S., and Lopes, C. 2011. How well do search engines support code retrieval on the web&quest; ACM Trans. Softw. Engin. Methodol. 21. Google ScholarDigital Library
Sim, S. E., Clarke, C. L. A., and Holt, R. C. 1998. Archetypal source code searches: A survey of software developers and maintainers. In Proceedings of the 6^th International Workshop on Program Comprehension (IWPC'98). 180--187. Google ScholarDigital Library
Sirkin, R. 2006. Statistics for the Social Sciences. Sage.Google Scholar
Smucker, M., Allan, J., and Carterette, B. 2007. A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16^th ACM Conference on Conference on Information and Knowledge Management. Google ScholarDigital Library
Starke, J., Luce, C., and Sillito, J. 2009. Searching and skimming: An exploratory study. In Proceedings of the 25^th IEEE International Conference on Software Maintenance (ICSM'09).Google Scholar
Stylos, J. and Myers, B. A. 2006. Mica: A web-search tool for finding api components and examples. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing. 195--202. Google ScholarDigital Library
Tansalarak, N. and Claypool, K. 2005. Finding a needle in the haystack: a technique for ranking matches between components. In Proceedings of the International Symposium on Component-Based Software Engineering. Google ScholarDigital Library
Witten, I., Moffat, A., and Bell, T. 1999. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Fransisco. Google ScholarDigital Library
Ye, Y. and Fischer, G. 2002. Supporting reuse by delivering task-relevant and personalized information. In Proceedings of the IEEE/ACM International Conference on Software Engineering (ICSE'02). 513--523. Google ScholarDigital Library
Ye, Y. and Fischer, G. 2005. Reuse-conducive development environments. J. Autom. Softw. Engin. 12, 199--235. Google ScholarDigital Library
Zaidman, A. and Demeyer, S. 2008. Automatic identification of key classes in a software system using webmining techniques. J. Softw. Maint. Evolut. Res. Pract. 20, 387--417. Google ScholarDigital Library

Index Terms

Portfolio: Searching for relevant functions and their usages in millions of lines of code
1. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Software evolution
      2. Software reverse engineering

Recommendations

Portfolio: finding relevant functions and their usage
ICSE '11: Proceedings of the 33rd International Conference on Software Engineering

Different studies show that programmers are more interested in finding definitions of functions and their uses than variables, statements, or arbitrary code fragments [30, 29, 31]. Therefore, programmers require support in finding relevant functions and ...
Read More
Associated pagerank: improved pagerank measured by frequent term sets
VECIMS'09: Proceedings of the 2009 IEEE international conference on Virtual Environments, Human-Computer Interfaces and Measurement Systems

Web search engines encounter many new challenges while the amount of information on the web increases rapidly. Web documents have been a main resource for various purposes, and people rely on search engines to retrieve the desired documents. This paper ...
Read More
Content and link-structure perspective of ranking webpages: A review
Abstract
The delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Software Engineering and Methodology Volume 22, Issue 4
Testing, debugging, and error handling, formal methods, lifecycle concerns, evolution and maintenance
October 2013
387 pages
ISSN:1049-331X
EISSN:1557-7392
DOI:10.1145/2522920
Issue’s Table of Contents

Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 October 2013
- Accepted: 1 July 2012
- Revised: 1 May 2012
- Received: 1 July 2011
Published in tosem Volume 22, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Pagerank
Source-code search
information retrieval
natural language processing
user studies
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 66
  Total Citations
  View Citations
- 713
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Portfolio: Searching for relevant functions and their usages in millions of lines of code

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Portfolio: finding relevant functions and their usage

Associated pagerank: improved pagerank measured by frequent term sets

Content and link-structure perspective of ranking webpages: A review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Portfolio: Searching for relevant functions and their usages in millions of lines of code

ACM Transactions on Software Engineering and Methodology

Abstract

References

Cited By

Index Terms

Recommendations

Portfolio: finding relevant functions and their usage

Associated pagerank: improved pagerank measured by frequent term sets

Content and link-structure perspective of ranking webpages: A review

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media