skip to main content
10.1145/2491411.2491415acmconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Diversity in software engineering research

Published:18 August 2013Publication History

ABSTRACT

One of the goals of software engineering research is to achieve generality: Are the phenomena found in a few projects reflective of others? Will a technique perform as well on projects other than the projects it is evaluated on? While it is common sense to select a sample that is representative of a population, the importance of diversity is often overlooked, yet as important. In this paper, we combine ideas from representativeness and diversity and introduce a measure called sample coverage, defined as the percentage of projects in a population that are similar to the given sample. We introduce algorithms to compute the sample coverage for a given set of projects and to select the projects that increase the coverage the most. We demonstrate our technique on research presented over the span of two years at ICSE and FSE with respect to a population of 20,000 active open source projects monitored by Ohloh.net. Knowing the coverage of a sample enhances our ability to reason about the findings of a study. Furthermore, we propose reporting guidelines for research: in addition to coverage scores, papers should discuss the target population of the research (universe) and dimensions that potentially can influence the outcomes of a research (space).

References

  1. Basili, V.R., Shull, F., and Lanubile, F. Building knowledge through families of experiments. Software Engineering, IEEE Transactions on, 25 (1999), 456--473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Robbes, R., Tanter, E., and Rothlisberger, D. How developers use the dynamic features of programming languages: the case of smalltalk. Proceedings of the International Working Conference on Mining Software Repositories (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gabel, M. and Su, Z. A study of the uniqueness of source code. In FSE'10: Proceedings of the International Symposium on Foundations of Software Engineering (2010), 147-156. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. NIH. NIH Guideline on The Inclusion of Women and Minorities., 2001. http://grants.nih.gov/grants/funding/women_min/guideline s_amended_10_2001.htm.Google ScholarGoogle Scholar
  5. Allmark, P. Should research samples reflect the diversity of the population? Journal Medical Ethics, 30 (2004), 185- 189.Google ScholarGoogle ScholarCross RefCross Ref
  6. DEPARTMENT OF HEALTH. Research governance framework for health and social care., 2001.Google ScholarGoogle Scholar
  7. Mulrow, C.D., Thacker, S.B., and Pugh, J.A. A proposal for more informative abstracts of review articles. Annals of internal medicine, 108 (1988), 613--615.Google ScholarGoogle Scholar
  8. The R Project for Statistical Computing. http://www.rproject.org/.Google ScholarGoogle Scholar
  9. Kitchenham, B.A., Mendes, E., and Travassos, G.H. Cross versus Within-Company Cost Estimation Studies: A Systematic Review. IEEE Trans. Software Eng. (TSE), 33, 5 (2007), 316-329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hill, P.R. Practical Software Project Estimation. McGraw-Hill Osborne Media, 2010.Google ScholarGoogle Scholar
  11. BLACK DUCK SOFTWARE. Ohloh, http://www.ohloh.net/.Google ScholarGoogle Scholar
  12. Sands, R. Measuring Project Activity. http://meta.ohloh.net/2012/04/measuring-project-activity/. 2012.Google ScholarGoogle Scholar
  13. Apel, S., Liebig, J., Brandl, B., Lengauer, C., and Kästner, C. Semistructured merge: rethinking merge in revision control systems. In ESEC/FSE'11: European Software Engineering Conference and Symposium on Foundations of Software Engineering (2011), 190-200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Beck, F. and Diehl, S. On the congruence of modularity and code coupling. In ESEC/FSE'11: European Software Engineering Conference and Symposium on Foundations of Software Engineering (2011), 354-364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Uddin, G., Dagenais, B., and Robillard, M.P. Temporal analysis of API usage concepts. In ICSE'12: Proceedings of 34th International Conference on Software Engineering (2012), 804-814. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jin, W. and Orso, A. BugRedux: Reproducing field failures for in-house debugging. In ICSE'12: Proceedings of 34th International Conference on Software Engineering (2012), 474-484. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Zhou, J., Zhang, H., and Lo, D. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In International Conference on Software Engineering (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kitchenham, B.A. and Mendes, E. A comparison of crosscompany and within-company effort estimation models for web applications. In Proceedings of the 8th International Conference on Empirical Assessment in Software Engineering (2004), 47-55.Google ScholarGoogle ScholarCross RefCross Ref
  19. Hall, T., Beecham, S., Bowes, D., Gray, D., and Counsell, S. A systematic review of fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 99 (2011).Google ScholarGoogle Scholar
  20. Murphy-Hill, E., Murphy, G.C., and Griswold, W.G. Understanding Context: Creating a Lasting Impact in Experimental Software Engineering Research. In Proceedings of the Workshop on Future of Software Engineering (2010), 255-258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kahneman, D. and Tversky, A. Subjective probability: A judgment of representativeness. Cognitive Psychology, 3 (1972), 430 - 454.Google ScholarGoogle ScholarCross RefCross Ref
  22. Tversky, A. and Kahneman, D. Judgment under Uncertainty: Heuristics and Biases. Science, 185 (1974), pp. 1124-1131.Google ScholarGoogle Scholar
  23. Nilsson, H., Juslin, P., and Olsson, H. Exemplars in the mist: The cognitive substrate of the representativeness heuristic. Scandinavian Journal of Psychology, 49, 201-- 212.Google ScholarGoogle ScholarCross RefCross Ref
  24. Robinson, D., Woerner, M.G., Pollack, S., and Lerner, G. Subject Selection Biases in Clinical Trials: Data From a Multicenter Schizophrenia Treatment Study. Journal of Clinical Psychopharmacology, 16, 2 (April 1996), 170-176.Google ScholarGoogle ScholarCross RefCross Ref
  25. Khan, K.S. et al., eds. NHS Centre for Reviews and Dissemination, University of York, 2001.Google ScholarGoogle Scholar
  26. Kitchenham, B. Procedures for undertaking systematic reviews. Technical Report TR/SE-0401, Department of Computer Science, Keele University and National ICT, Australia Ltd (2004).Google ScholarGoogle Scholar
  27. Brereton, P., Kitchenham, B.A., Budgen, D., Turner, M., and Khalil, M. Lessons from applying the systematic literature review process within the software engineering domain. Journal of Systems and Software, 80 (2007), 571 - 583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Standards for Systematic Reviews.. www.iom.edu/Reports/2011/Finding-What-Works-in-Health-Care-Standards-for-Systematic-Reviews/Standards.aspx?page=2.Google ScholarGoogle Scholar
  29. Boehm, B.W., Abts, C., Brown, A.W., Chulani, S., Clark, B.K., Horowitz, E., Madachy, R., Reifer, D.J., and Steece, B.t.=.S.C.E.w.C.I. NHS Centre for Reviews and Dissemination, University of York, 2000.Google ScholarGoogle Scholar
  30. Center for Systems and Software Engineering.. http://csse.usc.edu/csse/research/COCOMOII/cocomo_mai n.html.Google ScholarGoogle Scholar
  31. Kemerer, C.F. An empirical validation of software cost estimation models. Commun. ACM, 30 (may 1987), 416-- 429. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Chen, Z., Menzies, T., Port, D., and Boehm, D. Finding the right data for software cost modeling. Software, IEEE, 22 (nov.-dec. 2005), 38 - 46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Wohlin, C., Runeson, P., Host, M., Ohlsson, M.C., Regnell, B., and Wesslen, A. Experimentation in software engineering: an introduction. Kluwer Academic Publishers, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kitchenham, B.A., Pfleeger, S.L., Pickard, L.M., Jones, P.W., Hoaglin, D.C., Emam, E.K., and Rosenberg, J. Preliminary Guidelines for Empirical Research in Software Engineering. IEEE Transactions on Software Engineering, 28 (aug 2002), 721--734. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Jedlitschka, A. and Pfahl, D. Reporting guidelines for controlled experiments in software engineering. In Empirical Software Engineering, 2005. 2005 International Symposium on (nov. 2005), 10 pp.Google ScholarGoogle ScholarCross RefCross Ref
  36. Kitchenham, B., Al-Khilidar, H., Babar, M.A., Berry, M., Cox, K., Keung, J., Kurniawati, F., Staples, M., Zhang, H., and Zhu, L. Evaluating guidelines for reporting empirical software engineering studies. Empirical Softw. Engg., 13 (feb 2008), 97--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Runeson, P. and Host, M. Guidelines for conducting and reporting case study research in software engineering. Empirical Softw. Engg., 14 (Apr 2009), 131--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Harrold, M.J., Jones, J.A., Li, T., Lian, D., Orso, A., Pennings, M., Sinha, S., Spoon, S.A., and Gujarathi, A. Regression test selection for Java software. In OOPSLA '01: Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (2001). Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Briand, L.C., Labiche, Y., and Soccar, G. Automating impact analysis and regression test selection based on UML designs. In ICSM '02: Proceedings of the International Conference on Software Maintenance (2002), 252-261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Marré, M. and Bertolino, A. Using spanning sets for coverage testing. IEEE Transactions on Software Engineering, 29, 11 (Nov 2003), 974-984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Li, Z., Harman, M., and Hierons, R.M. Search Algorithms for Regression Test Case Prioritization. IEEE Transactions on Software Engineering, 33, 4 (April 2007), 225-237. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Yoo, S. and Harman, M. Regression testing minimization, selection and prioritization: a survey. Software Testing, Verification and Reliability, 22, 2 (2012), 67-120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Graves, T.L., Harrold, M.J., Kim, J.-M., Porter, A., and Rothermel, G. An empirical study of regression test selection techniques. In ICSE '98: Proceedings of the 20th International Conference on Software engineering (1998), 188-197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rothermel, G. and Harrold, M.J. Analyzing regression test selection techniques. IEEE Transactions on Software Engineering, 22, 8 (August 1996), 529-551. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Harrell, J.M. Orthogonal Array Testing Strategy (OATS)., 2001. http://www.51testing.com/ddimg/uploadsoft/20090113/O ATSEN.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Diversity in software engineering research

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      ESEC/FSE 2013: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
      August 2013
      738 pages
      ISBN:9781450322379
      DOI:10.1145/2491411

      Copyright © 2013 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 18 August 2013

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate112of543submissions,21%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader