ABSTRACT
It is often observed that the majority of the development work of an Open Source Software (OSS) project is contributed by a core team, i.e., a small subset of the pool of active devel- opers. In fact, recent work has found that core development teams follow the Pareto principle — roughly 80% of the code contributions are produced by 20% of the active developers. However, those findings are based on samples of between one and nine studied systems. In this paper, we revisit prior studies about core developers using 2,496 projects hosted on GitHub. We find that even when we vary the heuristic for detecting core developers, and when we control for system size, team size, and project age: (1) the Pareto principle does not seem to apply for 40%-87% of GitHub projects; and (2) more than 88% of GitHub projects have fewer than 16 core developers. Moreover, we find that when we control for the quantity of contributions, bug fixing accounts for a similar proportion of the contributions of both core (18%-20%) and non-core developers (21%-22%). Our findings suggest that the Pareto principle is not compatible with the core teams of many GitHub projects. In fact, several of the studied GitHub projects are susceptible to the “bus factor,” where the impact of a core developer leaving would be quite harmful.
- T. Bissyande, D. Lo, L. Jiang, L. Reveillere, J. Klein, and Y. le Traon. Got issues? who cares about it? a large scale investigation of issue trackers from github. In Proc. Int’l Symposium on Software Reliability Engineering (ISSRE), pages 188–197, Nov 2013.Google Scholar
- V. Cosentino, J. L. C. Izquierdo, and J. Cabot. Assessing the bus factor of git repositories. In Proc. Int’l Conf. on Software Analysis, Evolution, and Reengineering (SANER), pages 499–503, 2015.Google ScholarCross Ref
- K. Crowston, K. Wei, Q. Li, and J. Howison. Core and periphery in free/libre and open source software team communications. In Proc. Hawai’i Int’l Conf. on System Science (HICSS), 2006. Google ScholarDigital Library
- L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb. Social coding in github: Transparency and collaboration in an open software repository. In Proc. Conf. on Computer Supported Cooperative Work (CSCW), pages 1277–1286, 2012. Google ScholarDigital Library
- T. Dinh-Trong and J. Bieman. The freebsd project: a replication case study of open source development. IEEE Trans. on Software Engineering, 31(6):481–494, June 2005. Google ScholarDigital Library
- J. Geldenhuys. Finding the core developers. In Proc. of the 36th Euromicro Conference on Software Engineering and Advanced Applications, pages 447–450. IEEE Computer Society, Sept. 2010. Google ScholarDigital Library
- D. M. German. A study of the contributors of postgresql. In Proc. Int’l Workshop on Mining Software Repositories (MSR), pages 163–164, 2006. Google ScholarDigital Library
- M. Goeminne and T. Mens. Evidence for the pareto principle in open source software activity. In Joint Porc. the 1st Int’l Workshop on Model Driven Software Maintenance and 5th Int’l Workshop on Software Quality and Maintainability, pages 74–82, 2011.Google Scholar
- G. Gousios. The ghtorrent dataset and tool suite. In Proc. Int’l Working Conf. on Mining Software Repositories (MSR), pages 233–236, 2013. Google ScholarDigital Library
- G. Gousios, M. Pinzger, and A. v. Deursen. An exploratory study of the pull-based software development model. In Proc. Int’l Conf. on Software Engineering (ICSE), pages 345–355, 2014. Google ScholarDigital Library
- G. Gousios, A. Zaidman, M.-A. Storey, and A. v. Deursen. Work practices and challenges in pull-based development: The integrator’s perspective. In Proc. Int’l Conf. on Software Engineering (ICSE), 2015. To appear.Google ScholarCross Ref
- L. Hattori and M. Lanza. On the nature of commits. In Proc. Int’l Conf. on Automated Software Engineering (ASE) - Workshops, pages 63–71, Sept 2008.Google ScholarDigital Library
- K. Herzig and A. Zeller. The impact of tangled code changes. In Proc. Int’l Working Conf. on Mining Software Repositories (MSR), pages 121–130, 2013. Google ScholarDigital Library
- A. Hindle, D. M. German, and R. Holt. What do large commits tell us?: A taxonomical study of large commits. In Proc. Int’l Working Conf. on Mining Software Repositories (MSR), pages 99–108, 2008. Google ScholarDigital Library
- Y. Jiang, B. Adams, and D. German. Will my patch make it? and how fast? case study on the linux kernel. In Proc. Int’l Working Conf. on Mining Software Repositories (MSR), pages 101–110, May 2013. Google ScholarDigital Library
- E. Kalliamvakou, G. Gousios, K. Blincoe, L. Singer, D. M. German, and D. Damian. The promises and perils of mining github. In Proc. Int’l Working Conf. on Mining Software Repositories (MSR), pages 92–101, 2014. Google ScholarDigital Library
- S. Koch and G. Schneider. E↵ort, cooperation and coordination in an open source software project: Gnome. Information Systems Journal, 12(1):27–42, 2002.Google ScholarCross Ref
- A. Mockus, R. T. Fielding, and J. D. Herbsleb. Two case studies of open source software development: Apache and mozilla. ACM Trans. on Software Engineering and Methodology, 11(3):309–346, 2002. Google ScholarDigital Library
- K. Nakakoji, Y. Yamamoto, Y. Nishinaka, K. Kishida, and Y. Ye. Evolution patterns of open-source software systems and communities. In Proc. Int’l Workshop on Principles of Software Evolution (IWPSE), pages 76–85, 2002. Google ScholarDigital Library
- R. Purushothaman and D. Perry. Toward understanding the rhetoric of small source code changes. IEEE Trans. on Software Engineering, 31(6):511–526, 2005. Google ScholarDigital Library
- E. S. Raymond. The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary. O’Reilly & Associates, Inc., Sebastopol, CA, USA, 1st edition, 1999. Google ScholarDigital Library
- F. Ricca, A. Marchetto, and M. Torchiano. On the difficulty of computing the truck factor. In Product-Focused Software Process Improvement, volume 6759 of Lecture Notes in Computer Science, pages 337–351. 2011. Google ScholarDigital Library
- G. Robles, J. Gonzalez-Barahona, and I. Herraiz. Evolution of the core team of developers in libre software projects. In Proc. Int’l Working Conf. on Mining Software Repositories (MSR), pages 167–170, May 2009. Google ScholarDigital Library
- G. Robles, S. Koch, J. M. González-Barahona, and J. Carlos. Remote analysis and measurement of libre software systems by means of the cvsanaly tool. In Proc. the 2nd ICSE Workshop on Remote Analysis and Measurement of Software Systems (RAMSS), pages 51– 55, 2004.Google ScholarCross Ref
- M. Torchiano, F. Ricca, and A. Marchetto. Is my project’s truck factor low?: Theoretical and empirical considerations about the truck factor threshold. In Proc. Int’l Workshop on Emerging Trends in Software Metrics (WETSoM), pages 12–18, 2011. Google ScholarDigital Library
- Y. Ye and K. Kishida. Toward an understanding of the motivation open source software developers. In Proc. Int’l Conf. on Software Engineering (ICSE), pages 419– 429, 2003. Google ScholarDigital Library
Index Terms
- Revisiting the applicability of the pareto principle to core development teams in open source software projects
Recommendations
Open source software licenses: Strong-copyleft, non-copyleft, or somewhere in between?
Studies on open source software (OSS) have shown that the license under which an OSS is released has an impact on the success or failure of the software. In this paper, we model the relationship between an OSS developer's utility, the effort that goes ...
Comparing practices for reuse in integration-oriented software product lines and large open source software projects
This article compares the organization and practices for software reuse in integration-oriented software product lines (SPLs) and open source software projects. The main observation is that both approaches are successful regarding large variability and ...
Impact of license choice on Open Source Software development activity
The Open Source Software (OSS) development model has emerged as an important competing paradigm to proprietary alternatives; however, insufficient research exists to understand the influence of some OSS project characteristics on the level of activity ...
Comments