ABSTRACT
Using IDE usage data to analyze the behavior of software developers in the field, during the course of their daily work, can lend support to (or dispute) laboratory studies of developers. This paper describes a technique that leverages Hidden Markov Models (HMMs) as a means of mining high-level developer behavior from low-level IDE interaction traces of many developers in the field. HMMs use dual stochastic processes to model higher-level hidden behavior using observable input sequences of events. We propose an interactive approach of mining interpretable HMMs, based on guiding a human expert in building a high quality HMM in an iterative, one state at a time, manner. The final result is a model that is both representative of the field data and captures the field phenomena of interest. We apply our HMM construction approach to study debugging behavior, using a large IDE interaction dataset collected from nearly 200 developers at ABB, Inc. Our results highlight the different modes and constituent actions in debugging, exhibited by the developers in our dataset.
- Silvia Bacci, Silvia Pandolfi, and Fulvia Pennoni. A comparison of some criteria for states selection in the latent markov model for longitudinal data. Advances in Data Analysis and Classification, 8(2):125--145, 2013. Google ScholarDigital Library
- Gilles Celeux and Jean-Baptiste Durand. Selecting hidden markov model state number with cross-validated likelihood. Computational Statistics, 23(4):541--564, 2007. Google ScholarDigital Library
- Christopher S Corley, Federico Lois, and Sebastian Quezada. Web usage patterns of developers. In Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on, pages 381--390. IEEE, 2015. Google ScholarDigital Library
- Kostadin Damevski, David Shepherd, and Lori Pollock. A field study of how developers locate features in source code. Empirical Software Engineering, pages 1--24, 2015. Google ScholarDigital Library
- The Eclipse Foundation Filtered UDC Data. http://archive.eclipse.org/projects/usagedata, 2016.Google Scholar
- Szymon Jaroszewicz. Interactive hmm construction based on interesting sequences. In Proc. of Local Patterns to Global Models (LeGo'08) Workshop at the 12th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), pages 82--91, 2008.Google Scholar
- Szymon Jaroszewicz. Using interesting sequences to interactively build hidden markov models. Data Mining and Knowledge Discovery, 21(1):186--220, 2010. Google ScholarDigital Library
- Ghazaleh Khodabandelou, Charlotte Hug, Rebecca Deneckère, and Camille Salinesi. Unsupervised discovery of intentional process models from event logs. In Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pages 282--291. ACM, 2014. Google ScholarDigital Library
- J. Lawrance, C. Bogart, M. Burnett, R. Bellamy, K. Rector, and S. D. Fleming. How programmers debug, revisited: An information foraging theory perspective. Software Engineering, IEEE Transactions on, 39(2):197--215, Feb 2013. Google ScholarDigital Library
- Taek Lee, Jaechang Nam, DongGyun Han, Sunghun Kim, and Hoh Peter In. Micro interaction metrics for defect prediction. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering, ESEC/FSE '11, pages 311--321. ACM, 2011. Google ScholarDigital Library
- Roberto Minelli, Andrea Mocci, and Michele Lanza. I know what you did last summer -- an investigation of how developers spend their time. In Proceedings of ICPC 2015 (23rd IEEE International Conference on Program Comprehension), pages 25--35, 2015. Google ScholarDigital Library
- G. C. Murphy, M. Kersten, and L. Findlater. How are Java software developers using the Eclipse IDE? IEEE Software, 23(4):76--83, July 2006. Google ScholarDigital Library
- Emerson Murphy-Hill, Rahul Jiresal, and Gail C. Murphy. Improving software developers' fluency by recommending development environment commands. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, pages 42:1--42:11. ACM Press, 2012. Google ScholarDigital Library
- Emerson Murphy-Hill, Chris Parnin, and Andrew P. Black. How We Refactor, and How We Know It. IEEE Transactions on Software Engineering, 38(1):5--18, January 2012. Google ScholarDigital Library
- Stas Negara, Mihai Codoban, Danny Dig, and Ralph E. Johnson. Mining Fine-grained Code Changes to Detect Unknown Change Patterns. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 803--813. ACM, 2014. Google ScholarDigital Library
- D. Piorkowski, S. D. Fleming, C. Scaffidi, M. Burnett, I. Kwan, A. Z. Henley, J. Macbeth, C. Hill, and A. Horvath. To fix or to learn? how production bias affects developers' information foraging during debugging. In Software Maintenance and Evolution (ICSME), 2015 IEEE International Conference on, pages 11--20, Sept 2015. Google ScholarDigital Library
- L. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257--286, Feb 1989.Google ScholarCross Ref
- Vladimir A. Rubin, Alexey A. Mitsyuk, Irina A. Lomazova, and Wil M. P. van der Aalst. Process mining can be applied to software too! In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM '14, pages 57:1--57:8. ACM, 2014. Google ScholarDigital Library
- David Shepherd, Kostadin Damevski, Bartosz Ropski, and Thomas Fritz. Sando: an extensible local code search framework. In Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering, FSE, pages 15:1--15:2, 2012. Google ScholarDigital Library
- Will Snipes, Vinay Augustine, Anil R. Nair, and Emerson M. Hill. Towards recognizing and rewarding efficient developer work patterns. In Proceedings of the 2013 International Conference on Software Engineering, ICSE '13, pages 1277--1280, 2013. Google ScholarDigital Library
- Mohsen Vakilian and Ralph E. Johnson. Alternate refactoring paths reveal usability problems. pages 1106--1116. ACM Press, 2014. Google ScholarDigital Library
- Wil Van Der Aalst. Process mining: discovery, conformance and enhancement of business processes. Springer Science & Business Media, 2011. Google ScholarDigital Library
- Jinshui Wang, Xin Peng, Zhenchang Xing, and Wenyun Zhao. An Exploratory Study of Feature Location Process: Distinct Phases, Recurring Patterns, and Elementary Actions. In Software Maintenance, IEEE International Conference on, pages 213--222. IEEE, 2011. Google ScholarDigital Library
Index Terms
- Interactive exploration of developer interaction traces using a hidden markov model
Recommendations
Endpoint detection of sio2 plasma etching using expanded hidden markov model
ISNN'10: Proceedings of the 7th international conference on Advances in Neural Networks - Volume Part IIIn this paper, extended Hidden Markov Model (eHMM) is employed to resolve transition detection problems in plasma etch processes using optical emission spectroscopy (OES) data The proposed eHMM framework is a one of various semi-Markov models: a ...
A Hidden Semi-Markov Model-Based Speech Synthesis System
A statistical speech synthesis system based on the hidden Markov model (HMM) was recently proposed. In this system, spectrum, excitation, and duration of speech are modeled simultaneously by context-dependent HMMs, and speech parameter vector sequences ...
Modeling Default Data Via an Interactive Hidden Markov Model
In this paper, we first introduce the use of an interactive hidden Markov model (IHMM) for modeling and analyzing default data in a sector. Under the IHMM, transitions of the hidden risk states of the sector depend on the observed number of bonds in the ...
Comments