research-article

On mining crowd-based speech documentation

Authors:
Parisa Moslehi

Concordia University, Montreal, Canada

Concordia University, Montreal, Canada
View Profile

,
Bram Adams

École Polytechnique de Montréal, Montreal, Canada

École Polytechnique de Montréal, Montreal, Canada
View Profile

,
Juergen Rilling

Concordia University, Montreal, Canada

Concordia University, Montreal, Canada
View Profile

MSR '16: Proceedings of the 13th International Conference on Mining Software RepositoriesMay 2016Pages 259–268https://doi.org/10.1145/2901739.2901771

Published:14 May 2016Publication History

MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories

Pages 259–268

ABSTRACT

Despite the globalization of software development, relevant documentation of a project, such as requirements and design documents, often still is missing, incomplete or outdated. However, parts of that documentation can be found outside the project, where it is fragmented across hundreds of textual web documents like blog posts, email messages and forum posts, as well as multimedia documents such as screencasts and podcasts. Since dissecting and filtering multimedia information based on its relevancy to a given project is an inherently difficult task, it is necessary to provide an automated approach for mining this crowd-based documentation. In this paper, we are interested in mining the speech part of YouTube screencasts, since this part typically contains the rationale and insights of a screencast. We introduce a methodology that transcribes and analyzes the transcribed text using various Information Extraction (IE) techniques, and present a case study to illustrate the applicability of our mining methodology. In this case study, we extract use case scenarios from WordPress tutorial videos and show how their content can supplement existing documentation. We then evaluate how well existing rankings of video content are able to pinpoint the most relevant videos for a given scenario.

References

Turk, D., France, R., and Rumpe, B., (2002), Limitations of Agile Software Processes, Proceedings of the Third International Conferenceon extreme Programming and Agile Processes in Software Engineering, p. 43--46.Google Scholar
C. Parnin, C. Treude, L. Grammel, and M.-A. Storey, "Crowd documentation: Exploring the coverage and the dynamics of API discussions on Stack Overflow," Georgia Institute of Technology, Tech. Rep., 2012.Google Scholar
L. MacLeod, "Code, Camera, Action!: How Software Developers Document and Share Program Knowledge Using YouTube," in 2015 IEEE 23rd International Conference on Program Comprehension, 2015, p. 11. Google ScholarDigital Library
L. Bao, J. Li, Z. Xing, X. Wang and B. Zhou, "Reverse engineering time-series interaction data from screen-captured videos", Proc. 22nd IEEE International Conference on Software Analysis, Evolution and Reengineering, pp. 399--408.Google Scholar
L. Bao, J. Li, Z. Xing, and X. Wang, "Extracting and analyzing time-series HCI data from screen-captured task videos," Empir. Softw. Eng., 2016, pp. 1--41.Google Scholar
A. Pappu and A. Stent, "Automatic Formatted Transcripts for Videos," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015, pp. 2514--2518.Google Scholar
T. Oba, T. Hori, and A. Nakamura, "Sentence boundary detection using sequential dependency analysis combined with crf-based chunking," vol. 3, 2006, pp. 1153--1156.Google Scholar
T. Oba, T. Hori and A. Nakamura, "Improved sequential dependency analysis integrating labeling-based sentence boundary detection", IEICE Trans. Inf. Syst., vol. E93-D, no. 5, 2010, pp. 1272--1281.Google Scholar
S. Takahashi and T. Morimoto, "N-gram language model based on multi-word expressions in web documents for speech recognition and closed-captioning," Proc. of the Int. Conf. Asian Lang. Process. (IALP), 2012. Google ScholarDigital Library
Michael W. Berry, Ed., Survey of Text Mining. Springer New York, 2004. Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. Mach. Learn. Res., Mar. 2003, vol. 3, pp. 993--1022. Google ScholarDigital Library
C. Parnin and C. Treude, "Measuring API Documentation on the Web," in Proceedings of Web2SE 2011, New York, NY, USA, 2011, pp. 25--30. Google ScholarDigital Library
S. M. Nasehi, "What Makes a Good Code Example? A Study of Programming Q&A in StackOverflow", Proc. 28th IEEE Int Conf. Software Maintenance, 2012, pp. 25--34. Google ScholarDigital Library
L. Bao, Z. Xing, X. Wang, and B. Zhou, "Tracking and Analyzing Cross-Cutting Activities in Developers' Daily Work.", In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering.Google Scholar
H. C. Jiau and F.-P. Yang, "Facing up to the inequality of crowdsourced API documentation," ACM SIGSOFT Softw. Eng. Notes, 2012, vol. 37, no. 1, pp. 1--9. Google ScholarDigital Library
O. Barzilay, C. Treude, and A. Zagalsky, Facilitating Crowd Sourced Software Engineering via Stack Overflow. New York: Springer, 2013, pp. 297--316.Google Scholar
S. Subramanian, L. Inozemtseva, and R. Holmes, "Live API documentation," in Proceedings of the 36th International Conference on Software Engineering - ICSE, 2014, pp. 643--652. Google ScholarDigital Library
J.-C. Campbell, Chenlei Zhang, Zhen Xu, A. Hindle, and J. Miller, "Deficient documentation detection a methodology to locate deficient project documentation using topic analysis," 10th IEEE Work. Conf. Min. Softw. Repos. (MSR), 2013, pp. 57--60. Google ScholarDigital Library
R. Pham, L. Singer, O. Liskin, F. Figueira Filho, and K. Schneider, "Creating a shared understanding of testing culture on a social coding site," 35th Int. Conf. Softw. Eng. (ICSE), 2013, pp. 112--121. Google ScholarDigital Library

Index Terms

On mining crowd-based speech documentation
1. Information systems
  1. Information retrieval
    1. Document representation
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues

Recommendations

Feature location using crowd-based screencasts
MSR '18: Proceedings of the 15th International Conference on Mining Software Repositories

Crowd-based multi-media documents such as screencasts have emerged as a source for documenting requirements of agile software projects. For example, screencasts can describe buggy scenarios of a software product, or present new features in an upcoming ...
Read More
A feature location approach for mapping application features extracted from crowd-based screencasts to source code
Abstract
Crowd-based multimedia documents such as screencasts have emerged as a source for documenting requirements, the workflow and implementation issues of open source and agile software projects. For example, users can show and narrate how they ...
Read More
API documentation and software community values: a survey of open-source API documentation
SIGDOC '13: Proceedings of the 31st ACM international conference on Design of communication

Studies of what software developers need from API documentation have reported consistent findings over the years; however, these studies all used similar methods--usually a form of observation or survey. Our study looks at API documentation as artifacts ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
May 2016
544 pages
ISBN:9781450341868
DOI:10.1145/2901739
General Chair:
Miryung Kim
University of California, Los Angeles
,
Program Chairs:
Romain Robbes
University of Chile, Chile
,
Christian Bird
Microsoft Research
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
crowd-based documentation
information extraction
mining video content
software documentation
speech analysis
Qualifiers
- research-article
Conference

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 143
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On mining crowd-based speech documentation

MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories

ABSTRACT

References

Cited By

Index Terms

Recommendations

Feature location using crowd-based screencasts

A feature location approach for mapping application features extracted from crowd-based screencasts to source code

API documentation and software community values: a survey of open-source API documentation