skip to main content
10.1145/2901739.2901771acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

On mining crowd-based speech documentation

Published:14 May 2016Publication History

ABSTRACT

Despite the globalization of software development, relevant documentation of a project, such as requirements and design documents, often still is missing, incomplete or outdated. However, parts of that documentation can be found outside the project, where it is fragmented across hundreds of textual web documents like blog posts, email messages and forum posts, as well as multimedia documents such as screencasts and podcasts. Since dissecting and filtering multimedia information based on its relevancy to a given project is an inherently difficult task, it is necessary to provide an automated approach for mining this crowd-based documentation. In this paper, we are interested in mining the speech part of YouTube screencasts, since this part typically contains the rationale and insights of a screencast. We introduce a methodology that transcribes and analyzes the transcribed text using various Information Extraction (IE) techniques, and present a case study to illustrate the applicability of our mining methodology. In this case study, we extract use case scenarios from WordPress tutorial videos and show how their content can supplement existing documentation. We then evaluate how well existing rankings of video content are able to pinpoint the most relevant videos for a given scenario.

References

  1. Turk, D., France, R., and Rumpe, B., (2002), Limitations of Agile Software Processes, Proceedings of the Third International Conferenceon extreme Programming and Agile Processes in Software Engineering, p. 43--46.Google ScholarGoogle Scholar
  2. C. Parnin, C. Treude, L. Grammel, and M.-A. Storey, "Crowd documentation: Exploring the coverage and the dynamics of API discussions on Stack Overflow," Georgia Institute of Technology, Tech. Rep., 2012.Google ScholarGoogle Scholar
  3. L. MacLeod, "Code, Camera, Action!: How Software Developers Document and Share Program Knowledge Using YouTube," in 2015 IEEE 23rd International Conference on Program Comprehension, 2015, p. 11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Bao, J. Li, Z. Xing, X. Wang and B. Zhou, "Reverse engineering time-series interaction data from screen-captured videos", Proc. 22nd IEEE International Conference on Software Analysis, Evolution and Reengineering, pp. 399--408.Google ScholarGoogle Scholar
  5. L. Bao, J. Li, Z. Xing, and X. Wang, "Extracting and analyzing time-series HCI data from screen-captured task videos," Empir. Softw. Eng., 2016, pp. 1--41.Google ScholarGoogle Scholar
  6. A. Pappu and A. Stent, "Automatic Formatted Transcripts for Videos," in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2015, pp. 2514--2518.Google ScholarGoogle Scholar
  7. T. Oba, T. Hori, and A. Nakamura, "Sentence boundary detection using sequential dependency analysis combined with crf-based chunking," vol. 3, 2006, pp. 1153--1156.Google ScholarGoogle Scholar
  8. T. Oba, T. Hori and A. Nakamura, "Improved sequential dependency analysis integrating labeling-based sentence boundary detection", IEICE Trans. Inf. Syst., vol. E93-D, no. 5, 2010, pp. 1272--1281.Google ScholarGoogle Scholar
  9. S. Takahashi and T. Morimoto, "N-gram language model based on multi-word expressions in web documents for speech recognition and closed-captioning," Proc. of the Int. Conf. Asian Lang. Process. (IALP), 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Michael W. Berry, Ed., Survey of Text Mining. Springer New York, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. Mach. Learn. Res., Mar. 2003, vol. 3, pp. 993--1022. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Parnin and C. Treude, "Measuring API Documentation on the Web," in Proceedings of Web2SE 2011, New York, NY, USA, 2011, pp. 25--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. M. Nasehi, "What Makes a Good Code Example? A Study of Programming Q&A in StackOverflow", Proc. 28th IEEE Int Conf. Software Maintenance, 2012, pp. 25--34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. Bao, Z. Xing, X. Wang, and B. Zhou, "Tracking and Analyzing Cross-Cutting Activities in Developers' Daily Work.", In: Proceedings of the 30th IEEE/ACM international conference on automated software engineering.Google ScholarGoogle Scholar
  15. H. C. Jiau and F.-P. Yang, "Facing up to the inequality of crowdsourced API documentation," ACM SIGSOFT Softw. Eng. Notes, 2012, vol. 37, no. 1, pp. 1--9. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. O. Barzilay, C. Treude, and A. Zagalsky, Facilitating Crowd Sourced Software Engineering via Stack Overflow. New York: Springer, 2013, pp. 297--316.Google ScholarGoogle Scholar
  17. S. Subramanian, L. Inozemtseva, and R. Holmes, "Live API documentation," in Proceedings of the 36th International Conference on Software Engineering - ICSE, 2014, pp. 643--652. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J.-C. Campbell, Chenlei Zhang, Zhen Xu, A. Hindle, and J. Miller, "Deficient documentation detection a methodology to locate deficient project documentation using topic analysis," 10th IEEE Work. Conf. Min. Softw. Repos. (MSR), 2013, pp. 57--60. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. Pham, L. Singer, O. Liskin, F. Figueira Filho, and K. Schneider, "Creating a shared understanding of testing culture on a social coding site," 35th Int. Conf. Softw. Eng. (ICSE), 2013, pp. 112--121. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On mining crowd-based speech documentation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MSR '16: Proceedings of the 13th International Conference on Mining Software Repositories
        May 2016
        544 pages
        ISBN:9781450341868
        DOI:10.1145/2901739

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 May 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Upcoming Conference

        ICSE 2025

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader