skip to main content
10.1145/1282280.1282286acmconferencesArticle/Chapter ViewAbstractPublication PagescivrConference Proceedingsconference-collections
Article

VAST MM: multimedia browser for presentation video

Published: 09 July 2007 Publication History

Abstract

In the domain of candidly captured student presentation videos, we examine and evaluate approaches for multi-modal analysis and indexing of audio and video. We apply visual segmentation techniques on unedited video to determine likely changes of topics. Speaker segmentation methods are employed to determine individual student appearances, which are linked to extracted headshots to create a visual speaker index. Videos are augmented with time-aligned filtered keywords and phrases from highly inaccurate speech transcripts. Our experimental user interface, the VAST MM Browser (Video Audio Structure Text Multi Media Browser), combines streaming videos, visual, and textual indices for browsing and searching. We evaluate the UI and methods in a large engineering design course. We report on observations and statistics collected over 4 semesters and 598 student participants. Results suggest that our video indexing and retrieval approach is effective, and that our continuous improvements are reflecting in an increase in precision and recall of user study tasks.

References

[1]
Mukhopadhyay, S., and Smith, B. Passive capture and structuring of lectures. In Proc. of the ACM International Conference on Multimedia (MM '99) (Orlando, FL, Oct. 30 - Nov. 5, 1999). ACM Press, New York, NY, 1999, 477--487.
[2]
Abowd, G. D., Atkeson, C. G., Feinstein, A., Hmelo, C., Kooper, R., Long, S., Sawhnet, N., and Tani, M. Teaching and Learning as Multimedia Authoring: The Classroom 2000 Project. In Proc. of the ACM International Conference on Multimedia (MM '00) (Los Angeles, CA, Oct. 30 - Nov. 3, 2000). ACM Press, New York, NY, 2000, 187--198.
[3]
Haubold, A., and Kender, J. R. Analysis and Interface for Instructional Video. In Proc. of the IEEE International Conference on Multimedia & Expo (ICME '03) (Baltimore, MD, July 6-9, 2003). IEEE Press, New York, NY, 2003, 704--708.
[4]
Lin, M., Nunamaker, J. F., Chau, M., and Chen, H. Segmentation of Lecture Videos based on Text: A Method Combining Multiple Linguistic Features. In Proc. of the 37th Hawaii International Conference on System Sciences (HICCS '04) (Big Island, HI, January 5-8, 2004). IEEE Computer Society Press, New York, NY, 2004, 3--11.
[5]
Haubold, A., and Kender, J. R. Analysis and Visualization of Index Words from Audio Transcripts of Instructional Videos. In Proc. of the IEEE International Workshop on Multimedia Content-based Analysis and Retrieval (MCBAR '04) (Miami, FL, December 15, 2004). IEEE Press, New York, NY, 2004, 570--573.
[6]
He, L., Sanocki, E., Gupta, A., and Grudin, J. Auto-summarization of audio-video presentations. In Proc. of the ACM International Conference on Multimedia (MM '99) (Orlando, FL, October 30 - November 5, 1999). ACM Press, New York, NY, 1999, 489--498.
[7]
Haubold, A., and Kender, J. R. Augmented segmentation and visualization for presentation videos. In Proc. of the ACM International Conference on Multimedia (MM '05) (Singapore, November 6-11, 2005). ACM Press, New York, NY, 2005, 51--60.
[8]
Haubold, A., and Kender, J. R. Alignment of Speech to Highly Imperfect Text Transcriptions. To appear in Proc. of the IEEE International Conference on Multimedia & Expo (ICME '07) (Beijing, China, July 2-5, 2007).
[9]
Wang, H. L., and Chang, S. F. A Highly Efficient System for Automatic Face Region Detection in MPEG Video. In IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, No. 4 (August 1997), 615--628.
[10]
Cutler, R., and Davis, L. Look who's talking: speaker detection using video and audio correlation. In Proc. of the IEEE International Conference on Multimedia & Expo (ICME '00) (New York, NY, July 30 - August 2, 2000). IEEE Press, New York, NY, 2000, 1589--1592.
[11]
Ivanov, Y., Stauffer, C., Bobick, A., Grimson, W. E. L. Video Surveillance of Interactions. In Proc. of the IEEE Workshop on Visual Surveillance (VS '99) (Fort Collins, CO, June 26, 1999). IEEE Press, New York, NY, 1999, 82--89.
[12]
Haubold, A., Natsev, A., Naphade, M. R. Semantic Multimedia Retrieval Using Lexical Query Expansion and Model-based Reranking. In Proc. of the IEEE International Conference on Multimedia & Expo (ICME '06) (Toronto, Canada, July 9-12, 2006). IEEE Press, New York, NY, 2005, 1761--1764.
[13]
Haubold, A., and Kender, J. R. Selection and Ranking of Text from Highly Imperfect Transcripts for Retrieval of Video Content. To appear in Proc. of SIGIR 2007 (Amsterdam, The Netherlands, July 23-27, 2007).

Cited By

View all
  • (2020)Wordy: Interactive Word Cloud to Summarize and Browse Online Videos to Enhance eLearning2020 IEEE/SICE International Symposium on System Integration (SII)10.1109/SII46433.2020.9026306(879-884)Online publication date: Jan-2020
  • (2017)Adaptive Video Techniques for Informal Learning Support in Workplace EnvironmentsEmerging Technologies for Education10.1007/978-3-319-52836-6_57(533-543)Online publication date: 19-Feb-2017
  • (2016)A multimodal approach for extracting content descriptive metadata from lecture videosJournal of Intelligent Information Systems10.1007/s10844-015-0356-546:1(121-145)Online publication date: 1-Feb-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval
July 2007
655 pages
ISBN:9781595937339
DOI:10.1145/1282280
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2007

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automatic speech recognition
  2. presentation video
  3. speaker index
  4. speaker segmentation
  5. streaming video
  6. text augmentation
  7. transcript analysis
  8. video library
  9. visual segmentation

Qualifiers

  • Article

Conference

CIVR07
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Wordy: Interactive Word Cloud to Summarize and Browse Online Videos to Enhance eLearning2020 IEEE/SICE International Symposium on System Integration (SII)10.1109/SII46433.2020.9026306(879-884)Online publication date: Jan-2020
  • (2017)Adaptive Video Techniques for Informal Learning Support in Workplace EnvironmentsEmerging Technologies for Education10.1007/978-3-319-52836-6_57(533-543)Online publication date: 19-Feb-2017
  • (2016)A multimodal approach for extracting content descriptive metadata from lecture videosJournal of Intelligent Information Systems10.1007/s10844-015-0356-546:1(121-145)Online publication date: 1-Feb-2016
  • (2015)SceneSkimProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology10.1145/2807442.2807502(181-190)Online publication date: 5-Nov-2015
  • (2013)Narrative theme navigation for sitcoms supported by fan-generated scriptsMultimedia Tools and Applications10.1007/s11042-011-0877-z63:2(387-406)Online publication date: 1-Mar-2013
  • (2013)Finding a needle in a haystackMultimedia Tools and Applications10.1007/s11042-011-0809-y63:2(331-356)Online publication date: 1-Mar-2013
  • (2012)Upper body gestures in lecture videosProceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396499(1389-1392)Online publication date: 29-Oct-2012
  • (2012)Arm gesture variations during presentations are correlated with conjunctions indicating contrastProceedings of the 2012 ACM workshop on User experience in e-learning and augmented technologies in education10.1145/2390895.2390900(13-18)Online publication date: 2-Nov-2012
  • (2012)Towards a Video Browser for the Digital NativeProceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops10.1109/ICMEW.2012.29(127-132)Online publication date: 9-Jul-2012
  • (2011)Analysis, indexing and visualization of presentation videosProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2072499(871-872)Online publication date: 28-Nov-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media