Article

VAST MM: multimedia browser for presentation video

Authors:

Alexander Haubold,

John R. KenderAuthors Info & Claims

CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval

Pages 41 - 48

https://doi.org/10.1145/1282280.1282286

Published: 09 July 2007 Publication History

Get Access

Abstract

In the domain of candidly captured student presentation videos, we examine and evaluate approaches for multi-modal analysis and indexing of audio and video. We apply visual segmentation techniques on unedited video to determine likely changes of topics. Speaker segmentation methods are employed to determine individual student appearances, which are linked to extracted headshots to create a visual speaker index. Videos are augmented with time-aligned filtered keywords and phrases from highly inaccurate speech transcripts. Our experimental user interface, the VAST MM Browser (Video Audio Structure Text Multi Media Browser), combines streaming videos, visual, and textual indices for browsing and searching. We evaluate the UI and methods in a large engineering design course. We report on observations and statistics collected over 4 semesters and 598 student participants. Results suggest that our video indexing and retrieval approach is effective, and that our continuous improvements are reflecting in an increase in precision and recall of user study tasks.

References

[1]

Mukhopadhyay, S., and Smith, B. Passive capture and structuring of lectures. In Proc. of the ACM International Conference on Multimedia (MM '99) (Orlando, FL, Oct. 30 - Nov. 5, 1999). ACM Press, New York, NY, 1999, 477--487.

Digital Library

Google Scholar

[2]

Abowd, G. D., Atkeson, C. G., Feinstein, A., Hmelo, C., Kooper, R., Long, S., Sawhnet, N., and Tani, M. Teaching and Learning as Multimedia Authoring: The Classroom 2000 Project. In Proc. of the ACM International Conference on Multimedia (MM '00) (Los Angeles, CA, Oct. 30 - Nov. 3, 2000). ACM Press, New York, NY, 2000, 187--198.

Digital Library

Google Scholar

[3]

Haubold, A., and Kender, J. R. Analysis and Interface for Instructional Video. In Proc. of the IEEE International Conference on Multimedia & Expo (ICME '03) (Baltimore, MD, July 6-9, 2003). IEEE Press, New York, NY, 2003, 704--708.

Digital Library

Google Scholar

[4]

Lin, M., Nunamaker, J. F., Chau, M., and Chen, H. Segmentation of Lecture Videos based on Text: A Method Combining Multiple Linguistic Features. In Proc. of the 37^th Hawaii International Conference on System Sciences (HICCS '04) (Big Island, HI, January 5-8, 2004). IEEE Computer Society Press, New York, NY, 2004, 3--11.

Digital Library

Google Scholar

[5]

Haubold, A., and Kender, J. R. Analysis and Visualization of Index Words from Audio Transcripts of Instructional Videos. In Proc. of the IEEE International Workshop on Multimedia Content-based Analysis and Retrieval (MCBAR '04) (Miami, FL, December 15, 2004). IEEE Press, New York, NY, 2004, 570--573.

Digital Library

Google Scholar

[6]

He, L., Sanocki, E., Gupta, A., and Grudin, J. Auto-summarization of audio-video presentations. In Proc. of the ACM International Conference on Multimedia (MM '99) (Orlando, FL, October 30 - November 5, 1999). ACM Press, New York, NY, 1999, 489--498.

Digital Library

Google Scholar

[7]

Haubold, A., and Kender, J. R. Augmented segmentation and visualization for presentation videos. In Proc. of the ACM International Conference on Multimedia (MM '05) (Singapore, November 6-11, 2005). ACM Press, New York, NY, 2005, 51--60.

Digital Library

Google Scholar

[8]

Haubold, A., and Kender, J. R. Alignment of Speech to Highly Imperfect Text Transcriptions. To appear in Proc. of the IEEE International Conference on Multimedia & Expo (ICME '07) (Beijing, China, July 2-5, 2007).

Google Scholar

[9]

Wang, H. L., and Chang, S. F. A Highly Efficient System for Automatic Face Region Detection in MPEG Video. In IEEE Transactions on Circuits and Systems for Video Technology, Vol. 7, No. 4 (August 1997), 615--628.

Digital Library

Google Scholar

[10]

Cutler, R., and Davis, L. Look who's talking: speaker detection using video and audio correlation. In Proc. of the IEEE International Conference on Multimedia & Expo (ICME '00) (New York, NY, July 30 - August 2, 2000). IEEE Press, New York, NY, 2000, 1589--1592.

Crossref

Google Scholar

[11]

Ivanov, Y., Stauffer, C., Bobick, A., Grimson, W. E. L. Video Surveillance of Interactions. In Proc. of the IEEE Workshop on Visual Surveillance (VS '99) (Fort Collins, CO, June 26, 1999). IEEE Press, New York, NY, 1999, 82--89.

Digital Library

Google Scholar

[12]

Haubold, A., Natsev, A., Naphade, M. R. Semantic Multimedia Retrieval Using Lexical Query Expansion and Model-based Reranking. In Proc. of the IEEE International Conference on Multimedia & Expo (ICME '06) (Toronto, Canada, July 9-12, 2006). IEEE Press, New York, NY, 2005, 1761--1764.

Google Scholar

[13]

Haubold, A., and Kender, J. R. Selection and Ranking of Text from Highly Imperfect Transcripts for Retrieval of Video Content. To appear in Proc. of SIGIR 2007 (Amsterdam, The Netherlands, July 23-27, 2007).

Digital Library

Google Scholar

Cited By

View all

Zhu WZang JTobita H(2020)Wordy: Interactive Word Cloud to Summarize and Browse Online Videos to Enhance eLearning2020 IEEE/SICE International Symposium on System Integration (SII)10.1109/SII46433.2020.9026306(879-884)Online publication date: Jan-2020
https://doi.org/10.1109/SII46433.2020.9026306
Kravčík MNicolaescu PSiddiqui AKlamma R(2017)Adaptive Video Techniques for Informal Learning Support in Workplace EnvironmentsEmerging Technologies for Education10.1007/978-3-319-52836-6_57(533-543)Online publication date: 19-Feb-2017
https://doi.org/10.1007/978-3-319-52836-6_57
Balasubramanian VDoraisamy SKanakarajan N(2016)A multimodal approach for extracting content descriptive metadata from lecture videosJournal of Intelligent Information Systems10.1007/s10844-015-0356-546:1(121-145)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s10844-015-0356-5
Show More Cited By

Index Terms

VAST MM: multimedia browser for presentation video

Recommendations

Evaluation of video browser features and user interaction with VAST MM
MM '08: Proceedings of the 16th ACM international conference on Multimedia

In this paper, we present extensive user studies on browsing and information retrieval in the domain of unstructured videos using the VAST MM video library browser. Our studies were performed over a 3-year period with more than 1,000 participants in the ...
Augmented segmentation and visualization for presentation videos
MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia

We investigate methods of segmenting, visualizing, and indexing presentation videos by both audio and visual data. The audio track is segmented by speaker, and augmented with key phrases which are extracted using an Automatic Speech Recognizer (ASR). ...
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

Human-machine interaction in meetings requires the localization and identification of the speakers interacting with the system as well as the recognition of the words spoken. A seminal step toward this goal is the field of rich transcription research, ...

Comments

Information & Contributors

Information

Published In

CIVR '07: Proceedings of the 6th ACM international conference on Image and video retrieval

July 2007

655 pages

ISBN:9781595937339

DOI:10.1145/1282280

General Chairs:
Nicu Sebe
Univ. of Amsterdam, The Netherlands
,
Marcel Worring
Univ. of Amsterdam, The Netherlands

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 July 2007

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CIVR07

Sponsor:

SIGMM

CIVR07: International Conference on Image and Video Retrieval 2007

July 9 - 11, 2007

Amsterdam, The Netherlands

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

21
Total Citations
View Citations
389
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhu WZang JTobita H(2020)Wordy: Interactive Word Cloud to Summarize and Browse Online Videos to Enhance eLearning2020 IEEE/SICE International Symposium on System Integration (SII)10.1109/SII46433.2020.9026306(879-884)Online publication date: Jan-2020
https://doi.org/10.1109/SII46433.2020.9026306
Kravčík MNicolaescu PSiddiqui AKlamma R(2017)Adaptive Video Techniques for Informal Learning Support in Workplace EnvironmentsEmerging Technologies for Education10.1007/978-3-319-52836-6_57(533-543)Online publication date: 19-Feb-2017
https://doi.org/10.1007/978-3-319-52836-6_57
Balasubramanian VDoraisamy SKanakarajan N(2016)A multimodal approach for extracting content descriptive metadata from lecture videosJournal of Intelligent Information Systems10.1007/s10844-015-0356-546:1(121-145)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/s10844-015-0356-5
Pavel AGoldman DHartmann BAgrawala MLatulipe CHartmann BGrossman T(2015)SceneSkimProceedings of the 28th Annual ACM Symposium on User Interface Software & Technology10.1145/2807442.2807502(181-190)Online publication date: 5-Nov-2015
https://dl.acm.org/doi/10.1145/2807442.2807502
Friedland GGottlieb LJanin A(2013)Narrative theme navigation for sitcoms supported by fan-generated scriptsMultimedia Tools and Applications10.1007/s11042-011-0877-z63:2(387-406)Online publication date: 1-Mar-2013
https://dl.acm.org/doi/10.1007/s11042-011-0877-z
Haesen MMeskens JLuyten KConinx KBecker JTuytelaars TPoulisse GPham PMoens M(2013)Finding a needle in a haystackMultimedia Tools and Applications10.1007/s11042-011-0809-y63:2(331-356)Online publication date: 1-Mar-2013
https://dl.acm.org/doi/10.1007/s11042-011-0809-y
Zhang JBabaguchi NAizawa KSmith JSatoh SPlagemann THua XYan R(2012)Upper body gestures in lecture videosProceedings of the 20th ACM international conference on Multimedia10.1145/2393347.2396499(1389-1392)Online publication date: 29-Oct-2012
https://dl.acm.org/doi/10.1145/2393347.2396499
Zhang JKender JFonseca DZabulis XKurti APileggi SKo HRamzan NKurti A(2012)Arm gesture variations during presentations are correlated with conjunctions indicating contrastProceedings of the 2012 ACM workshop on User experience in e-learning and augmented technologies in education10.1145/2390895.2390900(13-18)Online publication date: 2-Nov-2012
https://dl.acm.org/doi/10.1145/2390895.2390900
Adams BGreenhill SVenkatesh S(2012)Towards a Video Browser for the Digital NativeProceedings of the 2012 IEEE International Conference on Multimedia and Expo Workshops10.1109/ICMEW.2012.29(127-132)Online publication date: 9-Jul-2012
https://dl.acm.org/doi/10.1109/ICMEW.2012.29
Merler MCandan KPanchanathan SPrabhakaran BSundaram HFeng WSebe N(2011)Analysis, indexing and visualization of presentation videosProceedings of the 19th ACM international conference on Multimedia10.1145/2072298.2072499(871-872)Online publication date: 28-Nov-2011
https://dl.acm.org/doi/10.1145/2072298.2072499
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Evaluation of video browser features and user interaction with VAST MM

Augmented segmentation and visualization for presentation videos

Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information

Comments

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations