skip to main content
10.1145/1056808.1057030acmconferencesArticle/Chapter ViewAbstractPublication PageschiConference Proceedingsconference-collections
Article

Automatic video editing system using stereo-based head tracking for multiparty conversation

Published: 02 April 2005 Publication History

Abstract

This paper presents an automatic video editing system based on head tracking for multiparty conversations. Systems that record meetings and those that support teleconferences are attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional systems fail to adequately convey who is talking to whom to the viewer. We focus on the participants' head orientation since this information is useful in detecting the speaker and who the speaker is talking to. In order to automatically estimate each participant's head orientation, our system combines several modules for stereo-based head tracking. The system selects the shot of the participant that most participants are looking at, based on majority decision. Experiments confirm the effectiveness of our system in several 3-participant conversations. The results show that our system can more successfully convey who is talking to whom which is an extremely crucial piece of information that allows the viewer to better under-stand conversation content.

References

[1]
Takemae, Y., Otsuka, K., and Mukawa, N. Impact of Video Editing Rule based on Participants' Gaze in Multiparty Conversation, Ext. Abstracts CHI 2004, pp.1333--1336, 2004.
[2]
Matsumoto, Y., Ogasawara, T., and Zelinskey, A. Behavior Recognition Based on Head Pose and Gaze Direction Measurement, Proc. IEEE International Conference on Intelligent Robots and Systems, pp.262--267, 2000.
[3]
Ohno, T., Mukawa, N., and Kawato, S. Just Blink Your Eyes: A Head-Free Gaze Tracking System, Ext. abstracts of CHI '03, pp.950--951, 2003.
[4]
Stiefelhagen, R., Zhu, J. Head Orientation and Gaze Direction in Meetings, Ext. Abstracts CHI 2002, pp.858--859, 2002.
[5]
R. Cutler, et al., Distributed Meetings: A Meeting Capture and Broadcasting System, Proc. of ACMMultimedia '02, pp.503--512, 2002.
[6]
T. Inoue, K. Okada, and Y. Matsushita, Learning from TV Programs: Application of TV Presentation to a Videoconferencing System, Proc. of ACM UIST '95, pp.147--154, 1995.
[7]
M. Glenny, R. Tayler (eds), S. M. Eisenstein Selected Works Volume 2, Towards a Theory of Montage, British Film Institute, 1991.
[8]
D. Arijion, Grammar of the Film Language, Silman-James Press, Los Angeles, 1976.
[9]
B. Reeves, C. Nass, The Media Equation, CSLI Publication, 1996.
[10]
Kendon, A. Some Function of Gaze-Direction in Social Interaction, Act. Psychologica, Vol. 26, pp.22--63, 1967.
[11]
Morency, L.-P., Rahimi, A., and Darrell, T. Adaptive View-based Appearance Model, Proc. IEEE conference on Computer Vision and Pattern Recognition, pp.803--810, 2003.

Cited By

View all
  • (2021)3D Acoustic Processing and Filtering for Speech Localization in Multi-participant Remote Conversation2021 IEEE 10th Global Conference on Consumer Electronics (GCCE)10.1109/GCCE53005.2021.9621772(400-401)Online publication date: 12-Oct-2021
  • (2021)Context-based camera selection from multiple video streamsMultimedia Tools and Applications10.1007/s11042-021-11674-6Online publication date: 5-Nov-2021
  • (2014)Cloud-Based Automatic Video Editing Using KeywordsE-Business and Telecommunications10.1007/978-3-662-44791-8_14(228-241)Online publication date: 12-Sep-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CHI EA '05: CHI '05 Extended Abstracts on Human Factors in Computing Systems
April 2005
1358 pages
ISBN:1595930027
DOI:10.1145/1056808
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2005

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. archiving meetings
  2. head tracking
  3. multiparty conversation
  4. teleconferencing
  5. video editing

Qualifiers

  • Article

Conference

CHI05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025
ACM CHI Conference on Human Factors in Computing Systems
April 26 - May 1, 2025
Yokohama , Japan

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)3D Acoustic Processing and Filtering for Speech Localization in Multi-participant Remote Conversation2021 IEEE 10th Global Conference on Consumer Electronics (GCCE)10.1109/GCCE53005.2021.9621772(400-401)Online publication date: 12-Oct-2021
  • (2021)Context-based camera selection from multiple video streamsMultimedia Tools and Applications10.1007/s11042-021-11674-6Online publication date: 5-Nov-2021
  • (2014)Cloud-Based Automatic Video Editing Using KeywordsE-Business and Telecommunications10.1007/978-3-662-44791-8_14(228-241)Online publication date: 12-Sep-2014
  • (2013)Automatic video programming using LSA and audio transcript2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)10.1109/BMSB.2013.6621779(1-5)Online publication date: Jun-2013
  • (2012)Enhancing communication and dramatic impact of online live performance with cooperative audience controlProceedings of the 2012 ACM Conference on Ubiquitous Computing10.1145/2370216.2370234(103-112)Online publication date: 5-Sep-2012
  • (2012)A Cloud-Based Collaborative and Automatic Video EditorProceedings of the 2012 IEEE International Symposium on Multimedia10.1109/ISM.2012.78(380-381)Online publication date: 10-Dec-2012
  • (2010)Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor EnvironmentsHandbook of Ambient Intelligence and Smart Environments10.1007/978-0-387-93808-0_16(433-461)Online publication date: 2010
  • (2008)Improving meeting capture by applying television production principles with audio and motion detectionProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1357054.1357095(227-236)Online publication date: 6-Apr-2008
  • (2006)The Subjective Evaluation Experiments on an Automatic Video Editing System Using Vision-based Head Tracking for Multiparty ConversationsIEEJ Transactions on Electronics, Information and Systems10.1541/ieejeiss.126.435126:4(435-442)Online publication date: 2006
  • (2006)A Feature-Augmented Grammar for Automated Media ProductionProceedings of the Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution10.1109/AXMEDIS.2006.5(315-318)Online publication date: 13-Dec-2006

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media