Article

Automatic video editing system using stereo-based head tracking for multiparty conversation

Authors:

Yoshinao Takemae,

Kazuhiro Otsuka,

Junji YamatoAuthors Info & Claims

CHI EA '05: CHI '05 Extended Abstracts on Human Factors in Computing Systems

Pages 1817 - 1820

https://doi.org/10.1145/1056808.1057030

Published: 02 April 2005 Publication History

Get Access

Abstract

This paper presents an automatic video editing system based on head tracking for multiparty conversations. Systems that record meetings and those that support teleconferences are attracting considerable interest. Conventional systems use a fixed-viewpoint camera and simple camera selection based on participants' utterances. However, conventional systems fail to adequately convey who is talking to whom to the viewer. We focus on the participants' head orientation since this information is useful in detecting the speaker and who the speaker is talking to. In order to automatically estimate each participant's head orientation, our system combines several modules for stereo-based head tracking. The system selects the shot of the participant that most participants are looking at, based on majority decision. Experiments confirm the effectiveness of our system in several 3-participant conversations. The results show that our system can more successfully convey who is talking to whom which is an extremely crucial piece of information that allows the viewer to better under-stand conversation content.

References

[1]

Takemae, Y., Otsuka, K., and Mukawa, N. Impact of Video Editing Rule based on Participants' Gaze in Multiparty Conversation, Ext. Abstracts CHI 2004, pp.1333--1336, 2004.

Digital Library

Google Scholar

[2]

Matsumoto, Y., Ogasawara, T., and Zelinskey, A. Behavior Recognition Based on Head Pose and Gaze Direction Measurement, Proc. IEEE International Conference on Intelligent Robots and Systems, pp.262--267, 2000.

Crossref

Google Scholar

[3]

Ohno, T., Mukawa, N., and Kawato, S. Just Blink Your Eyes: A Head-Free Gaze Tracking System, Ext. abstracts of CHI '03, pp.950--951, 2003.

Digital Library

Google Scholar

[4]

Stiefelhagen, R., Zhu, J. Head Orientation and Gaze Direction in Meetings, Ext. Abstracts CHI 2002, pp.858--859, 2002.

Digital Library

Google Scholar

[5]

R. Cutler, et al., Distributed Meetings: A Meeting Capture and Broadcasting System, Proc. of ACMMultimedia '02, pp.503--512, 2002.

Digital Library

Google Scholar

[6]

T. Inoue, K. Okada, and Y. Matsushita, Learning from TV Programs: Application of TV Presentation to a Videoconferencing System, Proc. of ACM UIST '95, pp.147--154, 1995.

Digital Library

Google Scholar

[7]

M. Glenny, R. Tayler (eds), S. M. Eisenstein Selected Works Volume 2, Towards a Theory of Montage, British Film Institute, 1991.

Google Scholar

[8]

D. Arijion, Grammar of the Film Language, Silman-James Press, Los Angeles, 1976.

Google Scholar

[9]

B. Reeves, C. Nass, The Media Equation, CSLI Publication, 1996.

Google Scholar

[10]

Kendon, A. Some Function of Gaze-Direction in Social Interaction, Act. Psychologica, Vol. 26, pp.22--63, 1967.

Crossref

Google Scholar

[11]

Morency, L.-P., Rahimi, A., and Darrell, T. Adaptive View-based Appearance Model, Proc. IEEE conference on Computer Vision and Pattern Recognition, pp.803--810, 2003.

Digital Library

Google Scholar

Cited By

View all

Nakamura JSakazawa S(2021)3D Acoustic Processing and Filtering for Speech Localization in Multi-participant Remote Conversation2021 IEEE 10th Global Conference on Consumer Electronics (GCCE)10.1109/GCCE53005.2021.9621772(400-401)Online publication date: 12-Oct-2021
https://doi.org/10.1109/GCCE53005.2021.9621772
Lefevre FBombardier VCharpentier PKrommenacker N(2021)Context-based camera selection from multiple video streamsMultimedia Tools and Applications10.1007/s11042-021-11674-6Online publication date: 5-Nov-2021
https://doi.org/10.1007/s11042-021-11674-6
Outtagarts ASquedin SMartinot O(2014)Cloud-Based Automatic Video Editing Using KeywordsE-Business and Telecommunications10.1007/978-3-662-44791-8_14(228-241)Online publication date: 12-Sep-2014
https://doi.org/10.1007/978-3-662-44791-8_14
Show More Cited By

Index Terms

Automatic video editing system using stereo-based head tracking for multiparty conversation
1. Human-centered computing
  1. Collaborative and social computing
    1. Collaborative and social computing theory, concepts and paradigms
      1. Computer supported cooperative work
2. Social and professional topics
  1. Professional topics
    1. Computing and business
      1. Computer supported cooperative work

Recommendations

Impact of video editing based on participants' gaze in multiparty conversation
CHI EA '04: CHI '04 Extended Abstracts on Human Factors in Computing Systems

This paper presents a video cut editing rule based on participants' gaze for establishing video editing rules that can accurately and clearly convey the flow of conversation in multiparty conversations to viewers. Demand is growing to be able to ...
Video cut editing rule based on participants' gaze in multiparty conversation
MULTIMEDIA '03: Proceedings of the eleventh ACM international conference on Multimedia

This paper proposes a video cut editing rule based on participants' gaze for extracting and conveying the flow of conversation in multiparty conversation. Systems that record meetings and those that support teleconferences are attracting considerable ...
Improved Gazing Transition Patterns for Predicting Turn-Taking in Multiparty Conversation
ICVIP '21: Proceedings of the 2021 5th International Conference on Video and Image Processing

Turn-taking is an important attribute of conversation. Non-verbal behavior is very important for analyzing the turn-taking in multi-party conversation. In this study, we focused on the gaze behavior and improved the framework for predicting turn by ...

Comments

Information & Contributors

Information

Published In

CHI EA '05: CHI '05 Extended Abstracts on Human Factors in Computing Systems

April 2005

1358 pages

ISBN:1595930027

DOI:10.1145/1056808

Conference Chair:
Gerrit van der Veer
Vrije Universiteit, The Netherlands
,
Program Chair:
Carolyn Gale
Stanford University, Stanford, CA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 April 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

CHI05

Sponsor:

CHI05: CHI 2005 Conference on Human Factors in Computing Systems

April 2 - 7, 2005

OR, Portland, USA

Acceptance Rates

Overall Acceptance Rate 6,164 of 23,696 submissions, 26%

Upcoming Conference

CHI 2025

Sponsor:
sigchi

ACM CHI Conference on Human Factors in Computing Systems

April 26 - May 1, 2025

Yokohama , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
372
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)0

Reflects downloads up to 20 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Nakamura JSakazawa S(2021)3D Acoustic Processing and Filtering for Speech Localization in Multi-participant Remote Conversation2021 IEEE 10th Global Conference on Consumer Electronics (GCCE)10.1109/GCCE53005.2021.9621772(400-401)Online publication date: 12-Oct-2021
https://doi.org/10.1109/GCCE53005.2021.9621772
Lefevre FBombardier VCharpentier PKrommenacker N(2021)Context-based camera selection from multiple video streamsMultimedia Tools and Applications10.1007/s11042-021-11674-6Online publication date: 5-Nov-2021
https://doi.org/10.1007/s11042-021-11674-6
Outtagarts ASquedin SMartinot O(2014)Cloud-Based Automatic Video Editing Using KeywordsE-Business and Telecommunications10.1007/978-3-662-44791-8_14(228-241)Online publication date: 12-Sep-2014
https://doi.org/10.1007/978-3-662-44791-8_14
Outtagarts AMbodj A(2013)Automatic video programming using LSA and audio transcript2013 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)10.1109/BMSB.2013.6621779(1-5)Online publication date: Jun-2013
https://doi.org/10.1109/BMSB.2013.6621779
Yonezawa TTokuda HDey AChu HHayes G(2012)Enhancing communication and dramatic impact of online live performance with cooperative audience controlProceedings of the 2012 ACM Conference on Ubiquitous Computing10.1145/2370216.2370234(103-112)Online publication date: 5-Sep-2012
https://dl.acm.org/doi/10.1145/2370216.2370234
Outtagarts AMbodj A(2012)A Cloud-Based Collaborative and Automatic Video EditorProceedings of the 2012 IEEE International Symposium on Multimedia10.1109/ISM.2012.78(380-381)Online publication date: 10-Dec-2012
https://dl.acm.org/doi/10.1109/ISM.2012.78
Gatica-Perez DOdobez J(2010)Visual Attention, Speaking Activity, and Group Conversational Analysis in Multi-Sensor EnvironmentsHandbook of Ambient Intelligence and Smart Environments10.1007/978-0-387-93808-0_16(433-461)Online publication date: 2010
https://doi.org/10.1007/978-0-387-93808-0_16
Ranjan ABirnholtz JBalakrishnan RCzerwinski MLund ATan D(2008)Improving meeting capture by applying television production principles with audio and motion detectionProceedings of the SIGCHI Conference on Human Factors in Computing Systems10.1145/1357054.1357095(227-236)Online publication date: 6-Apr-2008
https://dl.acm.org/doi/10.1145/1357054.1357095
Takemae YOtsuka KYamato JOzawa S(2006)The Subjective Evaluation Experiments on an Automatic Video Editing System Using Vision-based Head Tracking for Multiparty ConversationsIEEJ Transactions on Electronics, Information and Systems10.1541/ieejeiss.126.435126:4(435-442)Online publication date: 2006
https://doi.org/10.1541/ieejeiss.126.435
Choi FBeales RHearn JMiddleton SAddis MMangos C(2006)A Feature-Augmented Grammar for Automated Media ProductionProceedings of the Second International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution10.1109/AXMEDIS.2006.5(315-318)Online publication date: 13-Dec-2006
https://dl.acm.org/doi/10.1109/AXMEDIS.2006.5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Impact of video editing based on participants' gaze in multiparty conversation

Video cut editing rule based on participants' gaze in multiparty conversation

Improved Gazing Transition Patterns for Predicting Turn-Taking in Multiparty Conversation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations