skip to main content
research-article

Tools for placing cuts and transitions in interview video

Published: 01 July 2012 Publication History

Abstract

We present a set of tools designed to help editors place cuts and create transitions in interview video. To help place cuts, our interface links a text transcript of the video to the corresponding locations in the raw footage. It also visualizes the suitability of cut locations by analyzing the audio/visual features of the raw footage to find frames where the speaker is relatively quiet and still. With these tools editors can directly highlight segments of text, check if the endpoints are suitable cut locations and if so, simply delete the text to make the edit. For each cut our system generates visible (e.g. jump-cut, fade, etc.) and seamless, hidden transitions. We present a hierarchical, graph-based algorithm for efficiently generating hidden transitions that considers visual features specific to interview footage. We also describe a new data-driven technique for setting the timing of the hidden transition. Finally, our tools offer a one click method for seamlessly removing 'ums' and repeated words as well as inserting natural-looking pauses to emphasize semantic content. We apply our tools to edit a variety of interviews and also show how they can be used to quickly compose multiple takes of an actor narrating a story.

Supplementary Material

JPG File (tp168_12.jpg)
ZIP File (a67-berthouzoz.zip)
Supplemental material.
MP4 File (tp168_12.mp4)

References

[1]
Abel, J., and Glass, I. 1999. Radio: An illustrated guide. WBEZ Alliance Inc.
[2]
Agarwala, A., Zheng, K., Pal, C., Agrawala, M., Cohen, M., Curless, B., Salesin, D., and Szeliski, R. 2005. Panoramic video textures. Proc. SIGGRAPH 24, 3, 821--827.
[3]
Arya, S., Mount, D., Netanyahu, N., Silverman, R., and Wu, A. 1998. An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of ACM 45, 6, 891--923.
[4]
Athitsos, V., Alon, J., Sclaroff, S., and Kollios, G. 2004. BoostMap: A method for efficient approximate similarity rankings. Proc. CVPR, II:268--II:275.
[5]
Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. In Proc. SIGGRAPH, 187--194.
[6]
Boreczky, J., and Rowe, L. 1996. Comparison of video shot boundary detection techniques. JEI 5, 2, 122--128.
[7]
Bregler, C., and Omohundro, S. 1995. Nonlinear manifold learning for visual speech recognition. Proc. ICCV, 494--499.
[8]
Bregler, C., Covell, M., and Slaney, M. 1997. Video rewrite: Driving visual speech with audio. In Proc. SIGGRAPH, 353--360.
[9]
Brox, T., Bruhn, A., Papenberg, N., and Weickert, J. 2004. High accuracy optical flow estimation based on a theory for warping. Proc. ECCV, 25--36.
[10]
Casares, J., Long, A., Myers, B., Bhatnagar, R., Stevens, S., Dabbish, L., Yocum, D., and Corbett, A. 2002. Simplifying video editing using metadata. In Proc. DIS, 157--166.
[11]
Dalal, N., and Triggs, B. 2005. Histograms of oriented gradients for human detection. In Proc. CVPR, 886--893.
[12]
Dale, K., Sunkavalli, K., Johnson, M., Vlasic, D., Matusik, W., and Pfister, H. 2011. Video face replacement. Proc. SIGGRAPH ASIA 30, 6, 130:1--130:10.
[13]
Dragicevic, P., Ramos, G., Bibliowitcz, J., Nowrouzezahrai, D., Balakrishnan, R., and Singh, K. 2008. Video browsing by direct manipulation. Proc. CHI, 237--246.
[14]
Fowlkes, C., Belongie, S., Chung, F., and Malik, J. 2004. Spectral grouping using the nystrom method. PAMI 26, 2, 214--225.
[15]
Girgensohn, A., Boreczky, J., Chiu, P., Doherty, J., Foote, J., Golovchinsky, G., Uchihashi, S., and Wilcox, L. 2000. A semi-automatic approach to home video editing. Proc. UIST, 81--89.
[16]
Goldman, D., Gonterman, C., Curless, B., Salesin, D., and Seitz, S. 2008. Video object annotation, navigation, and composition. Proc. UIST, 3--12.
[17]
Gomes, J. 1999. Warping and morphing of graphical objects, vol. 1. Morgan Kaufmann.
[18]
Karrer, T., Weiss, M., Lee, E., and Borchers, J. 2008. DRAGON: A direct manipulation interface for frame-accurate in-scene video navigation. Proc. CHI, 247--250.
[19]
Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., and Seitz, S. 2010. Being John Malkovich. Proc. ECCV, 341--353.
[20]
Kemelmacher-Shlizerman, I., Shechtman, E., Garg, R., and Seitz, S. 2011. Exploring photobios. ACM Trans. on Graph. (Proc. SIGGRAPH) 30, 4, 61:1--61:10.
[21]
Kwatra, V., Schodl, A., Essa, I., Turk, G., and Bobick, A. 2003. Graphcut textures: Image and video synthesis using graph cuts. Proc. SIGGRAPH 22, 3, 277--286.
[22]
Mahajan, D., Huang, F., Matusik, W., Ramamoorthi, R., and Belhumeur, P. 2009. Moving gradients: A path-based method for plausible image interpolation. Proc. SIGGRAPH 28, 3, 42:1--42:11.
[23]
O'Steen, B. 2009. The Invisible Cut: How Editors Make Movie Magic. Michael Wiese Productions.
[24]
Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. 1998. Synthesizing realistic facial expressions from photographs. Proc. SIGGRAPH, 75--84.
[25]
Potamianos, G., Neti, C., Gravier, G., Garg, A., and Senior, A. 2003. Recent advances in the automatic recognition of audiovisual speech. Proc. IEEE 91, 9, 1306--1326.
[26]
Ranjan, A., Birnholtz, J., and Balakrishnan, R. 2008. Improving meeting capture by applying television production principles with audio and motion detection. In Proc. CHI, ACM, 227--236.
[27]
Saragih, J., Lucey, S., and Cohn, J. 2009. Face alignment through subspace constrained mean-shifts. ICCV, 1034--1041.
[28]
Schödl, A., and Essa, I. 2002. Controlled animation of video sprites. In Proc. SCA, 121--127.
[29]
Schödl, A., Szeliski, R., Salesin, D., and Essa, I. 2000. Video textures. Proc. SIGGRAPH, 489--498.
[30]
Shechtman, E., Rav-Acha, A., Irani, M., and Seitz, S. 2010. Regenerative morphing. Proc. CVPR, 615--622.
[31]
Truong, B., and Venkatesh, S. 2007. Video abstraction: A systematic review and classification. ACM TOMCCAP 3, 1.
[32]
Ueda, H., Miyatake, T., and Yoshizawa, S. 1991. IMPACT: An interactive natural-motion-picture dedicated multimedia authoring system. Proc. CHI, 343--350.
[33]
Virage. Audio analysis. http://www.virage.com/.
[34]
Wexler, Y., Shechtman, E., and Irani, M. 2007. Space-time completion of video. PAMI 29, 3, 463--476.
[35]
Zhang, H., Low, C., Smoliar, S., and Wu, J. 1995. Video parsing, retrieval and browsing: an integrated and content-based solution. Proc. Multimedia, 15--24.
[36]
Zhang, L., Snavely, N., Curless, B., and Seitz, S. 2004. Spacetime faces: High resolution capture for modeling and animation. Proc. SIGGRAPH, 548--558.

Cited By

View all
  • (2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024
  • (2024)Improving AI-assisted video editing: Optimized footage analysis through multi-task learningNeurocomputing10.1016/j.neucom.2024.128485(128485)Online publication date: Aug-2024
  • (2024)Evaluating the Efficacy of Automated Video Editing in Educational Content Production: A Time Efficiency and Learner Perspective StudyLearning and Collaboration Technologies10.1007/978-3-031-61672-3_15(234-246)Online publication date: 29-Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Graphics
ACM Transactions on Graphics  Volume 31, Issue 4
July 2012
935 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2185520
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 July 2012
Published in TOG Volume 31, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. human-computer interfaces
  2. interaction

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)47
  • Downloads (Last 6 weeks)9
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)ExpressEdit: Video Editing with Natural Language and SketchingProceedings of the 29th International Conference on Intelligent User Interfaces10.1145/3640543.3645164(515-536)Online publication date: 18-Mar-2024
  • (2024)Improving AI-assisted video editing: Optimized footage analysis through multi-task learningNeurocomputing10.1016/j.neucom.2024.128485(128485)Online publication date: Aug-2024
  • (2024)Evaluating the Efficacy of Automated Video Editing in Educational Content Production: A Time Efficiency and Learner Perspective StudyLearning and Collaboration Technologies10.1007/978-3-031-61672-3_15(234-246)Online publication date: 29-Jun-2024
  • (2023)Eventfulness for Interactive Video AlignmentACM Transactions on Graphics10.1145/359211842:4(1-10)Online publication date: 26-Jul-2023
  • (2023)AVscript: Accessible Video Editing with Audio-Visual ScriptsProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581494(1-17)Online publication date: 19-Apr-2023
  • (2023)Beyond Instructions: A Taxonomy of Information Types in How-to VideosProceedings of the 2023 CHI Conference on Human Factors in Computing Systems10.1145/3544548.3581126(1-21)Online publication date: 19-Apr-2023
  • (2023)Match Cutting: Finding Cuts with Smooth Visual Transitions2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00215(2114-2124)Online publication date: Jan-2023
  • (2023)LEMMS: Label Estimation of Multi-feature Movie Segments2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00325(3019-3027)Online publication date: 2-Oct-2023
  • (2022)Record Once, Post Everywhere: Automatic Shortening of Audio Stories for Social MediaProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545680(1-11)Online publication date: 29-Oct-2022
  • (2022)Synthesis-Assisted Video Prototyping From a DocumentProceedings of the 35th Annual ACM Symposium on User Interface Software and Technology10.1145/3526113.3545676(1-10)Online publication date: 29-Oct-2022
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media