skip to main content
10.1145/1101149.1101153acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
Article

Topic transition detection using hierarchical hidden Markov and semi-Markov models

Published: 06 November 2005 Publication History

Abstract

In this paper we introduce a probabilistic framework to exploit hierarchy, structure sharing and duration information for topic transition detection in videos. Our probabilistic detection framework is a combination of a shot classification step and a detection phase using hierarchical probabilistic models. We consider two models in this paper: the extended Hierarchical Hidden Markov Model (HHMM) and the Coxian Switching Hidden semi-Markov Model (S-HSMM) because they allow the natural decomposition of semantics in videos, including shared structures, to be modeled directly, and thus enable efficient inference and reduce the sample complexity in learning. Additionally, the S-HSMM allows the duration information to be incorporated, consequently the modeling of long-term dependencies in videos is enriched through both hierarchical and duration modeling. Furthermore, the use of Coxian distribution in the S-HSMM makes it tractable to deal with long sequences in video. Our experimentation of the proposed framework on twelve educational and training videos shows that both models outperform the baseline cases (flat HMM and HSMM) and performances reported in earlier work in topic detection. The superior performance of the S-HSMM over the HHMM verifies our belief that the duration information is an important factor in video content modeling.

References

[1]
B. Adams, C. Dorai, and S. Venkatesh. Automated film rhythm extraction for scene analysis. In IEEE International Conference on Multimedia and Expo, pages 1056--1059, Tokyo, Japan, August 2001.]]
[2]
P. Aigrain, P. Jolly, and V. Longueville. Medium knowledge-based macro-segmentation of video into sequences. In M. Maybury, editor, Intelligent Multimedia Information Retrieval, pages 159--174. AAAI Press/MIT Press, 1998.]]
[3]
H. H. Bui, D. Q. Phung, and S. Venkatesh. Hierarchical hidden markov models with general state hierarchy. In D. L. McGuinness and G. Ferguson, editors, Proceedings of the Nineteenth National Conference on Artificial Intelligence, pages 324--329, San Jose, California, USA, 2004. AAAI Press / The MIT Press.]]
[4]
L. Chaisorn, T.-S. Chua, C.-H. Lee, and Q. Tian. A hierarchical approach to story segmentation of large broadcast news video corpus. In IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, June 2004.]]
[5]
T. V. Duong, H. H. Bui, D. Q. Phung, and S. Venkatesh. Activity recognition and abnormality detection with the Switching Hidden Semi-Markov Model. In IEEE Int. Conf. on Computer Vision and Pattern Recognition, volume 1, pages 838--845, San Diego, 20-26 June 2005. IEEE Computer Society.]]
[6]
S. Fine, Y. Singer, and N. Tishby. The hierarchical hidden markov model: Analysis and applications. Machine Learning, 32(1):41--62, 1998.]]
[7]
A. Hanjalic. Shot-boundary detection: Unraveled and resolved? IEEE Transaction in Circuits and Systems for Video Technology, 12(2):90--105, 2002.]]
[8]
A. Hanjalic, R. L. Lagendijk, and J. Biemond. Automated high-level movie segmentation for advanced video retrieval systems. IEEE Transactions in Circuits and Systems for Video Technology, 9(4):580--588, 1999.]]
[9]
I. Ide, K. Yamamoto, and H. Tanaka. Automatic video indexing based on shot classification. In First International Conference on Advanced Multimedia Content Processing, pages 99--114, Osaka, Japan, November 1998.]]
[10]
U. Iurgel, R. Meermeier, S. Eickeler, and G. Rigoll. New approaches to audio-visual segmentation of TV news for automatic topic retrieval. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, volume 3, pages 1397--1400, Salt Lake City, Utah, 2001.]]
[11]
E. Kijak, L. Oisel, and P. Gros. Hierarchical structure analysis of sport videos using HMMs. In Int. Conf. on Image Processing, volume 2, pages II--1025--8 vol.3, 2003.]]
[12]
S. E. Levinson. Continuously variable duration hidden markov models for automatic speech recognition. Computer Speech and Language, 1(1):2945, March 1986.]]
[13]
T. Lin and H. J. Zhang. Automatic video scene extraction by shot grouping. Pattern Recognition, 4:39--42, 2000.]]
[14]
Z. Liu and Q. Huang. Detecting news reporting using audio/visual information. In International Conference on Image Processing, pages 24--28, Kobe, Japan, October 1999.]]
[15]
Mediaware-Company. Mediaware solution webflix professional V1.5.3, 1999. http://www.mediaware.com.au/webflix.html.]]
[16]
C. D. Mitchell and L. H. Jamieson. Modeling duration in a hidden markov model with the exponential family. In Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pages II.331--II.334, Minneapolis, Minnesota, April 1993.]]
[17]
K. Murphy and M. Paskin. Linear-time inference in hierarchical HMMs. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems, Cambridge, MA, 2001. MIT Press.]]
[18]
M. R. Naphade and T. S. Huang. Discovering recurrent events in video using unsupervised methods. In Int. Conf. om Image Processing, volume 2, pages 13--16, Rochester, NY, USA, 2002.]]
[19]
D. Q. Phung. Probabilistic and Film Grammar Based Methods for Video Content Analysis. PhD thesis, Curtin University of Technology, Australia, 2005.]]
[20]
D. Q. Phung, H. H. Bui, and S. Venkatesh. Content structure discovery in educational videos with shared structures in the hierarchical HMMs. In Joint Int. Workshop on Syntactic and Structural Pattern Recognition, pages 1155--1163, Lisbon, Portugal, August 18--20 2004.]]
[21]
D. Q. Phung and S. Venkatesh. Structural unit identification and segmentation of topical content in educational videos. Technical report, Department of Computing, Curtin University of Technology, 2005. TR-May-2005.]]
[22]
D. Q. Phung, S. Venkatesh, and H. H. Bui. Automatically learning structural units in educational videos using the hierarchical HMMs. In International Conference on Image Processing, Singapore, 2004.]]
[23]
D. Q. Phung, S. Venkatesh, and C. Dorai High level segmentation of instructional videos based on the content density function. In ACM International Conference on Multimedia, pages 295--298, Juan Les Pins, France, 1-6 December 2002.]]
[24]
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Procs. IEEE, volume 77, pages 257--286, February 1989.]]
[25]
H. A. Rowley, S. Baluja, and T. Kanade. Neutral network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(1):23--38, January 1998.]]
[26]
K. Shearer, C. Dorai, and S. Venkatesh. Incorporating domain knowlege with video and voice data analysis. In Workshop on Multimedia Data Minning, Boston, USA, August 2000.]]
[27]
J.-C. Shim, C. Dorai, and R. Bolle. Automatic text extraction from video for content-based annotation and retrieval. In International Conference on Pattern Recognition, volume 1, pages 618--620, Brisbane, Australia, August 1998.]]
[28]
C. G. Snoek and M. Worring. Multimodal video indexing: A review of the state-of-the-art. Multimedia Tools and Applications, 2004. In Press.]]
[29]
H. Sundaram. Segmentation, Structure Detection and Summarization of Multimedia Sequences. PhD thesis, Columbia University, 2002.]]
[30]
H. Sundaram and S.-F. Chang. Computable scenes and structures in films. IEEE Transactions in Multimedia, 4(4):482--491, 2002.]]
[31]
B. T. Truong. An Investigation into Structural and Expressive Elements in Film. PhD thesis, Curtin University of Technology, 2004.]]
[32]
J. Vendrig and M. Worring. Systematic evaluation of logical story unit segmentation. IEEE Transactions on Multimedia, 4(4):492--499, 2002.]]
[33]
C. Wang, Y. Wang, H. Liu, and Y. He. Automatic story segmentation of news video based on audio-visual features and text information. In Int. Conf. on Machine Learning and Cybernetics, volume 5, pages 3008--3011, 2003.]]
[34]
J. Wang, T.-S. Chua, and L. Chen. Cinematic-based model for scene boundary detection. In The Eight Conference on Multimedia Modeling, Amsterdam, Netherland, 5-7 November 2001.]]
[35]
L. Xie and S.-F. Chang. Unsupervised mining of statistical temporal structures in video. In A. Rosenfield, D. Doreman, and D. Dementhons, editors, Video Mining. Kluwer Academic Publishers, June 2003.]]
[36]
L. Xie, S.-F. Chang, A. Divakaran, and H. Sun. Learning hierarhical hidden markov models for unsupervised structure discovery from video. Technical report, Columbia University, 2002.]]
[37]
X. Zhu, L. Wu, X. Xue, X. Lu, and J. Fan. Automatic scene detection in news program by integrating visual feature and rules. In IEEE Pacific-Rim Conference on Multimedia, pages 837--842, Beijing, China, 2001.]]

Cited By

View all

Index Terms

  1. Topic transition detection using hierarchical hidden Markov and semi-Markov models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MULTIMEDIA '05: Proceedings of the 13th annual ACM international conference on Multimedia
    November 2005
    1110 pages
    ISBN:1595930442
    DOI:10.1145/1101149
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. coxian
    2. educational videos
    3. hierarchical Markov (Semi-Markov) models
    4. topic transition detection

    Qualifiers

    • Article

    Conference

    MM05

    Acceptance Rates

    MULTIMEDIA '05 Paper Acceptance Rate 49 of 312 submissions, 16%;
    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 22 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)ReferencesHidden Semi-Markov Models10.1016/B978-0-12-802767-7.00016-4(179-195)Online publication date: 2016
    • (2015)- Energy Efficiency in Physical HardwareGeneric and Energy-Efficient Context-Aware Mobile Sensing10.1201/b18058-9(120-145)Online publication date: 2-Feb-2015
    • (2015)MMToCProceedings of the 23rd ACM international conference on Multimedia10.1145/2733373.2806253(621-630)Online publication date: 13-Oct-2015
    • (2015)Hidden Semi-Markov ModelsundefinedOnline publication date: 9-Nov-2015
    • (2014)Anomaly detection in large-scale data stream networksData Mining and Knowledge Discovery10.1007/s10618-012-0297-328:1(145-189)Online publication date: 1-Jan-2014
    • (2013)Video Behavior Analysis Using Topic Models and Rough Sets [Applications Notes]IEEE Computational Intelligence Magazine10.1109/MCI.2012.22285978:1(56-67)Online publication date: 1-Feb-2013
    • (2013)Marginalized Viterbi algorithm for hierarchical hidden Markov modelsPattern Recognition10.1016/j.patcog.2013.06.00146:12(3452-3459)Online publication date: 1-Dec-2013
    • (2013)Finding the most likely upper level state sequence for hierarchical HMMsProceedings of the First international conference on Statistical Language and Speech Processing10.1007/978-3-642-39593-2_10(111-122)Online publication date: 29-Jul-2013
    • (2012)Unsupervised segmentation of hidden semi-Markov non-stationary chainsSignal Processing10.1016/j.sigpro.2011.06.00192:1(29-42)Online publication date: 1-Jan-2012
    • (2012)Event extraction using behaviors of sentiment signals and burst structure in social mediaKnowledge and Information Systems10.1007/s10115-012-0494-937:2(279-304)Online publication date: 20-Oct-2012
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media