ABSTRACT
Naming every individual person appearing in broadcast news videos with names detected from the video transcript leads to better access of the news video content. In this paper, we approach this challenging problem with a statistical learning method. Two categories of information extracted from multiple video modalities have been explored, namely <i>features</i>, which help distinguish the true name of every person, as well as <i>constraints</i>, which reveal the relationships among the names of different persons. The person-naming problem is formulated into a learning framework which predicts the most likely name for each person based on the features, and refines the predictions using the constraints. Experiments conducted on ABC World New Tonight and CNN Headline News videos demonstrate that this approach outperforms a non-learning alternative by a large amount.
- Bikel, D. M., Miller, S., Schwartz, R., and Weischedel, R., Nymble: a high-performance learning name-finder. In Proc. 5th Conf. on Applied Natural Language Processing, 1997, pp. 194--201. Google ScholarDigital Library
- Berg, T., Berg, A., Edwards, J., Maire, M., White, R., Teh, Y.W., Miller, E., Foryth, D. Names and Faces in the News. In Proc. of Computer Vision and Pattern Recognition, Vol.2, pp. 848--854, 2004. Google ScholarDigital Library
- Burges, C. J. C. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):121--167, 1998. Google ScholarDigital Library
- Gauvain, J.L., Lamel, L., and Adda, G. The LIMSI broadcast news transcription system. Speech Communication, 37(1-2): 89--108, 2002. Google ScholarDigital Library
- Hauptmann, A., Witbrock, M. Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval. Intelligent Multimedia Information Retrieval, Mark T. Maybury, Ed., AAAI Press, pp. 213--239, 1997. Google ScholarDigital Library
- Houghton, R. Named Faces: Putting Names to Faces. In IEEE Intelligent Systems Magazine, 14(5): 45--50, 1999. Google ScholarDigital Library
- Rong, Y., Zhang, J., Yang, J. and Hauptmann, A. A Discriminative Learning Framework with Pair-wise Constraints for Video Object Classification. In Proc. of Computer Vision and Pattern Recognition, Vol.2, pp. 284--291, 2004. Google ScholarDigital Library
- Sato, T., Kanade, T., Hughes, E. K., Smith, M. A., Satoh, S. Video OCR: Indexing digital news libraries by recognition of superimposed caption. ACM Multimedia Systems, 7(5): 385--395, 1999. Google ScholarDigital Library
- Satoh, S., Y., Kanade, T. NAME-IT: Association of Faces and Names in Video. In Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 1997, pp. 368--373. Google ScholarDigital Library
- Schneiderman, H., Kanade, T. Object detection using the statistics of parts. Int'l J. of Comp. Vision, 56(3): 151--177, 2002. Google ScholarDigital Library
- Snoek, C.G.M. and Hauptmann, A. Learning to identify TV news monologues by style and context. Technical Report, CMU-CS-03-193, Carnegie Mellon University, 2003.Google Scholar
- TRECVID: TREC Video Retrieval Evaluation: http://www-nlpir.nist.gov/projects/trecvid/.Google Scholar
- Yang, J., Chen, M.Y., Hauptmann, A. Finding Person X: Correlating Names with Visual Appearances. Int'l Conf. on Image and Video Retrieval, 2004. (To appear)Google Scholar
- Zhang, H.J., Tan, S.Y., Smoliar, S.W., Gong, Y.H. Automatic parsing and indexing of news video. In Multimedia Systems, 2(6): 256--266, 1995. Google ScholarDigital Library
- Zhang, L., Chen, L.B., Li, M.J., Zhang, H.J. Automated annotation of human faces in family albums. In Proc. of 11th ACM Int'l Conf. on Multimedia, 2003, pp. 355 -- 358. Google ScholarDigital Library
- Zhao, W.Y., Chellappa, R., Phillips, P.J., and Rosenfeld, A. Face Recognition: A Literature Survey. In ACM Computing Survey, Dec. Issue, pp. 399--458, 2003. Google ScholarDigital Library
Index Terms
- Naming every individual in news video monologues
Recommendations
Why We Watch the News: A Dataset for Exploring Sentiment in Broadcast Video News
ICMI '14: Proceedings of the 16th International Conference on Multimodal InteractionWe present a multimodal sentiment study performed on a novel collection of videos mined from broadcast and cable television news programs. To the best of our knowledge, this is the first dataset released for studying sentiment in the domain of broadcast ...
Naming faces in broadcast news video by image google
MM '08: Proceedings of the 16th ACM international conference on MultimediaNaming faces is important for news videos browsing and indexing. Although some research efforts have been contributed to it, they only use the concurrent information between the face and name or employ some clues as features and use simple heuristic ...
Disambiguating toponyms in news
HLT '05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language ProcessingThis research is aimed at the problem of disambiguating toponyms (place names) in terms of a classification derived by merging information from two publicly available gazetteers. To establish the difficulty of the problem, we measured the degree of ...
Comments