ABSTRACT
The human face is one of the most important objects in videos since it provides rich information for spotting certain people of interest, such as government leaders in news video, or the hero in a movie, and is the basis for interpreting facts. Therefore, detecting and recognizing faces appearing in video are essential tasks of many video indexing and retrieval applications. Due to large variations in pose changes, illumination conditions, occlusions, hairstyles, and facial expressions, robust face matching has been a challenging problem. In addition, when the number of faces in the dataset is huge, e.g. tens of millions of faces, a scalable method for matching is needed. To this end, we propose an efficient method for face retrieval in large video datasets. In order to make the face retrieval robust, the faces of the same person appearing in individual shots are grouped into a single face track by using a reliable tracking method. The retrieval is done by computing the similarity between face tracks in the databases and the input face track. For each face track, we select one representative face and the similarity between two face tracks is the similarity between their two representative faces. The representative face is the mean face of a subset selected from the original face track. In this way, we can achieve high accuracy in retrieval while maintaining low computational cost. For experiments, we extracted approximately 20 million faces from 370 hours of TRECVID video, of which scale has never been addressed by the former attempts. The results evaluated on a subset consisting of manually annotated 457,320 faces show that the proposed method is effective and scalable.
- T. L. Berg, A. C. Berg, J. Edwards, and D. A. Forsyth. Who's in the picture? In Advances in Neural Information Processing Systems, 2004.Google Scholar
- M. Everingham, J. Sivic, and A. Zisserman. "Hello, My name is... Buffy" - automatic naming of charecters in tv video. In Proc. British Machine Vision Conf., pages 899--908, 2006.Google Scholar
- A. Hadid and M. Pietikainen. From still image to video-based face recognition: An experimental analysis. In Proc. Intl. Conf. on Automatic Face and Gesture Recognition, pages 813--818, 2004. Google ScholarDigital Library
- P. Indyk and R. Motwani. Approximate nearest neighbor - towards removing the curse of dimensionality. In Proc. 30th Symposium on Theory of Computing, pages 604--613, 1998. Google ScholarDigital Library
- D.-D. Le, S. Satoh, M. Houle, and D. Nguyen. Finding important people in large news video databases using multimodal and clustering analysis. In Proc. 2nd IEEE Intl. Workshop on Multimedia Databases and Data Management, pages 127--136, 2007. Google ScholarDigital Library
- X. Liu and T. Chen. Video-based face recognition using adaptive hidden markov models. In Proc. Intl. Conf. on Computer Vision and Pattern Recognition, volume 1, pages 340--345, 2003. Google ScholarDigital Library
- T. Ngo, D.-D. Le, S. Satoh, and D. Duong. Robust face track finding in video using tracked points. In Proc. Intl. Conf. on Signal-Image Technology & Internet-Based Systems, pages 59--64, 2008. Google ScholarDigital Library
- T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. on Pattern Analysis and Machine Intelligence, 24(7):971--987, 2002. Google ScholarDigital Library
- D. Ramanan, S. Baker, and S. Kakade. Leveraging archival video for building face datasets. In Proc. Intl. Conf. on Computer Vision, volume 1, pages 1--8, 2007.Google ScholarCross Ref
- S. Satoh and N. Katayama. An efficient implementation and evaluation of robust face sequence matching. In Proc. 10th Intl. Conf. on Image Analysis and Processing, pages 266--271. Google ScholarDigital Library
- S. Satoh, Y. Nakamura, and T. Kanade. Name-it: Naming and detecting faces in news videos. IEEE Multimedia, 6(1):22--35, 1999. Google ScholarDigital Library
- J. Sivic, M. Everingham, and A. Zisserman. Person spotting: Video shot retrieval for face sets. In Proc. Int. Conf. on Image and Video Retrieval, pages 226--236, 2005. Google ScholarDigital Library
- J. Sivic, M. Everingham, and A. Zisserman. "Who are you?" - learning person specific classifiers from video. In Proc. Intl. Conf. on Computer Vision and Pattern Recognition, pages 1145--1152, 2009.Google ScholarCross Ref
- A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and trecvid. In MIR '06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321--330, 2006. Google ScholarDigital Library
- P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proc. Intl. Conf. on Computer Vision and Pattern Recognition, volume 1, pages 511--518, 2001.Google ScholarCross Ref
Index Terms
- An efficient method for face retrieval from large video datasets
Recommendations
Face image retrieval using sparse representation classifier with gabor-LBP histogram
WISA'10: Proceedings of the 11th international conference on Information security applicationsFace image retrieval is an important issue in the practical applications such as mug shot searching and surveillance systems. However, it is still a challenging problem because face images are fairly similar due to the same geometrical configuration of ...
Face matching and retrieval using soft biometrics
Soft biometric traits embedded in a face (e.g., gender and facial marks) are ancillary information and are not fully distinctive by themselves in face-recognition tasks. However, this information can be explicitly combined with face matching score to ...
Pose‐invariant face recognition based on matching the occlusion free regions aligned by 3D generic model
Face recognition systems perform accurately in a controlled environment, but an unconstrained environment dramatically degrades their performance. In this study, a novel pose‐invariant face recognition system is proposed based on the occlusion free ...
Comments