|
ABSTRACT
A typical way to perform video annotation requires to classify video elements (e.g. events and objects) according to some pre-defined ontology of the video content domain. Ontologies are defined by establishing relationships between linguistic terms that specify domain concepts at different abstraction levels. However, although linguistic terms are appropriate to distinguish event and object categories, they are inadequate when they must describe specific or complex patterns of events or video entities. Instead, in these cases, pattern specifications can be better expressed using visual prototypes, either images or video clips, that capture the essence of the event or entity. Therefore enhanced ontologies, that include both visual and linguistic concepts, can be useful to support video annotation up to the level of detail of pattern specification.This paper presents algorithms and techniques that employ enriched ontologies for video annotation and retrieval, and discusses a solution for their implementation for the soccer video domain. An unsupervised clustering method is proposed in order to create pictorially enriched ontologies by defining visual prototypes that represent specific patterns of highlights and adding them as visual concepts to the ontology.Two algorithms that use pictorially enriched ontologies to perform automatic soccer video annotation are proposed and results for typical highlights are presented. Annotation is performed associating occurrences of events, or entities, to higher level concepts by checking their similarity to visual concepts that are hierarchically linked to higher level semantics, using a dynamic programming approach.Usage of reasoning on the ontology is shown, to perform higher-level annotation of the clips using the domain knowledge and to create complex queries that comprise visual prototypes of actions, their temporal evolution and relations.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
| |
3
|
A. Benitez and S.-F. Chang. Automatic multimedia knowledge discovery, summarization and evaluation. IEEE Transactions on Multimedia, Submitted, 2003.
|
| |
4
|
M. Bertini, C. Colombo, and A. Del Bimbo. Automatic caption localization in videos using salient points. In Proc. of IEEE Int'l Conference on Multimedia & Expo, 2001.
|
| |
5
|
M. Bertini, R. Cucchiara, A. Del Bimbo, and C. Torniai. Video annotation with pictorially enriched ontologies. In Proc. of IEEE Int'l Conference on Multimedia & Expo, 2005.
|
| |
6
|
|
| |
7
|
W. W. W. Consortium. Resource description framework (rdf). Technical report, W3C, http://www.w3.org/RDF/, Feb 2004.
|
| |
8
|
W. W. W. Consortium. Web ontology language (owl). Technical report, W3C, http://www.w3.org/2004/OWL/, 2004.
|
| |
9
|
S. Dasiopoulou, V. Mezaris, I. Kompatsiaris, V. K. Papastathis, and M. G. Strintzis. Knowledge-assisted semantic video object detection. IEEE Transactions on Circuits and Systems for Video Technology, Accepted for future publication.
|
| |
10
|
A. Ekin, A. M. Tekalp, and R. Mehrotra. Automatic soccer video analysis and summarization. IEEE Transactions on Image Processing, 12(7):796--807, July 2003.
|
| |
11
|
V. Haarslev and R. Möller. Description of the racer system and its applications. In Proceedings International Workshop on Description Logics (DL-2001), Stanford, USA, 1.-3. August, pages 131--141, 2001.
|
| |
12
|
V. Haarslev, R. Möller, and M. Wessel. Querying the semantic web with racer + nrql. In Proceedings of the KI-2004 International Workshop on Applications of Description Logics (ADL'04), Ulm, Germany, September 24, 2004.
|
| |
13
|
A. Jaimes and J. Smith. Semi-automatic, data-driven construction of multimedia ontologies. In Proc. of IEEE Int'l Conference on Multimedia & Expo, 2003.
|
| |
14
|
A. Jaimes, B. Tseng, and J. Smith. Modal keywords, ontologies, and reasoning for video understanding. In International Conference on Image and Video Retrieval (CIVR 2003), July 2003.
|
| |
15
|
|
| |
16
|
V. Mezaris, I. Kompatsiaris, N. Boulgouris, and M. Strintzis. Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 14(5):606--621, 2004.
|
| |
17
|
D. Reidsma, J. Kuper, T. Declerck, H. Saggion, and H. Cunningham. Cross document ontology based information extraction for multimedia retrieval. In Supplementary proceedings of the ICCS03, Dresden, July 2003.
|
| |
18
|
J. Strintzis, S. Bloehdorn, S. Handschuh, S. Staab, N. Simou, V. Tzouvaras, K. Petridis, I. Kompatsiaris, and Y. Avrithis. Knowledge representation for semantic multimedia content analysis and reasoning. In European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, Nov. 2004.
|
 |
19
|
Xinguo Yu , Changsheng Xu , Hon Wai Leong , Qi Tian , Qing Tang , Kong Wah Wan, Trajectory-based ball detection and tracking with applications to semantic analysis of broadcast soccer video, Proceedings of the eleventh ACM international conference on Multimedia, November 02-08, 2003, Berkeley, CA, USA
[doi> 10.1145/957013.957018]
|
CITED BY 2
|
|
|
Marco Bertini , Alberto Del Bimbo , Carlo Torniai , Rita Cucchiara , Costantino Grana, MOM: multimedia ontology manager. A framework for automatic annotation and semantic retrieval of video sequences, Proceedings of the 14th annual ACM international conference on Multimedia, October 23-27, 2006, Santa Barbara, CA, USA
|
|