|
ABSTRACT
To better understand the content of multimedia, a lot of research efforts have been made on how to learn from multi-modal feature. In this paper, it is studied from a graph point of view: each kind of feature from one modality is represented as one independent graph; and the learning task is formulated as inferring from the constraints in every graph as well as supervision information (if available). For semi-supervised learning, two different fusion schemes, namely linear form and sequential form, are proposed. For each scheme, it is derived from optimization point of view; and further justified from two sides: similarity propagation and Bayesian interpretation. By doing so, we reveal the regular optimization nature, transductive learning nature as well as prior fusion nature of the proposed schemes, respectively. Moreover, the proposed method can be easily extended to unsupervised learning, including clustering and embedding. Systematic experimental results validate the effectiveness of the proposed method.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
|
 |
3
|
|
 |
4
|
Deng Cai , Xiaofei He , Zhiwei Li , Wei-Ying Ma , Ji-Rong Wen, Hierarchical clustering of WWW image search results using visual, textual and link information, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027747]
|
| |
5
|
|
| |
6
|
Dupont, S., and Luettin, J. Audio-visual speech modeling for continuous speech recognition. IEEE Trans. on Multimedia, 2(3): 141--151, 2000.
|
 |
7
|
|
| |
8
|
Garg, A., Potamianos, G., Neti, C., and Huang, T.S. Frame-dependent multi-stream reliability indications for audio-visual speech recognition, Proc. of Int. Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 24--27, 2003.
|
| |
9
|
|
 |
10
|
Jingrui He , Mingjing Li , Hong-Jiang Zhang , Hanghang Tong , Changshui Zhang, Manifold-ranking based image retrieval, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027531]
|
| |
11
|
Heckmann, M., Berthommier, F., and Kroschel, K. Noise adaptive stream weighting in audio-visual speech recognition, EURASIP Journal on Applied Signal Process, pp. 1260--1273, 2002.
|
| |
12
|
|
| |
13
|
Kailing, K., Kriegel, H., Pryakhin, A., and Schubert, M. Clustering multi-represented objects with noise. Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp. 394--403, 2004.
|
| |
14
|
Kittler, J., Hatef, M., and Duin, R.P.W. Combining classifiers. Pattern Recognition, pp. 897--901, 1996.
|
| |
15
|
|
| |
16
|
Ng, A.Y., Jordan, M.I., and Weiss, Y. On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2001.
|
 |
17
|
|
| |
18
|
|
| |
19
|
|
| |
20
|
Reference removed for double-blind review
|
| |
21
|
Tamura, H., Mori, S., and Yamawaki, T. Textural features corresponding to visual perception. IEEE Trans. on Systems., Man and Cybernetics, pp. 460--472, 1978.
|
| |
22
|
The WebKB dataset. http://meganesia.int.gu.edu.au/~phmartin/WebKB/.
|
 |
23
|
Jidong Wang , Huajun Zeng , Zheng Chen , Hongjun Lu , Li Tao , Wei-Ying Ma, ReCoM: reinforcement clustering of multi-type interrelated data objects, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, July 28-August 01, 2003, Toronto, Canada
[doi> 10.1145/860435.860486]
|
 |
24
|
Yi Wu , Edward Y. Chang , Kevin Chen-Chuan Chang , John R. Smith, Optimal multimodal fusion for multimedia data analysis, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027665]
|
 |
25
|
|
| |
26
|
Yi, X. Zhang, C, and Wang, J. Multi-view EM algorithm and its application to color image segmentation. IEEE Int. Conf. on Multimedia and Expo, pp. 351--354, 2004.
|
 |
27
|
Xin Zheng , Deng Cai , Xiaofei He , Wei-Ying Ma , Xueyin Lin, Locality preserving clustering for image database, Proceedings of the 12th annual ACM international conference on Multimedia, October 10-16, 2004, New York, NY, USA
[doi> 10.1145/1027527.1027731]
|
| |
28
|
Zhou, D., and Schölkopf, B. A regularization framework for learning from graph data. Workshop on Statistical Relational Learning at Int. Conf. on Machine Learning, pp. 132--137, 2004.
|
| |
29
|
Zhou, D., and Schölkopf, B. Transductive Inference with Graphs. MPI Technical Report, 2004.
|
| |
30
|
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Learning with local and global consistency. 18th Annual Conf. on Neural Information Processing Systems, pp. 237--244, 2003.
|
| |
31
|
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., and Schölkopf, B. Ranking on data manifolds. 18th Annual Conf. on Neural Information Processing System, pp. 169--176, 2003.
|
CITED BY 5
|
|
|
Xiaoguang Rui , Mingjing Li , Zhiwei Li , Wei-Ying Ma , Nenghai Yu, Bipartite graph reinforcement model for web image annotation, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|
Jing Liu , Mingjing Li , Wei-Ying Ma , Qingshan Liu , Hanqing Lu, An adaptive graph model for automatic image annotation, Proceedings of the 8th ACM international workshop on Multimedia information retrieval, October 26-27, 2006, Santa Barbara, California, USA
|
|
Meng Wang , Xian-Sheng Hua , Xun Yuan , Yan Song , Li-Rong Dai, Optimizing multi-graph learning: towards a unified video annotation scheme, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|
Jinhui Tang , Xian-Sheng Hua , Guo-Jun Qi , Meng Wang , Tao Mei , Xiuqing Wu, Structure-sensitive manifold ranking for video concept detection, Proceedings of the 15th international conference on Multimedia, September 25-29, 2007, Augsburg, Germany
|
|