short-paper

Interactive Multimodal Learning on 100 Million Images

Authors:

Stevan Rudinac,

Björn Þór Jónsson,

Dennis C. Koelma,

Marcel WorringAuthors Info & Claims

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Pages 333 - 337

https://doi.org/10.1145/2911996.2912062

Published: 06 June 2016 Publication History

Abstract

This paper presents Blackthorn, an efficient interactive multimodal learning approach facilitating analysis of multimedia collections of 100 million items on a single high-end workstation. This is achieved by efficient data compression and optimizations to the interactive learning process. The compressed i-I64 data representation costs tens of bytes per item yet preserves most of the visual and textual semantic information. The optimized interactive learning model scores the i-I64-compressed data directly, greatly reducing the computational requirements. The experiments show that Blackthorn is up to 105x faster than the conventional relevance feedback baseline. Blackthorn is shown to vastly outperform the baseline with respect to recall over time. Blackthorn reaches up to 92% of the precision achieved by the baseline, validating the efficacy of the i-I64 representation. On the YFCC100M dataset, Blackthorn performes one complete interaction round in 0.7 seconds. Blackthorn thus opens multimedia collections comprising 100 million items to learning-based analysis in fully interactive time.

References

[1]

A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008.

Digital Library

[2]

K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is 'nearest neighbor' meaningful? In ICDT, 1999.

Digital Library

[3]

D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.

Digital Library

[4]

S. Bondugula, V. Manjunatha, L. S. Davis, and D. Doermann. Shoe: Sibling hashing with output embeddings. In ACM MM, pages 823--826, 2015.

Digital Library

[5]

J. Choi, C. Hauff, O. V. Laere, and B. Thomee. The Placing task at MediaEval 2015. In MediaEval, 2015.

[6]

O. de Rooij and M. Worring. Active bucket categorization for high recall video retrieval. IEEE TMM, 15(4):898--907, June 2013.

Digital Library

[7]

C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001.

Digital Library

[8]

A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik. Asymmetric distances for binary embeddings. IEEE TPAMI, 36(1):33--47, 2014.

Digital Library

[9]

T. Huang, C. Dagli, S. Rajaram, E. Chang, M. Mandel, G. E. Poliner, and D. Ellis. Active learning for interactive multimedia retrieval. Proc. IEEE, 96(4):648--667, 2008.

[10]

H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE TPAMI, 33(1), 2011.

Digital Library

[11]

H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE TPAMI, 34(9):1704--1716, 2012.

Digital Library

[12]

L. Jiang, S.-I. Yu, D. Meng, Y. Yang, T. Mitamura, and A. G. Hauptmann. Fast and accurate content-based semantic search in 100M internet videos. In ACM MM, pages 49--58, 2015.

Digital Library

[13]

Y. S. Kalantidis and Y. Avrithis. Locally optimized product quantization for approximate nearest neighbor search. In IEEE CVPR, 2014.

Digital Library

[14]

H. Lejsek, B. T. Jónsson, and L. Amsaleg. NV-Tree: nearest neighbors at the billion scale. In ICMR, 2011.

Digital Library

[15]

P. Li, M. Wang, J. Cheng, C. Xu, and H. Lu. Spectral hashing with semantically consistent graph for image indexing. IEEE TMM, 15(1):141--152, 2013.

Digital Library

[16]

M. Norouzi, A. Punjani, and D. Fleet. Fast search in hamming space with multi-index hashing. In CVPR, pages 3108--3115, 2012.

Digital Library

[17]

C. North. Towards measuring visualization insight. IEEE TCGA, 26(3):6--9, 2006.

Digital Library

[18]

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, 115(3):211--252, 2015.

Digital Library

[19]

K. Schoeffmann. A user-centric media retrieval competition: The video browser showdown 2012--2014. IEEE MM, 21(4):8--13, 2014.

[20]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR, 2015.

[21]

B. Thomee, B. Elizalde, D. A. Shamma, K. Ni, G. Friedland, D. Poland, D. Borth, and L.-J. Li. YFCC100M: The new data in multimedia research. Commun. ACM, 59(2):64--73, 2016.

Digital Library

[22]

A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008.

[23]

R. Rehrurek and P. Sojka. Software framework for topic modelling with large corpora. In LREC, pages 45--50, 2010.

[24]

J. Wang, H. T. Shen, S. Yan, N. Yu, S. Li, and J. Wang. Optimized distances for binary code ranking. In ACM MM, pages 517--526, 2014.

Digital Library

[25]

J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6):1031--1044, 2010.

[26]

E. S. Xioufis, S. Papadopoulos, Y. Kompatsiaris, G. Tsoumakas, and I. P. Vlahavas. A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE TMM, 16(6), 2014.

[27]

J. Zahálka. Blackthorn. http://staff.fnwi.uva.nl/j.zahalka/blackthorn.html, 2016.

[28]

J. Zahálka and M. Worring. Towards interactive, intelligent, and integrated multimedia analytics. In IEEE VAST, pages 3--12, 2014.

[29]

L. Zhang, Y. Zhang, J. Tang, X. Gu, J. Li, and Q. Tian. Topology preserving hashing for similarity search. In ACM MM, pages 123--132, 2013.

Digital Library

Cited By

Khan OJónsson BZahálka JRudinac SWorring MCheng WKankanhalli MWang MChu WLiu JWorring M(2021)Impact of Interaction Strategies on User Relevance FeedbackProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463663(590-598)Online publication date: 24-Aug-2021
https://dl.acm.org/doi/10.1145/3460426.3463663
Jónsson BGuðmundsson GAmsaleg LBonnet PVrochidis SHuet BChang EKompatsiaris I(2019)Data Storage and Management for Big MultimediaBig Data Analytics for Large‐Scale Multimedia Search10.1002/9781119376996.ch8(209-238)Online publication date: 15-Mar-2019
https://doi.org/10.1002/9781119376996.ch8
Zahalka JRudinac SJonsson BKoelma DWorring M(2018)BlackthornIEEE Transactions on Multimedia10.1109/TMM.2017.275598620:3(687-698)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1109/TMM.2017.2755986
Show More Cited By

Index Terms

Interactive Multimodal Learning on 100 Million Images
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Interactive Learning for Multimedia at Large
Advances in Information Retrieval
Abstract
Interactive learning has been suggested as a key method for addressing analytic multimedia tasks arising in several domains. Until recently, however, methods to maintain interactive performance at the scale of today’s media collections have not ...
Exquisitor: Breaking the Interaction Barrier for Exploration of 100 Million Images
MM '19: Proceedings of the 27th ACM International Conference on Multimedia

In this demonstration, we present Exquisitor, a media explorer capable of learning user preferences in real-time during interactions with the 99.2 million images of YFCC100M. Exquisitor owes its efficiency to innovations in data representation, ...
YFCC100M HybridNet fc6 Deep Features for Content-Based Image Retrieval
MMCommons '16: Proceedings of the 2016 ACM Workshop on Multimedia COMMONS

This paper presents a corpus of deep features extracted from the YFCC100M images considering the fc6 hidden layer activation of the HybridNet deep convolutional neural network. For a set of random selected queries we made available k-NN results obtained ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

June 2016

452 pages

ISBN:9781450343596

DOI:10.1145/2911996

General Chairs:
John R. Kender
Columbia University, USA
,
John R. Smith
IBM Research, USA
,
Program Chairs:
Jiebo Luo
University of Rochester, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Winston Hsu
National Taiwan University, Taiwan

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Stichting voor de Technische Wetenschappen

Conference

ICMR'16

Sponsor:

SIGMM

ICMR'16: International Conference on Multimedia Retrieval

June 6 - 9, 2016

New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;

Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
196
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khan OJónsson BZahálka JRudinac SWorring MCheng WKankanhalli MWang MChu WLiu JWorring M(2021)Impact of Interaction Strategies on User Relevance FeedbackProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463663(590-598)Online publication date: 24-Aug-2021
https://dl.acm.org/doi/10.1145/3460426.3463663
Jónsson BGuðmundsson GAmsaleg LBonnet PVrochidis SHuet BChang EKompatsiaris I(2019)Data Storage and Management for Big MultimediaBig Data Analytics for Large‐Scale Multimedia Search10.1002/9781119376996.ch8(209-238)Online publication date: 15-Mar-2019
https://doi.org/10.1002/9781119376996.ch8
Zahalka JRudinac SJonsson BKoelma DWorring M(2018)BlackthornIEEE Transactions on Multimedia10.1109/TMM.2017.275598620:3(687-698)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1109/TMM.2017.2755986
Junker CAkbar ZCuquet M(2017)The Network Structure of Visited Locations According to Geotagged Social Media PhotosCollaboration in a Data-Rich World10.1007/978-3-319-65151-4_26(276-283)Online publication date: 22-Aug-2017
https://doi.org/10.1007/978-3-319-65151-4_26

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten