skip to main content
10.1145/2911996.2912062acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
short-paper

Interactive Multimodal Learning on 100 Million Images

Published: 06 June 2016 Publication History

Abstract

This paper presents Blackthorn, an efficient interactive multimodal learning approach facilitating analysis of multimedia collections of 100 million items on a single high-end workstation. This is achieved by efficient data compression and optimizations to the interactive learning process. The compressed i-I64 data representation costs tens of bytes per item yet preserves most of the visual and textual semantic information. The optimized interactive learning model scores the i-I64-compressed data directly, greatly reducing the computational requirements. The experiments show that Blackthorn is up to 105x faster than the conventional relevance feedback baseline. Blackthorn is shown to vastly outperform the baseline with respect to recall over time. Blackthorn reaches up to 92% of the precision achieved by the baseline, validating the efficacy of the i-I64 representation. On the YFCC100M dataset, Blackthorn performes one complete interaction round in 0.7 seconds. Blackthorn thus opens multimedia collections comprising 100 million items to learning-based analysis in fully interactive time.

References

[1]
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008.
[2]
K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is 'nearest neighbor' meaningful? In ICDT, 1999.
[3]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. JMLR, 3:993--1022, 2003.
[4]
S. Bondugula, V. Manjunatha, L. S. Davis, and D. Doermann. Shoe: Sibling hashing with output embeddings. In ACM MM, pages 823--826, 2015.
[5]
J. Choi, C. Hauff, O. V. Laere, and B. Thomee. The Placing task at MediaEval 2015. In MediaEval, 2015.
[6]
O. de Rooij and M. Worring. Active bucket categorization for high recall video retrieval. IEEE TMM, 15(4):898--907, June 2013.
[7]
C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the web. In WWW, pages 613--622, 2001.
[8]
A. Gordo, F. Perronnin, Y. Gong, and S. Lazebnik. Asymmetric distances for binary embeddings. IEEE TPAMI, 36(1):33--47, 2014.
[9]
T. Huang, C. Dagli, S. Rajaram, E. Chang, M. Mandel, G. E. Poliner, and D. Ellis. Active learning for interactive multimedia retrieval. Proc. IEEE, 96(4):648--667, 2008.
[10]
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. IEEE TPAMI, 33(1), 2011.
[11]
H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE TPAMI, 34(9):1704--1716, 2012.
[12]
L. Jiang, S.-I. Yu, D. Meng, Y. Yang, T. Mitamura, and A. G. Hauptmann. Fast and accurate content-based semantic search in 100M internet videos. In ACM MM, pages 49--58, 2015.
[13]
Y. S. Kalantidis and Y. Avrithis. Locally optimized product quantization for approximate nearest neighbor search. In IEEE CVPR, 2014.
[14]
H. Lejsek, B. T. Jónsson, and L. Amsaleg. NV-Tree: nearest neighbors at the billion scale. In ICMR, 2011.
[15]
P. Li, M. Wang, J. Cheng, C. Xu, and H. Lu. Spectral hashing with semantically consistent graph for image indexing. IEEE TMM, 15(1):141--152, 2013.
[16]
M. Norouzi, A. Punjani, and D. Fleet. Fast search in hamming space with multi-index hashing. In CVPR, pages 3108--3115, 2012.
[17]
C. North. Towards measuring visualization insight. IEEE TCGA, 26(3):6--9, 2006.
[18]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, 115(3):211--252, 2015.
[19]
K. Schoeffmann. A user-centric media retrieval competition: The video browser showdown 2012--2014. IEEE MM, 21(4):8--13, 2014.
[20]
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR, 2015.
[21]
B. Thomee, B. Elizalde, D. A. Shamma, K. Ni, G. Friedland, D. Poland, D. Borth, and L.-J. Li. YFCC100M: The new data in multimedia research. Commun. ACM, 59(2):64--73, 2016.
[22]
A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. http://www.vlfeat.org/, 2008.
[23]
R. Rehrurek and P. Sojka. Software framework for topic modelling with large corpora. In LREC, pages 45--50, 2010.
[24]
J. Wang, H. T. Shen, S. Yan, N. Yu, S. Li, and J. Wang. Optimized distances for binary code ranking. In ACM MM, pages 517--526, 2014.
[25]
J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. S. Huang, and S. Yan. Sparse representation for computer vision and pattern recognition. Proceedings of the IEEE, 98(6):1031--1044, 2010.
[26]
E. S. Xioufis, S. Papadopoulos, Y. Kompatsiaris, G. Tsoumakas, and I. P. Vlahavas. A comprehensive study over VLAD and product quantization in large-scale image retrieval. IEEE TMM, 16(6), 2014.
[27]
J. Zahálka. Blackthorn. http://staff.fnwi.uva.nl/j.zahalka/blackthorn.html, 2016.
[28]
J. Zahálka and M. Worring. Towards interactive, intelligent, and integrated multimedia analytics. In IEEE VAST, pages 3--12, 2014.
[29]
L. Zhang, Y. Zhang, J. Tang, X. Gu, J. Li, and Q. Tian. Topology preserving hashing for similarity search. In ACM MM, pages 123--132, 2013.

Cited By

View all
  • (2021)Impact of Interaction Strategies on User Relevance FeedbackProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463663(590-598)Online publication date: 24-Aug-2021
  • (2019)Data Storage and Management for Big MultimediaBig Data Analytics for Large‐Scale Multimedia Search10.1002/9781119376996.ch8(209-238)Online publication date: 15-Mar-2019
  • (2018)BlackthornIEEE Transactions on Multimedia10.1109/TMM.2017.275598620:3(687-698)Online publication date: 1-Mar-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
June 2016
452 pages
ISBN:9781450343596
DOI:10.1145/2911996
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. YFCC100M
  2. data compression
  3. interactive multimodal learning
  4. multimedia analytics

Qualifiers

  • Short-paper

Funding Sources

Conference

ICMR'16
Sponsor:
ICMR'16: International Conference on Multimedia Retrieval
June 6 - 9, 2016
New York, New York, USA

Acceptance Rates

ICMR '16 Paper Acceptance Rate 20 of 120 submissions, 17%;
Overall Acceptance Rate 254 of 830 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Impact of Interaction Strategies on User Relevance FeedbackProceedings of the 2021 International Conference on Multimedia Retrieval10.1145/3460426.3463663(590-598)Online publication date: 24-Aug-2021
  • (2019)Data Storage and Management for Big MultimediaBig Data Analytics for Large‐Scale Multimedia Search10.1002/9781119376996.ch8(209-238)Online publication date: 15-Mar-2019
  • (2018)BlackthornIEEE Transactions on Multimedia10.1109/TMM.2017.275598620:3(687-698)Online publication date: 1-Mar-2018
  • (2017)The Network Structure of Visited Locations According to Geotagged Social Media PhotosCollaboration in a Data-Rich World10.1007/978-3-319-65151-4_26(276-283)Online publication date: 22-Aug-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media