research-article

Probabilistic models for topic learning from images and captions in online biomedical literatures

Authors:

Palakorn AchananuparpAuthors Info & Claims

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Pages 495 - 504

https://doi.org/10.1145/1645953.1646017

Published: 02 November 2009 Publication History

Abstract

Biomedical images and captions are one of the major sources of information in online biomedical publications. They often contain the most important results to be reported, and provide rich information about the main themes in published papers. In the data mining and information retrieval community, there has been much effort on using text mining and language modeling algorithms to extract knowledge from the text content of online biomedical publications; however, the problem of knowledge extraction from biomedical images and captions has not been fully studied yet. In this paper, a hierarchical probabilistic topic model with background distribution (HPB) is introduced to uncover the latent semantic topics from the co-occurrence patterns of caption words, visual words and biomedical concepts. With downloaded biomedical figures, restricted captions are extracted with regard to each individual image panel. During the indexing stage, the 'bag-of-words' representation of captions is supplemented by an ontology-based concept indexing to alleviate the synonym and polysemy problems. As the visual counterpart of text words, the visual words are extracted and indexed from corresponding image panels. The model is estimated via collapsed Gibbs sampling algorithm. We compare the performance of our model with the extension of the Correspondence LDA (Corr-LDA) model under the same biomedical image annotation scenario using cross-validation. Experimental results demonstrate that our model is able to accurately extract latent patterns from complicated biomedical image-caption pairs and facilitate knowledge organization and understanding in online biomedical literatures.

References

[1]

T. Hofmann. Probabilistic Latent Semantic Indexing. Proceedings of the Twenty-Second Annual International SIGIR Conference, 1999.

Digital Library

[2]

A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, Content-based image rerieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 12, pp. 1349--1380, 2000.

Digital Library

[3]

T. L. Griffiths, M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101:5228--5235, 2004.

[4]

J. Yang, Y. G. Jiang, A. G. Hauptmann, C. W. Ngo, Evaluating Bag-of-Visual-Words Representations in Scene Classification. ACM SIGMM Int'l Workshop on Multimedia Information Retrieval (MIR'07), Augsburg, Germany, Sep. 2007.

Digital Library

[5]

W. W. Cohen, R. Wang, and R. F. Murphy. Understanding captions in biological publications. ACM KDD, 2005.

Digital Library

[6]

Sivic, J., Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos. International Conference on Computer Vision. (2003) 1470--1477

Digital Library

[7]

D. Blei and M. Jordan, Modeling Annotated Data, Proc. ACM SIGIR Conf. Research and Development in Information Retrieval, 2003.

Digital Library

[8]

J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision, vol. 73, no. 2, June 2007, pp. 213--238

Digital Library

[9]

T. Kadir and M. Brady. Scale, Saliency and Image Description. International Journal of Computer Vision. 45 (2):83--105, November 2001

Digital Library

[10]

O. Yakhnenko, V. Honavar, Annotating images and image objects using a hierarchical Dirichlet process model, proceedings of the 9th International Workshop on Multimedia Data Mining, pp. 1--7, 2008.

Digital Library

[11]

K. Mikolajczyk and C. Schmid. A Performance Evaluation of Local descriptors. In IEEE Conference on Computer Vision and Pattern Recognition, vol.2 pp. 257--264, 2003

[12]

Zhou, X., Zhang, X., and Hu, X., "MaxMatcher: Biological Concept Extraction Using Approximate Dictionary Lookup," the 9th biennial The Pacific Rim International Conference on Artificial Intelligence (PRICAI 2006), Aug 9--11, 2006, Guilin, Guangxi, China, Page 1145--1149

Digital Library

[13]

Lowe, D. Distinctive Image Features from Scale-Invariant Key Points. International Journal of Computer Vision, 60(2): 91---110, 2004.

Digital Library

[14]

Van Rijsbergen, C.J., Information Retrieval, Butterworths, 1975.

Digital Library

[15]

Humphreys B. and Lindberg D. -- The UMLS project: making the conceptual connection between users and the information they need -- Bulletin of the Medical Library Association 81(2): 170, 1993.

[16]

Yu-Gang Jiang, Chong-Wah Ngo, Jun Yang: Towards optimal bag-of-features for object categorization and semantic video retrieval. CIVR 2007: 494--501

Digital Library

[17]

L. Fei-Fei and P. Perona, A Bayesian hierarchical model for learning natural scene categories. In CVPR, volume 2, pages. 524--531, 2005

Digital Library

[18]

X. Wang and E. Grimson, Spatial Latent Dirichlet Allocation, in Proceedings of Neural Information Processing Systems Conference (NIPS) 2007

[19]

D. Blei, A. Ng. and M. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993--1022,2003.

Digital Library

[20]

Newman, D., Chemudugunta, C., Smyth, P., Steyvers, M.: Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, Pennsylvania, USA, pp. 680--686 (2006)

Digital Library

Cited By

Xu BLin HLin YGuan Y(2020)Integrating social annotations into topic models for personalized document retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-019-03998-124:3(1707-1716)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1007/s00500-019-03998-1
Weiqing Min Bing-Kun Bao Changsheng Xu Hossain M(2015)Cross-Platform Multi-Modal Topic Modeling for Personalized Inter-Platform RecommendationIEEE Transactions on Multimedia10.1109/TMM.2015.246322617:10(1787-1801)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1109/TMM.2015.2463226
Yang TLee DHe QIyengar ANejdl WPei JRastogi R(2013)On handling textual errors in latent document modelingProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505555(2089-2098)Online publication date: 27-Oct-2013
https://dl.acm.org/doi/10.1145/2505515.2505555
Show More Cited By

Index Terms

Probabilistic models for topic learning from images and captions in online biomedical literatures

Recommendations

A probabilistic topic-connection model for automatic image annotation
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

The explosive increase of image data on Internet has made it an important, yet very challenging task to index and automatically annotate image data. To achieve that end, sophisticated algorithms and models have been proposed to study the correlation ...
Anaphora resolution in biomedical literature: a hybrid approach
BCB '12: Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine

While traditional work on anaphora resolution has focused on resolving anaphors in newspaper and newswire articles, the surge of interest in biomedical natural language processing in recent years has stimulated work on anaphora resolution in biomedical ...
Link-topic model for biomedical abbreviation disambiguation

Display Omitted We suggest the link topic model for disambiguating biomedical abbreviations.The model chooses the most probably sense that generates the entire document with its underlying topic.Two distinct modes for word generation are designed to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

November 2009

2162 pages

ISBN:9781605585123

DOI:10.1145/1645953

General Chairs:
David Cheung
University of Hong Kong, Hong Kong
,
Il-Yeol Song
Drexel University, USA
,
Program Chairs:
Wesley Chu
UCLA, USA
,
Xiaohua Hu
Drexel University, USA
,
Jimmy Lin
University of Maryland, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM '09

Sponsor:

CIKM '09: Conference on Information and Knowledge Management

November 2 - 6, 2009

Hong Kong, China

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
379
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu BLin HLin YGuan Y(2020)Integrating social annotations into topic models for personalized document retrievalSoft Computing - A Fusion of Foundations, Methodologies and Applications10.1007/s00500-019-03998-124:3(1707-1716)Online publication date: 1-Feb-2020
https://dl.acm.org/doi/10.1007/s00500-019-03998-1
Weiqing Min Bing-Kun Bao Changsheng Xu Hossain M(2015)Cross-Platform Multi-Modal Topic Modeling for Personalized Inter-Platform RecommendationIEEE Transactions on Multimedia10.1109/TMM.2015.246322617:10(1787-1801)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1109/TMM.2015.2463226
Yang TLee DHe QIyengar ANejdl WPei JRastogi R(2013)On handling textual errors in latent document modelingProceedings of the 22nd ACM international conference on Information & Knowledge Management10.1145/2505515.2505555(2089-2098)Online publication date: 27-Oct-2013
https://dl.acm.org/doi/10.1145/2505515.2505555
Vijay-Shanker KTudor CArighi CHuang HWu CLopez LYu J(2012)Robust segmentation of biomedical figures for image-based document retrievalProceedings of the 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)10.1109/BIBM.2012.6392706(1-6)Online publication date: 4-Oct-2012
https://dl.acm.org/doi/10.1109/BIBM.2012.6392706
Yang TLee D(2011)Towards noise-resilient document modelingProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063962(2345-2348)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2063962
Chen XHu XAn YXiong ZHe TPark E(2011)Perspective hierarchical dirichlet process for user-tagged image modelingProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063770(1341-1346)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2063770
Chen XHu XZhou ZLu CRosen GHe TPark EHuang JKoudas NJones GWu XCollins-Thompson KAn A(2010)A probabilistic topic-connection model for automatic image annotationProceedings of the 19th ACM international conference on Information and knowledge management10.1145/1871437.1871552(899-908)Online publication date: 26-Oct-2010
https://dl.acm.org/doi/10.1145/1871437.1871552
Lu CHu XChen XPark JHe TLi ZRao BKrishnapuram BTomkins AYang Q(2010)The topic-perspective model for social tagging systemsProceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/1835804.1835891(683-692)Online publication date: 25-Jul-2010
https://dl.acm.org/doi/10.1145/1835804.1835891

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten