research-article

Academic conference homepage understanding using constrained hierarchical conditional random fields

Authors:
Xin Xin

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Juanzi Li

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Jie Tang

Tsinghua University, Beijing, China

Tsinghua University, Beijing, China
View Profile

,
Qiong Luo

Hong Kong University of Science and Technology, Hong Kong, Hong Kong

Hong Kong University of Science and Technology, Hong Kong, Hong Kong
View Profile

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge managementOctober 2008Pages 1301–1310https://doi.org/10.1145/1458082.1458254

Published:26 October 2008Publication History

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

Pages 1301–1310

ABSTRACT

We address the problem of academic conference homepage understanding for the Semantic Web. This problem consists of three labeling tasks - labeling conference function pages, function blocks, and attributes. Different from traditional information extraction tasks, the data in academic conference homepages has complex structural dependencies across multiple Web pages. In addition, there are logical constraints in the data. In this paper, we propose a unified approach, Constrained Hierarchical Conditional Random Fields, to accomplish the three labeling tasks simultaneously. In this approach, complex structural dependencies can be well described. Also, the constrained Viterbi algorithm in the inference process can avoid logical errors. Experimental results on real world conference data have demonstrated that this approach performs better than cascaded labeling methods by 3.6% in F1-measure and that the constrained inference process can improve the accuracy by 14.3%. Based on the proposed approach, we develop a prototype system of use-oriented semantic academic conference calendar. The user simply needs to specify what conferences he/she is interested in. Subsequently, the system finds, extracts, and updates the semantic information from the Web, and then builds a calendar automatically for the user. The semantic conference data can be used in other applications, such as finding sponsors and finding experts. The proposed approach can be used in other information extraction tasks as well.

References

Auer, S., Dietzold, S., and Riechert, T. OntoWiki - A Tool for Social, Semantic Collaboration. In Proc. of ISWC, 2006. Google ScholarDigital Library
Cai, D., Yu, S., Wen, J., and Ma, W. Block-based Web Search. In Proc. of SIGIR, 2004, 456--463. Google ScholarDigital Library
Ciravegna, F. (LP)² An Adaptive Algorithm for Information Extraction from Web-related Texts. In Proc. of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, USA, 2001.Google Scholar
Ciravegna, F., Dingli, A., Iria, J., and Wilks, Y. Multi-strategy Definition of Annotation Services in Melita, In Proc. of ISWC'2003 Workshop on Human Language Technology for the Semantic Web and Web Services, 2003, 97--107.Google Scholar
Collins, M. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proc. of EMNLP, 2002. Google ScholarDigital Library
Cortes, C. and Vapnik, V. Support Vector Networks. Machine Learning, 1995, 20: 273--297. Google ScholarDigital Library
Cowell, R., Dawid, A., Lauritzen, S., and Spiegelhalter, D. Probabilistic Networks and Expert Systems. Springer-Verlag, New York, 1999. Google ScholarDigital Library
Cox, C., Nicolson, J., Finkel, J., and Manning, C. Template Sampling for Leveraging Domain Knowledge in Information Extraction. In PASCAL Challenges, 2005.Google Scholar
Gandon, F., and Sadeh, N. A Semantic eWallet to Reconcile Privacy and Context Awareness. In Proc. of ISWC, 2003.Google Scholar
Ghahramani, Z. and Jordan, M.I. Factorial Hidden Markov Models. Machine Learning, 1997, 29: 245--273. Google ScholarDigital Library
Hammersley, J. and Clifford, P. Markov fields on Finite Graphs and Lattices. 1971.Google Scholar
He, X., Zemel, R., and Carreira-Perpiñán, M. Multiscale Conditional Random Fields for Image Labeling. In Proc of CVPR, 2004, 695--702. Google ScholarDigital Library
Ireson, N., Ciravegna, F., Califf, M. E., Freitag, D., Kushmerick, N., and Lavelli, A. Evaluating Machine Learning for Information Extraction. In Proc. of the 22nd International Conference on Machine Learning, 2005, 345--352. Google ScholarDigital Library
Kristjansson, T., Culotta, A., Viola, P., and McCallum, A. Interactive Information Extraction with Constrained Condition Random Fields. In Proc of AAAI, 2004, 412--418. Google ScholarDigital Library
Lafferty, J., McCallum, A., and Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of ICML, 2001, 282--289. Google ScholarDigital Library
Lazarinis, F. Combining Information Retrieval with Information Extraction for Efficient Retrieval of Calls for Papers. In Proc. of IRSG, 1998. Google ScholarDigital Library
Li, Y., Bontcheva, K., and Cunningham, H. Using Uneven Margins SVM and Perceptron for Information Extraction. In Proc. of CoNLL, 2005. Google ScholarDigital Library
Liu, D. and Nocedal, J. On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming, 1989, 503--528. Google ScholarDigital Library
McCallum, A., Freitag, D., and Pereira, F. Maximum Entropy Markov Models for Information Extraction and Segmentation. In Proc. of ICML, 2000, 591--598. Google ScholarDigital Library
Payne, T., Singh, R., and Sycara, K. Browsing Schedules - an Agent-Based Approach to Navigating the Semantic Web. In Proc. of ISWC, 2002, 469--474. Google ScholarDigital Library
Rijsbergen, C. Information Retrieval. 1979. Google ScholarDigital Library
Roth, D. and Yih, W. Integer Linear Programming Inference for Conditional Random Fields. In Proc. of ICML, 2005, 736--743. Google ScholarDigital Library
Sarawagi, S. and Cohen, W. Semi-markov Conditional Random Fields for Information Extraction. In Proc. of NIPS, 2004.Google Scholar
Schneider, K. Information Extraction from Calls for Papers with Conditional Random Fields and Layout Features. In Proc. of AICS, 2005, 267--276.Google Scholar
Sha, F. and Pereira, F. Shallow Parsing with Conditional Random Fields. In Proc. of HLT-NAACL, 2003. Google ScholarDigital Library
Tang, J., Hong, M., Li, J., and Liang, B. Tree-structured Conditional Random Fields for Semantic Annotation. In Proc. of ISWC, 2006, 640--653. Google ScholarDigital Library
Yedidia, J., Freeman, W., and Weiss, Y. Generalized Belief Propagation. In Proc. of NIPS, 2000.Google Scholar
Zhu, J., Nie, Z., Wen, J., Zhang, B., and Ma, W. 2D Conditional Random Fields for Web Information Extraction. In Proc. of ICML, 2005, 1044--1051. Google ScholarDigital Library
Zhu, J., Nie, Z., Wen, J., Zhang, B., and Ma, W. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. In Proc. of KDD, 2006, 494--503 Google ScholarDigital Library

Recommendations

Hierarchical hidden conditional random fields for information extraction
LION'05: Proceedings of the 5th international conference on Learning and Intelligent Optimization

Hidden Markov Models (HMMs) are very popular generative models for time series data. Recent work, however, has shown that for many tasks Conditional Random Fields (CRFs), a type of discriminative model, perform better than HMMs. Information extraction ...
Read More
Table extraction using conditional random fields
SIGIR '03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional ...
Read More
Exploitation de connaissances domaine pour l'interprétation d'images image understanding using domain knowledge
RIAO '04: Coupling approaches, coupling media and coupling languages for information retrieval

Pour être réellement efficaces, les outils d'indexation d'images doivent être automatiques en terme d'analyse et de description du contenu. Le fossé sémantique existant entre les concepts haut-niveau et des descripteurs bas-niveau (couleur, texture), ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
General Chair:
James G. Shanahan
Church and Duncan Group Inc, USA
,
Program Chairs:
Sihem Amer-Yahia
Yahoo! Research, USA
,
Ioana Manolescu
INRIA, France
,
Yi Zhang
University of California, Santa Cruz, USA
,
David A. Evans
JustSystems Evans Research, USA
,
Alek Kolcz
Microsoft Live Labs, USA
,
Key-Sun Choi
KAIST, Korea
,
Abdur Chowdury
Twitter, USA
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
constrained hierarchical conditional random fields
information extraction
semantic conference information
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 6
  Total Citations
  View Citations
- 383
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Academic conference homepage understanding using constrained hierarchical conditional random fields

CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management

ABSTRACT

References

Cited By

Recommendations

Hierarchical hidden conditional random fields for information extraction

Table extraction using conditional random fields

Exploitation de connaissances domaine pour l'interprétation d'images image understanding using domain knowledge