skip to main content
10.1145/1458082.1458254acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Academic conference homepage understanding using constrained hierarchical conditional random fields

Authors Info & Claims
Published:26 October 2008Publication History

ABSTRACT

We address the problem of academic conference homepage understanding for the Semantic Web. This problem consists of three labeling tasks - labeling conference function pages, function blocks, and attributes. Different from traditional information extraction tasks, the data in academic conference homepages has complex structural dependencies across multiple Web pages. In addition, there are logical constraints in the data. In this paper, we propose a unified approach, Constrained Hierarchical Conditional Random Fields, to accomplish the three labeling tasks simultaneously. In this approach, complex structural dependencies can be well described. Also, the constrained Viterbi algorithm in the inference process can avoid logical errors. Experimental results on real world conference data have demonstrated that this approach performs better than cascaded labeling methods by 3.6% in F1-measure and that the constrained inference process can improve the accuracy by 14.3%. Based on the proposed approach, we develop a prototype system of use-oriented semantic academic conference calendar. The user simply needs to specify what conferences he/she is interested in. Subsequently, the system finds, extracts, and updates the semantic information from the Web, and then builds a calendar automatically for the user. The semantic conference data can be used in other applications, such as finding sponsors and finding experts. The proposed approach can be used in other information extraction tasks as well.

References

  1. Auer, S., Dietzold, S., and Riechert, T. OntoWiki - A Tool for Social, Semantic Collaboration. In Proc. of ISWC, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Cai, D., Yu, S., Wen, J., and Ma, W. Block-based Web Search. In Proc. of SIGIR, 2004, 456--463. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Ciravegna, F. (LP)2 An Adaptive Algorithm for Information Extraction from Web-related Texts. In Proc. of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, USA, 2001.Google ScholarGoogle Scholar
  4. Ciravegna, F., Dingli, A., Iria, J., and Wilks, Y. Multi-strategy Definition of Annotation Services in Melita, In Proc. of ISWC'2003 Workshop on Human Language Technology for the Semantic Web and Web Services, 2003, 97--107.Google ScholarGoogle Scholar
  5. Collins, M. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms. In Proc. of EMNLP, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cortes, C. and Vapnik, V. Support Vector Networks. Machine Learning, 1995, 20: 273--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cowell, R., Dawid, A., Lauritzen, S., and Spiegelhalter, D. Probabilistic Networks and Expert Systems. Springer-Verlag, New York, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cox, C., Nicolson, J., Finkel, J., and Manning, C. Template Sampling for Leveraging Domain Knowledge in Information Extraction. In PASCAL Challenges, 2005.Google ScholarGoogle Scholar
  9. Gandon, F., and Sadeh, N. A Semantic eWallet to Reconcile Privacy and Context Awareness. In Proc. of ISWC, 2003.Google ScholarGoogle Scholar
  10. Ghahramani, Z. and Jordan, M.I. Factorial Hidden Markov Models. Machine Learning, 1997, 29: 245--273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hammersley, J. and Clifford, P. Markov fields on Finite Graphs and Lattices. 1971.Google ScholarGoogle Scholar
  12. He, X., Zemel, R., and Carreira-Perpiñán, M. Multiscale Conditional Random Fields for Image Labeling. In Proc of CVPR, 2004, 695--702. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ireson, N., Ciravegna, F., Califf, M. E., Freitag, D., Kushmerick, N., and Lavelli, A. Evaluating Machine Learning for Information Extraction. In Proc. of the 22nd International Conference on Machine Learning, 2005, 345--352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Kristjansson, T., Culotta, A., Viola, P., and McCallum, A. Interactive Information Extraction with Constrained Condition Random Fields. In Proc of AAAI, 2004, 412--418. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lafferty, J., McCallum, A., and Pereira, F. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proc. of ICML, 2001, 282--289. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Lazarinis, F. Combining Information Retrieval with Information Extraction for Efficient Retrieval of Calls for Papers. In Proc. of IRSG, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Li, Y., Bontcheva, K., and Cunningham, H. Using Uneven Margins SVM and Perceptron for Information Extraction. In Proc. of CoNLL, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Liu, D. and Nocedal, J. On the Limited Memory BFGS Method for Large Scale Optimization. Mathematical Programming, 1989, 503--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. McCallum, A., Freitag, D., and Pereira, F. Maximum Entropy Markov Models for Information Extraction and Segmentation. In Proc. of ICML, 2000, 591--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Payne, T., Singh, R., and Sycara, K. Browsing Schedules - an Agent-Based Approach to Navigating the Semantic Web. In Proc. of ISWC, 2002, 469--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rijsbergen, C. Information Retrieval. 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Roth, D. and Yih, W. Integer Linear Programming Inference for Conditional Random Fields. In Proc. of ICML, 2005, 736--743. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sarawagi, S. and Cohen, W. Semi-markov Conditional Random Fields for Information Extraction. In Proc. of NIPS, 2004.Google ScholarGoogle Scholar
  24. Schneider, K. Information Extraction from Calls for Papers with Conditional Random Fields and Layout Features. In Proc. of AICS, 2005, 267--276.Google ScholarGoogle Scholar
  25. Sha, F. and Pereira, F. Shallow Parsing with Conditional Random Fields. In Proc. of HLT-NAACL, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Tang, J., Hong, M., Li, J., and Liang, B. Tree-structured Conditional Random Fields for Semantic Annotation. In Proc. of ISWC, 2006, 640--653. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Yedidia, J., Freeman, W., and Weiss, Y. Generalized Belief Propagation. In Proc. of NIPS, 2000.Google ScholarGoogle Scholar
  28. Zhu, J., Nie, Z., Wen, J., Zhang, B., and Ma, W. 2D Conditional Random Fields for Web Information Extraction. In Proc. of ICML, 2005, 1044--1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhu, J., Nie, Z., Wen, J., Zhang, B., and Ma, W. Simultaneous Record Detection and Attribute Labeling in Web Data Extraction. In Proc. of KDD, 2006, 494--503 Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader