Article

Generic soft pattern models for definitional question answering

Authors:

Tat-Seng ChuaAuthors Info & Claims

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 384 - 391

https://doi.org/10.1145/1076034.1076101

Published: 15 August 2005 Publication History

Abstract

This paper explores probabilistic lexico-syntactic pattern matching, also known as soft pattern matching. While previous methods in soft pattern matching are ad hoc in computing the degree of match, we propose two formal matching models: one based on bigrams and the other on the Profile Hidden Markov Model (PHMM). Both models provide a theoretically sound method to model pattern matching as a probabilistic process that generates token sequences. We demonstrate the effectiveness of these models on definition sentence retrieval for definitional question answering. We show that both models significantly outperform state-of-the-art manually constructed patterns. A critical difference between the two models is that the PHMM technique handles language variations more effectively but requires more training data to converge. We believe that both models can be extended to other areas where lexico-syntactic pattern matching can be applied.

References

[1]

S. Blair-Goldensohn, K.R. McKeown and A. Hazen Schlaikjer, A Hybrid Approach for QA Track Definitional Questions, Proc. of TREC 2003, 2003, pp. 336--343.

[2]

H. Cui, M.-Y. Kan and T.-S. Chua, Unsupervised Learning of Soft Patterns for Generating Definitions from Online News, Proc. of WWW '04, New York, 2004, pp. 90--99.

Digital Library

[3]

H. Cui, M.-Y. Kan, T.-S. Chua and J. Xiao, A Comparative Study on Sentence Retrieval for Definitional Question Answering, SIGIR Workshop on Information Retrieval for Question Answering (IR4QA), Sheffield, U.K., 2004.

[4]

A.P. Dempster, N.M. Laird and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, 39:1--38, 1977.

[5]

S. Harabagiu, D. Moldovan, C. Clark, M. Bowden, J. Williams and J. Bensley, Answer Mining by Combining Extraction Techniques with Abductive Reasoning, Proc. of TREC 2003, 2003.

[6]

W. Hildebrandt, B. Katz and J. Lin, Answering Definition Questions with Multiple Knowledge Sources, Proc. of HLT/NAACL 2004, Boston, MA, 2004, pp. 49--56.

[7]

F. Jelinek and R. L. Mercer, Interpolated estimation of markov source parameters from sparse data, Proc. of the Workshop Pattern Recognition in Practice, Amsterdam, Holland, 1980, pp. 381--397.

[8]

A. Krogh, M. Brown, I.S. Mian K. Sjolander and D. Haussler, Hidden Markov Models in Computational Biology - Applications to Protein Modeling, J. Mol. Biol. (1994) 235, pp. 1501--1531.

[9]

C.-Y. Lin and E.H. Hovy, Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics, Proc. of HLT-NAACL '03, Edmonton, Canada, 2003, pp. 71--78.

Digital Library

[10]

C.D. Manning and H. Schtze, editors. Foundations of Statistical Natural Language Processing, The MIT Press, Cambridge, MA, 1999.

Digital Library

[11]

I. Muslea, Extraction patterns for information extraction tasks: A survey, Proc. of AAAI-99 Workshop on Machine Learning for Information Extraction, 1999, pp.1--6.

[12]

D. Ravichandran and E. Hovy, Learning Surface Text Patterns for a Question Answering System, Proc. of ACL '02, Philadelphia, July 2002, pp. 41--47.

Digital Library

[13]

E. Riloff, and J. Wiebe, Learning Extraction Patterns for Subjective Expressions, Proc. of EMNLP '03, 2003.

Digital Library

[14]

R. Rosenfeld, Two decades of statistical language modeling: Where do we go from here, Proc. of the IEEE, 88, August, 2000, pp. 1270--1278.

[15]

M. Skounakis, M. Craven, and S. Ray, Hierarchical hidden markov models for information extraction, Proc. of IJCAI '03, 2003.

Digital Library

[16]

E.M.Voorhees, Overview of the TREC 2003 question answering track, Proc. of TREC 2003, 2003.

[17]

E.M. Voorhees, Overview of the TREC 2004 question answering track, Proc. of TREC 2004, 2004.

[18]

J. Xiao, T.-S. Chua and H. Cui, Cascading Use of Soft and Hard Matching Pattern Rules for Weakly Supervised Information Extraction, Proc. of COLING '04, Geneva, Switzerland, 2004, pp.542--548.

Digital Library

[19]

J. Xu, R. M. Weischedel and A. Licuanan, Evaluation of an extraction-based approach to answering definitional questions, Proc. of SIGIR '04, Sheffield, UK, 2004, pp. 418--424.

Digital Library

[20]

H. Yang, H. Cui, M.-Y. Kan, M. Maslennikov, L. Qiu and T.-S. Chua, QUALIFIER in TREC 12 QA Main Task, Proc. of TREC 2003, 2003, pp. 54--63.

Cited By

Kumar CAnirudh CMurthy K(2020)Definitional Question Answering Using Text TripletsData Engineering and Communication Technology10.1007/978-981-15-1097-7_10(119-130)Online publication date: 9-Jan-2020
https://doi.org/10.1007/978-981-15-1097-7_10
Jiang JAllan JLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Similarity-based Distant Supervision for Definition RetrievalProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133032(527-536)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3133032
Severyn AMoschitti ABaeza-Yates RLalmas MMoffat ARibeiro-Neto B(2015)Learning to Rank Short Text Pairs with Convolutional Deep Neural NetworksProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767738(373-382)Online publication date: 9-Aug-2015
https://dl.acm.org/doi/10.1145/2766462.2767738
Show More Cited By

Index Terms

Generic soft pattern models for definitional question answering
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
2. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Soft pattern matching models for definitional question answering

We explore probabilistic lexico-syntactic pattern matching, also known as soft pattern matching, in a definitional question answering system. Most current systems use regular expression-based hard matching patterns to identify definition sentences. Such ...
Automatic Word Spacing Using Probabilistic Models Based on Character n-grams

Automatic word spacing decides the correct boundaries between words in a sentence. Word spacing is important in Korean, and word spacing errors are frequent. Several proposed probabilistic word-spacing models resolve problems with previous statistical ...
Probabilistic model for definitional question answering
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

This paper proposes a probabilistic model for definitional question answering (QA) that reflects the characteristics of the definitional question. The intention of the definitional question is to request the definition about the question target. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval

August 2005

708 pages

ISBN:1595930345

DOI:10.1145/1076034

General Chairs:
Ricardo Baeza-Yates
University of Chile, Chile
,
Nivio Ziviani
Federal University of Minas Gerais, Brazil
,
Program Chairs:
Gary Marchionini
University of North Carolina, USA
,
Alistair Moffat
University of Melbourne, Australia
,
John Tait
University of Sunderland, UK

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 August 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

SIGIR05

Sponsor:

SIGIR

SIGIR05: The 28th ACM/SIGIR International Symposium on Information Retrieval 2005

August 15 - 19, 2005

Salvador, Brazil

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

43
Total Citations
View Citations
824
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kumar CAnirudh CMurthy K(2020)Definitional Question Answering Using Text TripletsData Engineering and Communication Technology10.1007/978-981-15-1097-7_10(119-130)Online publication date: 9-Jan-2020
https://doi.org/10.1007/978-981-15-1097-7_10
Jiang JAllan JLim EWinslett MSanderson MFu ASun JCulpepper SLo EHo JDonato DAgrawal RZheng YCastillo CSun ATseng VLi C(2017)Similarity-based Distant Supervision for Definition RetrievalProceedings of the 2017 ACM on Conference on Information and Knowledge Management10.1145/3132847.3133032(527-536)Online publication date: 6-Nov-2017
https://dl.acm.org/doi/10.1145/3132847.3133032
Severyn AMoschitti ABaeza-Yates RLalmas MMoffat ARibeiro-Neto B(2015)Learning to Rank Short Text Pairs with Convolutional Deep Neural NetworksProceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/2766462.2767738(373-382)Online publication date: 9-Aug-2015
https://dl.acm.org/doi/10.1145/2766462.2767738
Momtazi SKlakow D(2015)Bridging the vocabulary gap between questions and answer sentencesInformation Processing and Management: an International Journal10.1016/j.ipm.2015.04.00551:5(595-615)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1016/j.ipm.2015.04.005
Wu SQiu XHuang XCao J(2015)Learning to Rank Answers for Definitional Question AnsweringChinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data10.1007/978-3-319-25816-4_26(326-332)Online publication date: 8-Nov-2015
https://doi.org/10.1007/978-3-319-25816-4_26
Sondhi PZhai CLi JWang XGarofalakis MSoboroff ISuel TWang M(2014)Mining Semi-Structured Online Knowledge Bases to Answer Natural Language Questions on Community QA WebsitesProceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management10.1145/2661829.2661968(341-350)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.1145/2661829.2661968
Benajiba YRosso PAbouenour LTrigui OBouzoubaa KBelguith L(2014)Question AnsweringNatural Language Processing of Semitic Languages10.1007/978-3-642-45358-8_11(335-370)Online publication date: 25-Mar-2014
https://doi.org/10.1007/978-3-642-45358-8_11
Espinosa-Anke LSaggion H(2014)Applying Dependency Relations to Definition ExtractionNatural Language Processing and Information Systems10.1007/978-3-319-07983-7_10(63-74)Online publication date: 2014
https://doi.org/10.1007/978-3-319-07983-7_10
Severyn AMoschitti AHersh WCallan JMaarek YSanderson M(2012)Structural relationships for large-scale learning of answer re-rankingProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348383(741-750)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2348283.2348383
Chang YDiesner JCarley K(2012)Toward Automated Definition Acquisition From Operations LawIEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews10.1109/TSMCC.2011.211064342:2(223-232)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1109/TSMCC.2011.2110643
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten