article

Free Access

PIC matrices: a computationally tractable class of probabilistic query operators

Authors:
Warren R. Greiff

Univ. of Massachusetts, Amherst

Univ. of Massachusetts, Amherst
View Profile

,
W. Bruce Croft

Univ. of Massachusetts, Amherst

Univ. of Massachusetts, Amherst
View Profile

,
Howard Turtle

West Publishing Co., Eagan, MN

West Publishing Co., Eagan, MN
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 17 Issue 4pp 367–405https://doi.org/10.1145/326440.326444

Published:01 October 1999Publication History

ACM Transactions on Information Systems

Abstract

The inference network model of information retrieval allows a probabilistic interpretation of query operators. In particular, Boolean query operators are conveniently modeled as link matrices of the Bayesian Network. Prior work has shown, however, that these operators do not perform as well as the pnorm operators used for modeling query operators in the context of the vector space model. This motivates the search for alternative probabilistic formulations for these operators. The design of such alternatives must contend with the issue of computational tractability, since the evaluation of an arbitrary operator requires exponential time. We define a flexible class of link matrices that are natural candidates for the implementation of query operators and an O(n²) algorithm (n = the number of parent nodes) for the computation of probabilities involving link matrices of this class. We present experimental results indicating that Boolean operators implemented in terms of link matrices from this class perform as well as pnorm operators in the context of the INQUERY inference network.

References

BELKIN,N.J.,COOL, C., CROFT,W.B.,AND CALLAN, J. P. 1993. The effect multiple query representations on information retrieval system performance. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Re-trieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 339-346. Google Scholar
BELKIN,N.J.,KANTOR, P., FOX,E.A.,AND SHAW, J. A. 1995. Combining the evidence of multiple query representations for information retrieval. Inf. Process. Manage. 31,3 (May-June), 431-448. Google Scholar
BOOKSTEIN, A. 1980. Fuzzy requests: An approach to weighted Boolean searches. J. Am. Soc. Inf. Sci. 31, 4 (July), 240-247.Google Scholar
BOOKSTEIN, A. 1981. A comparison of two systems of weighted Boolean queries. J. Am. Soc. Inf. Sci. 32, 4 (July), 275-279.Google Scholar
BRIER, G. W. 1950. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1 (Jan.), 1-3.Google Scholar
BUELL,D.A.AND KRAFT, D. H. 1981. Threshold values and Boolean retrieval system. Inf. Process. Manage. 17, 3, 127-136.Google Scholar
BALLAN,J.P.,CROFT,W.B.,AND HARDING, S. M. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications 78-83.Google Scholar
CHARNIAK, E. 1991. Bayesian networks without tears. AI Mag. 12, 4 (Apr.), 50-63. Google Scholar
COOPER, W. S. 1994. The formalism of probability theory in IR: A foundation or an encumbrance?. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 242-247. Google Scholar
DAWID, A. P. 1989. Probability forecasting. In Encyclopedia of Statistical Sciences John Wiley & Sons, Inc., New York, NY, 210-218.Google Scholar
FOX, E., BETRABET, S., KOUSHIK, M., AND LEE, W. 1992. Extended Boolean models. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 393-418. Google Scholar
FOX,E.A.AND SHAW, J. A. 1994. Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), D. K. Harman, Ed. 243-248.Google Scholar
GREIFF, W. R. 1996. Computationally tractable, conceptually plausible classes of link matrices for the INQUERY inference network. Tech. Rep. TR-96-66. Department of Computer Science, University of Massachusetts, Amherst, MA.Google Scholar
JANSEN,B.J.,SPINK, A., BATEMAN, J., AND SARACEVIC, T. 1998. Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32, 1, 5-17. Google Scholar
KIM,J.H.AND PEARL, J. 1983. A computational model for combined causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence (Karlsruhe, Germany) 190-193.Google Scholar
LEE, J. H. 1995. Analyzing the effectivness of extended Boolean models in information retrieval. Tech. Rep. TR95-1501. Cornell University, Ithaca, NY. Google Scholar
NOREAULT, T., KOLL, M., AND MCGILL, M. J. 1977. Automatic ranked output from Boolean searches in SIRE. J. Am. Soc. Inf. Sci. 28, 6, 333-339.Google Scholar
ORTEGA, J. M. 1972. Numerical Analysis: A Second Course. Academic Press, Inc., New York, NY.Google Scholar
PEARL, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann series in representation and reasoning. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google Scholar
ROBERTSON,S.E.AND WALKER, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 232-241. Google Scholar
ROBERTSON,S.E.,WALKER, S., AND HANCOCK-BEAULIEU, M. M. 1995. Large test collection experiments on an operational, interactive system: Okapi at TREC. Inf. Process. Manage. 31, 3 (May-June), 345-360. Google Scholar
SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA. Google Scholar
SALTON, G., BUCKLEY, C., AND FOX, E. A. 1983a. Automatic query formulations in information retrieval. J. Am. Soc. Inf. Sci. 34, 4 (July), 262-280.Google Scholar
SALTON, G., FOX,E.A.,AND WU, H. 1983b. Extended Boolean information retrieval. Commun. ACM 26, 11 (Nov.), 1022-1036. Google Scholar
SHAW,J.A.AND FOX, E. A. 1995. Combination of multiple searches. In Proceedings of the 3rd Text Retrieval Conference (TREC-3) 105-108.Google Scholar
TURTLE, H. 1994. Natural language vs. Boolean query evaluation: a comparison of retrieval performance. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 212-220. Google Scholar
TURTLE,H.AND CROFT, W. B. 1990. Inference networks for document retrieval. In Proceedings of the 13th International Conference on Research and Development in Informa-tion Retrieval (SIGIR '90, Brussels, Belgium, Sept. 5-7), J.-L. Vidick, Ed. ACM Press, New York, NY, 1-24. Google Scholar
TURTLE, H. R. 1991. Inference networks for document retrieval. Ph.D. Dissertation. Department of Computer Science, University of Massachusetts, Amherst, MA. Google Scholar
WALLER,W.G.AND KRAFT, D. H. 1979. A mathematical model for a weighted Boolean retrieval system. Inf. Process. Manage. 15, 5, 235-245.Google Scholar

Index Terms

PIC matrices: a computationally tractable class of probabilistic query operators
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Supporting top-k join queries in relational databases

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging ...
Read More
A simple graphical approach for understanding probabilistic inference in Bayesian networks

We present a simple graphical method for understanding exact probabilistic inference in discrete Bayesian networks (BNs). A conditional probability table (conditional) is depicted as a directed acyclic graph involving one or more black vertices and zero ...
Read More
Structural learning of mixed noisy-OR Bayesian networks
Abstract
In this paper we discuss learning Bayesian networks whose conditional probability tables are either Noisy-OR models or general conditional probability tables. We refer to these models as Mixed Noisy-OR Bayesian Networks. To learn their ...
Read More

Reviews

Reviewer: Donald Harris Kraft

This paper offers a most interesting view of researchers seeking to improve the probabilistic model of information retrieval in order to achieve improved performance over the leading model (the vector space model with p -norms for generalized Boolean queries). This improved model incorporates previous work on the employment of Bayesian inference nets to adapt to the Boolean query operators “and” and “or.” The paper offers a fine introduction to Bayesian inference nets for probabilistic retrieval modeling. It suggests the use of a link matrix as a data structure to facilitate the calculation of the probabilities. The authors then consider a user interested in queries that seek documents that contain a given number of the requested terms, that is, they satisfy the parent indifference criterion (PIC). The paper investigates some efficiencies for calculating the PIC matrix elements based on the use of piecewise linear functions. Moreover, the authors consider an extension of this model to incorporate term weights, so that not all terms are of equal importance in a given query. Finally, the authors test their model against some standard data, including the INSPEC and TIPSTER collections of text documents. Their model uses some additional computing time but produces improved performance that is seemingly worth the cost. The results indicate that this model can compete with the leading performer, the p -norm vector space model. The authors also indicate some future directions for research, including the incorporation of belief functions.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 17, Issue 4
Oct. 1999
123 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/326440
Issue’s Table of Contents

Copyright © 1999 ACM
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 1999
Published in tois Volume 17, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Bayesian networks
Boolean queries
computational complexity
inference networks
link matrices
piecewise linear functions
pnorm
probabilistic information retrieval
query operators
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 5
  Total Citations
  View Citations
- 486
  Total Downloads
- Downloads (Last 12 months)22
- Downloads (Last 6 weeks)8
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

PIC matrices: a computationally tractable class of probabilistic query operators

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Supporting top-k join queries in relational databases

A simple graphical approach for understanding probabilistic inference in Bayesian networks

Structural learning of mixed noisy-OR Bayesian networks

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

PIC matrices: a computationally tractable class of probabilistic query operators

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

Supporting top-k join queries in relational databases

A simple graphical approach for understanding probabilistic inference in Bayesian networks

Structural learning of mixed noisy-OR Bayesian networks

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media