Abstract
The inference network model of information retrieval allows a probabilistic interpretation of query operators. In particular, Boolean query operators are conveniently modeled as link matrices of the Bayesian Network. Prior work has shown, however, that these operators do not perform as well as the pnorm operators used for modeling query operators in the context of the vector space model. This motivates the search for alternative probabilistic formulations for these operators. The design of such alternatives must contend with the issue of computational tractability, since the evaluation of an arbitrary operator requires exponential time. We define a flexible class of link matrices that are natural candidates for the implementation of query operators and an O(n2) algorithm (n = the number of parent nodes) for the computation of probabilities involving link matrices of this class. We present experimental results indicating that Boolean operators implemented in terms of link matrices from this class perform as well as pnorm operators in the context of the INQUERY inference network.
- BELKIN,N.J.,COOL, C., CROFT,W.B.,AND CALLAN, J. P. 1993. The effect multiple query representations on information retrieval system performance. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Re-trieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 339-346. Google Scholar
- BELKIN,N.J.,KANTOR, P., FOX,E.A.,AND SHAW, J. A. 1995. Combining the evidence of multiple query representations for information retrieval. Inf. Process. Manage. 31,3 (May-June), 431-448. Google Scholar
- BOOKSTEIN, A. 1980. Fuzzy requests: An approach to weighted Boolean searches. J. Am. Soc. Inf. Sci. 31, 4 (July), 240-247.Google Scholar
- BOOKSTEIN, A. 1981. A comparison of two systems of weighted Boolean queries. J. Am. Soc. Inf. Sci. 32, 4 (July), 275-279.Google Scholar
- BRIER, G. W. 1950. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1 (Jan.), 1-3.Google Scholar
- BUELL,D.A.AND KRAFT, D. H. 1981. Threshold values and Boolean retrieval system. Inf. Process. Manage. 17, 3, 127-136.Google Scholar
- BALLAN,J.P.,CROFT,W.B.,AND HARDING, S. M. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications 78-83.Google Scholar
- CHARNIAK, E. 1991. Bayesian networks without tears. AI Mag. 12, 4 (Apr.), 50-63. Google Scholar
- COOPER, W. S. 1994. The formalism of probability theory in IR: A foundation or an encumbrance?. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 242-247. Google Scholar
- DAWID, A. P. 1989. Probability forecasting. In Encyclopedia of Statistical Sciences John Wiley & Sons, Inc., New York, NY, 210-218.Google Scholar
- FOX, E., BETRABET, S., KOUSHIK, M., AND LEE, W. 1992. Extended Boolean models. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 393-418. Google Scholar
- FOX,E.A.AND SHAW, J. A. 1994. Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), D. K. Harman, Ed. 243-248.Google Scholar
- GREIFF, W. R. 1996. Computationally tractable, conceptually plausible classes of link matrices for the INQUERY inference network. Tech. Rep. TR-96-66. Department of Computer Science, University of Massachusetts, Amherst, MA.Google Scholar
- JANSEN,B.J.,SPINK, A., BATEMAN, J., AND SARACEVIC, T. 1998. Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32, 1, 5-17. Google Scholar
- KIM,J.H.AND PEARL, J. 1983. A computational model for combined causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence (Karlsruhe, Germany) 190-193.Google Scholar
- LEE, J. H. 1995. Analyzing the effectivness of extended Boolean models in information retrieval. Tech. Rep. TR95-1501. Cornell University, Ithaca, NY. Google Scholar
- NOREAULT, T., KOLL, M., AND MCGILL, M. J. 1977. Automatic ranked output from Boolean searches in SIRE. J. Am. Soc. Inf. Sci. 28, 6, 333-339.Google Scholar
- ORTEGA, J. M. 1972. Numerical Analysis: A Second Course. Academic Press, Inc., New York, NY.Google Scholar
- PEARL, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann series in representation and reasoning. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google Scholar
- ROBERTSON,S.E.AND WALKER, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 232-241. Google Scholar
- ROBERTSON,S.E.,WALKER, S., AND HANCOCK-BEAULIEU, M. M. 1995. Large test collection experiments on an operational, interactive system: Okapi at TREC. Inf. Process. Manage. 31, 3 (May-June), 345-360. Google Scholar
- SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA. Google Scholar
- SALTON, G., BUCKLEY, C., AND FOX, E. A. 1983a. Automatic query formulations in information retrieval. J. Am. Soc. Inf. Sci. 34, 4 (July), 262-280.Google Scholar
- SALTON, G., FOX,E.A.,AND WU, H. 1983b. Extended Boolean information retrieval. Commun. ACM 26, 11 (Nov.), 1022-1036. Google Scholar
- SHAW,J.A.AND FOX, E. A. 1995. Combination of multiple searches. In Proceedings of the 3rd Text Retrieval Conference (TREC-3) 105-108.Google Scholar
- TURTLE, H. 1994. Natural language vs. Boolean query evaluation: a comparison of retrieval performance. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 212-220. Google Scholar
- TURTLE,H.AND CROFT, W. B. 1990. Inference networks for document retrieval. In Proceedings of the 13th International Conference on Research and Development in Informa-tion Retrieval (SIGIR '90, Brussels, Belgium, Sept. 5-7), J.-L. Vidick, Ed. ACM Press, New York, NY, 1-24. Google Scholar
- TURTLE, H. R. 1991. Inference networks for document retrieval. Ph.D. Dissertation. Department of Computer Science, University of Massachusetts, Amherst, MA. Google Scholar
- WALLER,W.G.AND KRAFT, D. H. 1979. A mathematical model for a weighted Boolean retrieval system. Inf. Process. Manage. 15, 5, 235-245.Google Scholar
Index Terms
- PIC matrices: a computationally tractable class of probabilistic query operators
Recommendations
Supporting top-k join queries in relational databases
Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging ...
A simple graphical approach for understanding probabilistic inference in Bayesian networks
We present a simple graphical method for understanding exact probabilistic inference in discrete Bayesian networks (BNs). A conditional probability table (conditional) is depicted as a directed acyclic graph involving one or more black vertices and zero ...
Structural learning of mixed noisy-OR Bayesian networks
AbstractIn this paper we discuss learning Bayesian networks whose conditional probability tables are either Noisy-OR models or general conditional probability tables. We refer to these models as Mixed Noisy-OR Bayesian Networks. To learn their ...
Comments