skip to main content
article
Free Access

PIC matrices: a computationally tractable class of probabilistic query operators

Published:01 October 1999Publication History
Skip Abstract Section

Abstract

The inference network model of information retrieval allows a probabilistic interpretation of query operators. In particular, Boolean query operators are conveniently modeled as link matrices of the Bayesian Network. Prior work has shown, however, that these operators do not perform as well as the pnorm operators used for modeling query operators in the context of the vector space model. This motivates the search for alternative probabilistic formulations for these operators. The design of such alternatives must contend with the issue of computational tractability, since the evaluation of an arbitrary operator requires exponential time. We define a flexible class of link matrices that are natural candidates for the implementation of query operators and an O(n2) algorithm (n = the number of parent nodes) for the computation of probabilities involving link matrices of this class. We present experimental results indicating that Boolean operators implemented in terms of link matrices from this class perform as well as pnorm operators in the context of the INQUERY inference network.

References

  1. BELKIN,N.J.,COOL, C., CROFT,W.B.,AND CALLAN, J. P. 1993. The effect multiple query representations on information retrieval system performance. In Proceedings of the 16th Annual International ACM Conference on Research and Development in Information Re-trieval (SIGIR '93, Pittsburgh, PA, June 27-July), R. Korfhage, E. Rasmussen, and P. Willett, Eds. ACM Press, New York, NY, 339-346. Google ScholarGoogle Scholar
  2. BELKIN,N.J.,KANTOR, P., FOX,E.A.,AND SHAW, J. A. 1995. Combining the evidence of multiple query representations for information retrieval. Inf. Process. Manage. 31,3 (May-June), 431-448. Google ScholarGoogle Scholar
  3. BOOKSTEIN, A. 1980. Fuzzy requests: An approach to weighted Boolean searches. J. Am. Soc. Inf. Sci. 31, 4 (July), 240-247.Google ScholarGoogle Scholar
  4. BOOKSTEIN, A. 1981. A comparison of two systems of weighted Boolean queries. J. Am. Soc. Inf. Sci. 32, 4 (July), 275-279.Google ScholarGoogle Scholar
  5. BRIER, G. W. 1950. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1 (Jan.), 1-3.Google ScholarGoogle Scholar
  6. BUELL,D.A.AND KRAFT, D. H. 1981. Threshold values and Boolean retrieval system. Inf. Process. Manage. 17, 3, 127-136.Google ScholarGoogle Scholar
  7. BALLAN,J.P.,CROFT,W.B.,AND HARDING, S. M. 1992. The INQUERY retrieval system. In Proceedings of the 3rd International Conference on Database and Expert Systems Applications 78-83.Google ScholarGoogle Scholar
  8. CHARNIAK, E. 1991. Bayesian networks without tears. AI Mag. 12, 4 (Apr.), 50-63. Google ScholarGoogle Scholar
  9. COOPER, W. S. 1994. The formalism of probability theory in IR: A foundation or an encumbrance?. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 242-247. Google ScholarGoogle Scholar
  10. DAWID, A. P. 1989. Probability forecasting. In Encyclopedia of Statistical Sciences John Wiley & Sons, Inc., New York, NY, 210-218.Google ScholarGoogle Scholar
  11. FOX, E., BETRABET, S., KOUSHIK, M., AND LEE, W. 1992. Extended Boolean models. In Information Retrieval: Data Structures and Algorithms, W. B. Frakes and R. Baeza-Yates, Eds. Prentice-Hall, Inc., Upper Saddle River, NJ, 393-418. Google ScholarGoogle Scholar
  12. FOX,E.A.AND SHAW, J. A. 1994. Combination of multiple searches. In Proceedings of the 2nd Text Retrieval Conference (TREC-2), D. K. Harman, Ed. 243-248.Google ScholarGoogle Scholar
  13. GREIFF, W. R. 1996. Computationally tractable, conceptually plausible classes of link matrices for the INQUERY inference network. Tech. Rep. TR-96-66. Department of Computer Science, University of Massachusetts, Amherst, MA.Google ScholarGoogle Scholar
  14. JANSEN,B.J.,SPINK, A., BATEMAN, J., AND SARACEVIC, T. 1998. Real life information retrieval: a study of user queries on the Web. SIGIR Forum 32, 1, 5-17. Google ScholarGoogle Scholar
  15. KIM,J.H.AND PEARL, J. 1983. A computational model for combined causal and diagnostic reasoning in inference systems. In Proceedings of the 8th International Joint Conference on Artificial Intelligence (Karlsruhe, Germany) 190-193.Google ScholarGoogle Scholar
  16. LEE, J. H. 1995. Analyzing the effectivness of extended Boolean models in information retrieval. Tech. Rep. TR95-1501. Cornell University, Ithaca, NY. Google ScholarGoogle Scholar
  17. NOREAULT, T., KOLL, M., AND MCGILL, M. J. 1977. Automatic ranked output from Boolean searches in SIRE. J. Am. Soc. Inf. Sci. 28, 6, 333-339.Google ScholarGoogle Scholar
  18. ORTEGA, J. M. 1972. Numerical Analysis: A Second Course. Academic Press, Inc., New York, NY.Google ScholarGoogle Scholar
  19. PEARL, J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann series in representation and reasoning. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarGoogle Scholar
  20. ROBERTSON,S.E.AND WALKER, S. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 232-241. Google ScholarGoogle Scholar
  21. ROBERTSON,S.E.,WALKER, S., AND HANCOCK-BEAULIEU, M. M. 1995. Large test collection experiments on an operational, interactive system: Okapi at TREC. Inf. Process. Manage. 31, 3 (May-June), 345-360. Google ScholarGoogle Scholar
  22. SALTON, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley Series in Computer Science. Addison-Wesley Longman Publ. Co., Inc., Reading, MA. Google ScholarGoogle Scholar
  23. SALTON, G., BUCKLEY, C., AND FOX, E. A. 1983a. Automatic query formulations in information retrieval. J. Am. Soc. Inf. Sci. 34, 4 (July), 262-280.Google ScholarGoogle Scholar
  24. SALTON, G., FOX,E.A.,AND WU, H. 1983b. Extended Boolean information retrieval. Commun. ACM 26, 11 (Nov.), 1022-1036. Google ScholarGoogle Scholar
  25. SHAW,J.A.AND FOX, E. A. 1995. Combination of multiple searches. In Proceedings of the 3rd Text Retrieval Conference (TREC-3) 105-108.Google ScholarGoogle Scholar
  26. TURTLE, H. 1994. Natural language vs. Boolean query evaluation: a comparison of retrieval performance. In Proceedings of the 17th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR '94, Dublin, Ireland, July 3-6), W. B. Croft and C. J. van Rijsbergen, Eds. Springer-Verlag, New York, NY, 212-220. Google ScholarGoogle Scholar
  27. TURTLE,H.AND CROFT, W. B. 1990. Inference networks for document retrieval. In Proceedings of the 13th International Conference on Research and Development in Informa-tion Retrieval (SIGIR '90, Brussels, Belgium, Sept. 5-7), J.-L. Vidick, Ed. ACM Press, New York, NY, 1-24. Google ScholarGoogle Scholar
  28. TURTLE, H. R. 1991. Inference networks for document retrieval. Ph.D. Dissertation. Department of Computer Science, University of Massachusetts, Amherst, MA. Google ScholarGoogle Scholar
  29. WALLER,W.G.AND KRAFT, D. H. 1979. A mathematical model for a weighted Boolean retrieval system. Inf. Process. Manage. 15, 5, 235-245.Google ScholarGoogle Scholar

Index Terms

  1. PIC matrices: a computationally tractable class of probabilistic query operators

    Recommendations

    Reviews

    Donald Harris Kraft

    This paper offers a most interesting view of researchers seeking to improve the probabilistic model of information retrieval in order to achieve improved performance over the leading model (the vector space model with p -norms for generalized Boolean queries). This improved model incorporates previous work on the employment of Bayesian inference nets to adapt to the Boolean query operators “and” and “or.” The paper offers a fine introduction to Bayesian inference nets for probabilistic retrieval modeling. It suggests the use of a link matrix as a data structure to facilitate the calculation of the probabilities. The authors then consider a user interested in queries that seek documents that contain a given number of the requested terms, that is, they satisfy the parent indifference criterion (PIC). The paper investigates some efficiencies for calculating the PIC matrix elements based on the use of piecewise linear functions. Moreover, the authors consider an extension of this model to incorporate term weights, so that not all terms are of equal importance in a given query. Finally, the authors test their model against some standard data, including the INSPEC and TIPSTER collections of text documents. Their model uses some additional computing time but produces improved performance that is seemingly worth the cost. The results indicate that this model can compete with the leading performer, the p -norm vector space model. The authors also indicate some future directions for research, including the incorporation of belief functions.

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader