skip to main content
10.1145/1804669.1804679acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Aggregate queries for discrete and continuous probabilistic XML

Published:23 March 2010Publication History

ABSTRACT

Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured setting. The latter is particularly well adapted to the management of uncertain data coming from a variety of automatic processes. An important problem, in the context of probabilistic XML databases, is that of answering aggregate queries (count, sum, avg, etc.), which has received limited attention so far. In a model unifying the various (discrete) semi-structured probabilistic models studied up to now, we present algorithms to compute the distribution of the aggregation values (exploiting some regularity properties of the aggregate functions) and probabilistic moments (especially, expectation and variance) of this distribution. We also prove the intractability of some of these problems and investigate approximation techniques. We finally extend the discrete model to a continuous one, in order to take into account continuous data values, such as measurements from sensor networks, and present algorithms to compute distribution functions and moments for various classes of continuous distributions of data values.

References

  1. S. Abiteboul, T.-H. H. Chan, E. Kharlamov, W. Nutt, and P. Senellart. Agrégation de documents XML probabilistes. In Proc. BDA, Namur, Belgium, Oct. 2009. Conference without formal proceedings.Google ScholarGoogle Scholar
  2. S. Abiteboul, B. Kimelfeld, Y. Sagiv, and P. Senellart. On the expressiveness of probabilistic XML models. VLDB Journal, 18(5):1041--1064, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In Proc. EDBT, Munich, Germany, Mar. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. N. Afrati and P. G. Kolaitis. Answering aggregate queries in data exchange. In Proc. PODS, Vancouver, BC, Canada, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. B. Ash and C. A. Doléans-Dade. Probability & Measure Theory. Academic Press, San Diego, CA, USA, 2000.Google ScholarGoogle Scholar
  6. D. Calvanese, E. Kharlamov, W. Nutt, and C. Thorne. Aggregate queries over ontologies. In Proc. ONISW, Napa, CA, USA, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In Proc. SIGMOD, San Diego, CA, USA, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Cohen, B. Kimelfeld, and Y. Sagiv. Incorporating constraints in probabilistic XML. In Proc. PODS, Vancouver, BC, Canada, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Cohen, B. Kimelfeld, and Y. Sagiv. Running tree automata on probabilistic XML. In Proc. PODS, Providence, RI, USA, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Cohen, Y. Sagiv, and W. Nutt. Rewriting queries with arbitrary aggregation functions using views. TODS, 31(2):672--715, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: Diamonds in the dirt. Communications of the ACM, 52(7), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In Proc. VLDB, Toronto, ON, Canada, Aug. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Grädel, Y. Gurevich, and C. Hirsch. The complexity of query reliability. In Proc. PODS, Seattle, WA, USA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):16--30, 1963.Google ScholarGoogle ScholarCross RefCross Ref
  15. E. Hung, L. Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In Proc. ICDE, Bangalore, India, Mar. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  16. E. Hung, L. Getoor, and V. S. Subrahmanian. Probabilistic interval XML. TOCL, 8(4), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Imieliński and W. Lipski. Incomplete information in relational databases. Journal of the ACM, 31(4):761--791, 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. T. S. Jayram, S. Kale, and E. Vee. Efficient aggregation algorithms for probabilistic data. In Proc. SODA, New Orleans, LA, USA, Jan. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query efficiency in probabilistic XML models. In Proc. SIGMOD, Vancouver, BC, Canada, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query evaluation over probabilistic XML. VLDB Journal, 18(5):1117--1140, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. B. Kimelfeld and Y. Sagiv. Matching twigs in probabilistic XML. In Proc. VLDB, Vienna, Austria, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Lechtenbörger, H. Shu, and G. Vossen. Aggregate queries over conditional tables. Journal of Intelligent Information Systems, 19(3):343--362, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In Proc. VLDB, Hong Kong, China, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. H. Papadimitriou. Computational Complexity. Addison Wesley, Reading, USA, 1994.Google ScholarGoogle Scholar
  25. J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal of Computing, 12(4):777--788, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  26. C. Ré and D. Suciu. Efficient evaluation of HAVING queries on a probabilistic database. In Proc. DBPL, Vienna, Austria, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Senellart and S. Abiteboul. On the complexity of managing probabilistic XML data. In Proc. PODS, Beijing, China, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In Proc. ICDE, Tokyo, Japan, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aggregate queries for discrete and continuous probabilistic XML

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICDT '10: Proceedings of the 13th International Conference on Database Theory
          March 2010
          260 pages
          ISBN:9781605589473
          DOI:10.1145/1804669

          Copyright © 2010 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 March 2010

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader