research-article

Aggregate queries for discrete and continuous probabilistic XML

Authors:
Serge Abiteboul

INRIA Saclay, Orsay Cedex, France

INRIA Saclay, Orsay Cedex, France
View Profile

,
T.-H. Hubert Chan

The University of Hong Kong, Hong Kong

The University of Hong Kong, Hong Kong
View Profile

,
Evgeny Kharlamov

Free University of Bozen-Bolzano, Bozen, Italy

Free University of Bozen-Bolzano, Bozen, Italy
View Profile

,
Werner Nutt

Free University of Bozen-Bolzano, Bozen, Italy

Free University of Bozen-Bolzano, Bozen, Italy
View Profile

,
Pierre Senellart

Institut Télécom; Télécom ParisTech, Paris, France

Institut Télécom; Télécom ParisTech, Paris, France
View Profile

ICDT '10: Proceedings of the 13th International Conference on Database TheoryMarch 2010Pages 50–61https://doi.org/10.1145/1804669.1804679

Published:23 March 2010Publication History

ICDT '10: Proceedings of the 13th International Conference on Database Theory

Pages 50–61

ABSTRACT

Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured setting. The latter is particularly well adapted to the management of uncertain data coming from a variety of automatic processes. An important problem, in the context of probabilistic XML databases, is that of answering aggregate queries (count, sum, avg, etc.), which has received limited attention so far. In a model unifying the various (discrete) semi-structured probabilistic models studied up to now, we present algorithms to compute the distribution of the aggregation values (exploiting some regularity properties of the aggregate functions) and probabilistic moments (especially, expectation and variance) of this distribution. We also prove the intractability of some of these problems and investigate approximation techniques. We finally extend the discrete model to a continuous one, in order to take into account continuous data values, such as measurements from sensor networks, and present algorithms to compute distribution functions and moments for various classes of continuous distributions of data values.

References

S. Abiteboul, T.-H. H. Chan, E. Kharlamov, W. Nutt, and P. Senellart. Agrégation de documents XML probabilistes. In Proc. BDA, Namur, Belgium, Oct. 2009. Conference without formal proceedings.Google Scholar
S. Abiteboul, B. Kimelfeld, Y. Sagiv, and P. Senellart. On the expressiveness of probabilistic XML models. VLDB Journal, 18(5):1041--1064, Oct. 2009. Google ScholarDigital Library
S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In Proc. EDBT, Munich, Germany, Mar. 2006. Google ScholarDigital Library
F. N. Afrati and P. G. Kolaitis. Answering aggregate queries in data exchange. In Proc. PODS, Vancouver, BC, Canada, June 2008. Google ScholarDigital Library
R. B. Ash and C. A. Doléans-Dade. Probability & Measure Theory. Academic Press, San Diego, CA, USA, 2000.Google Scholar
D. Calvanese, E. Kharlamov, W. Nutt, and C. Thorne. Aggregate queries over ontologies. In Proc. ONISW, Napa, CA, USA, Oct. 2008. Google ScholarDigital Library
R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In Proc. SIGMOD, San Diego, CA, USA, June 2003. Google ScholarDigital Library
S. Cohen, B. Kimelfeld, and Y. Sagiv. Incorporating constraints in probabilistic XML. In Proc. PODS, Vancouver, BC, Canada, June 2008. Google ScholarDigital Library
S. Cohen, B. Kimelfeld, and Y. Sagiv. Running tree automata on probabilistic XML. In Proc. PODS, Providence, RI, USA, June 2009. Google ScholarDigital Library
S. Cohen, Y. Sagiv, and W. Nutt. Rewriting queries with arbitrary aggregation functions using views. TODS, 31(2):672--715, 2006. Google ScholarDigital Library
N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: Diamonds in the dirt. Communications of the ACM, 52(7), 2009. Google ScholarDigital Library
A. Deshpande, C. Guestrin, S. Madden, J. M. Hellerstein, and W. Hong. Model-driven data acquisition in sensor networks. In Proc. VLDB, Toronto, ON, Canada, Aug. 2004. Google ScholarDigital Library
E. Grädel, Y. Gurevich, and C. Hirsch. The complexity of query reliability. In Proc. PODS, Seattle, WA, USA, June 1998. Google ScholarDigital Library
W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301):16--30, 1963.Google ScholarCross Ref
E. Hung, L. Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In Proc. ICDE, Bangalore, India, Mar. 2003.Google ScholarCross Ref
E. Hung, L. Getoor, and V. S. Subrahmanian. Probabilistic interval XML. TOCL, 8(4), 2007. Google ScholarDigital Library
T. Imieliński and W. Lipski. Incomplete information in relational databases. Journal of the ACM, 31(4):761--791, 1984.Google ScholarDigital Library
T. S. Jayram, S. Kale, and E. Vee. Efficient aggregation algorithms for probabilistic data. In Proc. SODA, New Orleans, LA, USA, Jan. 2007. Google ScholarDigital Library
B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query efficiency in probabilistic XML models. In Proc. SIGMOD, Vancouver, BC, Canada, June 2008. Google ScholarDigital Library
B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query evaluation over probabilistic XML. VLDB Journal, 18(5):1117--1140, Oct. 2009. Google ScholarDigital Library
B. Kimelfeld and Y. Sagiv. Matching twigs in probabilistic XML. In Proc. VLDB, Vienna, Austria, Sept. 2007. Google ScholarDigital Library
J. Lechtenbörger, H. Shu, and G. Vossen. Aggregate queries over conditional tables. Journal of Intelligent Information Systems, 19(3):343--362, 2002. Google ScholarDigital Library
A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In Proc. VLDB, Hong Kong, China, Aug. 2002. Google ScholarDigital Library
C. H. Papadimitriou. Computational Complexity. Addison Wesley, Reading, USA, 1994.Google Scholar
J. S. Provan and M. O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal of Computing, 12(4):777--788, 1983.Google ScholarCross Ref
C. Ré and D. Suciu. Efficient evaluation of HAVING queries on a probabilistic database. In Proc. DBPL, Vienna, Austria, Sept. 2007. Google ScholarDigital Library
P. Senellart and S. Abiteboul. On the complexity of managing probabilistic XML data. In Proc. PODS, Beijing, China, June 2007. Google ScholarDigital Library
M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In Proc. ICDE, Tokyo, Japan, Apr. 2005. Google ScholarDigital Library

Index Terms

Aggregate queries for discrete and continuous probabilistic XML
1. Information systems
  1. Data management systems
    1. Database design and models
    2. Query languages
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Capturing continuous data and answering aggregate queries in probabilistic XML

Sources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured ...
Read More
Updating probabilistic XML
EDBT '10: Proceedings of the 2010 EDBT/ICDT Workshops

We investigate the complexity of performing updates on probabilistic XML data for various classes of probabilistic XML documents of different succinctness. We consider two elementary kinds of updates, insertions and deletions, that are defined with the ...
Read More
On the complexity of managing probabilistic XML data
PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICDT '10: Proceedings of the 13th International Conference on Database Theory
March 2010
260 pages
ISBN:9781605589473
DOI:10.1145/1804669
Editor:
Luc Segoufin
INRIA & ENS Chacan, France
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 March 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
XML
aggregation
algorithms
complexity
probabilistic databases
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 18
  Total Citations
  View Citations
- 155
  Total Downloads
- Downloads (Last 12 months)2
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Aggregate queries for discrete and continuous probabilistic XML

ICDT '10: Proceedings of the 13th International Conference on Database Theory

ABSTRACT

References

Cited By

Index Terms

Recommendations

Capturing continuous data and answering aggregate queries in probabilistic XML

Updating probabilistic XML

On the complexity of managing probabilistic XML data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Aggregate queries for discrete and continuous probabilistic XML

ICDT '10: Proceedings of the 13th International Conference on Database Theory

ABSTRACT

References

Cited By

Index Terms

Recommendations

Capturing continuous data and answering aggregate queries in probabilistic XML

Updating probabilistic XML

On the complexity of managing probabilistic XML data

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media