|
ABSTRACT
We present a formal framework for capturing the provenance of data appearing in XQuery views of XML. Building on previous work on relations and their (positive) query languages, we decorate unordered XML with annotations from commutative semirings and show that these annotations suffice for a large positive fragment of XQuery applied to this data. In addition to tracking provenance metadata, the framework can be used to represent and process XML with repetitions, incomplete XML, and probabilistic XML, and provides a basis for enforcing access control policies in security applications. Each of these applications builds on our semantics for XQuery, which we present in several steps: we generalize the semantics of the Nested Relational Calculus (NRC) to handle semiring-annotated complex values, we extend it with a recursive type and structural recursion operator for trees, and we define a semantics for XQuery on annotated XML by translation into this calculus.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
|
| |
2
|
S. Abiteboul, L. Segoufin, and V. Vianu. Representing and querying xml with incomplete information. TODS, 31(1):208--254, 2006.
|
| |
3
|
S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, 2006.
|
| |
4
|
P. Buneman, J. Cheney, W.-C. Tan, and S. Vansummeren. Curated databases. In PODS, 2008.
|
| |
5
|
P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. In ICDT, 2007.
|
| |
6
|
P. Buneman, M. F. Fernandez, and D. Suciu. UnQL: A query language and algebra for semistructured data based on structural recursion. VLDB J., 9(1):76--110, 2000.
|
| |
7
|
P. Buneman, S. Khanna, and W. C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001.
|
| |
8
|
P. Buneman, S. A. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. TCS, 149(1):3--48, 1995.
|
| |
9
|
P. Buneman and V. Tannen. A structural approach to query language design. In The Functional Approach to Data Management Modeling, Analyzing, and Integrating Heterogenous Data. Springer, 2004.
|
| |
10
|
J. V. den Bussche, D. V. Gucht, and S. Vansummeren. Well-definedness and semantic type-checking in the nested relational calculus and XQuery. In ICDT, 2005.
|
| |
11
|
D. Draper, P. Fankhauser, M. F. Fernández, A. Malhotra, K. Rose, M. Rys, J. Siméon, and P. Wadler. XQuery 1.0 and XPath 2.0 Formal Semantics. W3C, Jan. 2007.
|
| |
12
|
D. Florescu and D. Kossmann. Storing and querying XML data using an RDMBS. IEEE Data Engineering Bulletin, 22(3), 1999.
|
| |
13
|
J. N. Foster, T. J. Green, and V. Tannen. Annotated XML: Queries and provenance. Technical Report TR-CIS-08-06, University of Pennsylvania, 2008.
|
| |
14
|
G. Grahne, A. Thomo, and W. W. Wadge. Preferentially annotated regular path queries. In ICDT, 2007.
|
| |
15
|
T. J. Green, G. Karvounarakis, Z. G. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007.
|
| |
16
|
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007.
|
| |
17
|
J. Hidders, N. Kwasnikowska, J. Sroka, J. Tyszkiewicz, and J. V. den Bussche. A formal model of dataflow repositories. In DILS, 2007.
|
| |
18
|
E. Hung, L. Getoor, and V. S. Subrahmanian. Probabilistic interval XML. ACM TOCL, 8(4), 2007.
|
| |
19
|
T. Imieliński and W. Lipski. Incomplete information in relational databases. JACM, 31(4), 1984.
|
| |
20
|
Y. Kanza, W. Nutt, and Y. Sagiv. Queries with incomplete answers over semistructured data. In PODS, 1999.
|
| |
21
|
S. K. Lellahi and V. Tannen. A calculus for collections and aggregates. In Category Theory and Computer Science, 1997.
|
| |
22
|
L. Libkin and L. Wong. Query languages for bags and aggregate functions. JCSS, 55(2):241--272, 1997.
|
| |
23
|
A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, 2002.
|
| |
24
|
D. Olteanu, H. Meuss, T. Furche, and F. Bry. XPath: Looking forward. In EDBT Workshops, 2002.
|
| |
25
|
C. Ré, J. Simèon, and M. Fernández. A complete and efficient algebraic compiler for xquery. In ICDE, page 14, 2006.
|
| |
26
|
E. L. Robertson, L. V. Saxton, D. V. Gucht, and S. Vansummeren. Structural recursion on ordered trees and list-based complex objects. In ICDT, 2007.
|
| |
27
|
P. Senellart and S. Abiteboul. On the complexity of managing probabilistic XML data. In PODS, 2007.
|
| |
28
|
J. Shanmugasundaram, K. Tufte, C. Zhang, G. He, D. J. DeWitt, and J. F. Naughton. Relational databases for querying XML documents: Limitations and opportunities. In VLDB J., 1999.
|
| |
29
|
M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In ICDE, 2005.
|
|