ABSTRACT
Many data management applications must deal with data which is uncertain, incomplete, or noisy. However, on existing uncertain data representations, we cannot tractably perform the important query evaluation tasks of determining query possibility, certainty, or probability: these problems are hard on arbitrary uncertain input instances. We thus ask whether we could restrict the structure of uncertain data so as to guarantee the tractability of exact query evaluation. We present our tractability results for tree and tree-like uncertain data, and a vision for probabilistic rule reasoning. We also study uncertainty about order, proposing a suitable representation, and study uncertain data conditioned by additional observations.
- S. Abiteboul, D. Deutch, and V. Vianu. Deduction with contradictions in Datalog. In ICDT, 2014.Google Scholar
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In VLDB, 1994. Google ScholarDigital Library
- A. Amarilli, Y. Amsterdamer, and T. Milo. On the complexity of mining itemsets from the crowd using taxonomies. In Proc. ICDT, 2014.Google Scholar
- A. Amarilli, Y. Amsterdamer, and T. Milo. Uncertainty in crowd data sourcing under structural constraints. In Proc. UnCrowd, 2014.Google ScholarCross Ref
- A. Amarilli, L. M. Ba, D. Deutch, and P. Senellart. Querying order-incomplete data. Preprint: http://a3nm.net/publications/amarilli2015querying.pdf.Google Scholar
- A. Amarilli, P. Bourhis, and P. Senellart. Probabilities and provenance via tree decompositions. Preprint: http://a3nm.net/publications/amarilli2015probabilities.pdf, 2014.Google Scholar
- A. Amarilli and P. Senellart. Unsaid string: Uncertainty and structure in the access to intensional data. Preprint: http://a3nm.net/publications/amarilli2014unsaid.pdf.Google Scholar
- Y. Amsterdamer, Y. Grossman, T. Milo, and P. Senellart. Crowd mining. In SIGMOD, 2013. Google ScholarDigital Library
- M. L. Ba, T. Abdessalem, and P. Senellart. Uncertain version control in open collaborative editing of tree-structured documents. In DocEng, 2013. Google ScholarDigital Library
- J.-F. Baget, M. Leclère, M.-L. Mugnier, and E. Salvat. On rules with existential variables: Walking the decidability line. Artificial Intelligence, 175(9), 2011. Google ScholarDigital Library
- V. Bárány, B. ten Cate, B. Kimelfeld, D. Olteanu, and Z. Vagena. Declarative statistical modeling with Datalog. CoRR, abs/1412.2221, 2015.Google Scholar
- M. Benedikt, E. Kharlamov, D. Olteanu, and P. Senellart. Probabilistic XML via Markov chains. PVLDB, 3(1--2), 2010. Google ScholarDigital Library
- G. Brightwell and P. Winkler. Counting linear extensions. Order, 8(3), 1991.Google Scholar
- A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell. Toward an architecture for never-ending language learning. In AAAI, 2010.Google ScholarDigital Library
- S. Cohen, B. Kimelfeld, and Y. Sagiv. Incorporating constraints in probabilistic XML. TODS, 34(3), 2009. Google ScholarDigital Library
- S. Cohen, B. Kimelfeld, and Y. Sagiv. Running tree automata on probabilistic XML. In PODS, 2009. Google ScholarDigital Library
- B. Courcelle. Graph rewriting: An algebraic and logic approach. In Handbook of Theoretical Computer Science. Elsevier, 1990. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. VLDBJ, 16(4), 2007. Google ScholarDigital Library
- N. Dalvi and D. Suciu. The dichotomy of probabilistic inference for unions of conjunctive queries. JACM, 59(6), 2012. Google ScholarDigital Library
- D. Deutch, T. Milo, S. Roy, and V. Tannen. Circuits for Datalog provenance. In ICDT, 2014.Google Scholar
- X. Dong, A. Y. Halevy, and C. Yu. Data integration with uncertainty. In VLDB, 2007. Google ScholarDigital Library
- L. Galárraga, C. Teflioudi, K. Hose, and F. M. Suchanek. AMIE string: Association rule mining under incomplete evidence in ontological knowledge bases. In WWW, 2013. Google ScholarDigital Library
- G. Gottlob, N. Leone, and F. Scarcello. Hypertree decompositions: A survey. In MFCS. 2001. Google ScholarDigital Library
- G. Gottlob, T. Lukasiewicz, and G. I. Simari. Conjunctive query answering in probabilistic Datalog+- ontologies. In Web Reasoning and Rule Systems. Springer, 2011. Google ScholarDigital Library
- G. Gottlob, R. Pichler, and F. Wei. Monadic Datalog over finite structures of bounded treewidth. TOCL, 12(1), 2010. Google ScholarDigital Library
- E. Grädel. Efficient evaluation methods for guarded logics and Datalog LITE. In LPAR, 2000.Google ScholarCross Ref
- T. J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarDigital Library
- T. J. Green and V. Tannen. Models for incomplete and probabilistic information. In IIDB, 2006. Google ScholarDigital Library
- J. Henriksen, J. Jensen, M. Jørgensen, N. Klarlund, B. Paige, T. Rauhe, and A. Sandholm. Mona: Monadic second-order logic in practice. In TACAS, 1995. Google ScholarDigital Library
- J. Huang, L. Antova, C. Koch, and D. Olteanu. MayBMS: a probabilistic database management system. In SIGMOD, 2009. Google ScholarDigital Library
- T. Imielinski and W. Lipski, Jr. Incomplete information in relational databases. JACM, 31(4), 1984. Google ScholarDigital Library
- A. Jha and D. Suciu. Probabilistic databases with MarkoViews. PVLDB, 5(11), 2012. Google ScholarDigital Library
- B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query efficiency in probabilistic XML models. In SIGMOD, 2008. Google ScholarDigital Library
- B. Kimelfeld and P. Senellart. Probabilistic XML\string: Models and complexity. In Z. Ma and L. Yan, editors, Advances in Probabilistic Databases for Uncertain Information Management. Springer, 2013.Google ScholarCross Ref
- L. V. S. Lakshmanan, N. Leone, R. B. Ross, and V. S. Subrahmanian. ProbView: A flexible probabilistic database system. TODS, 22(3), 1997. Google ScholarDigital Library
- S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems. J. Roy. Stat. Soc., Ser. B, 1988.Google Scholar
- S. Maniu, R. Cheng, and P. Senellart. ProbTree\string: A query-efficient representation of probabilistic graphs. In BUDA, June 2014. Workshop without formal proceedings.Google Scholar
- A. Parameswaran, H. Garcia-Molina, H. Park, N. Polyzotis, A. Ramesh, and J. Widom. Crowdscreen: Algorithms for filtering data with humans. In SIGMOD, 2012. Google ScholarDigital Library
- L. D. Raedt, A. Kimmig, and H. Toivonen. Problog: A probabilistic Prolog and its application in link discovery. In IJCAI, 2007. Google ScholarDigital Library
- M. Richardson and P. Domingos. Markov logic networks. Machine learning, 62(1--2), 2006. Google ScholarDigital Library
- N. Robertson and P. D. Seymour. Graph minors. II. Algorithmic aspects of tree-width. J. Algorithms, 7(3), 1986.Google Scholar
- K. Stefanidis, G. Koutrika, and E. Pitoura. A survey on representation, composition and application of preferences in database systems. ACM Transactions on Database Systems (TODS), 36(3), 2011. Google ScholarDigital Library
- R. Tang, R. Cheng, H. Wu, and S. Bressan. A framework for conditioning uncertain relational data. In Database and Expert Systems Applications. Springer, 2012.Google ScholarCross Ref
- J. W. Thatcher and J. B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Math. systems theory, 2(1), 1968.Google Scholar
- D. Vrandečić and M. Krötzsch. Wikidata: a free collaborative knowledgebase. CACM, 57(10), 2014. Google ScholarDigital Library
Index Terms
- Structurally Tractable Uncertain Data
Recommendations
Tractable Lineages on Treelike Instances: Limits and Extensions
PODS '16: Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database SystemsQuery evaluation on probabilistic databases is generally intractable (#P-hard). Existing dichotomy results have identified which queries are tractable (or safe), and connected them to tractable lineages. In our previous work, using different tools, we ...
Finding Probabilistic k-Skyline Sets on Uncertain Data
CIKM '15: Proceedings of the 24th ACM International on Conference on Information and Knowledge ManagementSkyline is a set of points that are not dominated by any other point. Given uncertain objects, probabilistic skyline has been studied which computes objects with high probability of being skyline. While useful for selecting individual objects, it is not ...
ELCA evaluation for keyword search on probabilistic XML data
As probabilistic data management is becoming one of the main research focuses and keyword search is turning into a more popular query means, it is natural to think how to support keyword queries on probabilistic XML data. With regards to keyword query ...
Comments