ABSTRACT
Graph databases with the property graph model are used in multiple domains including social networks, biology, and data integration. They provide schema-flexible storage for data of a different degree of a structure and support complex, expressive queries such as subgraph isomorphism queries. The exibility and expressiveness of graph databases make it difficult for the users to express queries correctly and can lead to unexpected query results, e.g. empty results. Therefore, we propose a relaxation approach for subgraph isomorphism queries that is able to automatically rewrite a graph query, such that the rewritten query is similar to the original query and returns a non-empty result set. In detail, we present relaxation operations applicable to a query, cardinality estimation heuristics, and strategies for prioritizing graph query elements to be relaxed. To determine the similarity between the original query and its relaxed variants, we propose a novel cardinality-based graph edit distance. The feasibility of our approach is shown by using real-world queries from the DBpedia query log.
- N. Bidoit, M. Herschel, and K. Tzompanaki. Query-based Why-Not provenance with NedExplain. In Proc. EDBT, pages 145--156, 2014.Google Scholar
- C. Bornhövd, R. Kubis, W. Lehner, H. Voigt, and H. Werner. Flexible information management, exploration and analysis in SAP HANA. In DATA, pages 15--28, 2012.Google Scholar
- A. Chapman and H. V. Jagadish. Why Not? In Proc. SIGMOD, pages 523--534. ACM, 2009. Google ScholarDigital Library
- S. Chaudhuri. Generalization and a framework for query modification. In Proc. ICDE, pages 138--145. IEEE Computer Society, 1990. Google ScholarDigital Library
- X. Gao, B. Xiao, D. Tao, and X. Li. A survey of graph edit distance. Pattern Anal. Appl., pages 113--129, 2010. Google ScholarDigital Library
- M. Herschel. Wondering why data are missing from query results?: Ask conseil why-not. In Proc. CIKM, pages 2213--2218. ACM, 2013. Google ScholarDigital Library
- M. Herschel and M. A. Hernández. Explaining missing answers to SPJUA queries. Proc. VLDB Endow., pages 185--196, 2010. Google ScholarDigital Library
- M. Herschel, M. A. Hernández, and W.-C. Tan. Artemis: A system for analyzing missing answers. Proc. VLDB Endow., pages 1550--1553, 2009. Google ScholarDigital Library
- J. Huang, T. Chen, A. Doan, and J. F. Naughton. On the provenance of non-answers to queries over extracted data. Proc. VLDB Endow., pages 736--747, 2008. Google ScholarDigital Library
- D. Jannach. Techniques for fast query relaxation in content-based recommender systems. In Proc. KI, pages 49--63. Springer-Verlag, 2007. Google ScholarDigital Library
- U. Junker. QUICKXPLAIN: preferred explanations and relaxations for over-constrained problems. In AAAI, pages 167--172, 2004. Google ScholarDigital Library
- F. Mandreoli, R. Martoglia, G. Villani, and W. Penzo. Flexible query answering on graph-modeled data. In Proc. EDBT, pages 216--227. ACM, 2009. Google ScholarDigital Library
- D. Mottin, A. Marascu, S. Basu Roy, G. Das, T. Palpanas, and Y. Velegrakis. IQR: An interactive query relaxation system for the empty-answer problem. In Proc. SIGMOD, pages 1095--1098. ACM, 2014. Google ScholarDigital Library
- D. Mottin, A. Marascu, S. B. Roy, G. Das, T. Palpanas, and Y. Velegrakis. A probabilistic optimization framework for the empty-answer problem. Proc. VLDB Endow., pages 1762--1773, 2013. Google ScholarDigital Library
- R. Myers, R. Wison, and E. R. Hancock. Bayesian graph edit distance. IEEE Trans. Pattern Anal. Mach. Intell., pages 628--635, 2000. Google ScholarDigital Library
- M. Neuhaus and H. Bunke. A probabilistic approach to learning costs for graph edit distance. In Proc. ICPR, pages 389--393. IEEE, 2004. Google ScholarDigital Library
- M. Neuhaus and H. Bunke. Self-organizing maps for learning the edit costs in graph matching. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, pages 503--514, 2005. Google ScholarDigital Library
- M. Paradies, M. Rudolf, C. Bornhövd, and W. Lehner. GRATIN: Accelerating graph traversals in main-memory column stores. In GRADES, pages 9:1--9:6. ACM, 2014. Google ScholarDigital Library
- M. A. Rodriguez and P. Neubauer. Constructions from dots and lines. Bulletin of the American Society for Inf. Science and Technology, pages 35--41, 2010.Google ScholarCross Ref
- M. Rudolf, M. Paradies, C. Bornhövd, and W. Lehner. The graph story of the SAP HANA database. In Datenbanksysteme für Business, Technologie und Web (BTW). Proceedings, pages 403--420, 2013.Google Scholar
- T. Tran, H. Wang, S. Rudolph, and P. Cimiano. Top-k exploration of query candidates for efficient keyword search on graph-shaped (RDF) data. In Proc. ICDE, pages 405--416, 2009. Google ScholarDigital Library
- E. Vasilyeva, M. Thiele, C. Bornhövd, and W. Lehner. GraphMCS: Discover the unknown in large data graphs. In Workshops EDBT, pages 200--207, 2014.Google Scholar
- E. Vasilyeva, M. Thiele, C. Bornhovd, and W. Lehner. Top-k differential queries in graph databases. In Advances in Databases and Information Systems, Lecture Notes in Computer Science, pages 112--125. Springer International Publishing, 2014.Google ScholarCross Ref
- J. Wei. Markov edit distance. IEEE Trans. Pattern Anal. Mach. Intell., pages 311--321, 2003. Google ScholarDigital Library
- R. C. Wilson and E. R. Hancock. Structural matching by discrete relaxation. IEEE Trans. Pattern Anal. Mach. Intell., pages 634--648, 1997. Google ScholarDigital Library
- H. Zhou, A. A. Shaverdian, H. Jagadish, and G. Michailidis. Querying graphs with uncertain predicates. In Proc. Eighth Workshop on Mining and Learning with Graphs, pages 163--170. ACM, 2010. Google ScholarDigital Library
Index Terms
- Relaxation of subgraph queries delivering empty results
Recommendations
Cooperative treatment of failing queries over uncertain databases: a matrix-computation-based approach
AbstractA large number of applications such as sensor networks, RFID-based monitoring systems, mobile object management and location-based services manage data pervaded with uncertainty. Usually, users wish/prefer high quality results (i.e. with highest ...
Query Relaxation for Star Queries on RDF
11th International Conference on Web Information Systems Engineering --- WISE 2010 - Volume 6488Query relaxation is an important problem for querying RDF data flexibly. The previous work mainly uses ontology information for relaxing user queries. The ranking models proposed, however, are either non-quantifiable or imprecise. Furthermore, the ...
Empty versus overabundant answers to flexible relational queries
When retrieving and searching desired data over large databases accessible in the web, users might be confronted with two common problems: overabundant answers and empty answers. In the former, the user is provided with an avalanche of responses that ...
Comments