|
ABSTRACT
Querying large-scale graph-structured data with twig patterns is attracting growing interest. Generally, a twig pattern could have an extremely large, potentially exponential, number of matches in a graph. Retrieving and returning to the user this many answers may both incur high computational overhead and overwhelm the user. In this paper we propose two efficient algorithms, DP-B and DP-P, for retrieving top-ranked twig-pattern matches from large graphs. Our first algorithm, DP-B, is able to retrieve exact top-ranked answer matches from potentially exponentially many matches in time and space linear in the size of our data inputs even in the worst case. Further, beyond the linear-cost result of DP-B, our second algorithm, DP-P, could take far less than linear time and space cost in practice. To the best of our knowledge, our algorithms are the first to have these performance properties. Our experimental results demonstrate the high performance of both algorithms on large datasets. We also analyze and compare the performance trade-off between DP-B and DP-P from the theoretical and practical viewpoints.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. SIGMOD, 1989.
|
| |
2
|
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. ICDE, 2002.
|
| |
3
|
M. D. Atkinson, J.-R. Sack, N. Santoro, and T. Strothotte. Min-max heaps and generalized priority queues. Communications of the ACM: 29(10), 1986.
|
| |
4
|
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE, 2002.
|
| |
5
|
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. SIGMOD, 2002.
|
| |
6
|
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SODA, 2002.
|
| |
7
|
S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB, 2003.
|
| |
8
|
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms (2nd edition). MIT Press, 2001.
|
| |
9
|
B. Dawes and D. Abrahams. Boost C++ Libraries. http://www.boost.org/.
|
| |
10
|
DBLP. DBLP bibliography. http://dblp.uni-trier.de/xml/.
|
| |
11
|
B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. ICDE, 2007.
|
| |
12
|
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. PODS, 2001.
|
| |
13
|
G. Gou and R. Chirkova. Efficiently querying large XML data repositories: a survey. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2007.
|
| |
14
|
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. SIGMOD, 2003.
|
| |
15
|
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD, 1996.
|
| |
16
|
H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. SIGMOD, 2007.
|
| |
17
|
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. VLDB, 2002.
|
| |
18
|
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB, 2005.
|
| |
19
|
W.-S. Li, K. S. Candan, Q. Vu, and D. Agrawal. Retrieving and organizing web pages by information unit. WWW, 2001.
|
| |
20
|
Oracle. Oracle Berkeley DB Java Edition (Version 3.2.44). http://www.oracle.com/technology/products/berkeley-db/ je/index.html.
|
| |
21
|
Y. Qi, K. S. Candan, and M. L. Sapino. Sum-Max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. VLDB, 2007.
|
| |
22
|
G. Reich and P. Widmayer. Beyond Steiner?s problem: A VLSI oriented generalization. WG Workshop, 1989.
|
| |
23
|
W3C. XML path language (XPath) Version 1.0. http://www.w3.org/TR/xpath, 1999.
|
| |
24
|
W3C. Resource Description framework (RDF). http://www.w3.org/RDF/, 2004.
|
| |
25
|
H.Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. ICDE, 2006.
|
|