ACM Home Page
Please provide us with feedback. Feedback
Efficient algorithms for exact ranked twig-pattern matching over graphs
Full text PdfPdf (615 KB)
Source
International Conference on Management of Data archive
Proceedings of the 2008 ACM SIGMOD international conference on Management of data table of contents
Vancouver, Canada
SESSION: Research Session 13: Graphs II table of contents
Pages 581-594  
Year of Publication: 2008
ISBN:978-1-60558-102-6
Authors
Gang Gou  North Carolina State University, Raleigh, NC, USA
Rada Chirkova  North Carolina State University, Raleigh, NC, USA
Sponsors
ACM: Association for Computing Machinery
SIGMOD: ACM Special Interest Group on Management of Data
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 50,   Downloads (12 Months): 139,   Citation Count: 0
Additional Information:

abstract   references   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1376616.1376676
What is a DOI?

ABSTRACT

Querying large-scale graph-structured data with twig patterns is attracting growing interest. Generally, a twig pattern could have an extremely large, potentially exponential, number of matches in a graph. Retrieving and returning to the user this many answers may both incur high computational overhead and overwhelm the user.

In this paper we propose two efficient algorithms, DP-B and DP-P, for retrieving top-ranked twig-pattern matches from large graphs. Our first algorithm, DP-B, is able to retrieve exact top-ranked answer matches from potentially exponentially many matches in time and space linear in the size of our data inputs even in the worst case. Further, beyond the linear-cost result of DP-B, our second algorithm, DP-P, could take far less than linear time and space cost in practice. To the best of our knowledge, our algorithms are the first to have these performance properties. Our experimental results demonstrate the high performance of both algorithms on large datasets. We also analyze and compare the performance trade-off between DP-B and DP-P from the theoretical and practical viewpoints.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

 
1
R. Agrawal, A. Borgida, and H. V. Jagadish. Efficient management of transitive relationships in large data and knowledge bases. SIGMOD, 1989.
 
2
S. Agrawal, S. Chaudhuri, and G. Das. DBXplorer: A system for keyword-based search over relational databases. ICDE, 2002.
 
3
M. D. Atkinson, J.-R. Sack, N. Santoro, and T. Strothotte. Min-max heaps and generalized priority queues. Communications of the ACM: 29(10), 1986.
 
4
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE, 2002.
 
5
N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. SIGMOD, 2002.
 
6
E. Cohen, E. Halperin, H. Kaplan, and U. Zwick. Reachability and distance queries via 2-hop labels. SODA, 2002.
 
7
S. Cohen, J. Mamou, Y. Kanza, and Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB, 2003.
 
8
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to algorithms (2nd edition). MIT Press, 2001.
 
9
B. Dawes and D. Abrahams. Boost C++ Libraries. http://www.boost.org/.
 
10
DBLP. DBLP bibliography. http://dblp.uni-trier.de/xml/.
 
11
B. Ding, J. X. Yu, S. Wang, L. Qin, X. Zhang, and X. Lin. Finding top-k min-cost connected trees in databases. ICDE, 2007.
 
12
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. PODS, 2001.
 
13
G. Gou and R. Chirkova. Efficiently querying large XML data repositories: a survey. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2007.
 
14
L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. SIGMOD, 2003.
 
15
V. Harinarayan, A. Rajaraman, and J. D. Ullman. Implementing data cubes efficiently. SIGMOD, 1996.
 
16
H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: ranked keyword searches on graphs. SIGMOD, 2007.
 
17
V. Hristidis and Y. Papakonstantinou. DISCOVER: Keyword search in relational databases. VLDB, 2002.
 
18
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB, 2005.
 
19
W.-S. Li, K. S. Candan, Q. Vu, and D. Agrawal. Retrieving and organizing web pages by information unit. WWW, 2001.
 
20
Oracle. Oracle Berkeley DB Java Edition (Version 3.2.44). http://www.oracle.com/technology/products/berkeley-db/ je/index.html.
 
21
Y. Qi, K. S. Candan, and M. L. Sapino. Sum-Max monotonic ranked joins for evaluating top-k twig queries on weighted data graphs. VLDB, 2007.
 
22
G. Reich and P. Widmayer. Beyond Steiner?s problem: A VLSI oriented generalization. WG Workshop, 1989.
 
23
W3C. XML path language (XPath) Version 1.0. http://www.w3.org/TR/xpath, 1999.
 
24
W3C. Resource Description framework (RDF). http://www.w3.org/RDF/, 2004.
 
25
H.Wang, H. He, J. Yang, P. S. Yu, and J. X. Yu. Dual labeling: Answering graph reachability queries in constant time. ICDE, 2006.