|
ABSTRACT
Speeding up query evaluation in large XML repositories becomes a challenging and all-important problem with vast XML-related applications arising. Upon discovery of hot XML query patterns, indexing and caching can be effectively adopted for query performance enhancement. Previous algorithms for finding hot query patterns basically introduced a straightforward generate-and-test strategy. In this paper, we present, SOLARIA, an efficient algorithm for mining hot XML query patterns without candidate maintenance and costly tree-containment checking. Efficient algorithm of sequence mining is involved in discovering frequent tree-structured patterns, which aims at replacing expensive containment testing with cheap parent-child checking in sequences. SOLARIA deeply prunes unrelated search space for frequent pattern enumeration by parent-child relationship constraint. With the motivation of indexing and caching in XML query optimization, we also propose the derived algorithm SOLARIA for mining hot "closed" XML query patterns which provide compact and complete structure information. By a thorough experimental study on various real-life data, we demonstrate the efficiency and scalability of SOLARIA over the previous known alternative. SOLARIA is also linearly scalable in terms of XML queries' size.
REFERENCES
Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.
| |
1
|
|
| |
2
|
T. Asai, K. Abe, S. Kawasoe, et. al. Efficient Substructure Discovery from Large Semi-structured Data. Proc. of the 2nd SIAM Int. Conf. on Data Mining, 2002, Arlington, VA, USA.
|
 |
3
|
Jay Ayres , Jason Flannick , Johannes Gehrke , Tomi Yiu, Sequential PAttern mining using a bitmap representation, Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, July 23-26, 2002, Edmonton, Alberta, Canada
[doi> 10.1145/775047.775109]
|
| |
4
|
C. Bettini, X. Wang, and S. Jajodia, Mining temporal relationals with multiple granularities in time sequences. Data Engineering Bulletin, 1998.
|
| |
5
|
D. Chamberlin, D. Florescu, J. Robie, J. Simon, and M. Stefanescu. XQuery: A Query Language for XML W3C working draft, 2001.
|
 |
6
|
|
| |
7
|
J. Clark and S. DeRose. XML Path Language (XPath) version 1.0 W3C recommendation, 1999.
|
| |
8
|
L. Dehaspe, H. Toivonen, R. D. King. Finding Frequent Substructures in Chemical Compounds. Proc. of 4th Int. Conf. on Knowledge Discovery and Data Mining, Aug. 1998, New York, New York, USA.
|
| |
9
|
|
 |
10
|
Jiawei Han , Jian Pei , Behzad Mortazavi-Asl , Qiming Chen , Umeshwar Dayal , Mei-Chun Hsu, FreeSpan: frequent pattern-projected sequential pattern mining, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, p.355-359, August 20-23, 2000, Boston, Massachusetts, United States
[doi> 10.1145/347090.347167]
|
| |
11
|
|
| |
12
|
|
| |
13
|
|
| |
14
|
|
| |
15
|
|
| |
16
|
Jian Pei , Jiawei Han , Behzad Mortazavi-Asl , Helen Pinto , Qiming Chen , Umeshwar Dayal , Meichun Hsu, PrefixSpan: Mining Sequential Patterns by Prefix-Projected Growth, Proceedings of the 17th International Conference on Data Engineering, p.215-224, April 02-06, 2001
|
| |
17
|
S. Picciotto. How to Encode a Tree. PhD thesis, University of California, San Diego, 1999.
|
| |
18
|
|
| |
19
|
|
| |
20
|
|
| |
21
|
|
| |
22
|
W. Wang, H. Jiang, H. Lu, and J. X. Yu. PBiTree coding and efficient processing of containment joins. Proc. of the 19th Int. Conf. on Data Engineering, Mar. 2003, Bangalore, India.
|
 |
23
|
|
| |
24
|
X. Yan, J. Han, R. Afshar, CloSpan: Mining Closed Sequential Patterns in Large Databases. Proc. of the 3rd SIAM Int. Conf. on Data Mining, May 2003, San Francisco, CA, USA.
|
| |
25
|
L. H. Yang, M. L. Lee, W. Hsu. Efficient Mining of XML Query Patterns for Caching. Proc. of the 29th Int. Conf. on Very Large Data Bases, Sept. 2003, Berlin, Germany.
|
| |
26
|
|
 |
27
|
|
 |
28
|
|
CITED BY 2
|
|
|
|
|
Liang Huai Yang , Mong Li Lee , Wynne Hsu , Decai Huang , Limsoon Wong, Efficient mining of frequent XML query patterns with repeating-siblings, Information and Software Technology, v.50 n.5, p.375-389, April, 2008
|
|