skip to main content
10.1145/1516241.1516322acmconferencesArticle/Chapter ViewAbstractPublication PagesicuimcConference Proceedingsconference-collections
research-article

XML data partitioning strategies to improve parallelism in parallel holistic twig joins

Published: 15 February 2009 Publication History

Abstract

Parallel XML query processing systems that process numerous queries over large heterogeneous XML documents often experience under-performance due to workload imbalance and low CPU/system utilization, because conventional partitioning strategies cannot serve well for state-of-the-art query processing algorithms, such as holistic twig joins. Consequently, partitioning and distributing heterogeneous XML documents onto a parallel cluster system have lead to such an intricacy issue for maintaining good query performance. In this paper, we propose XML data partitioning strategies that are able to alleviate system performance degradation due to workload imbalance, especially for parallel holistic twig joins processing. The proposed XML data partitioning strategies aim at improving workload balance on both static data distribution and dynamic data distribution. In the first strategy we refine an XML partition having a high cost by series of XML data partition refinements with various levels of granularities from document, query, and subquery, up to node streams. The selection of the granularity level for refining a high cost partition is contextually dependent on the overall workload balance in the system. In the second strategy for dynamic data distribution, we dynamically handle low system utilization when there are many idle nodes in the system. We propose an XML data redistribution approach by partitioning XML data on the fly at the stream nodes-based granularity.

References

[1]
Niagara query engine. http://www.cs.wisc.edu/niagara.
[2]
Stanford university infolab. http://infolab.stanford.edu/pub/movies/dtd.html.
[3]
S. Al-Khalifa, H. V. Jagadish, J. M. Patel, Y. Wu, N. Koudas, and D. Srivastava. Structural Joins: A Primitive for Efficient XML Query Pattern Matching. In Proc. of the 18th International Conference on Data Engineering (ICDE'02), pages 141--, 2002.
[4]
T. Amagasa, K. Kido, and H. Kitagawa. Querying XML Data Using PC Cluster System. In Proc. of the International Workshops on XML Data Management Tools and Techniques (XANTEC'07), pages 5--9, 2007.
[5]
J.-M. Bremer and M. Gertz. On Distributing XML Repositories. In International Workshop on the Web and Databases (WebDB), pages 73--78, 2003.
[6]
N. Bruno, N. Koudas, and D. Srivastava. Holistic Twig Joins: Optimal XML Pattern Matching. In Proc. of the 2002 ACM SIGMOD International Conference on Management of Data (SIGMOD'02), pages 310--321, 2002.
[7]
T. Chen, J. Lu, and T. W. Ling. On Boosting Holism in XML Twig Pattern Matching Using Structural Indexing Techniques. In Proc. of the 24th ACM SIGMOD International Conference on Management of Data (SIGMOD'05), pages 455--466, 2005.
[8]
R. Goldman and J. Widom. DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proc. of 23rd International Conference on Very Large Data Bases (VLDB'97), pages 436--445, 1997.
[9]
G. Gou and R. Chirkova. Efficiently Querying Large XML Data Repositories: A Survey. IEEE Transactions on Knowledge and Data Engineering, 19(10):1381--1403, 2007.
[10]
R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the Integration of Structure Indexes and Inverted Lists. In Proc. of the 23rd ACM SIGMOD International Conference on Management of Data (SIGMOD'04), pages 779--790, 2004.
[11]
K. Kido, T. Amagasa, and H. Kitagawa. Processing XPath Queries in PC Clusters Using XML Data Partitioning. In Proc. of the 22nd ICDE Workshops, page 114, 2006.
[12]
H. Kurita, K. Hatano, J. Miyazaki, and S. Uemura. Efficient Query Processing for Large XML Data in Distributed Environments. In Proc. of the 21st International Conference on Advanced Networking and Applications (AINA'07), 2007.
[13]
Y. G. Li, S. Bressan, G. Dobbie, Z. Lacroix, M. L. Lee, U. Nambiar, and B. Wadhwa. XOO7: Applying OO7 Benchmark to XML Query Processing Tool. In Proc. of the 10th International Conference on Information and Knowledge Management, pages 167--174, New York, NY, USA, 2001.
[14]
J. Lu, T. Chen, and T. W. Ling. Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach. In Proc. of 13th International Conference on Information and Knowledge Management (CIKM'04), pages 533--542, 2004.
[15]
K. Lu, Y. Zhu, and W. Sun. Parallel Processing XML Documents. In Proc. of International Database Engineering and Applications Symposium (IDEAS), pages 96--105, 2002.
[16]
I. Machdi, T. Amagasa, and H. Kitagawa. GMX: An XML Data Partitioning Scheme for Holistic Twig Joins. In Proc. of 10th International Conference on Information Integration and Web-based Applications and Services (iiWAS'08), Linz, Austria, 2008.
[17]
R. Sakellariou and J. R. Gurd. Compile-time Minimisation of Load Imbalance in Loop Nests. In Proc. of the 11th International Conference on Supercomputing, pages 277--284, 1997.
[18]
A. Schmidt, F. Waas, M. Kersten, M. J. Carey, I. Manolescu, and R. Busse. XMark: A Benchmark for XML Data Management. In Proceedings of the 28th International Conference on Very Large Data Bases, pages 974--985. VLDB Endowment, 2002.
[19]
N. Tang, G. Wang, X. J. Yu, K.-F. Wong, and G. Yu. WIN: An Efficient Data Placement Strategy for Parallel XML Databases. In Proc. of 11th International Conference on Parallel and Distributed Systems (ICPADS), pages 249--355, 2005.
[20]
H. Wang, S. Park, W. Fan, and P. S. Yu. Vist: A Dynamic Index Method for Querying XML Data by Tree Structures. In Proc. of the 22nd ACM SIGMOD International Conference on Management of Data (SIGMOD'03), pages 110--121, 2003.
[21]
C. Zhang, J. F. Naughton, D. J. DeWitt, Q. Luo, and G. M. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proc. of the 2001 ACM SIGMOD International Conference on Management of data, pages 425--436, 2001.

Cited By

View all
  • (2017)Structural XML Query ProcessingACM Computing Surveys10.1145/309579850:5(1-41)Online publication date: 26-Sep-2017
  • (2017)Efficient Processing of Distributed Twig Queries Based on Node DistributionJournal of Computer Science and Technology10.1007/s11390-017-1707-132:1(78-92)Online publication date: 11-Jan-2017
  • (2016)TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce2016 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2016.79(133-140)Online publication date: Jun-2016
  • Show More Cited By

Index Terms

  1. XML data partitioning strategies to improve parallelism in parallel holistic twig joins

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ICUIMC '09: Proceedings of the 3rd International Conference on Ubiquitous Information Management and Communication
    February 2009
    704 pages
    ISBN:9781605584058
    DOI:10.1145/1516241
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. XML data partition
    2. holistic twig joins
    3. inter query parallelism
    4. intra query parallelism

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    ICUIMC '09
    Sponsor:

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2017)Structural XML Query ProcessingACM Computing Surveys10.1145/309579850:5(1-41)Online publication date: 26-Sep-2017
    • (2017)Efficient Processing of Distributed Twig Queries Based on Node DistributionJournal of Computer Science and Technology10.1007/s11390-017-1707-132:1(78-92)Online publication date: 11-Jan-2017
    • (2016)TwigStack-MR: An Approach to Distributed XML Twig Query Using MapReduce2016 IEEE International Congress on Big Data (BigData Congress)10.1109/BigDataCongress.2016.79(133-140)Online publication date: Jun-2016
    • (2015)Multi-Core Processing of XML Twig PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2014.234990727:4(1057-1070)Online publication date: 1-Apr-2015
    • (2014)A Survey on XML FragmentationACM SIGMOD Record10.1145/2694428.269443443:3(24-35)Online publication date: 4-Dec-2014
    • (2011)TwigPipe: A pipelining holistic solution of Twig Query2011 IEEE 3rd International Conference on Communication Software and Networks10.1109/ICCSN.2011.6014308(448-452)Online publication date: May-2011
    • (2010)Parallel holistic twig joins on a multi‐core systemInternational Journal of Web Information Systems10.1108/174400810110531316:2(149-177)Online publication date: 22-Jun-2010
    • (2009)Executing parallel TwigStack algorithm on a multi-core systemProceedings of the 11th International Conference on Information Integration and Web-based Applications & Services10.1145/1806338.1806376(176-184)Online publication date: 14-Dec-2009
    • (2009)XML data partitioning schemes for parallel holistic twig joinsInternational Journal of Web Information Systems10.1108/174400809109684455:2(151-194)Online publication date: 19-Jun-2009

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media