skip to main content
research-article

Return specification inference and result clustering for keyword search on XML

Published: 03 May 2010 Publication History

Abstract

Keyword search enables Web users to easily access XML data without the need to learn a structured query language and to study possibly complex data schemas. Existing work has addressed the problem of selecting qualified data nodes that match keywords and connecting them in a meaningful way, in the spirit of inferring the where clause in XQuery. However, how to infer the return clause for keyword searches is an open problem.
To address this challenge, we present a keyword search engine for data-centric XML, XSeek, to infer the semantics of the search and identify return nodes effectively. XSeek recognizes possible entities and attributes inherently represented in the data. It also distinguishes between predicates and return specifications in query keywords. Then based on the analysis of both XML data structures and keyword patterns, XSeek generates return nodes. Furthermore, when the query is ambiguous and it is hard or impossible to determine the desirable return information, XSeek clusters the query results according to their semantics based on the user-specified granularity, and enables the user to easily browse and select the desired ones. Extensive experimental studies show the effectiveness and efficiency of XSeek.

References

[1]
Aggarwal, C. C., Ta, N., Wang, J., Feng, J., and Zaki, M. 2007. XPro j: A framework for projected structural clustering of XML documents. In Proceedings of the SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'07).
[2]
Agrawal, S., Chakrabarti, K., Chaudhuri, S., Ganti, V., Konig, A. C., and Xin, D. 2009. Query portals: Dynamically generating portals for Web. In Proceedings of the International World Wide Web Conference (WWW'09).
[3]
Bao, Z., Ling, T. W., Chen, B., and Lu, J. 2009. Effective xml keyword search with relevance oriented ranking. In Proceedings of the International Conference on Data Engineering (ICDE'09).
[4]
Barg, M. and Wong, R. K. 2001. Structural proximity searching for large collections of semi-structured data. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'01).
[5]
Bhalotia, G., Nakhe, C., Hulgeri, A., Chakrabarti, S., and Sudarshan, S. 2002. Keyword searching and browsing in databases using BANKS. In Proceedings of the International Conference on Data Engineering (ICDE'02).
[6]
Carmel, D., Maarek, Y., Mass, Y., Efraty, N., and Landau, G. 2002. An extension of the vector space model for querying XML documents via XML fragments. In Proceedings of the SIGIR Workshop on XML and Information Retrieval.
[7]
Cheng, T., Yan, X., and Chang, K. C.-C. 2007. EntityRank: Searching entities directly and holistically. In Proceedings of the International Conference on Very Large Databases (VLDB'07).
[8]
Clarke, C. L. A. 2005. Controlling overlap in content-oriented XML retrieval. In SIGIR.
[9]
Cohen, S., Mamou, J., Kanza, Y., and Sagiv, Y. 2003. XSEarch: A semantic search engine for XML. In Proceedings of the International Conference on Very Large Databases (VLDB'03).
[10]
Dalamagas, T., Cheng, T., Winkel, K.-J., and Sellis, T. K. 2004. Clustering XML documents using structural summaries. In Proceedings of the International Conference on Extending Database Technology (EDBT'04) Workshops.
[11]
Doucet, A. and Ahonen-Myka, H. 2002. Naive clustering of a large XML document ollection. In Proceedings of the Initiative for the Evaluation of XML Retrieval (INEX'02) Workshop. 81--87.
[12]
Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[13]
Hristidis, V., Koudas, N., Papakonstantinou, Y., and Srivastava, D. 2006. Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Engin. 18, 4.
[14]
Hristidis, V., Papakonstantinou, Y., and Balmin, A. 2003. Keyword proximity search on XML graphs. In Proceedings of the International Conference on Data Engineering (ICDE'03).
[15]
Huang, Y., Liu, Z., and Chen, Y. 2008. Query biased snippet generation in XML search. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[16]
Kamps, J., de Rijke, M., and Sigurbjörnsson, B. 2004. Length normalization in XML retrieval. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.
[17]
Kazai, G., Lalmas, M., and de Vries, A. P. 2004. The overlap problem in content-oriented XML retrieval evaluation. In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.
[18]
Kong, L., Gilleron, R., and Lemay, A. 2009. Retrieving meaningful relaxed tightest fragments for XML keyword search. In Proceedings of the International Conference on Extending Database Technology (EDBT'09).
[19]
Koutrika, G., Simitsis, A., and Ioannidis, Y. E. 2006. Precis: The essence of a query answer. In Proceedings of the International Conference on Data Engineering (ICDE'06).
[20]
Lee, M. L., Yang, L. H., Hsu, W., and Yang, X. 2002. XClust: Clustering XML schemas for effective integration. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'02).
[21]
Li, G., Feng, J., Wang, J., and Zhou, L. 2007. Effective keyword search for valuable LCAs over XML documents. In Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM'07).
[22]
Li, Y., Yu, C., and Jagadish, H. V. 2004. Schema-Free XQuery. In Proceedings of the International Conference on Very Large Databases (VLDB'04).
[23]
Lian, W., lok Cheung, D. W., Mamoulis, N., and Yiu, S.-M. 2004. An efficient and scalable algorithm for clustering XML documents by structure. IEEE Trans. Knowl. Data Engin. 16, 1, 82--96.
[24]
Liu, Z. and Chen, Y. 2007. Identifying meaningful return information for XML keyword search. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[25]
Liu, Z. and Chen, Y. 2008. Reasoning and identifying relevant matches for XML keyword search. In Proceedings of the International Conference on Very Large Databases (VLDB'08).
[26]
Liu, Z. and Chen, Y. 2010. Return specification inference and result clustering for keyword search on XML. Tech. rep. TR-10-003, Arizona State University.
[27]
Liu, Z., Sun, P., and Chen, Y. 2009. Structured search result differentiation. In Proceedings of the International Conference on Very Large Databases (VLDB'09).
[28]
Nierman, A. and Jagadish, H. V. 2002. Evaluating structural similarity in XML documents. In Proceedings of the International Workshop on Web and Databases (WebDB'02).
[29]
Ogilvie, P. and Callan, J. 2003. Using language models for flat text queries in XML retrieval. In Proceedings of the Initiative for the Evaluation of XML Retrieval (INEX'03) Workshop.
[30]
Piwowarski, B. and Dupret, G. 2006. Evaluation in (XML) information retrieval: Expected precision-recall with user modelling (EPRUM). In Proceedings of the Annual ACM SIGIR Conference on Research and Development in Information Retrieval.
[31]
Sun, C., Chan, C.-Y., and Goenka, A. 2007. Multiway SLCA-based keyword search in XML data. In Proceeedings of the International World Wide Web Conference (WWW'07).
[32]
Tagarelli, A. and Greco, S. 2006. Toward semantic XML clustering. In Proceedings of the SIAM International Conference on Data Mining (SDM'06).
[33]
Wang, J. T. L., Liu, J., and Wang, J. 2005. XML clustering and retrieval through principal component analysis. Int. J. Artif. Intell. Tools 14, 4, 683.
[34]
Wang, T., xin Liu, D., and Lin, X.-Z. 2006. XML document clustering by independent component analysis. In Proceedings of the International Workshop on Knowledge Discovery from XML Documents (KDXD'06).
[35]
Xing, G., Guo, J., and Xia, Z. 2006. Classifying XML documents based on structure/content similarity. In Proceedings of the Initiative for the Evaluation of XML Retrieval Workshop (INEX'06).
[36]
Xing, G., Xia, Z., and Guo, J. 2007. Clustering XML documents based on structural similarity. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA'07).
[37]
Xu, Y. and Papakonstantinou, Y. 2005. Efficient keyword search for smallest LCAs in XML databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data.
[38]
Xu, Y. and Papakonstantinou, Y. 2008. Efficient LCA based keyword search in XML data. In Proceedings of the International Conference on Extending Database Technology (EDBT'08).

Cited By

View all

Index Terms

  1. Return specification inference and result clustering for keyword search on XML

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 35, Issue 2
    April 2010
    336 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/1735886
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 May 2010
    Accepted: 01 January 2010
    Revised: 01 December 2009
    Received: 01 September 2008
    Published in TODS Volume 35, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. XML
    2. keyword search
    3. result clustering

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Key-core: cohesive keyword subgraph exploration in large graphsWorld Wide Web10.1007/s11280-021-00926-yOnline publication date: 6-Aug-2021
    • (2018)KlustreeProceedings of the ACM India Joint International Conference on Data Science and Management of Data10.1145/3152494.3152509(265-272)Online publication date: 11-Jan-2018
    • (2018)Processing keyword search on XMLWorld Wide Web10.1007/s11280-011-0128-214:5-6(671-707)Online publication date: 25-Dec-2018
    • (2017)Constructing target-aware results for keyword search on knowledge graphsData & Knowledge Engineering10.1016/j.datak.2017.02.001110(1-23)Online publication date: Jul-2017
    • (2016)Keyword query with structureInformation Technology and Management10.1007/s10799-015-0247-z17:2(151-163)Online publication date: 1-Jun-2016
    • (2015)Reasoning with patterns to effectively answer XML keyword queriesThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-015-0384-324:3(441-465)Online publication date: 1-Jun-2015
    • (2014)Finding patterns in a knowledge base using keywords to compose table answersProceedings of the VLDB Endowment10.14778/2733085.27330887:14(1809-1820)Online publication date: 1-Oct-2014
    • (2014)Clustering Query Results to Support Keyword Search on Tree DataWeb-Age Information Management10.1007/978-3-319-08010-9_24(213-224)Online publication date: 2014
    • (2013)Summarizing answer graphs induced by keyword queriesProceedings of the VLDB Endowment10.14778/2556549.25565616:14(1774-1785)Online publication date: 1-Sep-2013
    • (2013)MESSIAHProceedings of the 2013 ACM SIGMOD International Conference on Management of Data10.1145/2463676.2463699(37-48)Online publication date: 22-Jun-2013
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media