skip to main content
research-article

Differentiating search results on structured data

Published: 06 March 2012 Publication History

Abstract

Studies show that about 50% of Web search is for information exploration purposes, where a user would like to investigate, compare, evaluate, and synthesize multiple relevant results. Due to the absence of general tools that can effectively analyze and differentiate multiple results, a user has to manually read and comprehend potential large results in an exploratory search. Such a process is time consuming, labor intensive and error prone. Interestingly, we find that the metadata information embedded in structured data provides a potential for automating or semi-automating the comparison of multiple results.
In this article we present an approach for structured data search result differentiation. We define the differentiability of query results and quantify the degree of difference. Then we define the problem of identifying a limited number of valid features in a result that can maximally differentiate this result from the others, which is proved NP-hard. We propose two local optimality conditions, namely single-swap and multi-swap, and design efficient algorithms to achieve local optimality. We then present a feature type-based approach, which further improves the quality of the features identified for result differentiation. To show the usefulness of our approach, we implemented a system CompareIt, which can be used to compare structured search results as well as any objects. Our empirical evaluation verifies the effectiveness and efficiency of the proposed approach.

References

[1]
Bao, Z., Ling, T. W., Chen, B., and Lu, J. 2009. Effective XML keyword search with relevance oriented ranking. In Proceedings of the International Conference on Data Engineering (ICDE).
[2]
Barg, M. and Wong, R. K. 2001. Structural proximity searching for large ccollections of semi-structured data. In Proceedings of the CIKM Conference. ACM Press, New York, 175--182.
[3]
Broder, A. 2002. A taxonomy of web search. ACM SIGIR Forum 36, 2, 3--10.
[4]
Chakrabarti, K., Chaudhuri, S., and won Hwang, S. 2004. Automatic categorization of query results. In Proceedings of the ACM SIGMOD Conference on Management of Data. 755--766.
[5]
Chen, Z. and Li, T. 2007. Addressing diverse user preferences in SQL-query-result navigation. In Proceedings of the ACM SIGMOD Conference on Management of Data. 641--652.
[6]
Cohen, S., Mamou, J., Kanza, Y., and Sagiv, Y. 2003. XSEarch: A semantic search engine for XML. http:www.vldb.org/conf/2003/papers/S.3P.2.pdf.
[7]
Das, G., Hristidis, V., Kapoor, N., and Sudarshan, S. 2006. Ordering the attributes of query results. In Proceedings of the ACM SIGMOD Conference on Management of Data. 395--406.
[8]
Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the ACM SIGMOD on Management of Data.
[9]
Hristidis, V., Koudas, N., Papakonstantinou, Y., and Srivastava, D. 2006. Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Engin. 18, 4.
[10]
Huang, Y., Liu, Z., and Chen, Y. 2008. Query biased snippet generation in XML search. In Proceedings of the ACM SIGMOD on Management of Data.
[11]
Kashyap, A., Hristidis, V., and Petropoulos, M. 2010. FACeTOR: Cost-driven exploration of faceted query results. In Proceedings of the CIKM Conference.
[12]
Kong, L., Gilleron, R., and Lemay, A. 2009. Retrieving meaningful relaxed tightest fragments for xml keyword search. In Proceedings of the ACM EDBT Conference. 815--826.
[13]
Kullback, S. 1987. The Kullback-Leibler distance. In The American Statistician.
[14]
Li, C., Yan, N., Roy, S. B., Lisham, L., and Das, G. 2010. Facetedpedia: Dynamic generation of query-dependent faceted interfaces for wikipedia. In Proceedings of the International Conference on World Wide Web (WWW). 651--660.
[15]
Li, G., Feng, J., Wang, J., and Zhou, L. 2007. Effective keyword search for valuable LCAs over XML documents. In Proceedings of the CIKM Conference.
[16]
Li, G., Ooi, B. C., Feng, J., Wang, J., and Zhou, L. 2008. EASE: Efficient and adaptive keyword search on unstructured, semi-structured and structured data. In Proceedings of the ACM SIGMOD on Management of Data.
[17]
Li, Y., Yu, C., and Jagadish, H. V. 2004. Schema-Free XQuery. In Proceedings of the International Conference on Very Large Database (VLDB).
[18]
Liu, Z. and Chen, Y. 2007. Identifying meaningful return information for XML keyword search. In Proceedings of the ACM SIGMOD on Management of Data.
[19]
Liu, Z. and Chen, Y. 2008. Reasoning and identifying relevant matches for XML keyword search. In Proceedings of the International Conference on Very Large Database (VLDB).
[20]
Liu, Z., Huang, Y., and Chen, Y. 2010. Improving XML search by generating and utilizing informative result snippets. ACM Trans. Datab. Syst. 35, 3.
[21]
Liu, Z., Sun, P., and Chen, Y. 2009. Structured search result differentiation. Proc. VLDB 2, 1, 313--324.
[22]
Miah, M., Das, G., Hristidis, V., and Mannila, H. 2008. Standing out in a crowd: Selecting attributes for maximum visibility. In Proceedings of the International Conference on Data Engineering (ICDE). IEEE, 356--365.
[23]
Rubner, Y., Tomasi, C., and Guibas, L. J. 1998. A metric for distributions with applications to image databases. In Proceedings of the International Conference on Computer Vision (ICCV). 59--66.
[24]
Sun, C., Chan, C.-Y., and Goenka, A. 2007. Multiway SLCA-based keyword search in XML data. In Proceedings of the International Conference on Data Engineering (WWW).
[25]
Xu, Y. and Papakonstantinou, Y. 2005. Efficient keyword search for smallest LCAs in XML databases. In Proceedings of the ACM SIGMOD on Management of Data.

Cited By

View all
  • (2022)Weighted Aggregating Stochastic Gradient Descent for Parallel Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304789434:10(5037-5050)Online publication date: 1-Oct-2022
  • (2019)Fast and Practical Snippet Generation for RDF DatasetsACM Transactions on the Web10.1145/336557513:4(1-38)Online publication date: 16-Nov-2019
  • (2019)Refining Image Search Results using Multiple Attributes2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9005685(1080-1087)Online publication date: Dec-2019
  • Show More Cited By

Index Terms

  1. Differentiating search results on structured data

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Database Systems
      ACM Transactions on Database Systems  Volume 37, Issue 1
      February 2012
      268 pages
      ISSN:0362-5915
      EISSN:1557-4644
      DOI:10.1145/2109196
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 06 March 2012
      Accepted: 01 October 2011
      Revised: 01 May 2011
      Received: 01 April 2010
      Published in TODS Volume 37, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Keyword search
      2. XML data
      3. comparison
      4. databases
      5. differentiation
      6. result analysis
      7. structured data

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)5
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Weighted Aggregating Stochastic Gradient Descent for Parallel Deep LearningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.304789434:10(5037-5050)Online publication date: 1-Oct-2022
      • (2019)Fast and Practical Snippet Generation for RDF DatasetsACM Transactions on the Web10.1145/336557513:4(1-38)Online publication date: 16-Nov-2019
      • (2019)Refining Image Search Results using Multiple Attributes2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9005685(1080-1087)Online publication date: Dec-2019
      • (2018)Diverse Set Selection Over Dynamic DataIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.4426:5(1102-1116)Online publication date: 31-Dec-2018
      • (2017)Generating Illustrative Snippets for Open Data on the WebProceedings of the Tenth ACM International Conference on Web Search and Data Mining10.1145/3018661.3018670(151-159)Online publication date: 2-Feb-2017
      • (2015)A Randomized Trial Comparing Vaginal and Laparoscopic Hysterectomy vs Robot-Assisted HysterectomyJournal of Minimally Invasive Gynecology10.1016/j.jmig.2014.07.01022:1(78-86)Online publication date: Jan-2015
      • (2014)Breaking out of the MisMatch trap2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816713(940-951)Online publication date: Mar-2014
      • (2014)Distributed Diversification of Large DatasetsProceedings of the 2014 IEEE International Conference on Cloud Engineering10.1109/IC2E.2014.19(67-76)Online publication date: 11-Mar-2014
      • (2014)Update on Robotic Versus Laparoscopic Sacrocolpopexy: Outcomes and CostsCurrent Obstetrics and Gynecology Reports10.1007/s13669-014-0099-23:4(252-264)Online publication date: 21-Oct-2014
      • (2013)A randomized trial comparing conventional and robotically assisted total laparoscopic hysterectomyAmerican Journal of Obstetrics and Gynecology10.1016/j.ajog.2013.02.008208:5(368.e1-368.e7)Online publication date: May-2013
      • Show More Cited By

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media