Abstract
The need to manage diverse information sources has triggered the rise of very loosely structured data models, known as dataspace models. Such information management systems must allow querying in simple ways, mostly by a form of searching. Motivated by these developments, we propose a theory of search queries in a general model of dataspaces. In this model, a dataspace is a collection of data objects, where each data object is a collection of data items. Basic search queries are expressed using filters on data items, following the basic model of Boolean search in information retrieval. We characterize semantically the class of queries that can be expressed by searching. We apply our theory to classical relational databases, where we connect search queries to the known class of fully generic queries, and to dataspaces where data items are formed by attribute-value pairs. We also extend our theory to a more powerful, associative form of searching, where one can ask for objects that are similar to objects satisfying given search conditions. Such associative search queries are shown to correspond to a very limited kind of joins. We show that the basic search language extended with associative search can exactly define the queries definable in a restricted fragment of the semijoin algebra working on an explicit relational representation of the dataspace.
- Abiteboul, S., Hull, R., and Vianu, V. 1995. Foundations of Databases. Addison-Wesley. Google ScholarDigital Library
- Agrawal, R., Somani, A., and Xu, Y. 2001. Storage and querying of e-commerce data. In Proceedings of the 27th International Conference on Very Large Data Bases. 149--158. Google ScholarDigital Library
- Aho, A., and Ullman, J. 1979. Universality of data retrieval languages. In Conference Record of the 6th ACM Symposium on Principles of Programming Languages. 110--120. Google ScholarDigital Library
- Alkhateeb, F., Baget, J.-F., and Euzenat, J. 2009. Extending SPARQL with regular expression patterns (for querying RDF). J. Web Sem. 7, 2, 57--73. Google ScholarDigital Library
- Beeri, C., Milo, T., and Ta-Shma, P. 1996. On genericity and parametricity. In Proceedings of the 15th ACM Symposium on Principles of Database Systems. 104--116. Google ScholarDigital Library
- Beeri, C., Milo, T., and Ta-Shma, P. 1997. Towards a language for the fully generic queries. In Database Programming Languages, S. Cluet and R. Hull, Eds. Lecture Notes in Computer Science. Springer, 239--259. Google ScholarDigital Library
- Blackburn, P., de Rijke, M., and Venema, Y. 2001. Modal Logic. Cambridge University Press. Google ScholarDigital Library
- Blackburn, P., van Benthem, J., and Wolter, F., Eds. 2007. Handbook of Modal Logic. Elsevier.Google Scholar
- Chandra, A., and Harel, D. 1980. Computable queries for relational data bases. J. Comput. Syst. Sci. 21, 2, 156--178.Google ScholarCross Ref
- DeCandia, G., Hastorun, D. Jampani, M., Kakulapati, G., Lakshman, A., Pilahin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. 2007. Dynamo: Amazon's highly available key-value store. SIGOPS Oper. Syst. Rev. 41, 6, 205--220. Google ScholarDigital Library
- Dittrich, J.-P., and Vaz Salles, M. 2006. iDM: A unified and versatile data model for personal dataspace management. In Proceedings of the 32nd International Conference on Very Large Data Bases. 367--378. Google ScholarDigital Library
- Dong, X., and Halevy, A. 2007. Indexing dataspaces. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 43--54. Google ScholarDigital Library
- Fletcher, G. H. L. 2008. An algebra for basic graph patterns. Logic in Databases (LID). Informal Proceedings.Google Scholar
- Fletcher, G. H. L., Van den Bussche, J., Van Gucht, D., and Vansummeren, S. 2009. Towards a theory of search queries. In Proceedings of the 12th International Conference on Database Theory. 201--211. Google ScholarDigital Library
- Golenberg, K., Kimelfeld, B., and Sagiv, Y. 2008. Keyword proximity search in complex data graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 927--940. Google ScholarDigital Library
- Goranko, V., and Otto, M. 2007. Model theory of modal logic. in Handbook of Hoda Logic, P. Blackburn, J. van Benthem, and F. Wolter, Ed., Chapter 5.Google Scholar
- Gutierrez, C., Hurtado, C., and Mendelzon, A. 2004. Foundations of semantic Web databases. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems. 95--106. Google ScholarDigital Library
- Halevy, A., Franklin, M., and Maier, D. 2006. Principles of dataspace systems. In Proceedings of the 25th ACM Symposium on Principles of Database Systems. 1--9. Google ScholarDigital Library
- Jagadish, H., Chapman A., Elkiss, A., Jayapandian, M., Li, Y., Nandi, A., and Yu, C. 2007. Making database systems usable. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 13--24. Google ScholarDigital Library
- Jain, M., Mendhekar, A., and Van Gucht, D. 1995. A uniform data model for relational data and meta-data query processing. In Advances in Data Management '95, Tata McGraw-Hill, 146--165.Google Scholar
- Leinders, D., Marx, M., Tyszkiewicz, J., and Van den Bussche, J. 2005. The semijoin algebra and the guarded fragment. J. Logic Lang. Inform. 14, 331--343. Google ScholarDigital Library
- Leinders, D., and Van den Bussche, J. 2007. On the complexity of division and set joins in the relational algebra. J. Comput. Syst. Sci. 73, 4, 538--549. Google ScholarDigital Library
- Litwin, W., Ketabchi, M., and Krishnamurthy, R. 1991. First order normal form for relational databases and multidatabases. SIGMOD Rec. 20, 4, 74--76. Google ScholarDigital Library
- Manning, C., Raghavan, P., and Shütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press. Google ScholarDigital Library
- Olston, C., Reed, B., Srivastava, U., Kumar, R., and Tomkins, A. 2008. Pig latin: a not-so-foreign language for data processing. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 1099--1110. Google ScholarDigital Library
- Pérez, J., Arenas, M., and Gutierrez, C. 2008. nSPARQL: A navigational language for RDF. In Proceedings of the International Semantic Web Conference. 66--81. Google ScholarDigital Library
- Pérez, J., Arenas, M., and Gutierrez, C. 2009. Semantics and complexity of SPARQL. ACM Trans. Datab. Syst. 34, 3. Google ScholarDigital Library
- Qin, L., Yu, J.X., and Chang, L. 2009. Keyword search in databases: the power of RDBMS. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 681--694. Google ScholarDigital Library
- Ross, K., and Janevski, A. 2005. Querying faceted databases. In Proceedings of the 2nd International Workshop on Semantic Web and Databases. Lecture Notes in Computer Science, vol. 3372, Springer, 199--218. Google ScholarDigital Library
- Sarawagi, S., and Kirpal, A. 2004. Efficient set joins on similarity predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 743--754. Google ScholarDigital Library
- Ullman, J. 1989. Principles of Database and Knowledge-Base Systems. Vol. II. Computer Science Press. Google ScholarDigital Library
- W3C. 2004. RDF primer. W3C Recommendation.Google Scholar
- W3C. 2008. SPARQL query language for RDF. W3C Recommendation.Google Scholar
Index Terms
- Towards a theory of search queries
Recommendations
Towards a theory of search queries
ICDT '09: Proceedings of the 12th International Conference on Database TheoryThe need to manage diverse information sources has triggered the rise of very loosely structured data models, known as "dataspace models." Such information management systems must allow querying in simple ways, mostly by a form of searching. Motivated ...
Materialization and Decomposition of Dataspaces for Efficient Search
Dataspaces consist of large-scale heterogeneous data. The query interface of accessing tuples should be provided as a fundamental facility by practical dataspace systems. Previously, an efficient index has been proposed for queries with keyword ...
Identifying regional sensitive queries in web search
WWW '08: Proceedings of the 17th international conference on World Wide WebIn Web search ranking, the expected results for some queries could vary greatly depending upon location of the user. We name such queries regional sensitive queries. Identifying regional sensitivity of queries is important to meet users' needs. The ...
Comments