skip to main content
10.1145/1142351.1142352acmconferencesArticle/Chapter ViewAbstractPublication PagespodsConference Proceedingsconference-collections
Article

Principles of dataspace systems

Published:26 June 2006Publication History

ABSTRACT

The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient, integrated, or principled fashion. These challenges arise in enterprise and government data management, digital libraries, "smart" homes and personal information management. We have proposed dataspaces as a data management abstraction for these diverse applications and DataSpace Support Platforms (DSSPs) as systems that should be built to provide the required services over dataspaces. Unlike data integration systems, DSSPs do not require full semantic integration of the sources in order to provide useful services. This paper lays out specific technical challenges to realizing DSSPs and ties them to existing work in our field. We focus on query answering in DSSPs, the DSSP's ability to introspect on its content, and the use of human attention to enhance the semantic relationships in a dataspace.

References

  1. Shaul Dar aand Gadi Entin, Shai Geva, and Eran Palmon. DTL's dataspot: Database exploration using plain language. In Proc. of VLDB, pages 645--649, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. Dbxplorer: A system for keyword-based search over relational databases. In Proc. of ICDE, pages 5--16, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sihem Amer-Yahia, Nick Koudas, Amelie Marian, Divesh Srivastava, and David Toman. Structure and content scoring for xml. In Proc. of VLDB, pages 361--372, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent Query Answers in Inconsistent Databases. In Proc. of ACM PODS, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Barbará, H. Garcia-Molina, and D. Porter. The Management of Probabilistic Data. IEEE Trans. Knowl. Data Eng., 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. http://dbpubs.stanford.edu/pub/2005-39, 2005.]]Google ScholarGoogle Scholar
  8. D. Bhagwat, L. Chiticariu, W. Tan, and G. Vijayvargiya. An annotation management system for relational databases. Proc. of VLDB, 2004.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. of ICDE, pages 431--440, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Shawn Bowers, Lois M. L. Delcambre, and David Maier. Superimposed schematics: Introducing e-r structure for in-situ information selections. In ER, pages 90--104, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Buneman, S. Khanna, and W. Tan. Why and where: A charaterization of data provenance. Proc. of ICDT, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A.K. Chandra and P.M. Merlin. Optimal implementation of conjunctive queries in relational databases. In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, pages 77--90, 1977.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Surajit Chaudhuri, Raghu Ramakrishnan, and Gerhard Weikum. Integrating db and ir technologies: what is the sound of one hand clapping. In Proc. of CIDR, 2005.]]Google ScholarGoogle Scholar
  14. Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. VLDB Journal, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM TODS, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. In Proc. of VLDB, 2004.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. Dalvi and D. Suciu. Answering Queries from Statistics and Probabilistic Views. In Proc. of VLDB, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working Models for Uncertain Data. In Proc. of ICDE, April 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lois M. L. Delcambre, David Maier, Shawn Bowers, Mathew Weaver, Longxing Deng, Paul Gorman, Joan Ash, Mary Lavelle, and Jason Lyman. Bundles in captivity: An application of superimposed information. In Proc. of ICDE, pages 111--120, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Anhai Doan, Pedro Domingos, and Alon Halevy. Reconciling schemas of disparate data sources: a machine learning approach. In Proc. of SIGMOD, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Xin Dong and Alon Halevy. A Platform for Personal Information Management and Integration. In Proc. of CIDR, 2005.]]Google ScholarGoogle Scholar
  22. Xin (Luna) Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes, and Jun Zhang. Similarity search for web services. In Proc. of VLDB, 2004.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. T. Dumais, E. Cutrell, J. J. Cadiz E., G. Jancke, R. Sarin, and D. C. Robbins. Stuff i've seen: A system for personal information retrieval and re-use. In SIGIR, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: A new abstraction for information management. Sigmod Record, 34(4):27--33, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Ariel Fuxman, Elham Fazli, and Renee J. Miller. Conquer: efficient management of inconsistent databases. In Proc. of SIGMOD, pages 155--166, New York, NY, USA, 2005. ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Jim Gemmell, Roger Lueder, and Gordon Bell. Living with a lifetime store. In Workshop on Ubiquitous Experience Media, 2003.]]Google ScholarGoogle Scholar
  27. Lise Getoor and John Grant. Prl: A logical approach to probabilistic relational models. Machine Learning Journal, 62, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Google.com. Google base. base.google.com, 2005.]]Google ScholarGoogle Scholar
  29. G. Grahne. Dependency Satisfaction in Databases with Incomplete Information. In Proc. of VLDB, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In Proc. of SIGMOD, pages 16--27, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Alon Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4), 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Bin He and Kevin Chen-Chuan Chang. Statistical schema integration across the deep web. In Proc. of SIGMOD, 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. Efficient ir-style keyword search over relational databases. In Proc. of VLDB, pages 850--861, 2003.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Imielinski and W. Lipski Jr. Incomplete Information in Relational Databases. Journal of the ACM, 1984.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. G. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Orchestra: Rapid, collaborative sharing of dynamic data. In Proc. of CIDR, 2005.]]Google ScholarGoogle Scholar
  36. Phokion Kolaitis. Schema mappings, data exchange, and metadata management. In Proc. of ACM PODS, pages 61--75, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. Koller and A. Pfeffer. Probabilistic frame-based systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 580--587, Madison, WI, 1998. AAAI Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. L. V. S. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. ProbView: A Flexible Probabilistic Database System. ACM TODS, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Y. Levy, R. E. Fikes, and S. Sagiv. Speeding up inferences using relevance reasoning: A formalism and algorithms. Artificial Intelligence, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Querying heterogeneous information sources using source descriptions. In Proc. of VLDB, pages 251--262, Bombay, India, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Jayant Madhavan, Philip A. Bernstein, AnHai Doan, and Alon Halevy. Corpus-based schema matching. In Proc. of ICDE, pages 57--68, 2005.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David Maier and Lois M. L. Delcambre. Superimposed information for the internet. In WebDB, pages 1--9, 1999.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. R. McCann, A. Doan, A. Kramnik, and V. Varadarajan. Building data integration systems via mass collaboration. In Proc. of the SIGMOD-03 Workshop on the Web and Databases (WebDB-03), 2003.]]Google ScholarGoogle Scholar
  45. Sudarshan Murthy, Lois M. L. Delcambre, David Maier, and Shawn Bowers. Putting integrated information in context: Superimposing conceptual models with sparce. In APCCM, pages 71--80, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Sudarshan Murthy, David Maier, and Lois M. L. Delcambre. Querying bi-level information. In WebDB, pages 7--12, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Dennis Quan, David Huynh, and David R. Karger. Haystack: a platform for authoring end user semantic web applications. In ISWC, 2003.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In SIGKDD, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nicholas E. Taylor and Zachary G. Ives. Reconciling while tolerating disagreement in collaborative data sharing. In Proc. of SIGMOD, 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of ACM CHI, Vienna, Austria, 2004.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. J. Widom. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. In Proc. of CIDR, 2005.]]Google ScholarGoogle Scholar

Index Terms

  1. Principles of dataspace systems

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
                June 2006
                382 pages
                ISBN:1595933182
                DOI:10.1145/1142351

                Copyright © 2006 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 26 June 2006

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                PODS '06 Paper Acceptance Rate35of185submissions,19%Overall Acceptance Rate642of2,707submissions,24%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader