ABSTRACT
The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient, integrated, or principled fashion. These challenges arise in enterprise and government data management, digital libraries, "smart" homes and personal information management. We have proposed dataspaces as a data management abstraction for these diverse applications and DataSpace Support Platforms (DSSPs) as systems that should be built to provide the required services over dataspaces. Unlike data integration systems, DSSPs do not require full semantic integration of the sources in order to provide useful services. This paper lays out specific technical challenges to realizing DSSPs and ties them to existing work in our field. We focus on query answering in DSSPs, the DSSP's ability to introspect on its content, and the use of human attention to enhance the semantic relationships in a dataspace.
- Shaul Dar aand Gadi Entin, Shai Geva, and Eran Palmon. DTL's dataspot: Database exploration using plain language. In Proc. of VLDB, pages 645--649, 1998.]] Google ScholarDigital Library
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.]] Google ScholarDigital Library
- Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. Dbxplorer: A system for keyword-based search over relational databases. In Proc. of ICDE, pages 5--16, 2002.]] Google ScholarDigital Library
- Sihem Amer-Yahia, Nick Koudas, Amelie Marian, Divesh Srivastava, and David Toman. Structure and content scoring for xml. In Proc. of VLDB, pages 361--372, 2005.]] Google ScholarDigital Library
- M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent Query Answers in Inconsistent Databases. In Proc. of ACM PODS, 1999.]] Google ScholarDigital Library
- D. Barbará, H. Garcia-Molina, and D. Porter. The Management of Probabilistic Data. IEEE Trans. Knowl. Data Eng., 1992.]] Google ScholarDigital Library
- O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. http://dbpubs.stanford.edu/pub/2005-39, 2005.]]Google Scholar
- D. Bhagwat, L. Chiticariu, W. Tan, and G. Vijayvargiya. An annotation management system for relational databases. Proc. of VLDB, 2004.]]Google ScholarDigital Library
- Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. of ICDE, pages 431--440, 2002.]] Google ScholarDigital Library
- Shawn Bowers, Lois M. L. Delcambre, and David Maier. Superimposed schematics: Introducing e-r structure for in-situ information selections. In ER, pages 90--104, 2002.]] Google ScholarDigital Library
- P. Buneman, S. Khanna, and W. Tan. Why and where: A charaterization of data provenance. Proc. of ICDT, 2001.]] Google ScholarDigital Library
- A.K. Chandra and P.M. Merlin. Optimal implementation of conjunctive queries in relational databases. In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, pages 77--90, 1977.]] Google ScholarDigital Library
- Surajit Chaudhuri, Raghu Ramakrishnan, and Gerhard Weikum. Integrating db and ir technologies: what is the sound of one hand clapping. In Proc. of CIDR, 2005.]]Google Scholar
- Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. VLDB Journal, 2003.]] Google ScholarDigital Library
- Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM TODS, 2000.]] Google ScholarDigital Library
- N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. In Proc. of VLDB, 2004.]]Google ScholarDigital Library
- N. Dalvi and D. Suciu. Answering Queries from Statistics and Probabilistic Views. In Proc. of VLDB, 2005.]] Google ScholarDigital Library
- A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working Models for Uncertain Data. In Proc. of ICDE, April 2006.]] Google ScholarDigital Library
- Lois M. L. Delcambre, David Maier, Shawn Bowers, Mathew Weaver, Longxing Deng, Paul Gorman, Joan Ash, Mary Lavelle, and Jason Lyman. Bundles in captivity: An application of superimposed information. In Proc. of ICDE, pages 111--120, 2001.]] Google ScholarDigital Library
- Anhai Doan, Pedro Domingos, and Alon Halevy. Reconciling schemas of disparate data sources: a machine learning approach. In Proc. of SIGMOD, 2001.]] Google ScholarDigital Library
- Xin Dong and Alon Halevy. A Platform for Personal Information Management and Integration. In Proc. of CIDR, 2005.]]Google Scholar
- Xin (Luna) Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes, and Jun Zhang. Similarity search for web services. In Proc. of VLDB, 2004.]]Google ScholarDigital Library
- S. T. Dumais, E. Cutrell, J. J. Cadiz E., G. Jancke, R. Sarin, and D. C. Robbins. Stuff i've seen: A system for personal information retrieval and re-use. In SIGIR, 2003.]] Google ScholarDigital Library
- M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: A new abstraction for information management. Sigmod Record, 34(4):27--33, 2005.]] Google ScholarDigital Library
- Ariel Fuxman, Elham Fazli, and Renee J. Miller. Conquer: efficient management of inconsistent databases. In Proc. of SIGMOD, pages 155--166, New York, NY, USA, 2005. ACM Press.]] Google ScholarDigital Library
- Jim Gemmell, Roger Lueder, and Gordon Bell. Living with a lifetime store. In Workshop on Ubiquitous Experience Media, 2003.]]Google Scholar
- Lise Getoor and John Grant. Prl: A logical approach to probabilistic relational models. Machine Learning Journal, 62, 2006.]] Google ScholarDigital Library
- Google.com. Google base. base.google.com, 2005.]]Google Scholar
- G. Grahne. Dependency Satisfaction in Databases with Incomplete Information. In Proc. of VLDB, 1984.]] Google ScholarDigital Library
- Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In Proc. of SIGMOD, pages 16--27, 2003.]] Google ScholarDigital Library
- Alon Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4), 2001.]] Google ScholarDigital Library
- Bin He and Kevin Chen-Chuan Chang. Statistical schema integration across the deep web. In Proc. of SIGMOD, 2003.]] Google ScholarDigital Library
- Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. Efficient ir-style keyword search over relational databases. In Proc. of VLDB, pages 850--861, 2003.]]Google ScholarDigital Library
- T. Imielinski and W. Lipski Jr. Incomplete Information in Relational Databases. Journal of the ACM, 1984.]] Google ScholarDigital Library
- Z. G. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Orchestra: Rapid, collaborative sharing of dynamic data. In Proc. of CIDR, 2005.]]Google Scholar
- Phokion Kolaitis. Schema mappings, data exchange, and metadata management. In Proc. of ACM PODS, pages 61--75, 2005.]] Google ScholarDigital Library
- D. Koller and A. Pfeffer. Probabilistic frame-based systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 580--587, Madison, WI, 1998. AAAI Press.]] Google ScholarDigital Library
- L. V. S. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. ProbView: A Flexible Probabilistic Database System. ACM TODS, 1997.]] Google ScholarDigital Library
- Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS, 2002.]] Google ScholarDigital Library
- A. Y. Levy, R. E. Fikes, and S. Sagiv. Speeding up inferences using relevance reasoning: A formalism and algorithms. Artificial Intelligence, 1997.]] Google ScholarDigital Library
- Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Querying heterogeneous information sources using source descriptions. In Proc. of VLDB, pages 251--262, Bombay, India, 1996.]] Google ScholarDigital Library
- Jayant Madhavan, Philip A. Bernstein, AnHai Doan, and Alon Halevy. Corpus-based schema matching. In Proc. of ICDE, pages 57--68, 2005.]] Google ScholarDigital Library
- David Maier and Lois M. L. Delcambre. Superimposed information for the internet. In WebDB, pages 1--9, 1999.]]Google ScholarDigital Library
- R. McCann, A. Doan, A. Kramnik, and V. Varadarajan. Building data integration systems via mass collaboration. In Proc. of the SIGMOD-03 Workshop on the Web and Databases (WebDB-03), 2003.]]Google Scholar
- Sudarshan Murthy, Lois M. L. Delcambre, David Maier, and Shawn Bowers. Putting integrated information in context: Superimposing conceptual models with sparce. In APCCM, pages 71--80, 2004.]] Google ScholarDigital Library
- Sudarshan Murthy, David Maier, and Lois M. L. Delcambre. Querying bi-level information. In WebDB, pages 7--12, 2004.]] Google ScholarDigital Library
- Dennis Quan, David Huynh, and David R. Karger. Haystack: a platform for authoring end user semantic web applications. In ISWC, 2003.]]Google ScholarDigital Library
- S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In SIGKDD, 2002.]] Google ScholarDigital Library
- Nicholas E. Taylor and Zachary G. Ives. Reconciling while tolerating disagreement in collaborative data sharing. In Proc. of SIGMOD, 2006.]] Google ScholarDigital Library
- Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of ACM CHI, Vienna, Austria, 2004.]] Google ScholarDigital Library
- J. Widom. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. In Proc. of CIDR, 2005.]]Google Scholar
Index Terms
- Principles of dataspace systems
Recommendations
Conceptual modeling of XML schemas
WIDM '03: Proceedings of the 5th ACM international workshop on Web information and data managementXML has become the standard format for representing structured and semi-structured data on the Web. To describe the structure and content of XML data, several XML schema languages have been proposed. Although being very useful for validating XML ...
Towards the preservation of functional dependency in XML data transformation
With the advent of XML as a data representation and exchange format over the web, a massive amount of data is being stored in XML. As the use of XML grows rapidly, the task of data transformation for integration purposes in XML is getting much ...
Quasi-inverses of schema mappings
Schema mappings are high-level specifications that describe the relationship between two database schemas. Two operators on schema mappings, namely the composition operator and the inverse operator, are regarded as especially important. Progress on the ...
Comments