Article

Principles of dataspace systems

Authors:
Alon Halevy

Google Inc., Mountain View, CA

Google Inc., Mountain View, CA
View Profile

,
Michael Franklin

UC Berkeley, Berkeley, CA

UC Berkeley, Berkeley, CA
View Profile

,
David Maier

Portland State University, Portland, OR

Portland State University, Portland, OR
View Profile

PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsJune 2006Pages 1–9https://doi.org/10.1145/1142351.1142352

Published:26 June 2006Publication History

PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 1–9

ABSTRACT

The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient, integrated, or principled fashion. These challenges arise in enterprise and government data management, digital libraries, "smart" homes and personal information management. We have proposed dataspaces as a data management abstraction for these diverse applications and DataSpace Support Platforms (DSSPs) as systems that should be built to provide the required services over dataspaces. Unlike data integration systems, DSSPs do not require full semantic integration of the sources in order to provide useful services. This paper lays out specific technical challenges to realizing DSSPs and ties them to existing work in our field. We focus on query answering in DSSPs, the DSSP's ability to introspect on its content, and the use of human attention to enhance the semantic relationships in a dataspace.

References

Shaul Dar aand Gadi Entin, Shai Geva, and Eran Palmon. DTL's dataspot: Database exploration using plain language. In Proc. of VLDB, pages 645--649, 1998.]] Google ScholarDigital Library
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.]] Google ScholarDigital Library
Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. Dbxplorer: A system for keyword-based search over relational databases. In Proc. of ICDE, pages 5--16, 2002.]] Google ScholarDigital Library
Sihem Amer-Yahia, Nick Koudas, Amelie Marian, Divesh Srivastava, and David Toman. Structure and content scoring for xml. In Proc. of VLDB, pages 361--372, 2005.]] Google ScholarDigital Library
M. Arenas, L. E. Bertossi, and J. Chomicki. Consistent Query Answers in Inconsistent Databases. In Proc. of ACM PODS, 1999.]] Google ScholarDigital Library
D. Barbará, H. Garcia-Molina, and D. Porter. The Management of Probabilistic Data. IEEE Trans. Knowl. Data Eng., 1992.]] Google ScholarDigital Library
O. Benjelloun, A. Das Sarma, A. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. http://dbpubs.stanford.edu/pub/2005-39, 2005.]]Google Scholar
D. Bhagwat, L. Chiticariu, W. Tan, and G. Vijayvargiya. An annotation management system for relational databases. Proc. of VLDB, 2004.]]Google ScholarDigital Library
Gaurav Bhalotia, Arvind Hulgeri, Charuta Nakhe, Soumen Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. of ICDE, pages 431--440, 2002.]] Google ScholarDigital Library
Shawn Bowers, Lois M. L. Delcambre, and David Maier. Superimposed schematics: Introducing e-r structure for in-situ information selections. In ER, pages 90--104, 2002.]] Google ScholarDigital Library
P. Buneman, S. Khanna, and W. Tan. Why and where: A charaterization of data provenance. Proc. of ICDT, 2001.]] Google ScholarDigital Library
A.K. Chandra and P.M. Merlin. Optimal implementation of conjunctive queries in relational databases. In Proceedings of the Ninth Annual ACM Symposium on Theory of Computing, pages 77--90, 1977.]] Google ScholarDigital Library
Surajit Chaudhuri, Raghu Ramakrishnan, and Gerhard Weikum. Integrating db and ir technologies: what is the sound of one hand clapping. In Proc. of CIDR, 2005.]]Google Scholar
Y. Cui and J. Widom. Lineage tracing for general data warehouse transformations. VLDB Journal, 2003.]] Google ScholarDigital Library
Y. Cui, J. Widom, and J. L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM TODS, 2000.]] Google ScholarDigital Library
N. Dalvi and D. Suciu. Efficient Query Evaluation on Probabilistic Databases. In Proc. of VLDB, 2004.]]Google ScholarDigital Library
N. Dalvi and D. Suciu. Answering Queries from Statistics and Probabilistic Views. In Proc. of VLDB, 2005.]] Google ScholarDigital Library
A. Das Sarma, O. Benjelloun, A. Halevy, and J. Widom. Working Models for Uncertain Data. In Proc. of ICDE, April 2006.]] Google ScholarDigital Library
Lois M. L. Delcambre, David Maier, Shawn Bowers, Mathew Weaver, Longxing Deng, Paul Gorman, Joan Ash, Mary Lavelle, and Jason Lyman. Bundles in captivity: An application of superimposed information. In Proc. of ICDE, pages 111--120, 2001.]] Google ScholarDigital Library
Anhai Doan, Pedro Domingos, and Alon Halevy. Reconciling schemas of disparate data sources: a machine learning approach. In Proc. of SIGMOD, 2001.]] Google ScholarDigital Library
Xin Dong and Alon Halevy. A Platform for Personal Information Management and Integration. In Proc. of CIDR, 2005.]]Google Scholar
Xin (Luna) Dong, Alon Y. Halevy, Jayant Madhavan, Ema Nemes, and Jun Zhang. Similarity search for web services. In Proc. of VLDB, 2004.]]Google ScholarDigital Library
S. T. Dumais, E. Cutrell, J. J. Cadiz E., G. Jancke, R. Sarin, and D. C. Robbins. Stuff i've seen: A system for personal information retrieval and re-use. In SIGIR, 2003.]] Google ScholarDigital Library
M. Franklin, A. Halevy, and D. Maier. From databases to dataspaces: A new abstraction for information management. Sigmod Record, 34(4):27--33, 2005.]] Google ScholarDigital Library
Ariel Fuxman, Elham Fazli, and Renee J. Miller. Conquer: efficient management of inconsistent databases. In Proc. of SIGMOD, pages 155--166, New York, NY, USA, 2005. ACM Press.]] Google ScholarDigital Library
Jim Gemmell, Roger Lueder, and Gordon Bell. Living with a lifetime store. In Workshop on Ubiquitous Experience Media, 2003.]]Google Scholar
Lise Getoor and John Grant. Prl: A logical approach to probabilistic relational models. Machine Learning Journal, 62, 2006.]] Google ScholarDigital Library
Google.com. Google base. base.google.com, 2005.]]Google Scholar
G. Grahne. Dependency Satisfaction in Databases with Incomplete Information. In Proc. of VLDB, 1984.]] Google ScholarDigital Library
Lin Guo, Feng Shao, Chavdar Botev, and Jayavel Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In Proc. of SIGMOD, pages 16--27, 2003.]] Google ScholarDigital Library
Alon Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4), 2001.]] Google ScholarDigital Library
Bin He and Kevin Chen-Chuan Chang. Statistical schema integration across the deep web. In Proc. of SIGMOD, 2003.]] Google ScholarDigital Library
Vagelis Hristidis, Luis Gravano, and Yannis Papakonstantinou. Efficient ir-style keyword search over relational databases. In Proc. of VLDB, pages 850--861, 2003.]]Google ScholarDigital Library
T. Imielinski and W. Lipski Jr. Incomplete Information in Relational Databases. Journal of the ACM, 1984.]] Google ScholarDigital Library
Z. G. Ives, N. Khandelwal, A. Kapur, and M. Cakir. Orchestra: Rapid, collaborative sharing of dynamic data. In Proc. of CIDR, 2005.]]Google Scholar
Phokion Kolaitis. Schema mappings, data exchange, and metadata management. In Proc. of ACM PODS, pages 61--75, 2005.]] Google ScholarDigital Library
D. Koller and A. Pfeffer. Probabilistic frame-based systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pages 580--587, Madison, WI, 1998. AAAI Press.]] Google ScholarDigital Library
L. V. S. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. ProbView: A Flexible Probabilistic Database System. ACM TODS, 1997.]] Google ScholarDigital Library
Maurizio Lenzerini. Data integration: A theoretical perspective. In Proc. of PODS, 2002.]] Google ScholarDigital Library
A. Y. Levy, R. E. Fikes, and S. Sagiv. Speeding up inferences using relevance reasoning: A formalism and algorithms. Artificial Intelligence, 1997.]] Google ScholarDigital Library
Alon Y. Levy, Anand Rajaraman, and Joann J. Ordille. Querying heterogeneous information sources using source descriptions. In Proc. of VLDB, pages 251--262, Bombay, India, 1996.]] Google ScholarDigital Library
Jayant Madhavan, Philip A. Bernstein, AnHai Doan, and Alon Halevy. Corpus-based schema matching. In Proc. of ICDE, pages 57--68, 2005.]] Google ScholarDigital Library
David Maier and Lois M. L. Delcambre. Superimposed information for the internet. In WebDB, pages 1--9, 1999.]]Google ScholarDigital Library
R. McCann, A. Doan, A. Kramnik, and V. Varadarajan. Building data integration systems via mass collaboration. In Proc. of the SIGMOD-03 Workshop on the Web and Databases (WebDB-03), 2003.]]Google Scholar
Sudarshan Murthy, Lois M. L. Delcambre, David Maier, and Shawn Bowers. Putting integrated information in context: Superimposing conceptual models with sparce. In APCCM, pages 71--80, 2004.]] Google ScholarDigital Library
Sudarshan Murthy, David Maier, and Lois M. L. Delcambre. Querying bi-level information. In WebDB, pages 7--12, 2004.]] Google ScholarDigital Library
Dennis Quan, David Huynh, and David R. Karger. Haystack: a platform for authoring end user semantic web applications. In ISWC, 2003.]]Google ScholarDigital Library
S. Sarawagi and A. Bhamidipaty. Interactive deduplication using active learning. In SIGKDD, 2002.]] Google ScholarDigital Library
Nicholas E. Taylor and Zachary G. Ives. Reconciling while tolerating disagreement in collaborative data sharing. In Proc. of SIGMOD, 2006.]] Google ScholarDigital Library
Luis von Ahn and Laura Dabbish. Labeling images with a computer game. In Proceedings of ACM CHI, Vienna, Austria, 2004.]] Google ScholarDigital Library
J. Widom. Trio: A System for Integrated Management of Data, Accuracy, and Lineage. In Proc. of CIDR, 2005.]]Google Scholar

Index Terms

Principles of dataspace systems

Recommendations

Conceptual modeling of XML schemas
WIDM '03: Proceedings of the 5th ACM international workshop on Web information and data management

XML has become the standard format for representing structured and semi-structured data on the Web. To describe the structure and content of XML data, several XML schema languages have been proposed. Although being very useful for validating XML ...
Read More
Towards the preservation of functional dependency in XML data transformation

With the advent of XML as a data representation and exchange format over the web, a massive amount of data is being stored in XML. As the use of XML grows rapidly, the task of data transformation for integration purposes in XML is getting much ...
Read More
Quasi-inverses of schema mappings

Schema mappings are high-level specifications that describe the relationship between two database schemas. Two operators on schema mappings, namely the composition operator and the inverse operator, are regarded as especially important. Progress on the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
June 2006
382 pages
ISBN:1595933182
DOI:10.1145/1142351
General Chair:
Georg Gottlob
Vienna University of Technology
,
Program Chair:
Jan Van den Bussche
Hasselt University
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data integration
dataspaces
information retrieval and databases
personal information management
Qualifiers
- Article
Conference

Acceptance Rates
PODS '06 Paper Acceptance Rate35of185submissions,19%Overall Acceptance Rate642of2,707submissions,24%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 203
  Total Citations
  View Citations
- 2,769
  Total Downloads
- Downloads (Last 12 months)178
- Downloads (Last 6 weeks)19
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Principles of dataspace systems

PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Conceptual modeling of XML schemas

Towards the preservation of functional dependency in XML data transformation

Quasi-inverses of schema mappings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Principles of dataspace systems

PODS '06: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Conceptual modeling of XML schemas

Towards the preservation of functional dependency in XML data transformation

Quasi-inverses of schema mappings

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media