skip to main content
10.1145/3183713.3196910acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Data Citation: Giving Credit Where Credit is Due

Published: 27 May 2018 Publication History

Abstract

An increasing amount of information is being published in structured databases and retrieved using queries, raising the question of how query results should be cited. Since there are a large number of possible queries over a database, one strategy is to specify citations to a small set of frequent queries - citation views - and use these to construct citations to other "general" queries. We present three approaches to implementing citation views and describe alternative policies for the joint, alternate and aggregated use of citation views. Extensive experiments using both synthetic and realistic citation views and queries show the trade-offs between the approaches in terms of the time to generate citations, as well as the size of the resulting citation. They also show that the choice of policy has a huge effect both on performance and size, leading to useful guidelines for what policies to use and how to specify citation views.

References

[1]
Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data, volume 12. CODATA-ICSTI Task Group on Data Citation Standards and Practices, 2013.
[2]
DataCite Metadata Schema Documentation for the Publication and Citation of Research Data, v4.0. Technical Report, DataCite Metadata Working Group, 2016.
[3]
S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995.
[4]
F. N. Afrati, C. Li, and J. D. Ullman. Using views to generate efficient evaluation plans for queries. Journal of Computer and System Sciences, 73(5):703--724, 2007.
[5]
A. Alawini, L. Chen, S. B. Davidson, N. Portilho, and G. Silvello. Automating data citation: the eagle-i experience. In Proc. of the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2017), pages 169--178, 2017.
[6]
A. Alawini, S. B. Davidson, W. Hu, and Y. Wu. Automating data citation in CiteDB. PVLDB, 10(12):1881--1884, 2017.
[7]
R. Angles and C. Gutierrez. The Expressive Power of SPARQL. In Proc. of the 7th International Semantic Web Conference (ISWC), pages 114--129, 2008.
[8]
J. Brase, I. Sens, and M. Lautenschlager. The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite. D-Lib Magazine, 21(1/2), 2015.
[9]
P. Buneman, S. B. Davidson, and J. Frew. Why data citation is a computational problem. Communications of the ACM (CACM), 59(9):50--57, 2016.
[10]
P. Buneman and G. Silvello. A Rule-Based Citation System for Structured and Evolving Datasets. IEEE Data Eng. Bull., 33(3):33--41, 2010.
[11]
B. Cautis, A. Deutsch, and N. Onose. XPath rewriting using multiple views: Achieving completeness and efficiency. In 11th International Workshop on the Web and Databases, WebDB 2008, Vancouver, BC, Canada, June 13, 2008, 2008.
[12]
S. B. Davidson, D. Deutsch, T. Milo, and G. Silvello. A model for fine-grained data citation. In CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Online Proceedings, 2017.
[13]
A. Deutsch and V. Tannen. XML queries and constraints, containment and reformulation. Theor. Comput. Sci., 336(1):57--87, 2005.
[14]
W. Fan, X. Wang, and Y. Wu. Answering pattern queries using views. IEEE Transactions on Knowledge and Data Engineering, 28(2):326--341, Feb 2016.
[15]
M. Force, N. Robinson, M. Matthews, D. Auld, and M. Boletta. Research Data in Journals and Repositories in the Web of Science: Developments and Recommendations. Bulletin of IEEE Technical Committee on Digital Libraries, Special Issue on Data Citation, 12(1):27--30, May 2016.
[16]
FORCE-11. Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. FORCE11, San Diego, CA, USA, 2014.
[17]
J. Goldstein and P. A. Larson. Optimizing queries using materialized views: a practical, scalable solution. In Proc. ACM SIGMOD International Conference on Management of Data (SIGMOD 2001), pages 331--342. ACM Press, 2001.
[18]
T. J. Green, G. Karvounarakis, and V. Tannen. Provenance Semirings. In Proc. of the 26th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 31--40, 2007.
[19]
P. Groth, A. Gibson, and J. Velterop. The Anatomy of a Nanopublication. Inf. Serv. Use, 30(1--2):51--56, 2010.
[20]
A. Y. Halevy. Answering queries using views: A survey. VLDB J., 10(4):270--294, 2001.
[21]
S. Harding, J. Sharman, E. Faccenda, C. Southan, A. Pawson, S. Ireland, A. Gray, L. Bruce, S. Alexander, S. Anderton, C. Bryant, A. Davenport, C. Doerig, D. Fabbro, F. Levi-Schaffer, M. Spedding, and J. Davies. The IUPHAR/BPS Guide to PHARMACOLOGY in 2018: updates and expansion to encompass the new guide to IMMUNOPHARMACOLOGY. Nucl. Acids Res., 46:D1091--D1106, 2018.
[22]
L. B. Honor, C. Haselgrove, J. A. Frazier, and D. N. Kennedy. Data Citation in Neuroimaging: Proposed Best Practices for Data Identification and Attribution. Frontiers in Neuroinformatics, 10(34):1--12, August 2016.
[23]
J. Klump, R. Huber, and M. Diepenbroek. DOI for Geoscience Data -- How Early Practices Shape Present Perceptions. Earth Science Inform., pages 1--14, 2015.
[24]
W. Le, S. Duan, A. Kementsietsidis, F. Li, and M. Wang. Rewriting queries on SPARQL views. In Proceedings of the 20th International Conference on World Wide Web, WWW '11, pages 655--664, New York, NY, USA, 2011. ACM.
[25]
G. Miklau and D. Suciu. Containment and equivalence for an XPath fragment. In Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 3-5, Madison, Wisconsin, USA, pages 65--76, 2002.
[26]
S. Pröll and A. Rauber. Scalable data citation in dynamic, large databases: Model and reference implementation. In Proc. of the 2013 IEEE International Conference on Big Data, pages 307--312, 2013.
[27]
S. Pröll and A. Rauber. A Scalable Framework for Dynamic Data Citation of Arbitrary Structured Data. In Proc. of 3rd Int. Conf. on Data Management Technologies and Applications, pages 223--230, 2014.
[28]
A. Rajaraman, Y. Sagiv, and J. D. Ullman. Answering queries using templates with binding patterns. In Proc. of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 105--112, 1995.
[29]
A. Rauber, A. Ari, D. van Uytvanck, and S. Pröll. Identification of Reproducible Subsets for Data Citation, Sharing and Re-Use. Bulletin of IEEE Technical Committee on Digital Libraries, Special Issue on Data Citation, 12(1):6--15, May 2016.
[30]
S. Rizvi, A. O. Mendelzon, S. Sudarshan, and P. Roy. Extending query rewriting techniques for fine-grained access control. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, June 13-18, 2004, pages 551--562, 2004.
[31]
G. Silvello. A Methodology for Citing Linked Open Data Subsets. D-Lib Magazine, 21(1/2), 2015.
[32]
G. Silvello. Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data. Journal of the American Society for Information Science and Technology (JASIST), 68(6):1505--1524, 2017.
[33]
N. Simons. Implementing DOIs for Research Data. D-Lib Magazine, 18(5/6), 2012.
[34]
P. Slavik. A tight analysis of the greedy algorithm for set cover. Journal of Algorithms, 25(2):237--276, 1997.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. data citation
  2. provenance
  3. scientific databases

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)78
  • Downloads (Last 6 weeks)19
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2023)How to Cite a Web Ranking and Make it FAIRLinking Theory and Practice of Digital Libraries10.1007/978-3-031-43849-3_6(65-78)Online publication date: 22-Sep-2023
  • (2022)Teaching Data Models with TriQLProceedings of the 1st International Workshop on Data Systems Education10.1145/3531072.3535320(16-21)Online publication date: 12-Jun-2022
  • (2022)Credit distribution in relational scientific databasesInformation Systems10.1016/j.is.2022.102060109:COnline publication date: 1-Nov-2022
  • (2021)Data citation and the citation graphQuantitative Science Studies10.1162/qss_a_001662:4(1399-1422)Online publication date: 1-Dec-2021
  • (2020)Why data citation isn't working, and what to do about itDatabase10.1093/databa/baaa0222020Online publication date: 12-May-2020
  • (2020)Data credit distribution: A new method to estimate databases impactJournal of Informetrics10.1016/j.joi.2020.10108014:4(101080)Online publication date: Nov-2020
  • (2020)Nanocitation: Complete and Interoperable Citations of NanopublicationsDigital Libraries: The Era of Big Data and Data Science10.1007/978-3-030-39905-4_18(182-187)Online publication date: 22-Jan-2020
  • (2019)ProvCiteProceedings of the VLDB Endowment10.14778/3317315.331731712:7(738-751)Online publication date: 1-Mar-2019
  • (2019)Dataset search: a surveyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-019-00564-x29:1(251-272)Online publication date: 24-Aug-2019
  • (2019)A Framework for Citing NanopublicationsDigital Libraries for Open Knowledge10.1007/978-3-030-30760-8_6(70-83)Online publication date: 9-Sep-2019
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media