research-article

Coding-Data Portability in Systematic Literature Reviews: a W3C's Open Annotation Approach

Authors:
Oscar Díaz

University of the Basque Country (UPV/EHU), San Sebastián, Spain

University of the Basque Country (UPV/EHU), San Sebastián, Spain
View Profile

,
Haritz Medina

University of the Basque Country (UPV/EHU), San Sebastián, Spain

University of the Basque Country (UPV/EHU), San Sebastián, Spain
View Profile

,
Felipe I. Anfurrutia

University of the Basque Country (UPV/EHU), San Sebastián, Spain

University of the Basque Country (UPV/EHU), San Sebastián, Spain
View Profile

EASE '19: Proceedings of the 23rd International Conference on Evaluation and Assessment in Software EngineeringApril 2019Pages 178–187https://doi.org/10.1145/3319008.3319025

Published:15 April 2019Publication History

EASE '19: Proceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering

Pages 178–187

ABSTRACT

Systematic Literature Reviews (SLRs) are increasingly popular to categorize and identify research gaps. Their reliability largely depends on the rigour of the attempt to identify, appraise and aggregate evidences through coding, i.e. the process of examining and organizing the data contained in primary studies in order to answer the research questions. Current Qualitative Data Analysis Software (QDAS) lack of a common format. This jeopardizes reuse (i.e. difficult to share coding data among different tools), evolution (i.e. difficult to turn coding data into living documents that evolve as new research is published), and replicability (i.e. difficult for third parties to access and query coding data). Yet, the result of a recent survey indicates that 71,4% of participants (expert SLR reviewers) are ready to share SLR artifacts in a common repository. On the road towards open coding-data repositories, this work looks into W3C's Open Annotation as the way to RDFized those coding data. Benefits include: portability (i.e. W3C's prestige endorses the adoption of this standard among tool vendors); webization (i.e. coding data becomes URL addressable, hence openly reachable), and data linkage (i.e. RDFizing coding data benefit from Web technologies to query, draw inferences and easily link this data with external vocabularies). This paper rephrases coding practices as annotation practices where data is captured as W3C's Open Annotations. Using an open annotation repository (i.e. Hypothes.is), the paper illustrates how this repository can be populated with coding data. Deployability is proven by describing two clients on top of this repository: (1) a write client that populates the repository through a color-coding highlighter, and (2), a read client that obtains a traditional SLR spreadsheets by querying so-populated repositories.

References

Ahmed Al-Zubidy, Jeffrey C. Carver, David P. Hale, and Edgar E. Hassler. 2017. Vision for SLR tooling infrastructure: Prioritizing value-added requirements. Information and Software Technology 91 (11 2017), 72--81. Google ScholarDigital Library
Atlas.ti. 2000. Atlas.ti XML Universal Data Export. (2000). https://atlasti.com/product/xml/Google Scholar
Souvik Barat, Tony Clark, Balbir Barn, and Vinay Kulkarni. 2017. A Model-Based Approach to Systematic Review of Research Literature. In Proceedings of the 10th Innovations in Software Engineering Conference on - ISEC '17 (New York, New York, USA). ACM Press, 15--25. Google ScholarDigital Library
Tim Berners-Lee. 1998. Semantic Web: Why RDF is more than XML. (1998). https://www.w3.org/DesignIssues/RDF-XML.htmlGoogle Scholar
Christian Bizer, Tom Heath, and Tim Berners-Lee. 2011. Linked data: The story so far. In Semantic services, interoperability and web applications: emerging concepts. IGI Global, 205--227.Google Scholar
Pearl Brereton, Barbara A Kitchenham, David Budgen, Mark Turner, and Mohamed Khalil. 2007. Lessons from applying the systematic literature review process within the software engineering domain. Journal of systems and software 80, 4 (2007), 571--583. Google ScholarDigital Library
Calvesbert, Gian. 2014. The Benefits of Open Standards. (2014). https://www.air-worldwide.com/Blog/The-Benefits-of-Open-Standards/Google Scholar
Louise Corti and Gregory Arofan. 2011. CAQDAS Comparability. What about CAQDAS Data Exchange? FORUM: Qualitative Social Research 12, 3 (2011), 1--18.Google Scholar
D. S. Cruzes and T. Dyba. 2011. Recommended Steps for Thematic Synthesis in Software Engineering. 2011 International Symposium on Empirical Software Engineering and Measurement 7491 (2011), 275--284. Google ScholarDigital Library
Fajar J. Ekaputra, Estefanía Serral, and Stefan Biffl. 2014. Building an empirical software engineering research knowledge base from heterogeneous data sources. In Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business - i-KNOW '14. ACM Press, New York, New York, USA, 1--8. Google ScholarDigital Library
Jeanine C Evers. 2018. Current Issues in Qualitative Data Analysis Software (QDAS): A User and Developer Perspective. The Qualitative Report 23, 13 (2018), 61--73.Google Scholar
Leyla Jael García-Castro, Olga Giraldo, and Alexander García. 2012. Using annotations to model discourse: An extension to the Annotation Ontology. In CEUR Workshop Proceedings, Vol. 903. 13--22.Google Scholar
Vahid Garousi and Michael Felderer. 2017. Experience-based guidelines for effective and efficient data extraction in systematic reviews in software engineering. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 170--179. Google ScholarDigital Library
Joaquín Gayoso-Cabada, Antonio Sarasa-Cabezuelo, and José-Luis Sierra. 2018. Document Annotation Tools. In Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality - TEEM'18. ACM Press, New York, New York, USA, 889--895. Google ScholarDigital Library
B. Kitchenham, D. Budgen, and O. P. Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Google ScholarDigital Library
Nikolaos Konstantinou and Dimitrios-Emmanuel Spanos. 2015. Introduction: Linked Data and the Semantic Web. In Materializing the Web of Linked Data. Springer, 1--16.Google ScholarCross Ref
Kathleen M Macqueen and Eleanor McLellan-Lemal. 1998. Team-based codebook development: Structure, process, and agreement. Cultural Antropology Methods 10, 2 (1998), 31--36.Google Scholar
Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (oct 2012), 276--82. arXiv:arXiv:gr-qc/9809069v1Google Scholar
Vilmar Nepomuceno and Sergio Soares. 2018. Maintaining systematic literature reviews. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM 18. ACM Press, New York, New York, USA, 1--4. Google ScholarDigital Library
Neuhaus, Fabian. 2013. OntologySummit2013 Communique. (2013). http://ontolog.cim3.net/wiki/OntologySummit2013_Communique.htmlGoogle Scholar
Michelle Ortlipp. 2008. Keeping and using reflective journals in the qualitative research process. The qualitative report 13, 4 (2008), 695--705.Google Scholar
Kai Petersen, Robert Feldt, Shahid Mujtaba, and Michael Mattsson. 2008. Systematic Mapping Studies in Software Engineering. In 12th International Conference on Evaluation and Assessment in Software. 1--10. Google ScholarDigital Library
Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (2015), 1--18. Google ScholarDigital Library
Nigel Shadbolt, Tim Berners-Lee, and Wendy Hall. 2006. The semantic web revisited. IEEE intelligent systems 21, 3 (2006), 96--101. Google ScholarDigital Library
David Shotton and Silvio Peroni. 2015. DoCO, the Document Components Ontology. (2015). https://sparontologies.github.io/doco/current/doco.htmlGoogle Scholar
Mark Staples and Mahmood Niazi. 2007. Experiences using systematic review guidelines. Journal of Systems and Software 80, 9 (2007), 1425--1437. Google ScholarDigital Library
Paolo Tell, Jacob B Cholewa, Peter Nellemann, and Marco Kuhrmann. 2016. Beyond the Spreadsheet: Reflections on Tool Support for Literature Studies. Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (2016), 22:1--22:5. Google ScholarDigital Library
Udell, Jon. 2017. Federating Annotations Using Digital Object Identifiers (DOIs). (2017). https://web.hypothes.is/blog/dois/Google Scholar
W3C Web Annotation Working Group. 2017. Web Annotation. (2017). https://www.w3.org/annotation/Google Scholar
Web Annotation Working Group. 2017. Web Annotation Ontology (OA). (2017). https://www.w3.org/ns/oaGoogle Scholar
Roel Wieringa, Neil Maiden, Nancy Mead, and Colette Rolland. 2006. Requirements engineering paper classification and evaluation criteria: A proposal and a discussion. Requirements Engineering 11, 1 (mar 2006), 102--107. Google ScholarDigital Library
Yueming Sun, Ye Yang, He Zhang, Wen Zhang, and Qing Wang. 2012. Towards evidence-based ontology for supporting systematic literature review. In 16th International Conference on Evaluation & Assessment in Software Engineering (EASE 2012). 171--175.Google Scholar

Index Terms

Coding-Data Portability in Systematic Literature Reviews: a W3C's Open Annotation Approach
1. Applied computing
  1. Document management and text processing
    1. Document preparation
      1. Annotation
2. Information systems
  1. Data management systems
    1. Information integration
      1. Data exchange
      2. Mediators and data integration
  2. Information retrieval
    1. Document representation
      1. Ontologies

Recommendations

Automation of systematic literature reviews: A systematic literature review
Abstract Context
Systematic Literature Review (SLR) studies aim to identify relevant primary papers, extract the required data, analyze, and synthesize results to gain further and broader insight into the investigated domain. ...
Read More
Towards Sustainability of Systematic Literature Reviews
ESEM '21: Proceedings of the 15th ACM / IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM)

Background: The software engineering community has increasingly conducted systematic literature reviews (SLR) as a means to summarize evidence from different studies and bring to light the state of the art of a given research topic. While SLR provide ...
Read More
Reliability of search in systematic reviews: Towards a quality assessment framework for the automated-search strategy
Abstract Context
The trust in systematic literature reviews (SLRs) to provide credible recommendations is critical for establishing evidence-based software engineering (EBSE) practice. The reliability of SLR as a method is not a ...
Read More

Reviews

Reviewer: Soon Ae Chun

Systematic literature reviews (SLRs) involve several steps: the planning step, which identifies the scope of literature according to the research goals, and develops a coding protocol; the analysis step, which performs searching for relevant literature, analysis, coding, and data/evidence extraction; and the reporting step, which synthesizes and evaluates the reviews. The most challenging task is coding, which extracts the required data from primary sources that researchers need to address SLR questions. This data includes publication metadata (for example, authors, year, title), context descriptions (for example, subjects, technologies, settings), and findings (for example, results, behaviors, actions). Some tasks, such as metadata extraction, can be easily automated; however, other tasks need human qualitative coding and linking to the textual parts of the sources. Spreadsheets or proprietary tools have been used, for example, qualitative data analysis software (QDAS) to record coding data by different reviewers, but these tools lack portability and reusability. The authors propose an alternative: use the World Wide Web Consortium (W3C) web annotation data model (that is, the Resource Description Framework, RDF) and vocabulary to capture the coding data as web resources, as the open standard promotes data portability, interoperability, vendor neutrality, and data linkage to refer to the code sources in the text passages. The coding data in RDF will create the linked dataset, where web addressable primary studies (or entities) can be linked to diverse classifications of coding in different SLRs. To illustrate how the coding mechanism works using web annotation, the authors develop a browser extension tool that allows reviewers to create code (vocabulary) functions ("codeBookDevelopment"), to define links between category codes ("categorization"), to annotate selected quotes in the text as codes ("classifying"), and to validate the codes ("assessing"). The use of open standards to enable the coding of literature studies is shown to be easily deployable and fit for addressing coding needs. It would have been much more convincing if the codes in the open standards were portable to other tools, or vice versa, to emphasize the reuse and portability of the existing codes. All coders must use a tool that is compliant with the W3C web annotation data model, which requires mass adoption of the proposed scheme. Also lacking is an analysis of how the existing coding tools might inhibit the adoption of the proposed web annotation model. Existing tools are equipped with not only coding strategies, but also evaluative functions and text analyses. The study can be useful for researchers and students who conduct systematic literature reviews, but the functionalities of the standard compliant tools need to be mature enough to compete with existing tools to achieve wider adoption.

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
EASE '19: Proceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering
April 2019
345 pages
ISBN:9781450371452
DOI:10.1145/3319008
Program Chairs:
Shaukat Ali,
Vahid Garousi
Copyright © 2019 ACM
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 April 2019
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Data portability
Secondary Studies
Web Annotation
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
EASE '19 Paper Acceptance Rate20of73submissions,27%Overall Acceptance Rate71of232submissions,31%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 184
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Coding-Data Portability in Systematic Literature Reviews: a W3C's Open Annotation Approach

EASE '19: Proceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Automation of systematic literature reviews: A systematic literature review

Towards Sustainability of Systematic Literature Reviews

Reliability of search in systematic reviews: Towards a quality assessment framework for the automated-search strategy

Reviews

Access critical reviews of Computing literature here