skip to main content
10.1145/3319008.3319025acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
research-article

Coding-Data Portability in Systematic Literature Reviews: a W3C's Open Annotation Approach

Authors Info & Claims
Published:15 April 2019Publication History

ABSTRACT

Systematic Literature Reviews (SLRs) are increasingly popular to categorize and identify research gaps. Their reliability largely depends on the rigour of the attempt to identify, appraise and aggregate evidences through coding, i.e. the process of examining and organizing the data contained in primary studies in order to answer the research questions. Current Qualitative Data Analysis Software (QDAS) lack of a common format. This jeopardizes reuse (i.e. difficult to share coding data among different tools), evolution (i.e. difficult to turn coding data into living documents that evolve as new research is published), and replicability (i.e. difficult for third parties to access and query coding data). Yet, the result of a recent survey indicates that 71,4% of participants (expert SLR reviewers) are ready to share SLR artifacts in a common repository. On the road towards open coding-data repositories, this work looks into W3C's Open Annotation as the way to RDFized those coding data. Benefits include: portability (i.e. W3C's prestige endorses the adoption of this standard among tool vendors); webization (i.e. coding data becomes URL addressable, hence openly reachable), and data linkage (i.e. RDFizing coding data benefit from Web technologies to query, draw inferences and easily link this data with external vocabularies). This paper rephrases coding practices as annotation practices where data is captured as W3C's Open Annotations. Using an open annotation repository (i.e. Hypothes.is), the paper illustrates how this repository can be populated with coding data. Deployability is proven by describing two clients on top of this repository: (1) a write client that populates the repository through a color-coding highlighter, and (2), a read client that obtains a traditional SLR spreadsheets by querying so-populated repositories.

References

  1. Ahmed Al-Zubidy, Jeffrey C. Carver, David P. Hale, and Edgar E. Hassler. 2017. Vision for SLR tooling infrastructure: Prioritizing value-added requirements. Information and Software Technology 91 (11 2017), 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Atlas.ti. 2000. Atlas.ti XML Universal Data Export. (2000). https://atlasti.com/product/xml/Google ScholarGoogle Scholar
  3. Souvik Barat, Tony Clark, Balbir Barn, and Vinay Kulkarni. 2017. A Model-Based Approach to Systematic Review of Research Literature. In Proceedings of the 10th Innovations in Software Engineering Conference on - ISEC '17 (New York, New York, USA). ACM Press, 15--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tim Berners-Lee. 1998. Semantic Web: Why RDF is more than XML. (1998). https://www.w3.org/DesignIssues/RDF-XML.htmlGoogle ScholarGoogle Scholar
  5. Christian Bizer, Tom Heath, and Tim Berners-Lee. 2011. Linked data: The story so far. In Semantic services, interoperability and web applications: emerging concepts. IGI Global, 205--227.Google ScholarGoogle Scholar
  6. Pearl Brereton, Barbara A Kitchenham, David Budgen, Mark Turner, and Mohamed Khalil. 2007. Lessons from applying the systematic literature review process within the software engineering domain. Journal of systems and software 80, 4 (2007), 571--583. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Calvesbert, Gian. 2014. The Benefits of Open Standards. (2014). https://www.air-worldwide.com/Blog/The-Benefits-of-Open-Standards/Google ScholarGoogle Scholar
  8. Louise Corti and Gregory Arofan. 2011. CAQDAS Comparability. What about CAQDAS Data Exchange? FORUM: Qualitative Social Research 12, 3 (2011), 1--18.Google ScholarGoogle Scholar
  9. D. S. Cruzes and T. Dyba. 2011. Recommended Steps for Thematic Synthesis in Software Engineering. 2011 International Symposium on Empirical Software Engineering and Measurement 7491 (2011), 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Fajar J. Ekaputra, Estefanía Serral, and Stefan Biffl. 2014. Building an empirical software engineering research knowledge base from heterogeneous data sources. In Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven Business - i-KNOW '14. ACM Press, New York, New York, USA, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jeanine C Evers. 2018. Current Issues in Qualitative Data Analysis Software (QDAS): A User and Developer Perspective. The Qualitative Report 23, 13 (2018), 61--73.Google ScholarGoogle Scholar
  12. Leyla Jael García-Castro, Olga Giraldo, and Alexander García. 2012. Using annotations to model discourse: An extension to the Annotation Ontology. In CEUR Workshop Proceedings, Vol. 903. 13--22.Google ScholarGoogle Scholar
  13. Vahid Garousi and Michael Felderer. 2017. Experience-based guidelines for effective and efficient data extraction in systematic reviews in software engineering. In Proceedings of the 21st International Conference on Evaluation and Assessment in Software Engineering. ACM, 170--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Joaquín Gayoso-Cabada, Antonio Sarasa-Cabezuelo, and José-Luis Sierra. 2018. Document Annotation Tools. In Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality - TEEM'18. ACM Press, New York, New York, USA, 889--895. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. B. Kitchenham, D. Budgen, and O. P. Brereton. 2015. Evidence-Based Software Engineering and Systematic Reviews. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nikolaos Konstantinou and Dimitrios-Emmanuel Spanos. 2015. Introduction: Linked Data and the Semantic Web. In Materializing the Web of Linked Data. Springer, 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  17. Kathleen M Macqueen and Eleanor McLellan-Lemal. 1998. Team-based codebook development: Structure, process, and agreement. Cultural Antropology Methods 10, 2 (1998), 31--36.Google ScholarGoogle Scholar
  18. Mary L McHugh. 2012. Interrater reliability: the kappa statistic. Biochemia medica 22, 3 (oct 2012), 276--82. arXiv:arXiv:gr-qc/9809069v1Google ScholarGoogle Scholar
  19. Vilmar Nepomuceno and Sergio Soares. 2018. Maintaining systematic literature reviews. In Proceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement - ESEM 18. ACM Press, New York, New York, USA, 1--4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Neuhaus, Fabian. 2013. OntologySummit2013 Communique. (2013). http://ontolog.cim3.net/wiki/OntologySummit2013_Communique.htmlGoogle ScholarGoogle Scholar
  21. Michelle Ortlipp. 2008. Keeping and using reflective journals in the qualitative research process. The qualitative report 13, 4 (2008), 695--705.Google ScholarGoogle Scholar
  22. Kai Petersen, Robert Feldt, Shahid Mujtaba, and Michael Mattsson. 2008. Systematic Mapping Studies in Software Engineering. In 12th International Conference on Evaluation and Assessment in Software. 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Kai Petersen, Sairam Vakkalanka, and Ludwik Kuzniarz. 2015. Guidelines for conducting systematic mapping studies in software engineering: An update. Information and Software Technology 64 (2015), 1--18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Nigel Shadbolt, Tim Berners-Lee, and Wendy Hall. 2006. The semantic web revisited. IEEE intelligent systems 21, 3 (2006), 96--101. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. David Shotton and Silvio Peroni. 2015. DoCO, the Document Components Ontology. (2015). https://sparontologies.github.io/doco/current/doco.htmlGoogle ScholarGoogle Scholar
  26. Mark Staples and Mahmood Niazi. 2007. Experiences using systematic review guidelines. Journal of Systems and Software 80, 9 (2007), 1425--1437. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Paolo Tell, Jacob B Cholewa, Peter Nellemann, and Marco Kuhrmann. 2016. Beyond the Spreadsheet: Reflections on Tool Support for Literature Studies. Proceedings of the 20th International Conference on Evaluation and Assessment in Software Engineering (2016), 22:1--22:5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Udell, Jon. 2017. Federating Annotations Using Digital Object Identifiers (DOIs). (2017). https://web.hypothes.is/blog/dois/Google ScholarGoogle Scholar
  29. W3C Web Annotation Working Group. 2017. Web Annotation. (2017). https://www.w3.org/annotation/Google ScholarGoogle Scholar
  30. Web Annotation Working Group. 2017. Web Annotation Ontology (OA). (2017). https://www.w3.org/ns/oaGoogle ScholarGoogle Scholar
  31. Roel Wieringa, Neil Maiden, Nancy Mead, and Colette Rolland. 2006. Requirements engineering paper classification and evaluation criteria: A proposal and a discussion. Requirements Engineering 11, 1 (mar 2006), 102--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Yueming Sun, Ye Yang, He Zhang, Wen Zhang, and Qing Wang. 2012. Towards evidence-based ontology for supporting systematic literature review. In 16th International Conference on Evaluation & Assessment in Software Engineering (EASE 2012). 171--175.Google ScholarGoogle Scholar

Index Terms

  1. Coding-Data Portability in Systematic Literature Reviews: a W3C's Open Annotation Approach

          Recommendations

          Reviews

          Soon Ae Chun

          Systematic literature reviews (SLRs) involve several steps: the planning step, which identifies the scope of literature according to the research goals, and develops a coding protocol; the analysis step, which performs searching for relevant literature, analysis, coding, and data/evidence extraction; and the reporting step, which synthesizes and evaluates the reviews. The most challenging task is coding, which extracts the required data from primary sources that researchers need to address SLR questions. This data includes publication metadata (for example, authors, year, title), context descriptions (for example, subjects, technologies, settings), and findings (for example, results, behaviors, actions). Some tasks, such as metadata extraction, can be easily automated; however, other tasks need human qualitative coding and linking to the textual parts of the sources. Spreadsheets or proprietary tools have been used, for example, qualitative data analysis software (QDAS) to record coding data by different reviewers, but these tools lack portability and reusability. The authors propose an alternative: use the World Wide Web Consortium (W3C) web annotation data model (that is, the Resource Description Framework, RDF) and vocabulary to capture the coding data as web resources, as the open standard promotes data portability, interoperability, vendor neutrality, and data linkage to refer to the code sources in the text passages. The coding data in RDF will create the linked dataset, where web addressable primary studies (or entities) can be linked to diverse classifications of coding in different SLRs. To illustrate how the coding mechanism works using web annotation, the authors develop a browser extension tool that allows reviewers to create code (vocabulary) functions ("codeBookDevelopment"), to define links between category codes ("categorization"), to annotate selected quotes in the text as codes ("classifying"), and to validate the codes ("assessing"). The use of open standards to enable the coding of literature studies is shown to be easily deployable and fit for addressing coding needs. It would have been much more convincing if the codes in the open standards were portable to other tools, or vice versa, to emphasize the reuse and portability of the existing codes. All coders must use a tool that is compliant with the W3C web annotation data model, which requires mass adoption of the proposed scheme. Also lacking is an analysis of how the existing coding tools might inhibit the adoption of the proposed web annotation model. Existing tools are equipped with not only coding strategies, but also evaluative functions and text analyses. The study can be useful for researchers and students who conduct systematic literature reviews, but the functionalities of the standard compliant tools need to be mature enough to compete with existing tools to achieve wider adoption.

          Access critical reviews of Computing literature here

          Become a reviewer for Computing Reviews.

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            EASE '19: Proceedings of the 23rd International Conference on Evaluation and Assessment in Software Engineering
            April 2019
            345 pages
            ISBN:9781450371452
            DOI:10.1145/3319008

            Copyright © 2019 ACM

            Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 15 April 2019

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

            Acceptance Rates

            EASE '19 Paper Acceptance Rate20of73submissions,27%Overall Acceptance Rate71of232submissions,31%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader