skip to main content
10.1145/2883851.2883950acmotherconferencesArticle/Chapter ViewAbstractPublication PageslakConference Proceedingsconference-collections
research-article

Towards automated content analysis of discussion transcripts: a cognitive presence case

Published:25 April 2016Publication History

ABSTRACT

In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen's kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.

References

  1. Z. Akyol, J. B. Arbaugh, M. Cleveland-Innes, D. R. Garrison, P. Ice, J. C. Richardson, and K. Swan. A response to the review of the community of inquiry framework. Journal of distance education, 23(2), 2009. URL: http://www.ijede.ca/index.php/jde/article/view/630/884.Google ScholarGoogle Scholar
  2. T. Anderson and J. Dron. Three generations of distance education pedagogy. The international review of research in open and distance learning, 12(3):80--97, 2010. URL: http://www.irrodl.org/index.php/irrodl/article/view/890/.Google ScholarGoogle Scholar
  3. T. Anderson, L. Rourke, D. R. Garrison, and W. Archer. Assessing teaching presence in a computer conferencing context. Journal of asynchronous learning networks, 5:1--17, 2001. URL: http://auspace.athabascau.ca/handle/2149/725.Google ScholarGoogle Scholar
  4. J. B. Arbaugh, A. Bangert, and M. Cleveland-Innes. Subject matter effects and the community of inquiry (coi) framework: an exploratory study. The internet and higher education, 13(1):37--44, 2010.Google ScholarGoogle Scholar
  5. J. Arbaugh, M. Cleveland-Innes, S. R. Diaz, D. R. Garrison, P. Ice, J. C. Richardson, and K. P. Swan. Developing a community of inquiry instrument: testing a measure of the community of inquiry framework using a multi-institutional sample. The internet and higher education, 11(3--4):133--136, 2008.Google ScholarGoogle Scholar
  6. L. Breiman. Random Forests. Machine learning, 45(1):5--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. L. Butler and P. H. Winne. Feedback and self-regulated learning: a theoretical synthesis. Review of educational research, 65(3):245--281, 1995.Google ScholarGoogle Scholar
  8. N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD explorations newsletter, 6(1):1--6, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research:321--357, 2002. URL: https://www.jair.org/media/953/live-953-2037-jair.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Coh-Metrix 3.0 indicies. http://cohmetrix.com/documentation_indices.html.Google ScholarGoogle Scholar
  11. S. Corich, K. Hunt, and L. Hunt. Computerised content analysis for measuring critical thinking within discussion forums. Journal of e-learning and knowledge society, 2(1), 2012. URL: http://www.jelks.org/ojs/index.php/Je-LKS_EN/article/view/700.Google ScholarGoogle Scholar
  12. B. De Wever, T. Schellens, M. Valcke, and H. Van Keer. Content analysis schemes to analyze transcripts of online asynchronous discussion groups: a review. Computers & education, 46(1):6--28, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the american society for information science, 41(6):391--407, 1990.Google ScholarGoogle Scholar
  14. J. Dewey. My pedagogical creed. School journal, 54(3):77--80, 1897.Google ScholarGoogle Scholar
  15. P. Dönmez, C. Rosé, K. Stegmann, A. Weinberger, and F. Fischer. Supporting CSCL with automatic corpus analysis technology. In Proceedings of th 2005 conference on computer support for collaborative learning: learning 2005: the next 10 years!, 2005, 125--134. URL: https://telearn.archives-ouvertes.fr/hal-00190638. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Donnelly and J. Gardner. Content analysis of computer conferencing transcripts. Interactive learning environments, 19(4):303--315, 2011. URL: http://eprints.teachingandlearning.ie/3930/.Google ScholarGoogle Scholar
  17. N. Dowell, O. Skrypnyk, S. Joksimović, A. C. Graesser, S. Dawson, D. Gašević, P. d. Vries, T. Hennis, and V. Kovanović. Modeling Learners' Social Centrality and Performance through Language and Discourse. In Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015), 2015. URL: http://www.educationaldatamining.org/EDM2015/proceedings/full250-257.pdf.Google ScholarGoogle Scholar
  18. M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? The journal of machine learning research, 15(1):3133--3181, 2014. URL: http://jmlr.org/papers/v15/delgado14a.html. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with wikipedia pages. Software, ieee, 29(1):70--75, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. W. Foltz, W. Kintsch, and T. K. Landauer. The measurement of textual coherence with latent semantic analysis. Discourse processes, 25:285--307, 1998. URL: http://eric.ed.gov/?id=EJ589329.Google ScholarGoogle ScholarCross RefCross Ref
  21. E. Gabrilovich and S. Markovitch. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelligence. Morgan Kaufmann Publishers Inc., 2007, pp. 1606--1611. URL: http://dl.acm.org/citation.cfm?id=1625275.1625535. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. D. Gašević, O. Adesope, S. Joksimović, and V. Kovanović. Externally-facilitated regulation scaffolding and role assignment to develop cognitive presence in asynchronous online discussions. The internet and higher education, 24:53--65, 2015.Google ScholarGoogle Scholar
  23. D. R. Garrison, T. Anderson, and W. Archer. Critical inquiry in a text-based environment: computer conferencing in higher education. The internet and higher education, 2(2-3):87--105, 1999.Google ScholarGoogle Scholar
  24. D. R. Garrison, T. Anderson, and W. Archer. Critical thinking, cognitive presence, and computer conferencing in distance education. American journal of distance education, 15(1):7--23, 2001.Google ScholarGoogle Scholar
  25. D. R. Garrison, T. Anderson, and W. Archer. The first decade of the community of inquiry framework: a retrospective. The internet and higher education, 13(1--2):5--9, 2010.Google ScholarGoogle Scholar
  26. R. Garrison, M. Cleveland-Innes, and T. S. Fung. Exploring causal relationships among teaching, cognitive and social presence: student perceptions of the community of inquiry framework. The internet and higher education, 13(1--2):31--36, 2010.Google ScholarGoogle Scholar
  27. L. Getoor. Introduction to Statistical Relational Learning. MIT Press, 2007. ISBN: 978-0-262-07288-5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. P. Gorsky, A. Caspi, I. Blau, Y. Vine, and A. Billet. Toward a coi population parameter: the impact of unit (sentence vs. message) on the results of quantitative content analysis. The international review of research in open and distributed learning, 13(1):17--37, 2011. URL: http://www.irrodl.org/index.php/irrodl/article/view/1073.Google ScholarGoogle Scholar
  29. A. C. Graesser, D. S. McNamara, and J. M. Kulikowich. Coh-Metrix Providing Multilevel Analyses of Text Characteristics. Educational researcher, 40(5):223--234, 2011.Google ScholarGoogle Scholar
  30. O. R. Holsti. Content analysis for the social sciences and humanities. Addison-Wesley Reading, MA, 1969.Google ScholarGoogle Scholar
  31. M. K. C. f. Jed Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer, B. Kenkel, t. R Core Team, M. Benesty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, and C. Candan. Caret: classification and regression training. R package version 6.0-58, 2015. URL: http://CRAN.R-project.org/package=caret.Google ScholarGoogle Scholar
  32. S. Joksimović, N. Dowell, O. Skrypnyk, V. Kovanović, D. Gašević, S. Dawson, and A. C. Graesser. Exploring the Accumulation of Social Capital in cMOOC Through Language and Discourse. Submitted, 2015.Google ScholarGoogle Scholar
  33. S. Joksimović, D. Gašević, V. Kovanović, O. Adesope, and M. Hatala. Psychological characteristics in cognitive presence of communities of inquiry: A linguistic analysis of online discussions. The internet and higher education, 22:1--10, 2014.Google ScholarGoogle Scholar
  34. S. Joksimović, V. Kovanović, J. Jovanović, A. Zouaq, D. Gašević, and M. Hatala. What Do cMOOC Participants Talk About in Social Media?: A Topic Analysis of Discourse in a cMOOC. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, 2015, pp. 156--165. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. V. Kovanović, S. Joksimović, D. Gašević, and M. Hatala. Automated Content Analysis of Online Discussion Transcripts. In Proceedings of the Workshops at the LAK 2014 Conference co-located with 4th International Conference on Learning Analytics and Knowledge (LAK 2014), 2014. URL: http://ceur-ws.org/Vol-1137/.Google ScholarGoogle Scholar
  36. V. Kovanović, S. Joksimović, D. Gašević, M. Hatala, and G. Siemens. Content Analytics: the definition, scope, and an overview of published research. In, Handbook of Learning Analyitcs, 2015.Google ScholarGoogle Scholar
  37. K. H. Krippendorff. Content analysis: an introduction to its methodology. Sage Publications, 2003.Google ScholarGoogle Scholar
  38. J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML '01), 2001. URL: http://dl.acm.org/citation.cfm?id=655813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159--174, 1977.Google ScholarGoogle ScholarCross RefCross Ref
  40. A. Liaw and M. Wiener. Classification and regression by random-forest. R news, 2(3):18--22, 2002. URL: http://CRAN.R-project.org/doc/Rnews/.Google ScholarGoogle Scholar
  41. G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts. Understanding variable importances in forests of randomized trees. In Advances in neural information processing systems 26, 2013, pp. 431--439. URL: http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/281.pdf.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. R. Luppicini. Review of computer mediated communication research for education. Instructional science, 35(2):141--185, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  43. E. Mayfield and C. Penstein-Rosé. Using feature construction to avoid large feature spaces in text classification. In Proceedings of the 12th annual conference on genetic and evolutionary computation, 2010, 1299--1306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. T. McKlin. Analyzing Cognitive Presence in Online Courses Using an Artificial Neural Network. PhD thesis. Georgia State University, College of Education, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. D. S. McNamara, A. C. Graesser, P. M. McCarthy, and Z. Cai. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  46. P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems, 2011, 1--8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Mu, K. Stegmann, E. Mayfield, C. Rosé, and F. Fischer. The ACODEA framework: developing segmentation and classification schemes for fully automatic analysis of online discussions. International journal of computer-supported collaborative learning, 7(2):285--305, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  48. E. B. Page and N. S. Petersen. The computer moves into essay grading: Updating the ancient test. Phi delta kappan, 76(7):561, 1995. URL: http://search.proquest.com/docview/218533317/abstract.Google ScholarGoogle Scholar
  49. C. L. Park. Replicating the Use of a Cognitive Presence Measurement Tool. Journal of interactive online learning, 8:140--155, 2, 2009. URL: http://www.ncolr.org/issues/jiol/v8/n2/replicating-the-use-of-a-cognitive-presence-measurement-tool#.VrVSebKUFhE.Google ScholarGoogle Scholar
  50. L. Rourke, T. Anderson, D. R. Garrison, and W. Archer. Assessing social presence in asynchronous text-based computer conferencing. The journal of distance education/ revue de l'éducation à distance, 14(2):50--71, 2007. URL: http://eric.ed.gov/?id=EJ616753.Google ScholarGoogle Scholar
  51. L. Rourke, T. Anderson, D. R. Garrison, and W. Archer. Methodological issues in the content analysis of computer conference transcripts. International journal of artificial intelligence in education (IJAIED), 12:8--22, 2001.Google ScholarGoogle Scholar
  52. P. J. Stone, D. C. Dunphy, and M. S. Smith. The general inquirer: a computer approach to content analysis. MIT press, 1966.Google ScholarGoogle Scholar
  53. J.-W. Strijbos. Assessment of (computer-supported) collaborative learning. IEEE transactions on learning technologies, 4(1):59--73, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. J.-W. Strijbos, R. L. Martens, F. J. Prins, and W. M. G. Jochems. Content analysis: what are they talking about? Computers & education, 46(1):29--48, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. Strube and S. P. Ponzetto. WikiRelate! Computing Semantic Relatedness Using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2. AAAI Press, 2006, pp. 1419--1424. ISBN: 978-1-57735-281-5. URL: http://dl.acm.org/citation.cfm?id=1597348.1597414. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. P.-N. Tan, V. Kumar, and M. Steinbach. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., 2005. ISBN: 0-321-32136-7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Y. R. Tausczik and J. W. Pennebaker. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of language and social psychology, 29(1):24--54, 2010.Google ScholarGoogle Scholar
  58. Y. R. Tausczik and J. W. Pennebaker. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of language and social psychology, 29(1):24--54, 2010.Google ScholarGoogle Scholar
  59. V. N. Vapnik. Statistical learning theory. Wiley-Interscience, 1998.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. J. Vassileva. Toward social learning environments. IEEE transactions on learning technologies, 1(4):199--214, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  61. N. Vaughan and D. R. Garrison. Creating cognitive presence in a blended faculty development community. The internet and higher education, 8(1):1--12, 2005.Google ScholarGoogle Scholar
  62. Z. Waters, V. Kovanović, K. Kitto, and D. Gašević. Structure matters: Adoption of structured classification approach in the context of cognitive presence classification. In Proceedings of the 11th Asia Information Retrieval Societies Conference, AIRS 2015, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  63. I. H. Witten, E. Frank, and M. A. Hall. Data mining: practical machine learning tools and techniques. Morgan Kaufmann, 3rd ed., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. A. Zouaq and R. Nkambou. Building domain ontologies from text for educational purposes. IEEE transactions on learning technologies, 1(1):49--62, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Towards automated content analysis of discussion transcripts: a cognitive presence case

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge
          April 2016
          567 pages
          ISBN:9781450341905
          DOI:10.1145/2883851

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 25 April 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          LAK '16 Paper Acceptance Rate36of116submissions,31%Overall Acceptance Rate236of782submissions,30%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader