research-article

Towards automated content analysis of discussion transcripts: a cognitive presence case

Authors:
Vitomir Kovanović

The University of Edinburgh, Edinburgh, UK

The University of Edinburgh, Edinburgh, UK
View Profile

,
Srećko Joksimović

The University of Edinburgh, Edinburgh, UK

The University of Edinburgh, Edinburgh, UK
View Profile

,
Zak Waters

Queensland University of Technology, Brisbane, Australia

Queensland University of Technology, Brisbane, Australia
View Profile

,
Dragan Gašević

The University of Edinburgh, Edinburgh, UK

The University of Edinburgh, Edinburgh, UK
View Profile

,
Kirsty Kitto

Queensland University of Technology, Brisbane, Australia

Queensland University of Technology, Brisbane, Australia
View Profile

,
Marek Hatala

Simon Fraser University, Burnaby, Canada

Simon Fraser University, Burnaby, Canada
View Profile

,
George Siemens

University of Texas at Arlington, Arlington

University of Texas at Arlington, Arlington
View Profile

LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & KnowledgeApril 2016Pages 15–24https://doi.org/10.1145/2883851.2883950

Published:25 April 2016Publication History

LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge

Pages 15–24

ABSTRACT

In this paper, we present the results of an exploratory study that examined the problem of automating content analysis of student online discussion transcripts. We looked at the problem of coding discussion transcripts for the levels of cognitive presence, one of the three main constructs in the Community of Inquiry (CoI) model of distance education. Using Coh-Metrix and LIWC features, together with a set of custom features developed to capture discussion context, we developed a random forest classification system that achieved 70.3% classification accuracy and 0.63 Cohen's kappa, which is significantly higher than values reported in the previous studies. Besides improvement in classification accuracy, the developed system is also less sensitive to overfitting as it uses only 205 classification features, which is around 100 times less features than in similar systems based on bag-of-words features. We also provide an overview of the classification features most indicative of the different phases of cognitive presence that gives an additional insights into the nature of cognitive presence learning cycle. Overall, our results show great potential of the proposed approach, with an added benefit of providing further characterization of the cognitive presence coding scheme.

References

Z. Akyol, J. B. Arbaugh, M. Cleveland-Innes, D. R. Garrison, P. Ice, J. C. Richardson, and K. Swan. A response to the review of the community of inquiry framework. Journal of distance education, 23(2), 2009. URL: http://www.ijede.ca/index.php/jde/article/view/630/884.Google Scholar
T. Anderson and J. Dron. Three generations of distance education pedagogy. The international review of research in open and distance learning, 12(3):80--97, 2010. URL: http://www.irrodl.org/index.php/irrodl/article/view/890/.Google Scholar
T. Anderson, L. Rourke, D. R. Garrison, and W. Archer. Assessing teaching presence in a computer conferencing context. Journal of asynchronous learning networks, 5:1--17, 2001. URL: http://auspace.athabascau.ca/handle/2149/725.Google Scholar
J. B. Arbaugh, A. Bangert, and M. Cleveland-Innes. Subject matter effects and the community of inquiry (coi) framework: an exploratory study. The internet and higher education, 13(1):37--44, 2010.Google Scholar
J. Arbaugh, M. Cleveland-Innes, S. R. Diaz, D. R. Garrison, P. Ice, J. C. Richardson, and K. P. Swan. Developing a community of inquiry instrument: testing a measure of the community of inquiry framework using a multi-institutional sample. The internet and higher education, 11(3--4):133--136, 2008.Google Scholar
L. Breiman. Random Forests. Machine learning, 45(1):5--32, 2001. Google ScholarDigital Library
D. L. Butler and P. H. Winne. Feedback and self-regulated learning: a theoretical synthesis. Review of educational research, 65(3):245--281, 1995.Google Scholar
N. V. Chawla, N. Japkowicz, and A. Kotcz. Editorial: special issue on learning from imbalanced data sets. ACM SIGKDD explorations newsletter, 6(1):1--6, 2004. Google ScholarDigital Library
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research:321--357, 2002. URL: https://www.jair.org/media/953/live-953-2037-jair.pdf. Google ScholarDigital Library
Coh-Metrix 3.0 indicies. http://cohmetrix.com/documentation_indices.html.Google Scholar
S. Corich, K. Hunt, and L. Hunt. Computerised content analysis for measuring critical thinking within discussion forums. Journal of e-learning and knowledge society, 2(1), 2012. URL: http://www.jelks.org/ojs/index.php/Je-LKS_EN/article/view/700.Google Scholar
B. De Wever, T. Schellens, M. Valcke, and H. Van Keer. Content analysis schemes to analyze transcripts of online asynchronous discussion groups: a review. Computers & education, 46(1):6--28, 2006. Google ScholarDigital Library
S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the american society for information science, 41(6):391--407, 1990.Google Scholar
J. Dewey. My pedagogical creed. School journal, 54(3):77--80, 1897.Google Scholar
P. Dönmez, C. Rosé, K. Stegmann, A. Weinberger, and F. Fischer. Supporting CSCL with automatic corpus analysis technology. In Proceedings of th 2005 conference on computer support for collaborative learning: learning 2005: the next 10 years!, 2005, 125--134. URL: https://telearn.archives-ouvertes.fr/hal-00190638. Google ScholarDigital Library
R. Donnelly and J. Gardner. Content analysis of computer conferencing transcripts. Interactive learning environments, 19(4):303--315, 2011. URL: http://eprints.teachingandlearning.ie/3930/.Google Scholar
N. Dowell, O. Skrypnyk, S. Joksimović, A. C. Graesser, S. Dawson, D. Gašević, P. d. Vries, T. Hennis, and V. Kovanović. Modeling Learners' Social Centrality and Performance through Language and Discourse. In Proceedings of the 8th International Conference on Educational Data Mining (EDM 2015), 2015. URL: http://www.educationaldatamining.org/EDM2015/proceedings/full250-257.pdf.Google Scholar
M. Fernández-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? The journal of machine learning research, 15(1):3133--3181, 2014. URL: http://jmlr.org/papers/v15/delgado14a.html. Google ScholarDigital Library
P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with wikipedia pages. Software, ieee, 29(1):70--75, 2012. Google ScholarDigital Library
P. W. Foltz, W. Kintsch, and T. K. Landauer. The measurement of textual coherence with latent semantic analysis. Discourse processes, 25:285--307, 1998. URL: http://eric.ed.gov/?id=EJ589329.Google ScholarCross Ref
E. Gabrilovich and S. Markovitch. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis. In Proceedings of the 20th International Joint Conference on Artifical Intelligence. Morgan Kaufmann Publishers Inc., 2007, pp. 1606--1611. URL: http://dl.acm.org/citation.cfm?id=1625275.1625535. Google ScholarDigital Library
D. Gašević, O. Adesope, S. Joksimović, and V. Kovanović. Externally-facilitated regulation scaffolding and role assignment to develop cognitive presence in asynchronous online discussions. The internet and higher education, 24:53--65, 2015.Google Scholar
D. R. Garrison, T. Anderson, and W. Archer. Critical inquiry in a text-based environment: computer conferencing in higher education. The internet and higher education, 2(2-3):87--105, 1999.Google Scholar
D. R. Garrison, T. Anderson, and W. Archer. Critical thinking, cognitive presence, and computer conferencing in distance education. American journal of distance education, 15(1):7--23, 2001.Google Scholar
D. R. Garrison, T. Anderson, and W. Archer. The first decade of the community of inquiry framework: a retrospective. The internet and higher education, 13(1--2):5--9, 2010.Google Scholar
R. Garrison, M. Cleveland-Innes, and T. S. Fung. Exploring causal relationships among teaching, cognitive and social presence: student perceptions of the community of inquiry framework. The internet and higher education, 13(1--2):31--36, 2010.Google Scholar
L. Getoor. Introduction to Statistical Relational Learning. MIT Press, 2007. ISBN: 978-0-262-07288-5. Google ScholarDigital Library
P. Gorsky, A. Caspi, I. Blau, Y. Vine, and A. Billet. Toward a coi population parameter: the impact of unit (sentence vs. message) on the results of quantitative content analysis. The international review of research in open and distributed learning, 13(1):17--37, 2011. URL: http://www.irrodl.org/index.php/irrodl/article/view/1073.Google Scholar
A. C. Graesser, D. S. McNamara, and J. M. Kulikowich. Coh-Metrix Providing Multilevel Analyses of Text Characteristics. Educational researcher, 40(5):223--234, 2011.Google Scholar
O. R. Holsti. Content analysis for the social sciences and humanities. Addison-Wesley Reading, MA, 1969.Google Scholar
M. K. C. f. Jed Wing, S. Weston, A. Williams, C. Keefer, A. Engelhardt, T. Cooper, Z. Mayer, B. Kenkel, t. R Core Team, M. Benesty, R. Lescarbeau, A. Ziem, L. Scrucca, Y. Tang, and C. Candan. Caret: classification and regression training. R package version 6.0-58, 2015. URL: http://CRAN.R-project.org/package=caret.Google Scholar
S. Joksimović, N. Dowell, O. Skrypnyk, V. Kovanović, D. Gašević, S. Dawson, and A. C. Graesser. Exploring the Accumulation of Social Capital in cMOOC Through Language and Discourse. Submitted, 2015.Google Scholar
S. Joksimović, D. Gašević, V. Kovanović, O. Adesope, and M. Hatala. Psychological characteristics in cognitive presence of communities of inquiry: A linguistic analysis of online discussions. The internet and higher education, 22:1--10, 2014.Google Scholar
S. Joksimović, V. Kovanović, J. Jovanović, A. Zouaq, D. Gašević, and M. Hatala. What Do cMOOC Participants Talk About in Social Media?: A Topic Analysis of Discourse in a cMOOC. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, 2015, pp. 156--165. Google ScholarDigital Library
V. Kovanović, S. Joksimović, D. Gašević, and M. Hatala. Automated Content Analysis of Online Discussion Transcripts. In Proceedings of the Workshops at the LAK 2014 Conference co-located with 4th International Conference on Learning Analytics and Knowledge (LAK 2014), 2014. URL: http://ceur-ws.org/Vol-1137/.Google Scholar
V. Kovanović, S. Joksimović, D. Gašević, M. Hatala, and G. Siemens. Content Analytics: the definition, scope, and an overview of published research. In, Handbook of Learning Analyitcs, 2015.Google Scholar
K. H. Krippendorff. Content analysis: an introduction to its methodology. Sage Publications, 2003.Google Scholar
J. Lafferty, A. McCallum, and F. C. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML '01), 2001. URL: http://dl.acm.org/citation.cfm?id=655813. Google ScholarDigital Library
J. R. Landis and G. G. Koch. The measurement of observer agreement for categorical data. Biometrics, 33(1):159--174, 1977.Google ScholarCross Ref
A. Liaw and M. Wiener. Classification and regression by random-forest. R news, 2(3):18--22, 2002. URL: http://CRAN.R-project.org/doc/Rnews/.Google Scholar
G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts. Understanding variable importances in forests of randomized trees. In Advances in neural information processing systems 26, 2013, pp. 431--439. URL: http://media.nips.cc/nipsbooks/nipspapers/paper_files/nips26/281.pdf.Google ScholarDigital Library
R. Luppicini. Review of computer mediated communication research for education. Instructional science, 35(2):141--185, 2007.Google ScholarCross Ref
E. Mayfield and C. Penstein-Rosé. Using feature construction to avoid large feature spaces in text classification. In Proceedings of the 12th annual conference on genetic and evolutionary computation, 2010, 1299--1306. Google ScholarDigital Library
T. McKlin. Analyzing Cognitive Presence in Online Courses Using an Artificial Neural Network. PhD thesis. Georgia State University, College of Education, 2004. Google ScholarDigital Library
D. S. McNamara, A. C. Graesser, P. M. McCarthy, and Z. Cai. Automated Evaluation of Text and Discourse with Coh-Metrix. Cambridge University Press, 2014. Google ScholarCross Ref
P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems, 2011, 1--8. Google ScholarDigital Library
J. Mu, K. Stegmann, E. Mayfield, C. Rosé, and F. Fischer. The ACODEA framework: developing segmentation and classification schemes for fully automatic analysis of online discussions. International journal of computer-supported collaborative learning, 7(2):285--305, 2012.Google ScholarCross Ref
E. B. Page and N. S. Petersen. The computer moves into essay grading: Updating the ancient test. Phi delta kappan, 76(7):561, 1995. URL: http://search.proquest.com/docview/218533317/abstract.Google Scholar
C. L. Park. Replicating the Use of a Cognitive Presence Measurement Tool. Journal of interactive online learning, 8:140--155, 2, 2009. URL: http://www.ncolr.org/issues/jiol/v8/n2/replicating-the-use-of-a-cognitive-presence-measurement-tool#.VrVSebKUFhE.Google Scholar
L. Rourke, T. Anderson, D. R. Garrison, and W. Archer. Assessing social presence in asynchronous text-based computer conferencing. The journal of distance education/ revue de l'éducation à distance, 14(2):50--71, 2007. URL: http://eric.ed.gov/?id=EJ616753.Google Scholar
L. Rourke, T. Anderson, D. R. Garrison, and W. Archer. Methodological issues in the content analysis of computer conference transcripts. International journal of artificial intelligence in education (IJAIED), 12:8--22, 2001.Google Scholar
P. J. Stone, D. C. Dunphy, and M. S. Smith. The general inquirer: a computer approach to content analysis. MIT press, 1966.Google Scholar
J.-W. Strijbos. Assessment of (computer-supported) collaborative learning. IEEE transactions on learning technologies, 4(1):59--73, 2011. Google ScholarDigital Library
J.-W. Strijbos, R. L. Martens, F. J. Prins, and W. M. G. Jochems. Content analysis: what are they talking about? Computers & education, 46(1):29--48, 2006. Google ScholarDigital Library
M. Strube and S. P. Ponzetto. WikiRelate! Computing Semantic Relatedness Using Wikipedia. In Proceedings of the 21st National Conference on Artificial Intelligence - Volume 2. AAAI Press, 2006, pp. 1419--1424. ISBN: 978-1-57735-281-5. URL: http://dl.acm.org/citation.cfm?id=1597348.1597414. Google ScholarDigital Library
P.-N. Tan, V. Kumar, and M. Steinbach. Introduction to Data Mining. Addison-Wesley Longman Publishing Co., Inc., 2005. ISBN: 0-321-32136-7.Google ScholarDigital Library
Y. R. Tausczik and J. W. Pennebaker. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of language and social psychology, 29(1):24--54, 2010.Google Scholar
Y. R. Tausczik and J. W. Pennebaker. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of language and social psychology, 29(1):24--54, 2010.Google Scholar
V. N. Vapnik. Statistical learning theory. Wiley-Interscience, 1998.Google ScholarDigital Library
J. Vassileva. Toward social learning environments. IEEE transactions on learning technologies, 1(4):199--214, 2008. Google ScholarDigital Library
N. Vaughan and D. R. Garrison. Creating cognitive presence in a blended faculty development community. The internet and higher education, 8(1):1--12, 2005.Google Scholar
Z. Waters, V. Kovanović, K. Kitto, and D. Gašević. Structure matters: Adoption of structured classification approach in the context of cognitive presence classification. In Proceedings of the 11th Asia Information Retrieval Societies Conference, AIRS 2015, 2015.Google ScholarCross Ref
I. H. Witten, E. Frank, and M. A. Hall. Data mining: practical machine learning tools and techniques. Morgan Kaufmann, 3rd ed., 2011. Google ScholarDigital Library
A. Zouaq and R. Nkambou. Building domain ontologies from text for educational purposes. IEEE transactions on learning technologies, 1(1):49--62, 2008. Google ScholarDigital Library

Index Terms

Towards automated content analysis of discussion transcripts: a cognitive presence case
1. Applied computing
  1. Education
    1. Distance learning
    2. E-learning
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification

Recommendations

Towards automatic content analysis of social presence in transcripts of online discussions
LAK '20: Proceedings of the Tenth International Conference on Learning Analytics & Knowledge

This paper presents an approach to automatic labeling of the content of messages in online discussion according to the categories of social presence. To achieve this goal, the proposed approach is based on a combination of traditional text mining ...
Read More
Automated Analysis of Cognitive Presence in Online Discussions Written in Portuguese
Lifelong Technology-Enhanced Learning
Abstract
This paper presents a method for automated content analysis of students’ messages in asynchronous discussions written in Portuguese. In particular, the paper looks at the problem of coding discussion transcripts for the levels of cognitive ...
Read More
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge
April 2016
567 pages
ISBN:9781450341905
DOI:10.1145/2883851
General Chairs:
Dragan Gašević
University of Edinburgh, United Kingdom
,
Grace Lynch
Society for Learning Analytics Research
,
Program Chairs:
Shane Dawson
University of South Australia
,
Hendrik Drachsler
University of the Netherlands
,
Carolyn Penstein Rosé
Carnegie Mellon University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 April 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
community of inquiry (CoI) model
content analysis
content analytics
online discussions
text classification
Qualifiers
- research-article
Conference

Acceptance Rates
LAK '16 Paper Acceptance Rate36of116submissions,31%Overall Acceptance Rate236of782submissions,30%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 64
  Total Citations
  View Citations
- 977
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)10
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards automated content analysis of discussion transcripts: a cognitive presence case

LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards automatic content analysis of social presence in transcripts of online discussions

Automated Analysis of Cognitive Presence in Online Discussions Written in Portuguese

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards automated content analysis of discussion transcripts: a cognitive presence case

LAK '16: Proceedings of the Sixth International Conference on Learning Analytics & Knowledge

ABSTRACT

References

Cited By

Index Terms

Recommendations

Towards automatic content analysis of social presence in transcripts of online discussions

Automated Analysis of Cognitive Presence in Online Discussions Written in Portuguese

Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media