skip to main content
10.1145/1255175.1255178acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

Categorization and analysis of text in computer mediated communication archives using visualization

Published: 18 June 2007 Publication History

Abstract

Digital libraries (DLs) for online discourse contain large amounts of valuable information that is difficult to navigate and analyze. Visualization systems developed to facilitate improved CMC archive analysis and navigation primarily focus on interaction information, with little emphasis on textual content. In this paper we present a system that provides DL exploration services such as visualization, categorization, and analysis for CMC text. The system incorporates an extended feature set comprised of stylistic, topical, and sentiment related features to enable richer content representation. The system also includes the Ink Blot technique which utilizes decision tree models and text overlay to visualize CMC messages. Ink Blots can be used for text categorization and analysis across forums, authors, threads, messages, and over time. The proposed system's analysis capabilities were evaluated with a series of examples and a qualitative user study. Empirical categorization experiments comparing the Ink Blot technique against a benchmark support vector machine classifier were also conducted. The results demonstrated the efficacy of the Ink Blot technique for text categorization and also highlighted the effectiveness of the extended feature set for improved text categorization.

References

[1]
Abbasi, A., and Chen, H. "Identification and Comparison of Extremist-Group Web Forum Messages using Authorship Analysis," IEEE Intelligent Systems (20:5), 2005, pp. 67--75.
[2]
Abbasi, A. and Chen, H. "Visualizing Authorship for Identification", In the 4th IEEE Symposium on Intelligence and Security Informatics (ISI 2006), San Diego, CA, 2006.
[3]
Chen, H., Lally, A. M., Zhu, B., & Chau, M. "HelpfulMed: Intelligent Searching for Medical Information over the Internet," Journal of the American Society for Information Science and Technology (54:7), 2003, pp. 683--694.
[4]
Dash, M. and Liu, H. "Feature Selection for Classification," Intelligent Data Analysis,(1),1997, pp. 131--156.
[5]
Donath, J. "Identity and Deception in the Virtual Community," In Communities in Cyberspace, London, Routledge Press, 1999.
[6]
Donath, J. "A Semantic Apporach to Visualizing Online Conversations," Communications of the ACM, 45(4), 2002, pp. 45--49.
[7]
Duch, W., Wieczorek, T., Biesiada, J., and Blachnik M. "Comparison of feature ranking methods based on information entropy," Neural Networks, 15, 2004.
[8]
Dumais, S., Platt, J., Heckerman, D. And Sahami, M. "Inductive Learning Algorithms and Representations for Text Categorization," In Proceedings of the Seventh of ACM-CIKM, 1998, pp. 148--155.
[9]
Efron, M., Marchionini, G., & Zhiang, J. "Implications of the Recursive Representation Problem for Automatic Concept Identification in On-Line Government Information," In Proceedings of the ASIST SIG-CR Workshop, 2004.
[10]
Erickson, T. and Kellogg, W. A. "Social Translucence: An Approach to Designing Systems that Support Social Processes," ACM Transactions on Computer-Human Interaction (7:1), 2000 pp. 59--83.
[11]
Fiore, A, T., and Smith, M, A. "Tree Map Visualizations of News Groups," Poster Presented at IEEE Symposium on Information Visualization, 2002, Boston, Massachusetts.
[12]
Hearst, M. A. "Direction-Based Text Interpretation as an Information Access Refinement," In P. Jacobs (Ed.), Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retreival, Mahwah, NJ: Lawrence Erlbaum Associates, 1992.
[13]
Hara, N., Bonk, C, J., and Angeli, C. "Content Analysis of Online Discussion In An Applied Educational Psychology Course," Instructional Science (28), 2000, pp. 115--152.
[14]
Henri, F. "Computer Conferencing and Content Analysis," in Collabrative Learning through Computer Conferencing: The Najaden papers, A.R. Kaye, (ed), 1992, pp. 115--136.
[15]
Herring, S. C. "Computer-Mediated Communication on the Internet," Annual Review of Information Science and Technology (36:1), 2002, pp. 109--168.
[16]
Kelly, S. U., Sung, C., and Farnham, S. "Designing for Improved Social Responsibility, User Participation and Content in On-Line Communities," in Proceedings of the Conference on Human Factors in Computing Systems (CHI 2002), 2002.
[17]
Li, J., Zheng, R. and Chen, H. "From Fingerprint to Writeprint," Communications of the ACM, (49:4), 2006, pp. 76--82.
[18]
McDonald, D., Chen, H., Hua S., and Marshall, B. "Extracting Gene Pathway Relations using a Hybrid Grammar: The Arizona Relation Parser," Bioinformatics (20:18), 2004, pp. 3370--3378.
[19]
Mladenic, D., and Institute, S. "Text-Learning and Related Intelligent Agents: A Survey," IEEE Intelligent Systems (14:4), 1999, pp. 44--54.
[20]
Paccagnella, L. "Getting the Seats of Your Pants Dirty: Strategies for Ethnographic Research on Virtual Communities," Journal of Computer Mediated Communication (3:1), 1997.
[21]
Pang, B., Lee, L., and Vaithyanathain, S. "Thumbs up? Sentiment classification using machine learning techniques", in proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2002), 2002.
[22]
Panteli, N. "Richness, Power Cues and Email Text," Information and Management, 2002, pp. 75--86.
[23]
Picard, R. W. Affective Computing, MIT Press, Cambridge, MA., 1997.
[24]
Sack, W. "Conversation Map: An Interface for Very Large-Scale Conversations," Journal of Management Information Systems (17:3), 2000, pp. 73--92.
[25]
Santini, M. "A Shallow Approach to Syntactic Feature Extraction for Genre Classification," in Proceedings of the 7th Annual Colloquium for the UK Special Interest Group for Computational Linguistics (CLUK 04), 2004.
[26]
Shen, R., Srinivas, N., Fan, W., D. S. Torres, R., Fox, E. A. "Exploring Digital Libraries: Integrating Browsing, Searching, and Visualization," In Proceedings of the Joint Conference on Digital Libraries (JCDL'06), Chapel Hill, North Carolina.
[27]
Smith, M, A., and Fiore, A, T. "Visualization Components for Persistent Conversations," Proceedings of the SIGCHI conference on Human factors in computing systems, Seattle, Washington, United States, 2001, pp. 136--143.
[28]
Smith, M. "Tools for Navigating Large social Cyberspaces," Communications of ACM (45:4), 2002, pp. 51--55.
[29]
Subasic, P., and Huettner, A. "Affect Analysis of Text Using Fuzzy Semantic Typing," IEEE Transactions on Fuzzy Systems (9:4), 2001, pp. 483--496.
[30]
Turney, P, D., and Littman, M, L. "Measuring Praise and Criticism: Inference of Semantic Orientation from Association," ACM Transactions on Information Systems (21:4), 2003, pp. 315--346.
[31]
Viegas, F. B., and Smith, M. "Newsgroup Crowds and AuthorLines: Visualizing the Activity of Individuals in Conversational Cyberspaces," in Proceedings of the 37th Hawaii International Conference on System Sciences (HICSS, 04), Hawaii, USA, 2004.
[32]
Wasko, M, M., and Faraj, S. "Why Should I Share? Examining Social Capital and Knowledge Contribution in Electronic Networks of Practice," MIS Quarterly (29:1), 2005, pp. 35--57.
[33]
Wellman, B. "Computers Networks as Social Networks," Science (293), 2001, pp. 2031--2034.
[34]
Wenger, E, C., and Snyder, W, M. "Communities of Practice: The Organizational Frontier," Harvard Business Review, 2000.
[35]
Whitelaw, C., and Patrick, J. "Selecting Systemic Features for Text Classification," in Proceedings of AAAI Fall Symposium on Style and Meaning in Language, Art, and Music, 2004.
[36]
Wiebe, J., Wilson, T., Bruce, R., Bell, M., & Martin, M. "Learning Subjective Language," Computational Linguistics, 30(3), 2004, 277--308.
[37]
Xiong, R., Donath, J., "PeopleGarden: Creating Data Portraits for Users," in Proceedings of UIST 1999.
[38]
Zheng, R., Qin, Y., Huang, Z., and Chen, H. "A Framework for Authorship Analysis of Online Messages: Writing-style Features and Techniques," Journal of the American Society for Information Science and Technology (57:3), 2006, pp.378--393.
[39]
Zhu B. and Chen H. "Social Visualization for Computer-Mediated Communications: A Knowledge Management Perspective," in Proceedings of the Eleventh Workshop on Information Technologies and Systems 2001, Baton Rouge, LA, USA.

Cited By

View all

Index Terms

  1. Categorization and analysis of text in computer mediated communication archives using visualization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      JCDL '07: Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
      June 2007
      534 pages
      ISBN:9781595936448
      DOI:10.1145/1255175
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 18 June 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. computer mediated communication
      2. text mining
      3. visualization

      Qualifiers

      • Article

      Conference

      JCDL07
      JCDL07: Joint Conference on Digital Libraries
      June 18 - 23, 2007
      BC, Vancouver, Canada

      Acceptance Rates

      Overall Acceptance Rate 415 of 1,482 submissions, 28%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)7
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 15 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)Visualizing Large Collections of URLs Using the Hilbert CurveMachine Learning and Knowledge Extraction10.1007/978-3-031-14463-9_18(270-289)Online publication date: 23-Aug-2022
      • (2019)Register in computational language researchRegister Studies10.1075/rs.18015.arg1:1(100-135)Online publication date: 26-Apr-2019
      • (2017)The State of the Art in Sentiment VisualizationComputer Graphics Forum10.1111/cgf.1321737:1(71-96)Online publication date: 12-Jun-2017
      • (2016)Guidelines for Effective Usage of Text Highlighting TechniquesIEEE Transactions on Visualization and Computer Graphics10.1109/TVCG.2015.246775922:1(489-498)Online publication date: 31-Jan-2016
      • (2014)PerConProceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries10.5555/2740769.2740786(97-106)Online publication date: 8-Sep-2014
      • (2014)Descriptive AnalyticsProceedings of the 2014 IEEE Joint Intelligence and Security Informatics Conference10.1109/JISIC.2014.18(56-63)Online publication date: 24-Sep-2014
      • (2014)PerCon: A personal digital library for heterogeneous dataIEEE/ACM Joint Conference on Digital Libraries10.1109/JCDL.2014.6970155(97-106)Online publication date: Sep-2014
      • (2013)Rule-based visual mappings - with a case study on poetry visualizationProceedings of the 15th Eurographics Conference on Visualization10.1111/cgf.12125(381-390)Online publication date: 17-Jun-2013
      • (2013)Fingerprint matricesProceedings of the 15th Eurographics Conference on Visualization10.1111/cgf.12124(371-380)Online publication date: 17-Jun-2013
      • (2012)Relative N-gram signaturesProceedings of the 2012 IEEE Conference on Visual Analytics Science and Technology (VAST)10.1109/VAST.2012.6400484(103-112)Online publication date: 14-Oct-2012
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media