research-article

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

Authors:
Ahmed Abbasi

The University of Arizona, Tucson, AZ

The University of Arizona, Tucson, AZ
View Profile

,
Hsinchun Chen

The University of Arizona, Tucson, AZ

The University of Arizona, Tucson, AZ
View Profile

,
Arab Salem

The University of Arizona, Tucson, AZ

The University of Arizona, Tucson, AZ
View Profile

Authors Info & Claims

ACM Transactions on Information Systems Volume 26 Issue 3Article No.: 12pp 1–34https://doi.org/10.1145/1361684.1361685

Published:20 June 2008Publication History

ACM Transactions on Information Systems

Abstract

The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91% on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

References

Abbasi, A. and Chen, H. 2005. Identification and comparison of extremist-group Web forum messages using authorship analysis. IEEE Intell. Syst. 20, 5, 67--75. Google ScholarDigital Library
Abbasi, A. and Chen, H. 2006. Visualizing authorship for identification. In Proceedings of the 4th IEEE International Conference on Intelligence and Security Informatics, San Diego, CA, 60--71. Google ScholarDigital Library
Abbasi, A. and Chen, H. 2007a. Affect intensity analysis of Dark Web forums. In Proceedings of the 5th IEEE International Conference on Intelligence and Security Informatics, New Brunswick, NJ, 282--288.Google Scholar
Abbasi, A. and Chen, H. 2007b. Analysis of affect intensities in extremist group forums. In Intelligence and Security Informatics. E. Reid and H. Chen, Eds. Springer (forthcoming).Google Scholar
Alexouda, G. and Papparrizos, K. 2001. A genetic algorithm approach to the product line design problem using the seller's return criterion: An extensive comparative computational study. Eur. J. Oper. Res. 134, 165--178.Google ScholarCross Ref
Aggarwal, C. C., Orlin, J., and Tai, R. P. 1997. Optimized crossover for the independent set problem. Oper. Res. 45, 2, 226--234.Google ScholarDigital Library
Agrawal, R., Rajagopalan, S., Srikant, R., and Xu, Y. 2003. Mining newsgroups using networks arising from social behavior. In Proceedings of the 12th International World Wide Web Conference (WWW), 529--535. Google ScholarDigital Library
Balakrishnan, P. V., Gupta, R., and Jacob, V. S. 2004. Development of hybrid genetic algorithms for product line designs. IEEE Trans. Syst. Man Cybernet. 34, 1, 468--483. Google ScholarDigital Library
Beineke, P., Hastie, T., and Vaithyanathan, S. 2004. The sentimental factor: Improving review classification via human-provided information. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 263. Google ScholarDigital Library
Burris, V., Smith, E., and Strahm, A. 2000. White supremacist networks on the Internet. Sociol. Focus 33, 2, 215--235.Google ScholarCross Ref
Chen, A. and Gey, F. 2002. Building an Arabic stemmer for information retrieval. In Proceedings of the 11th Text Retrieval Conference (TREC), Gaithersburg, MD, 631--639.Google Scholar
Chen, H. 2006. Intelligence and Security Informatics for International Security: Information Sharing and Data Mining. Springer, London. Google ScholarDigital Library
Crilley, K. 2001. Information warfare: New battle fields, terrorists, propaganda, and the Internet. Aslib Proc. 53, 7, 250--264.Google ScholarCross Ref
Dash, M. and Liu, H. 1997. Feature selection for classification. Intell. Data Anal. 1, 131--156.Google ScholarCross Ref
Dave, K., Lawrence, S., and Pennock, D. M. 2003. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of the 12th International Conference on the World Wide Web (WWW), 519--528. Google ScholarDigital Library
De Vel, O., Anderson, A., Corney, M., and Mohay, G. 2001. Mining e-mail content for author identification forensics. ACM SIGMOD Rec. 30, 4, 55--64. Google ScholarDigital Library
Donath, J. 1999. Identity and deception in the virtual community. In Communities in Cyberspace, Routledge Press, London.Google Scholar
Efron, M. 2004. Cultural orientations: Classifying subjective documents by cocitation analysis. In Proceedings of the AAAI Fall Symposium Series on Style and Meaning in Language, Art, Music, and Design, 41--48.Google Scholar
Efron, M., Marchionini, G., and Zhiang, J. 2004. Implications of the recursive representation problem for automatic concept identification in on-line government information. In Proceedings of the Annual Meeting of the American Society for Information Science and Technology (ASIST) SIG-CR Workshop.Google Scholar
Fei, Z., Liu, J., and Wu, G. 2004. Sentiment classification using phrase patterns. In Proceedings of the 4th IEEE International Conference on Computer Information Technology, 1147--1152. Google ScholarDigital Library
Forman, G. 2003. An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289--1305. Google ScholarCross Ref
Gamon, M. 2004. Sentiment classification on customer feedback data: Noisy data, large feature vectors, and the role of linguistic analysis. In Proceedings of the 20th International Conference on Computational Linguistics, 841. Google ScholarDigital Library
Glaser, J., Dixit, J., and Green, D. P. 2002. Studying hate crime with the Internet: What makes racists advocate racial violence&quest; J. Social Issues 58, 1, 177--193.Google ScholarCross Ref
Grefenstette, G., Qu, Y., Shanahan, J. G., and Evans, D. A. 2004. Coupling niche browsers and affect analysis for an opinion mining application. In Proceedings of the 12th International Conference Recherche d'Information Assistee par Ordinateur, 186--194.Google Scholar
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. 2002. Gene selection for cancer classification using support vector machines. Mach. Learn. 46, 389--422. Google ScholarDigital Library
Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157--1182. Google ScholarDigital Library
Hatzivassiloglou, V. and McKeown, K. R. 1997. Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the Association of Computational Linguistics, 174--181. Google ScholarDigital Library
Hearst, M. A. 1992. Direction-Based text interpretation as an information access refinement. In Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval, P. Jacobs, Ed. Lawrence Erlbaum Associates, Mahwah, NJ. Google ScholarDigital Library
Henley, N. M., Miller, M. D., Beazley, J. A., Nguyen, D. N., Kaminsky, D., and Sanders, R. 2002. Frequency and specificity of referents to violence in news reports of anti-gay attacks. Discourse Soc. 13, 1, 75--104.Google ScholarCross Ref
Herring, S., Job-Sluder, K., Scheckler, R., and Barab, S. 2002. Searching for safety online: Managing “trolling” in a feminist forum. The Inf. Soc. 18, 5, 371--384.Google ScholarCross Ref
Herring, S. and Paolillo, J. C. 2006. Gender and genre variations in Weblogs. J. Sociolinguist. 10, 4, 439.Google ScholarCross Ref
Holland, J. 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI. Google ScholarDigital Library
Hu, M. and Liu, B. 2004. Mining and summarizing customer reviews. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 168--177. Google ScholarDigital Library
Jain, A. and Zongker, D. 1997. Feature selection: Evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. 19, 2, 153--158. Google ScholarDigital Library
Jiang, M., Jensen, E., Beitzel, S. and Argamon, S. 2004. Choosing the right bigrams for information retrieval. In Proceedings of the Meeting of the International Federation of Classification Societies.Google Scholar
Juola, P. and Baayen, H. 2005. A controlled-corpus experiment in authorship identification by cross-entropy. Literar. Linguist. Comput. 20, 59--67.Google ScholarCross Ref
Kanayama, H., Nasukawa, T., and Watanabe, H. 2004. Deeper sentiment analysis using machine translation technology. In Proceedings of the 20th International Conference on Computational Linguistics, 494--500. Google ScholarDigital Library
Kaplan, J. and Weinberg, L. 1998. The Emergence of a Euro-American Radical Right., Rutgers University Press, New Brunswick, NJ.Google Scholar
Kim, S. and Hovy, E. 2004. Determining the sentiment of opinions. In Proceedings of the 20th International Conference on Computational Linguistics, 1367--1373. Google ScholarDigital Library
Kjell, B. Woods, W. A., and Frieder, O. 1994. Discrimination of authorship using visualization. Inf. Process. Manage. 30, 1, 141--150. Google ScholarDigital Library
Koppel, M., Argamon, S., and Shimoni, A. R. 2002. Automatically categorizing written texts by author gender. Literar. Linguis. Comput. 17, 4, 401--412.Google ScholarCross Ref
Koppel, M. and Schler, J. 2003. Exploiting stylistic idiosyncrasies for authorship attribution. In Proceedings of the IJCAI Workshop on Computational Approaches to Style Analysis and Synthesis, Acapulco, Mexico.Google Scholar
Levine, D. 1996. Application of a hybrid genetic algorithm to airline crew scheduling. Comput. Oper. Res. 23, 6, 547--558. Google ScholarDigital Library
Leets, L. 2001. Responses to Internet hate sites: Is speech too free in cyberspace&quest; Commun. Law Policy 6, 2, 287--317.Google Scholar
Li, J., Zheng, R., and Chen, H. 2006. From fingerprint to writeprint. Commun. ACM 49, 4, 76--82. Google ScholarDigital Library
Li, J., Su, H., Chen, H., and Futscher, B. 2007. Optimal search-based gene subset selection for gene array cancer classification. IEEE Trans. Inf. Technol. Biomed (to appear). Google ScholarDigital Library
Liu, B., Hu, M., and Cheng, J. 2005. Opinion observer: Analyzing and comparing opinions on the Web. In Proceedings of the 14th International World Wide Web Conference (WWW), 342--351. Google ScholarDigital Library
Martin, J. R. and White, P. R. R. 2005. The Language of Evaluation: Appraisal in English. Palgrave, London.Google Scholar
Mishne, G. 2005. Experiments with mood classification. In Proceedings of the 1st Workshop on Stylistic Analysis of Text for Information Access, Salvador, Brazil.Google Scholar
Mitra, M., Buckley, C., Singhal, A., and Cardie, C. 1997. An analysis of statistical and syntactic phrases. In Proceedings of the 5th International Conference Recherche d'Information Assistee par Ordinateur, Montreal, Canada, 200--214.Google Scholar
Mladenic, D., Brank, J., Grobelnik, M., and Milic-Frayling, N. 2004. Feature selection using linear classifier weights: Interaction with classification models. In Proceedings of the 27th ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 234--241. Google ScholarDigital Library
Morinaga, S., Yamanishi, K., Tateishi, K., and Fukushima, T. 2002. Mining product reputations on the Web. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 341--349. Google ScholarDigital Library
Mullen, T. and Collier, N. 2004. Sentiment analysis using support vector machines with diverse information sources. In Proceedings of the Empirical Methods in Natural Language Processing (EMNLP) Conference, Barcelona, Spain, 412--418.Google Scholar
Nasukawa, T. and Yi, J. 2003. Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the 2nd International Conference on Knowledge Capture, Sanibel Island, FL, 70--77. Google ScholarDigital Library
Nigam, K. and Hurst, M. 2004. Towards a robust metric of opinion. In Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text.Google Scholar
Ng, V., Dasgupta, S., and Arifin, S. M. N. 2006. Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In Proceedings of the COLING/ACL Conference. Sydney, Australia, 611--618. Google ScholarDigital Library
Oliveira, L. S., Sabourin, R., Bortolozzi, F., and Suen, C. Y. 2002. Feature selection using multi-objective genetic algorithms for handwritten digit recognition. In Proceedings of the 16th International Conference on Pattern Recognition, 568--571. Google ScholarDigital Library
Pang, B., Lee, L., and Vaithyanathain, S. 2002. Thumbs up&quest; Sentiment classification using machine learning techniques. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 79--86. Google ScholarDigital Library
Pang, B. and Lee, L. 2004. A sentimental education: Sentimental analysis using subjectivity summarization based on minimum cuts. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 271--278. Google ScholarDigital Library
Picard, R. W. 1997. Affective Computing. MIT Press, Cambridge, MA. Google ScholarDigital Library
Platt, J. 1999. Fast training on SVMs using sequential minimal optimization. In Advances in Kernel Methods: Support Vector Learning. B. Scholkopf et al. Eds., MIT Press, Cambridge, MA, 185--208. Google ScholarDigital Library
Quinlan, J. R. 1986. Induction of decision trees. Mach. Learn. 1, 1, 81--106. Google ScholarCross Ref
Riloff, E., Patwardhan, S., and Wiebe, J. 2006. Feature subsumption for opinion analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. Sydney, Australia, 440--448. Google ScholarDigital Library
Riloff, E., Wiebe, J., and Wilson, T. 2003. Learning subjective nouns using extraction pattern bootstrapping. In Proceedings of the 7th Conference on Natural Language Learning, Edmonton, Canada, 25--32. Google ScholarDigital Library
Robinson, L. 2005. Debating the events of September 11th: Discursive and interactional dynamics in three online for a. J. Comput. Mediat. Commun. 10, 4.Google ScholarCross Ref
Schafer, J. 2002. Spinning the web of hate: Web-based hate propagation by extremist organizations. J. Criminal Just. Popular Culture 9, 2, 69--88.Google Scholar
Schler, J., Koppel, M., Argamon, S., and Pennebaker, J. 2006. Effects of age and gender on blogging. In Proceedings of the AAAI Spring Symposium Computational Approaches to Analyzing Weblogs, Menlo Park, CA, 191--197.Google Scholar
Sebastiani, F. 2002. Machine learning in automated text categorization. ACM Comput. Surv. 34, 1, 1--47. Google ScholarDigital Library
Shannon, C. E. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27, 4, 379--423.Google ScholarCross Ref
Siedlecki, W. and Sklansky, J. 1989. A note on genetic algorithms for large-scale feature selection. Pattern Recogn. Lett. 10, 5, 335--347. Google ScholarDigital Library
Subasic, P. and Huettner, A. 2001. Affect analysis of text using fuzzy semantic typing. IEEE Trans. Fuzzy Syst. 9, 4, 483--496. Google ScholarDigital Library
Tong, R. 2001. An operational system for detecting and tracking opinions in on-line discussion. In Proceedings of the ACM SIGIR Workshop on Operational Text Classification, 1--6.Google Scholar
Turney, P. D. 2002. Thumbs up or thumbs down&quest; Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meetings of the Association for Computational Linguistics, Philadelphia, PA, 417--424. Google ScholarDigital Library
Turney, P, D. and Littman, M. L. 2003. Measuring praise and criticism: Inference of semantic orientation from association. ACM Trans. Inf. Syst. 21, 4, 315--346. Google ScholarDigital Library
Vafaie, H. and Imam, I. F. 1994. Feature selection methods: Genetic algorithms vs. greedy-like search. In Proceedings of the International Conference on Fuzzy and Intelligent Control Systems.Google Scholar
Viegas, F. B. and Smith, M. 2004. Newsgroup crowds and AuthorLines: Visualizing the activity of individuals in conversational cyberspaces. In Proceedings of the 37th Hawaii International Conference on System Sciences, Hawaii, USA. Google ScholarDigital Library
Whitelaw, C., Garg, N., and Argamon, S. 2005. Using appraisal groups for sentiment analysis. In Proceedings of the 14th ACM Conference on Information and Knowledge Management, 625--631. Google ScholarDigital Library
Wiebe, J. 1994. Tracking point of view in narrative. Comput. Linguist. 20, 2, 233--287. Google ScholarDigital Library
Wiebe, J., Wilson, T., and Bell, M. 2001. Identifying collocations for recognizing opinions. In Proceedings of the ACL/EACL Workshop on Collocation, Toulouse, France.Google Scholar
Wiebe, J., Wilson, T., Bruce, R., Bell, M., and Martin, M. 2004. Learning subjective language. Comput. Linguist. 30, 3, 277--308. Google ScholarDigital Library
Wiebe, J., Wilson, T., and Cardie, C. 2005. Annotating expressions of opinions and emotions in language. Lang. Resources Eval. 1, 2,165--210.Google ScholarCross Ref
Witten, I. H. and Frank, E. 2005. Data Mining: Practical Machine Learning Tools and Techniques, 2nd ed. Morgan Kaufmann, San Francisco, CA. Google ScholarDigital Library
Wilson, T., Wiebe, J., and Hoffman, P. 2005. Recognizing contextual polarity in phrase-level sentiment analysis. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, British Columbia, Canada, 347--354. Google ScholarDigital Library
Yang, Y. and Pederson, J. O. 1997. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning, 412--420. Google ScholarDigital Library
Yang, J. and Honavar, V. 1998. Feature subset selection using a genetic algorithm. IEEE Intell. Syst. 13, 2, 44--49. Google ScholarDigital Library
Yi, J., Nasukawa, T., Bunescu, R., and Niblack, W. 2003. Sentiment analyzer: Extracting sentiments about a given topic using natural language processing techniques. In Proceedings of the 3rd IEEE International Conference on Data Mining, 427--434. Google ScholarDigital Library
Yi, J. and Niblack, W. 2005. Sentiment mining in WebFountain. In Proceedings of the 21st International Conference on Data Engineering, 1073--1083. Google ScholarDigital Library
Yu, H. and Hatzivassiloglou, V. 2003. Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 129--136. Google ScholarDigital Library
Zheng, R., Li, J., Huang, Z., and Chen, H. 2006. A framework for authorship analysis of online messages: Writing-Style features and techniques. J. Amer. Soc. Inf. Sci. Technol. 57, 3, 378--393. Google ScholarDigital Library
Zhou, Y., Reid, E., Qin, J., Chen, H., and Lai, G. 2005. U.S. extremist groups on the Web: Link and content analysis. IEEE Intell. Syst. 20, 5, 44--51. Google ScholarDigital Library

Index Terms

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Language resources
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

A comparative study of feature selection and machine learning techniques for sentiment analysis
RACS '12: Proceedings of the 2012 ACM Research in Applied Computation Symposium

Sentiment analysis is performed to extract opinion and subjectivity knowledge from user generated text content. This is contextually different from traditional topic based text classification since it involves classifying opinionated text according to ...
Read More
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Read More
Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog

As a new form of social media, microblogging provides platform sharing, wherein users can share their feelings and ideas on certain topics. Bursty topics from microblogs are the results of the emerging issues that instantly attract more followers and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Information Systems Volume 26, Issue 3
June 2008
236 pages
ISSN:1046-8188
EISSN:1558-2868
DOI:10.1145/1361684
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2008
- Accepted: 1 July 2007
- Revised: 1 June 2007
- Received: 1 December 2006
Published in tois Volume 26, Issue 3

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Sentiment analysis
feature selection
opinion mining
text classification
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 627
  Total Citations
  View Citations
- 6,984
  Total Downloads
- Downloads (Last 12 months)137
- Downloads (Last 6 weeks)32
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

A comparative study of feature selection and machine learning techniques for sentiment analysis

Joint sentiment/topic model for sentiment analysis

Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums

ACM Transactions on Information Systems

Abstract

References

Cited By

Index Terms

Recommendations

A comparative study of feature selection and machine learning techniques for sentiment analysis

Joint sentiment/topic model for sentiment analysis

Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media