ABSTRACT
In this work, we propose a method to improve performance in biomedical article classification. We use Naïve Bayes and Maximum Entropy classifiers to classify real world biomedical articles. We describe a technique based on chi-square measure to discard irrelevant information from the data and to identify the most relevant keywords to the classification task. To improve classification performance, we used two merging operators, Max and Harmonic Mean proposed by Jongwoo et al (2010) to combine results of the two classifiers. The results show that the Maximum Entropy classifier shows the better performance at 500 top relevant keywords. It is also shown that combining the results of the two classifiers we can improve classification performance of real world biomedical data.
- Fuhr, N., Hartmanna, S., Lustig, G., Schwantner, M., and Tzeras, K. 1991. Air/X -- A rule-based multi-stage indexing system for lage subject fields. In Proceedings of RIAO'91, 606--623.Google Scholar
- Galathiya, A. S., Ganatra, A. P., and Bhensdadia, K. C. 2012. An Improved decision tree induction algorithm, with feature selection, cross validation, model complexity & reduced error pruning, IJSCIT march 2012.Google Scholar
- Feldman, R., Sanger, J. 2006. The Text Mining Handbook: advanced approaches in analyzing unstructured data. Cambridge University Press. Google Scholar
- Krallinger, M., et al. 2009 The BioCreative II. 5 challenge overview. In: Proc. The BioCreative II. 5 Workshop 2009 on Digital Annotations, pp. 7--9.Google Scholar
- Fragos, K., Maistros, I. 2006. A Goodness of Fit Test Approach in Information Retrieval. In journal of "Information Retrieval", Springer Netherlands, Volume 9, Number 3, p 331--342. Google ScholarDigital Library
- McCallum A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In AAAI/ICML-98 Workshop on Learning for Text Categorization.Google Scholar
- Fragos, K., Maistros, I., Skourlas, C. 2005. A X2-Weighted Maximum Entropy Model for Text Classification. In Proceedings of 2nd International Conference On N.L.U.C.S, Miami, Florida.Google Scholar
Index Terms
- Toward Improving Classification of Real World Biomedical Articles
Recommendations
Chinese text classification by the Naïve Bayes Classifier and the associative classifier with multiple confidence threshold values
Each type of classifier has its own advantages as well as certain shortcomings. In this paper, we take the advantages of the associative classifier and the Naive Bayes Classifier to make up the shortcomings of each other, thus improving the accuracy of ...
Feature selection for text classification with Naïve Bayes
As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm ...
Improving multiclass text classification with error-correcting output coding and sub-class partitions
AI'10: Proceedings of the 23rd Canadian conference on Advances in Artificial IntelligenceError-Correcting Output Coding (ECOC) is a general framework for multiclass text classification with a set of binary classifiers It can not only help a binary classifier solve multi-class classification problems, but also boost the performance of a ...
Comments