ABSTRACT
Subject or prepositional content has been the focus of most classification research. Genre or style, on the other hand, is a different and important property of text, and automatic text genre classification is becoming important for classification and retrieval purposes as well as for some natural language processing research. In this paper, we present a method for automatic genre classification that is based on statistically selected features obtained from both subject-classified and genre classified training data. The experimental results show that the proposed method outperforms a direct application of a statistical learner often used for subject classification. We also observe that the deviation formula and discrimination formula using document frequency ratios also work as expected. We conjecture that this dual feature set approach can be generalized to improve the performance of subject classification as well.
- Ivan Bretan, John Dewe, Anders Hallberg, Niklas Wolkert, Jussi Karlgren, "Web-Specific Genre Visualization", Proc. of the 30th Hawaii International Conference on System Science, Jan 1997.Google Scholar
- Johan Dewe, Jussi Karlgren, Ivan Bretan, "Assembling a Balanced Corpus from the Internet", 11th Nordic Conference of Computational Linguistics, pages 100--107, Copenhagen, 1998.Google Scholar
- Andrew Dillon, Barbara A. Gushrowski, "Genre and the Web: Is the Personal Home Page the First Uniquely Digital Genre?", JASIS, 51(2):202--205, 2000. Google ScholarDigital Library
- Jussi Karlgren, "Stylistic Variation in an Information Retrieval Experiment", Proc. of the 2nd International Conference on New Methods in Language Processing-NeMLaP, 1996.Google Scholar
- Jussi Karlgren, Ivan Brettan, Johan Dewe, Anders Hallberg, Niklas Wolkert, "Iterative Information Retrieval Using Fast Clustering and Usage-Specific Genres", 8th DELOS Workshop on User Interfaces in Digital Libraries, pages 85--92, 1998.Google Scholar
- Jussi Karlgren, Douglass Cutting, "Recognizing Text Genres with Simple Metrics Using Discriminant Analysis", Proc. of COLING94, Kyoto, 1994. Google ScholarDigital Library
- Brett Kessler, Geoffrey Nunberg, Hinrich Schutze, "Automatic Detection of Text Genre", ACL'97, pages 32--38, July 1997. Google ScholarDigital Library
- D. Lewis and M. Ringuette, "Compariosn of two learning algorithms for text categorization," Proc. of the 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994.Google Scholar
- H. J. Oh, S. H. Myaeng, and M. Lee, "A practical?hypertext categorization method using links and incrementally available?class?information", Proc. of the 23rd ACM SIGIR Conference, pages 264--271, Athenes, Greece,?2000. Google ScholarDigital Library
- E.Stamatatos, N.Fakotakis, G. Kokkinakis, "Text Genre Detection Using Common Word Frequencies", Proc. of the 18th International Conference on COLING2000, 2000. Google ScholarDigital Library
- Y. Yang and X. Liu, "A re-examination of text categorization methods," Proc. Of the 22nd ACM SIGIR Conference, 1999. Google ScholarDigital Library
Index Terms
- Text genre classification with genre-revealing and subject-revealing features
Recommendations
An ensemble scheme based on language function analysis and feature engineering for text genre classification
Text genre classification is the process of identifying functional characteristics of text documents. The immense quantity of text documents available on the web can be properly filtered, organised and retrieved with the use of text genre classification,...
Music genre classification using MIDI and audio features
We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinaga's 3-root and 9-leaf genre data set. In order to compute distances between MIDI ...
Musical Genre Classification Using Ensemble of Classifiers
CIMSIM '12: Proceedings of the 2012 Fourth International Conference on Computational Intelligence, Modelling and SimulationMost automatic music genre classification researches have been focusing on combining information from different sources than the musical signal. This paper presents an ensemble approach for the automatic music genre classification problem using audio ...
Comments