ABSTRACT
A thesaurus is a reference work that lists words grouped together according to similarity of meaning (containing synonyms and sometimes antonyms), in contrast to a dictionary, which contains definitions and pronunciations. Three kinds of relationships used in a thesaurus includes: (1) equivalency, (2) hierarchy, and finally (2) association. This paper proposes a novel method to develop a classification task in general Persian context while it employs a thesaurus. Two kinds of word relationships are employed in our used thesaurus: (1) equivalency, and (2) hierarchy. Each of these kinds has a weight that can be tuned. The paper explores all possible weights for the proper ones. After that a feature selection mechanism is also employed. A host of machine learning algorithms are employed as the classifier over the frequency based features. Experimental results indicate the usage of the best weights for these relationships; can lead to a good result.
- American Society of Indexers. Frequently Asked Questions Indexing. Index review in Books, Ireland. Available: http://www.asindexing.org/site/indfaq.shtmlGoogle Scholar
- Strehl A. and Ghosh J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3(Dec):583--617, (2002). Google ScholarDigital Library
- Hamshahri newspaper, http://www.hamshahrionline.irGoogle Scholar
- Yousefi, A.: Principles and methods for computerized indexing. Journal Books. Volume 9, Number 2., (2010) (in Persian)Google Scholar
- Turney, P.D.: Learning Algorithms for Keyphrase Extraction. Information Retrieval, 2(4), pp. 306--336, (1999). Google ScholarDigital Library
- Frank, E.: Domain-Based Extraction of Technical Keyphrases. International Joint Conference on Artificial Intelligence, India, (1999). Google ScholarDigital Library
- Liu, Y. and Ciliax, B.J., Borges, K., Dasigi, V., Ram, A., Navathe, S.B., and ingledine, R.: Comparison of two schemes for automatic keyword extraction from MEDLINE for functional gene clustering. Computational Systems Bioinformatics Conference, Stanford, (2005). Google ScholarDigital Library
- Frantzi, K., Ananiadou, S., and Mima, H.: Automatic Recognition of Multi-word Terms: the C-value/NC-value Method. Digital Libraries, 3(2), pp. 115--130, (2002).Google ScholarCross Ref
- Freitas, N., and Kaestner, A.: Automatic text summarization using a machine learning approach. Brazilian Symposium on Artificial Intelligence (SBIA), Brazil, (2005).Google Scholar
- Zhang, Y., Heywood, N.Z., and Milios, E.: World Wide Web Site Summarization Web Intelligence and Agent Systems. Technical Report, CS-2002-8, (2006).Google Scholar
- Hult, A.: Improved automatic keyword extraction given more linguistic knowledge. 8th Conference on Empirical Methods in Natural Language Processing, (2003). Google ScholarDigital Library
- Deegan, M.: Keyword Extraction with Thesauri and Content Analysis. URL: http://www.rlg.org/en/page.php?Page_ID=17068Google Scholar
- Hyun, D.: Automatic Keyword Extraction Using Category Correlation of Data. Heidelberg, pp. 224--230, (2006). Google ScholarDigital Library
- Witten, W. and Medley, I.H.: Thesaurus based automatic keyphrase indexing. 6th ACM/IEEE-CS JCDL '06 (Joint Conference on Digital Libraries) Google ScholarDigital Library
- Klein, M. and Steenbergen, W.V.: Thesaurus-based Retrieval of Case Law. 19th International JURIX conference, Paris, (2006). Google ScholarDigital Library
- Martinez, J.L.: Automatic Keyword Extraction for News Finder. Heidelberg, pp. 405--427, (2008).Google Scholar
- Shahabi, A.M.: abstract construction in Persian literature. Second International Conference on Cognitive Science, page 56, Tehran, (2002) (in Persian)Google Scholar
- Bahar, M.T.: Persian Grammar. Chapter IV, page 111, (1962). (in Persian)Google Scholar
- Khalouei, M.: indexing machine. Journal Books. Volume 6, Number 3. (2009) (in Persian)Google Scholar
- Karimi, Z. and Shamsfard, M.: Automatic summarization systems Persian literature. 12th International Conference of Computer Society of Iran, (2005). (in Persian)Google Scholar
- Parvin, H., Minaei-Bidgoli, B., and Dahbashi, A.: Improving Persian Text Classification Using Persian Thesaurus. Iberoamerican Congress on Pattern Recognition, pp. 391--398, (2011). Google ScholarDigital Library
- Hori, E.: A Manual to make and develop a multilingual thesaurus, Scientific Documentation Center, (2003). (in Persian)Google Scholar
- Daryabari M., Minaei-Bidgoli B., and Parvin H.: Localizing Program Logical Errors Using Extraction of Knowledge from Invariants. LNCS 6630: 124--135, (2011). Google ScholarDigital Library
- Fouladgar M.H., Minaei-Bidgoli B., and Parvin H.: On Possibility of Conditional Invariant Detection. 6881(2): 214--224, (2011). Google ScholarDigital Library
- Minaei-Bidgoli B., Parvin H., Alinejad-Rokny H., Alizadeh H., and Punch W.F.: Effects of resampling method and adaptation on clustering ensemble efficacy, Online, (2011).Google Scholar
- Parvin H. and Minaei-Bidgoli B.: Linkage Learning Based on Local Optima. LNCS 6922(1): 163--172, (2011). Google ScholarDigital Library
- Parvin, H., Helmi, H., and Minaei-Bidgoli, B., Alinejad-Rokny, H. and Shirgahi H.: Linkage Learning Based on Differences in Local Optimums of Building Blocks with One Optima. International Journal of the Physical Sciences 6(14): 3419--3425, (2011).Google Scholar
- Parvin H., Minaei-Bidgoli M., and Alizadeh H.: A New Clustering Algorithm with the Convergence Proof. LNCS 6881(1): 21--31, (2011). Google ScholarDigital Library
- Parvin H., Minaei-Bidgoli B., Alizadeh H., and Beigi A.: A Novel Classifier Ensemble Method Based on Class Weightening in Huge Dataset. LNCS 6676 (2): 144--150, (2011). Google ScholarDigital Library
- Parvin H., Minaei-Bidgoli B., and Alizadeh H.: Detection of Cancer Patients Using an Innovative Method for Learning at Imbalanced Datasets. LNCS 6954: 376--381, (2011). Google ScholarDigital Library
- Parvin H., Minaei-Bidgoli B., and Ghaffarian H.: An Innovative Feature Selection Using Fuzzy Entropy. LNCS 6677 (3): 576--585, (2011). Google ScholarDigital Library
- Parvin H., Minaei-Bidgoli B., and Parvin S.: A Metric to Evaluate a Cluster by Eliminating Effect of Complement Cluster. LNCS 7006: 246--254, (2011). Google ScholarDigital Library
- Parvin, H., Minaei-Bidgoli, B., Ghatei, S. and Alinejad-Rokny, H.: An Innovative Combination of Particle Swarm Optimization, Learning Automaton and Great Deluge Algorithms for Dynamic Environments. International Journal of the Physical Sciences 6(22): 5121 -- 5127, (2011).Google Scholar
- Parvin H., Minaei-Bidgoli B., Karshenas H., and Beigi A.: A New N-gram Feature Extraction-Selection Method for Malicious Code. LNCS 6594(2): 98--107, (2011). Google ScholarDigital Library
- Qodmanan H.R., Nasiri M., Minaei-Bidgoli B.: Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence, Expert Systems with Applications, 38(1): 288--298, (2011). Google ScholarDigital Library
- Bi Y., Bell D., Wang H., Guo G., and Guan J.: Combining multiple classifiers using dempster's rule text caractrization, Applied Artificial Intelligence: An International Journal, 21(3):211--239, (2007). Google ScholarDigital Library
- Tan S.: An effective refinement strategy for KNN text classifier, Expert Systems with Applications, 30(2):290--298, (2005). Google ScholarDigital Library
- Liao Y. and Vemuri V.R.: Use of K-Nearest Neighbor classifier for intrusion detection, Computers & Security, 21(5):439--448, (2002). Google ScholarDigital Library
- Chikh M.A., Saidi M., and Settouti N.: Diagnosis of Diabetes Diseases Using an Artificial Immune Recognition System2 (AIRS2) with Fuzzy K-nearest Neighbor, Journal of Medical Systems, Online, (2011). Google ScholarDigital Library
- Liu D.Y., Chen H.L., Yang B., Lv X.E., Li L.N., and Liu J.: Design of an Enhanced Fuzzy k-nearest Neighbor Classifier Based Computer Aided Diagnostic System for Thyroid Disease, Journal of Medical Systems, Online, (2011). Google ScholarDigital Library
- Arif M., Malagore I.A., and Afsar F.A.: Detection and Localization of Myocardial Infarction using K-nearest Neighbor Classifier, Journal of Medical Systems, 36(1): 279--289, (2012). Google ScholarDigital Library
- Mejdoub M. and Amar C.B.: Classification improvement of local feature vectors over the KNN algorithm, Multimedia Tools and Applications, Online, (2011). Google ScholarDigital Library
- Aronson A.R.: Exploiting a Large Thesaurus for Information Retrieval. RIAO: 197--217, (1994).Google Scholar
- Scott S. and Matwin S.: Text Classification Using WordNet Hypernyms, USE OF WORDNET IN NATURAL LANGUAGE PROCESSING SYSTEMS, pp. 38--44, (1998).Google Scholar
- Yang, T.: Computational Verb Decision Trees. International Journal of Computational Cognition, pp. 34--46, (2006).Google Scholar
- Munkres, J.: Algorithms for the Assignment and Transportation Problems. Journal of the Society for Industrial and Applied Mathematics, 5(1):32--38 (1957).Google Scholar
- Tsatsaronis G., Varlamis I., and Vazirgiannis M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, no. 1: 1--40 (2010). Google ScholarDigital Library
- Lloréns J., and Astudillo H.: Automatic generation of hierarchical taxonomies from free text using linguistic algorithms. Advances in Object-Oriented Information Systems. Springer Berlin Heidelberg, 74--83 (2002). Google ScholarDigital Library
- Tashakori M., Meybodi M.R., and Oroumchian F.: Bon: The Persian Stemmer. First EurAsian Conference on Information and Communication Technology, pp. 487--494 (2002). Google ScholarDigital Library
Index Terms
- Exploring Weights of Hierarchical and Equivalency Relationship in General Persian Texts
Recommendations
Automatic Persian WordNet construction
COLING '10: Proceedings of the 23rd International Conference on Computational Linguistics: PostersIn this paper, an automatic method for Persian WordNet construction based on Prenceton WordNet 2.1 (PWN) is introduced. The proposed approach uses Persian and English corpora as well as a bilingual dictionary in order to make a mapping between PWN ...
Unsupervised identification of persian compound verbs
MICAI'11: Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part IOne of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian ...
Comments