ABSTRACT
In recent years, clustering has become a critical success factor for data analysis. Most clustering methods are sensitive to outliers, noise, presentation order, configuration architecture, Bellman's curse of dimensionality and complex shapes. They use the cost functions to reflect the general knowledge about internal structures and distributions of target data. There is no provided mechanism to reflect the dynamics of clustering environment on the data set. Hence, in the present study, an alternative numerical scheme (SC) was proposed to enhance the predictive accuracy of clustering. Our approach exploits variables selection techniques and Fuzzy Adaptive Resonance Theory to increase productivity of knowledge extraction.
- Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications, volume 27. ACM, 1998. Google ScholarDigital Library
- M Baena-Garcia, J M Carmona-Cejudo, G Castillo, and R Morales-Bueno. TF-SIDF: Term frequency, sketched inverse document frequency. In Intelligent Systems Design and Applications.Google Scholar
- Xu Baowen, Lu Jianjiang, and Huang Gangshi. A constrained non-negative matrix factorization in information retrieval. In Information Reuse and Integration, 2003. IRI 2003. IEEE International Conference on, pages 273-- 277, 2003.Google ScholarCross Ref
- P Berkhin. A survey of clustering data mining techniques. Grouping multidimensional data, pages 25--71, 2006.Google Scholar
- M Boukhadoum. Introduction to the information processing by neural networks Cours, Dic UQAM, 2010.Google Scholar
- C Djellali. Enhancing text Clustering model based on Truncated Singular Value Decomposition and Fuzzy ART and Cross Validation. 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 1078--1083, 2013. Google ScholarDigital Library
- C Djellali. Truncated singular value decomposition for semantic-based data retrieval. In Communications and Information Technology (ICCIT), 2013 Third International Conference on, pages 61--66, 2013.Google ScholarCross Ref
- Richard O Duda, Peter E Hart, and David G Stork. Pattern classification. Wiley, New York; Toronto, 2nd edition, 2001. Google ScholarDigital Library
- M Georgiopoulos, I Dagher, G L Heileman, and G Bebis. Properties of learning of a fuzzy ART variant. Neural networks, 12(6):837--850, 1999. Google ScholarDigital Library
- L O Hall, I B Ozyurt, and J C Bezdek. Clustering with a genetically optimized approach. Evolutionary Computation, IEEE Transactions on, 3(2):103--112, 1999. Google ScholarDigital Library
- Li Heping, Liu Jie, and Zhang Shuwu. Hierarchical Latent Dirichlet Allocation models for realistic action recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 1297--1300, 2011.Google Scholar
- H Isawa, H Matsushita, and Y Nishio. Fuzzy Adaptive Resonance Theory Combining Overlapped Category in consideration of connections. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pages 3595--3600, 2008.Google ScholarCross Ref
- Wei Jyh-Jong, Chang Chuang-Jan, Chou Nai-Kuan, and Jan Gwo-Jen. ECG data compression using truncated singular value decomposition. Information Technology in Biomedicine, IEEE Transactions on, 5(4):290-- 299, 2001. Google ScholarDigital Library
- S Puntheeranurak and S Sanprasert. Hybrid Naive Bayes Classifier Weighting and Singular Value Decomposition Technique for Recommender System. In Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on, pages 473--476, 2011.Google ScholarCross Ref
- Xu Rui and DWunsch II. Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16(3):645--678, 2005. Google ScholarDigital Library
- G Salton, A Wong, and C S Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11) 1975. Google ScholarDigital Library
- Han Yongkoo, Park Kisung, and Lee Young-Koo. Confident wrapper-type semi-supervised feature selection using an ensemble classifier. In Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on, pages 4581--4586, 2011.Google Scholar
- J Zhao, G Y Wang, Z F Wu, H Tang, and H Li. The study on technologies for feature selection. volume 2, pages 689--693 vol. 2. IEEE, 2002.Google Scholar
- B Issac and W J Jap. Implementing spam detection using Bayesian and Porter Stemmer keyword stripping approaches. In TENCON 2009-2009 IEEE Region 10 Conference, pages 1--5, 2009.Google ScholarCross Ref
- A N K Zaman, P Matsakis, and C Brown. Evaluation of stop word lists in text retrieval using Latent Semantic Indexing. In Digital Information Management (ICDIM), 2011 Sixth International.Google ScholarCross Ref
- C Djellali, JG Meunier, and S Delisle. A new approach to the evolution of Data Mining ontology. EGC-M 2012: The 3rd International Conference on the Extraction and Management of Knowledge - Maghreb, 2012.Google Scholar
- F Sebastiani. Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1,47, 2002. Google ScholarDigital Library
- E Gomez-Sanchez, Y A Dimitriadis, J M Cano-Izquierdo, and J Lopez- Coronado. Safe-ARTMAP: a new solution for reducing category proliferation in Fuzzy ARTMAP. In Neural Networks, 2001. Proceedings. IJCNN'01. International Joint Conference on, volume 2, pages 1197--1202. IEEE, 2001.Google Scholar
- Yang Xiaobing, Kong Fansheng, Xu Weihua, and Liu Bihong. Gaussian mixture density modeling and decomposition with weighted likelihood. In Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, volume 5, pages 4245,4249 Vol.5, 2004.Google ScholarCross Ref
- William B Frakes, Ricardo A., and Baeza-Yates. Information Retrieval: Data Structures Algorithms. Prentice-Hall, 2000. Google ScholarDigital Library
- Y Xiaobing, K Fansheng, X Weihua, and L Bihong. Gaussian mixture density modeling and decomposition with weighted likelihood. In Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, volume 5, pages 4245,4249 Vol.5, 2004.Google ScholarCross Ref
- F Xie, X Liu, and Q Hu. Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval. In Information Technology and Applications, 2009. IFITA '09. International Forum on, volume 3, pages 559,562, 2009. Google ScholarDigital Library
Index Terms
- A new conceptual model for dynamic text clustering Using unstructured text as a case
Recommendations
Text clustering using one-mode projection of document-word bipartite graphs
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied ComputingMany real life networks have an underlying bipartite structure based on which similarity between two nodes or data instances can be defined. For example, in the case of a document corpus, the similarity between a pair of documents can be assumed to ...
A novel incremental conceptual hierarchical text clustering method using CFu-tree
This paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation.For summarizing a cluster, we use the term-based feature extraction in text clustering.A new measure criterion, ...
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index
Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Comments