skip to main content
10.1145/2641483.2641538acmotherconferencesArticle/Chapter ViewAbstractPublication PagesuccsConference Proceedingsconference-collections
research-article

A new conceptual model for dynamic text clustering Using unstructured text as a case

Authors Info & Claims
Published:03 August 2014Publication History

ABSTRACT

In recent years, clustering has become a critical success factor for data analysis. Most clustering methods are sensitive to outliers, noise, presentation order, configuration architecture, Bellman's curse of dimensionality and complex shapes. They use the cost functions to reflect the general knowledge about internal structures and distributions of target data. There is no provided mechanism to reflect the dynamics of clustering environment on the data set. Hence, in the present study, an alternative numerical scheme (SC) was proposed to enhance the predictive accuracy of clustering. Our approach exploits variables selection techniques and Fuzzy Adaptive Resonance Theory to increase productivity of knowledge extraction.

References

  1. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications, volume 27. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M Baena-Garcia, J M Carmona-Cejudo, G Castillo, and R Morales-Bueno. TF-SIDF: Term frequency, sketched inverse document frequency. In Intelligent Systems Design and Applications.Google ScholarGoogle Scholar
  3. Xu Baowen, Lu Jianjiang, and Huang Gangshi. A constrained non-negative matrix factorization in information retrieval. In Information Reuse and Integration, 2003. IRI 2003. IEEE International Conference on, pages 273-- 277, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  4. P Berkhin. A survey of clustering data mining techniques. Grouping multidimensional data, pages 25--71, 2006.Google ScholarGoogle Scholar
  5. M Boukhadoum. Introduction to the information processing by neural networks Cours, Dic UQAM, 2010.Google ScholarGoogle Scholar
  6. C Djellali. Enhancing text Clustering model based on Truncated Singular Value Decomposition and Fuzzy ART and Cross Validation. 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 1078--1083, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C Djellali. Truncated singular value decomposition for semantic-based data retrieval. In Communications and Information Technology (ICCIT), 2013 Third International Conference on, pages 61--66, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  8. Richard O Duda, Peter E Hart, and David G Stork. Pattern classification. Wiley, New York; Toronto, 2nd edition, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M Georgiopoulos, I Dagher, G L Heileman, and G Bebis. Properties of learning of a fuzzy ART variant. Neural networks, 12(6):837--850, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L O Hall, I B Ozyurt, and J C Bezdek. Clustering with a genetically optimized approach. Evolutionary Computation, IEEE Transactions on, 3(2):103--112, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Li Heping, Liu Jie, and Zhang Shuwu. Hierarchical Latent Dirichlet Allocation models for realistic action recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 1297--1300, 2011.Google ScholarGoogle Scholar
  12. H Isawa, H Matsushita, and Y Nishio. Fuzzy Adaptive Resonance Theory Combining Overlapped Category in consideration of connections. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pages 3595--3600, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  13. Wei Jyh-Jong, Chang Chuang-Jan, Chou Nai-Kuan, and Jan Gwo-Jen. ECG data compression using truncated singular value decomposition. Information Technology in Biomedicine, IEEE Transactions on, 5(4):290-- 299, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S Puntheeranurak and S Sanprasert. Hybrid Naive Bayes Classifier Weighting and Singular Value Decomposition Technique for Recommender System. In Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on, pages 473--476, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  15. Xu Rui and DWunsch II. Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16(3):645--678, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G Salton, A Wong, and C S Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11) 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Han Yongkoo, Park Kisung, and Lee Young-Koo. Confident wrapper-type semi-supervised feature selection using an ensemble classifier. In Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on, pages 4581--4586, 2011.Google ScholarGoogle Scholar
  18. J Zhao, G Y Wang, Z F Wu, H Tang, and H Li. The study on technologies for feature selection. volume 2, pages 689--693 vol. 2. IEEE, 2002.Google ScholarGoogle Scholar
  19. B Issac and W J Jap. Implementing spam detection using Bayesian and Porter Stemmer keyword stripping approaches. In TENCON 2009-2009 IEEE Region 10 Conference, pages 1--5, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  20. A N K Zaman, P Matsakis, and C Brown. Evaluation of stop word lists in text retrieval using Latent Semantic Indexing. In Digital Information Management (ICDIM), 2011 Sixth International.Google ScholarGoogle ScholarCross RefCross Ref
  21. C Djellali, JG Meunier, and S Delisle. A new approach to the evolution of Data Mining ontology. EGC-M 2012: The 3rd International Conference on the Extraction and Management of Knowledge - Maghreb, 2012.Google ScholarGoogle Scholar
  22. F Sebastiani. Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1,47, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. E Gomez-Sanchez, Y A Dimitriadis, J M Cano-Izquierdo, and J Lopez- Coronado. Safe-ARTMAP: a new solution for reducing category proliferation in Fuzzy ARTMAP. In Neural Networks, 2001. Proceedings. IJCNN'01. International Joint Conference on, volume 2, pages 1197--1202. IEEE, 2001.Google ScholarGoogle Scholar
  24. Yang Xiaobing, Kong Fansheng, Xu Weihua, and Liu Bihong. Gaussian mixture density modeling and decomposition with weighted likelihood. In Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, volume 5, pages 4245,4249 Vol.5, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  25. William B Frakes, Ricardo A., and Baeza-Yates. Information Retrieval: Data Structures Algorithms. Prentice-Hall, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y Xiaobing, K Fansheng, X Weihua, and L Bihong. Gaussian mixture density modeling and decomposition with weighted likelihood. In Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, volume 5, pages 4245,4249 Vol.5, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  27. F Xie, X Liu, and Q Hu. Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval. In Information Technology and Applications, 2009. IFITA '09. International Forum on, volume 3, pages 559,562, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A new conceptual model for dynamic text clustering Using unstructured text as a case

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      C3S2E '14: Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering
      August 2014
      201 pages
      ISBN:9781450327121
      DOI:10.1145/2641483

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 August 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      Overall Acceptance Rate12of42submissions,29%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader