research-article

A new conceptual model for dynamic text clustering Using unstructured text as a case

Author:
Choukri Djellali

LATECE, UQAM, Laboratory for research on technology for E-commerce, Montréal, Canada

LATECE, UQAM, Laboratory for research on technology for E-commerce, Montréal, Canada
View Profile

C3S2E '14: Proceedings of the 2014 International C* Conference on Computer Science & Software EngineeringAugust 2014Article No.: 13Pages 1–7https://doi.org/10.1145/2641483.2641538

Published:03 August 2014Publication History

C3S2E '14: Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering

Pages 1–7

ABSTRACT

In recent years, clustering has become a critical success factor for data analysis. Most clustering methods are sensitive to outliers, noise, presentation order, configuration architecture, Bellman's curse of dimensionality and complex shapes. They use the cost functions to reflect the general knowledge about internal structures and distributions of target data. There is no provided mechanism to reflect the dynamics of clustering environment on the data set. Hence, in the present study, an alternative numerical scheme (SC) was proposed to enhance the predictive accuracy of clustering. Our approach exploits variables selection techniques and Fuzzy Adaptive Resonance Theory to increase productivity of knowledge extraction.

References

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications, volume 27. ACM, 1998. Google ScholarDigital Library
M Baena-Garcia, J M Carmona-Cejudo, G Castillo, and R Morales-Bueno. TF-SIDF: Term frequency, sketched inverse document frequency. In Intelligent Systems Design and Applications.Google Scholar
Xu Baowen, Lu Jianjiang, and Huang Gangshi. A constrained non-negative matrix factorization in information retrieval. In Information Reuse and Integration, 2003. IRI 2003. IEEE International Conference on, pages 273-- 277, 2003.Google ScholarCross Ref
P Berkhin. A survey of clustering data mining techniques. Grouping multidimensional data, pages 25--71, 2006.Google Scholar
M Boukhadoum. Introduction to the information processing by neural networks Cours, Dic UQAM, 2010.Google Scholar
C Djellali. Enhancing text Clustering model based on Truncated Singular Value Decomposition and Fuzzy ART and Cross Validation. 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 1078--1083, 2013. Google ScholarDigital Library
C Djellali. Truncated singular value decomposition for semantic-based data retrieval. In Communications and Information Technology (ICCIT), 2013 Third International Conference on, pages 61--66, 2013.Google ScholarCross Ref
Richard O Duda, Peter E Hart, and David G Stork. Pattern classification. Wiley, New York; Toronto, 2nd edition, 2001. Google ScholarDigital Library
M Georgiopoulos, I Dagher, G L Heileman, and G Bebis. Properties of learning of a fuzzy ART variant. Neural networks, 12(6):837--850, 1999. Google ScholarDigital Library
L O Hall, I B Ozyurt, and J C Bezdek. Clustering with a genetically optimized approach. Evolutionary Computation, IEEE Transactions on, 3(2):103--112, 1999. Google ScholarDigital Library
Li Heping, Liu Jie, and Zhang Shuwu. Hierarchical Latent Dirichlet Allocation models for realistic action recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pages 1297--1300, 2011.Google Scholar
H Isawa, H Matsushita, and Y Nishio. Fuzzy Adaptive Resonance Theory Combining Overlapped Category in consideration of connections. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pages 3595--3600, 2008.Google ScholarCross Ref
Wei Jyh-Jong, Chang Chuang-Jan, Chou Nai-Kuan, and Jan Gwo-Jen. ECG data compression using truncated singular value decomposition. Information Technology in Biomedicine, IEEE Transactions on, 5(4):290-- 299, 2001. Google ScholarDigital Library
S Puntheeranurak and S Sanprasert. Hybrid Naive Bayes Classifier Weighting and Singular Value Decomposition Technique for Recommender System. In Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on, pages 473--476, 2011.Google ScholarCross Ref
Xu Rui and DWunsch II. Survey of clustering algorithms. Neural Networks, IEEE Transactions on, 16(3):645--678, 2005. Google ScholarDigital Library
G Salton, A Wong, and C S Yang. A vector space model for automatic indexing. Communications of the ACM, 18(11) 1975. Google ScholarDigital Library
Han Yongkoo, Park Kisung, and Lee Young-Koo. Confident wrapper-type semi-supervised feature selection using an ensemble classifier. In Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC), 2011 2nd International Conference on, pages 4581--4586, 2011.Google Scholar
J Zhao, G Y Wang, Z F Wu, H Tang, and H Li. The study on technologies for feature selection. volume 2, pages 689--693 vol. 2. IEEE, 2002.Google Scholar
B Issac and W J Jap. Implementing spam detection using Bayesian and Porter Stemmer keyword stripping approaches. In TENCON 2009-2009 IEEE Region 10 Conference, pages 1--5, 2009.Google ScholarCross Ref
A N K Zaman, P Matsakis, and C Brown. Evaluation of stop word lists in text retrieval using Latent Semantic Indexing. In Digital Information Management (ICDIM), 2011 Sixth International.Google ScholarCross Ref
C Djellali, JG Meunier, and S Delisle. A new approach to the evolution of Data Mining ontology. EGC-M 2012: The 3rd International Conference on the Extraction and Management of Knowledge - Maghreb, 2012.Google Scholar
F Sebastiani. Machine learning in automated text categorization. ACM computing surveys (CSUR), 34(1):1,47, 2002. Google ScholarDigital Library
E Gomez-Sanchez, Y A Dimitriadis, J M Cano-Izquierdo, and J Lopez- Coronado. Safe-ARTMAP: a new solution for reducing category proliferation in Fuzzy ARTMAP. In Neural Networks, 2001. Proceedings. IJCNN'01. International Joint Conference on, volume 2, pages 1197--1202. IEEE, 2001.Google Scholar
Yang Xiaobing, Kong Fansheng, Xu Weihua, and Liu Bihong. Gaussian mixture density modeling and decomposition with weighted likelihood. In Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, volume 5, pages 4245,4249 Vol.5, 2004.Google ScholarCross Ref
William B Frakes, Ricardo A., and Baeza-Yates. Information Retrieval: Data Structures Algorithms. Prentice-Hall, 2000. Google ScholarDigital Library
Y Xiaobing, K Fansheng, X Weihua, and L Bihong. Gaussian mixture density modeling and decomposition with weighted likelihood. In Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on, volume 5, pages 4245,4249 Vol.5, 2004.Google ScholarCross Ref
F Xie, X Liu, and Q Hu. Comparison Probabilistic Latent Semantic Indexing Model In Chinese Information Retrieval. In Information Technology and Applications, 2009. IFITA '09. International Forum on, volume 3, pages 559,562, 2009. Google ScholarDigital Library

Index Terms

A new conceptual model for dynamic text clustering Using unstructured text as a case
1. Computing methodologies
  1. Machine learning

Recommendations

Text clustering using one-mode projection of document-word bipartite graphs
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing

Many real life networks have an underlying bipartite structure based on which similarity between two nodes or data instances can be defined. For example, in the case of a document corpus, the similarity between a pair of documents can be assumed to ...
Read More
A novel incremental conceptual hierarchical text clustering method using CFu-tree

This paper presents a novel down-top incremental conceptual hierarchical text clustering approach using CFu-tree (ICHTC-CF) representation.For summarizing a cluster, we use the term-based feature extraction in text clustering.A new measure criterion, ...
Read More
Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Classical clustering methods, such as partitioning and hierarchical clustering algorithms, often fail to deliver satisfactory results, given clusters of arbitrary shapes. Motivated by a clustering validity index based on inter-cluster and intra-cluster ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
C3S2E '14: Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering
August 2014
201 pages
ISBN:9781450327121
DOI:10.1145/2641483
General Chair:
Bipin C. Desai
Concordia University, Canada
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 August 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
indexation
information retrieval
learning
variables selection
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate12of42submissions,29%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 104
  Total Downloads
- Downloads (Last 12 months)1
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A new conceptual model for dynamic text clustering Using unstructured text as a case

C3S2E '14: Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Text clustering using one-mode projection of document-word bipartite graphs

A novel incremental conceptual hierarchical text clustering method using CFu-tree

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A new conceptual model for dynamic text clustering Using unstructured text as a case

C3S2E '14: Proceedings of the 2014 International C* Conference on Computer Science & Software Engineering

ABSTRACT

References

Cited By

Index Terms

Recommendations

Text clustering using one-mode projection of document-word bipartite graphs

A novel incremental conceptual hierarchical text clustering method using CFu-tree

Self-Organizing-Map Based Clustering Using a Local Clustering Validity Index

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media