ABSTRACT
An important feature in data analysis is the exploration and data representation. This article describes the Principal Components Analysis techniques (PCA) and clusters analysis with k-means, in order to represent a set of two-dimensional spatial data and group similar data to find relationships between the two techniques. Data is extracted from IEEE Xplore digital library, which lacks processing tools and information display since it doesn't permit analysis and identification of trends and patterns in a query.
At the end of the article, is discussed as a technique of data analysis unsupervised allows grouping and organizing of data by proximity based on the variance, finding similar keywords between groups and major components, allowing temporary and evolutionary view of a set of keywords, which can later be interpreted as topics and areas of exploration and research.
- Berna Altınel, Murat Can Ganiz, and Banu Diri. 2015. A corpus-based semantic kernel for text classification by using meaning values of terms. Engineering Applications of Artificial Intelligence 43: 54--66. http://dx.doi.org/10.1016/j.engappai.2015.03.015 Google ScholarDigital Library
- John P Anzola Anzola, Luz Andrea Rodriguez Rojas, and Giovanny M Tarazona Bermudez. 2015. Knowledge Management in Organizations: 10th International Conference, KMO 2015, Maribor, Slovenia, August 24-28, 2015, Proceedings. In Lorna Uden, Marjan Hericko and I-Hsien Ting (eds.). Springer International Publishing, Cham, 463--476. http://doi.org/10.1007/978-3-319-21009-4_36Google Scholar
- Nabil Arman. 2010. e-Learning Materials Development: Implementing Software Reuse Principles and Granularity Levels in the Small Using Taxonomy Search. Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications, ACM, 19:1--19:6. http://doi.org/10.1145/1874590.1874609 Google ScholarDigital Library
- Tao Cheng and Jochen Teizer. 2013. Real-time resource location data collection and visualization technology for construction safety and activity monitoring applications. Automation in Construction 34: 3--15 http://dx.doi.org/10.1016/j.autcon.2012.10.017Google Scholar
- Y Q Cheng, H C Li, T Celik, and F Zhang. 2013. FRFT-based improved algorithm of unsupervised change detection in SAR images via PCA and K-means clustering. Geoscience and Remote Sensing Symposium (IGARSS), 2013 IEEE International, 1952--1955. http://dx.doi.org/10.1109/IGARSS.2013.6723189Google ScholarCross Ref
- J K Chiang and R.-H. Yang. 2013. Multidimensional data mining for discover association rules in various granularities. International Conference on Computer Applications Technology, ICCAT 2013. http://doi.org/10.1109/ICCAT.2013.6522021Google ScholarCross Ref
- Wei Ming Chiew, Feng Lin, Kemao Qian, and Hock Soon Seah. 2014. A heterogeneous computing system for coupling 3D endomicroscopy with volume rendering in real-time image visualization. Computers in Industry 65, 2: 367--381. http://dx.doi.org/10.1016/j.compind.2013.10.002 Google ScholarDigital Library
- Guang-Feng Deng and Woo-Tsong Lin. 2012. Citation analysis and bibliometric approach for ant colony optimization from 1996 to 2010. Expert Systems with Applications 39, 6: 6229--6237. http://dx.doi.org/10.1016/j.eswa.2011.12.001 Google ScholarDigital Library
- Chris Ding and Tao Li. 2007. Adaptive Dimension Reduction Using Discriminant Analysis and K-means Clustering. Proceedings of the 24th International Conference on Machine Learning, ACM, 521--528. http://doi.org/10.1145/1273496.1273562 Google ScholarDigital Library
- Z Fan, S Chen, L Zha, and J Yang. 2016. A Text Clustering Approach of Chinese News Based on Neural Network Language Model. International Journal of Parallel Programming 44, 1: 198--206. http://doi.org/10.1007/s10766-014-0329-2 Google ScholarDigital Library
- P Gautam. 2015. Deciphering the Department-Discipline Relationships within a University through Bibliometric Analysis of Publications Aided with Multi-variate Techniques. Advanced Applied Informatics (IIAI-AAI), 2015 IIAI 4th International Congress on, 468--471. http://doi.org/10.1109/IIAI-AAI.2015.212 Google ScholarDigital Library
- A S Ghareb, A A Bakar, and A R Hamdan. 2016. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Systems with Applications 49: 31--47. http://doi.org/10.1016/j.eswa.2015.12.004 Google ScholarDigital Library
- K.-Y. Ho and W Wang. 2016. Predicting stock price movements with news sentiment: An artificial neural network approach. Studies in Computational Intelligence 628: 395--403. http://doi.org/10.1007/978-3-319-28495-8_18Google Scholar
- K Honda, R Nonoguchi, A Notsu, and H Ichihashi. 2011. PCA-guided k-Means clustering with incomplete data. Fuzzy Systems (FUZZ), 2011 IEEE International Conference on, 1710--1714. http://doi.org/10.1109/FUZZY.2011.6007312Google ScholarCross Ref
- O C L Hou, Heigen Hsu, and J M Yang. 2010. An empirical investigation of research productivity on Text Mining #x2014; in bibliometrics view. New Trends in Information Science and Service Science (NISS), 2010 4th International Conference on, 646--650.Google Scholar
- IEEE. 2016. IEEE leads patent citations.Google Scholar
- Jahiruddin, Muhammad Abulaish, and Lipika Dey. 2010. A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. Journal of Biomedical Informatics 43, 6: 1020--1035. http://dx.doi.org/10.1016/j.jbi.2010.09.008 Google ScholarDigital Library
- Mikael Johansson, Mattias Roupé, and Petra Bosch-Sijtsema. 2015. Real-time visualization of building information models (BIM). Automation in Construction 54: 69--82. http://dx.doi.org/10.1016/j.autcon.2015.03.018Google ScholarCross Ref
- C Katherine Andrea Cuartas, A John Petearson Anzola, and B Giovanny Mauricio Tarazona. 2015. Classification methodology of research topics based in decision trees: J48 andrandomtree. International Journal of Applied Engineering Research 10, 8: 19413-19424. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0-84929933512&partnerID=40&md5=03c7360c0a771362b2b135b252f12021Google Scholar
- M Kaya and S Conley. 2016. Comparison of sentiment lexicon development techniques for event prediction. Social Network Analysis and Mining 6, 1: 1--13. http://doi.org/10.1007/s13278-015-0315-8Google ScholarCross Ref
- Ehsan Lotfi and Azita Keshavarz. 2014. Gene expression microarray classification using PCA--BEL. Computers in Biology and Medicine 54: 180--187. http://dx.doi.org/10.1016/j.compbiomed.2014.09.008 Google ScholarDigital Library
- T Matsui, K Honda, C H Oh, A Notsu, and H Ichihashi. 2009. Cluster validation in k-Means clustering based on PCA-guided k-Means and procrustean transformation of PC scores. Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference on, 1546--1550. http://doi.org/10.1109/FUZZY.2009.5277333Google ScholarCross Ref
- José M Merigó, Anna M Gil-Lafuente, and Ronald R Yager. 2015. An overview of fuzzy research with bibliometric indicators. Applied Soft Computing 27: 420--433. http://dx.doi.org/10.1016/j.asoc.2014.10.035 Google ScholarDigital Library
- Jussi Nikander, Ari Korhonen, Eiri Valanto, and Kirsi Virrantaus. 2007. Visualization of Spatial Data Structures on Different Levels of Abstraction. Electronic Notes in Theoretical Computer Science 178: 89--99. http://dx.doi.org/10.1016/j.entcs.2007.01.029 Google ScholarDigital Library
- Du Ping-ping, Li Wen-ping, Sang Shu-xun, Wang Lin-xiu, and Zhou Xiao-zhi. 2009. Application of 3D visualization concept layer model for coal-bed methane index system. Procedia Earth and Planetary Science 1, 1: 977--981. http://dx.doi.org/10.1016/j.proeps.2009.09.151Google ScholarCross Ref
- Daniel J Power and Ramesh Sharda. 2007. Model-driven decision support systems: Concepts and research directions. Decision Support Systems 43, 3: 1044--1061. http://dx.doi.org/10.1016/j.dss.2005.05.030 Google ScholarDigital Library
- M A Schuh, J M Banda, T Wylie, P McInerney, K Ganesan Pillai, and R A Angryk. 2015. On visualization techniques for solar data mining. Astronomy and Computing 10: 32--42. http://dx.doi.org/10.1016/j.ascom.2014.12.003Google ScholarCross Ref
- F N Silva, F A Rodrigues, O N Oliveira Jr, and L da F. Costa. 2013. Quantifying the interdisciplinarity of scientific journals and fields. Journal of Informetrics 7, 2: 469--477. http:/dx.doi.org/10.1016/j.joi.2013.01.007Google ScholarCross Ref
- F N Silva, F A Rodrigues, O N Oliveira Jr, et al. 2015. A corpus-based semantic kernel for text classification by using meaning values of terms. Automation in Construction 43, 6: 69--82. http://doi.org/10.1109/BigData.2014.7004345Google Scholar
- Thiago H P Silva, Mirella M Moro, Ana Paula C Silva, Wagner Meira Jr., and Alberto H F Laender. 2014. Community-based Endogamy As an Influence Indicator. Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, IEEE Press, 67--76. Retrieved from http://dl.acm.org/citation.cfm?id=2740782 Google ScholarDigital Library
- Xiaoli Sun, Dan Wu, and Chao Zhang. 2015. Based on bibliometrics and content analysis of the literature on science and technology media. Management of Engineering and Technology (PICMET), 2015 Portland International Conference on, 1339--1344. http://doi.org/10.1109/PICMET.2015.7273157Google Scholar
- Arthur Szlam. 2009. Asymptotic regularity of subdivisions of Euclidean domains by iterated {PCA} and iterated 2-means. Applied and Computational Harmonic Analysis 27, 3: 342--350. http:/dx.doi.org/10.1016/j.acha.2009.02.006Google ScholarCross Ref
- K Vijay and K Selvakumar. 2015. Brain FMRI clustering using interaction K-means algorithm with PCA. Communications and Signal Processing (ICCSP), 2015 International Conference on, 909--913. http://doi.org/10.1109/ICCSP.2015.7322628Google ScholarCross Ref
- Z Wu and H Ju. 2008. Research of Printed Matter Flaws Inspection Based on Improved K-Means and PCA. Computational Intelligence and Industrial Application, 2008. PACIIA '08. Pacific-Asia Workshop on, 247--251. http://doi.org/10.1109/PACIIA.2008.164 Google ScholarDigital Library
- Qin Xu, Chris Ding, Jinpei Liu, and Bin Luo. 2015. PCA-guided search for K-means. Pattern Recognition Letters 54: 50--55. http:/dx.doi.org/10.1016/j.patrec.2014.11.017Google ScholarCross Ref
- Shijie Zhang, Wei Jin, Ying Huang, Wei Su, Jiong Yang, and Zhaoyang Feng. 2011. Profiling a Caenorhabditis elegans behavioral parametric dataset with a supervised K-means clustering algorithm identifies genetic networks regulating locomotion. Journal of Neuroscience Methods 197, 2: 315--323. http:/dx.doi.org/10.1016/j.jneumeth.2011.02.014Google ScholarCross Ref
Recommendations
Principal components analysis of nonstationary time series data
The effect of nonstationarity in time series columns of input data in principal components analysis is examined. Nonstationarity are very common among economic indicators collected over time. They are subsequently summarized into fewer indices for purposes ...
π-means: Granular Approach towards Interactive Data Exploration
AbstractIn this paper, we examine the possibility of employing the idea of progressive-inductive (π) aggregation in the k-means algorithm. We base our work on the interactive visualization framework called Skydive which is a tightly coupled system that ...
Discriminant Eigenfaces: A New Ranking Method for Principal Components Analysis
SBIA '08: Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial IntelligencePrincipal Component Analysis (PCA) is one of the most successful approaches to the problem of creating a low dimensional data representation and interpretation. However, since PCA explains the covariance structure of all the data, the first principal ...
Comments