skip to main content
10.1145/2925995.2926007acmotherconferencesArticle/Chapter ViewAbstractPublication PageskmoConference Proceedingsconference-collections
research-article

Exploring data by PCA and k-means for IEEE Xplore digital library

Authors Info & Claims
Published:25 July 2016Publication History

ABSTRACT

An important feature in data analysis is the exploration and data representation. This article describes the Principal Components Analysis techniques (PCA) and clusters analysis with k-means, in order to represent a set of two-dimensional spatial data and group similar data to find relationships between the two techniques. Data is extracted from IEEE Xplore digital library, which lacks processing tools and information display since it doesn't permit analysis and identification of trends and patterns in a query.

At the end of the article, is discussed as a technique of data analysis unsupervised allows grouping and organizing of data by proximity based on the variance, finding similar keywords between groups and major components, allowing temporary and evolutionary view of a set of keywords, which can later be interpreted as topics and areas of exploration and research.

References

  1. Berna Altınel, Murat Can Ganiz, and Banu Diri. 2015. A corpus-based semantic kernel for text classification by using meaning values of terms. Engineering Applications of Artificial Intelligence 43: 54--66. http://dx.doi.org/10.1016/j.engappai.2015.03.015 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. John P Anzola Anzola, Luz Andrea Rodriguez Rojas, and Giovanny M Tarazona Bermudez. 2015. Knowledge Management in Organizations: 10th International Conference, KMO 2015, Maribor, Slovenia, August 24-28, 2015, Proceedings. In Lorna Uden, Marjan Hericko and I-Hsien Ting (eds.). Springer International Publishing, Cham, 463--476. http://doi.org/10.1007/978-3-319-21009-4_36Google ScholarGoogle Scholar
  3. Nabil Arman. 2010. e-Learning Materials Development: Implementing Software Reuse Principles and Granularity Levels in the Small Using Taxonomy Search. Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications, ACM, 19:1--19:6. http://doi.org/10.1145/1874590.1874609 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tao Cheng and Jochen Teizer. 2013. Real-time resource location data collection and visualization technology for construction safety and activity monitoring applications. Automation in Construction 34: 3--15 http://dx.doi.org/10.1016/j.autcon.2012.10.017Google ScholarGoogle Scholar
  5. Y Q Cheng, H C Li, T Celik, and F Zhang. 2013. FRFT-based improved algorithm of unsupervised change detection in SAR images via PCA and K-means clustering. Geoscience and Remote Sensing Symposium (IGARSS), 2013 IEEE International, 1952--1955. http://dx.doi.org/10.1109/IGARSS.2013.6723189Google ScholarGoogle ScholarCross RefCross Ref
  6. J K Chiang and R.-H. Yang. 2013. Multidimensional data mining for discover association rules in various granularities. International Conference on Computer Applications Technology, ICCAT 2013. http://doi.org/10.1109/ICCAT.2013.6522021Google ScholarGoogle ScholarCross RefCross Ref
  7. Wei Ming Chiew, Feng Lin, Kemao Qian, and Hock Soon Seah. 2014. A heterogeneous computing system for coupling 3D endomicroscopy with volume rendering in real-time image visualization. Computers in Industry 65, 2: 367--381. http://dx.doi.org/10.1016/j.compind.2013.10.002 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Guang-Feng Deng and Woo-Tsong Lin. 2012. Citation analysis and bibliometric approach for ant colony optimization from 1996 to 2010. Expert Systems with Applications 39, 6: 6229--6237. http://dx.doi.org/10.1016/j.eswa.2011.12.001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chris Ding and Tao Li. 2007. Adaptive Dimension Reduction Using Discriminant Analysis and K-means Clustering. Proceedings of the 24th International Conference on Machine Learning, ACM, 521--528. http://doi.org/10.1145/1273496.1273562 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z Fan, S Chen, L Zha, and J Yang. 2016. A Text Clustering Approach of Chinese News Based on Neural Network Language Model. International Journal of Parallel Programming 44, 1: 198--206. http://doi.org/10.1007/s10766-014-0329-2 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P Gautam. 2015. Deciphering the Department-Discipline Relationships within a University through Bibliometric Analysis of Publications Aided with Multi-variate Techniques. Advanced Applied Informatics (IIAI-AAI), 2015 IIAI 4th International Congress on, 468--471. http://doi.org/10.1109/IIAI-AAI.2015.212 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A S Ghareb, A A Bakar, and A R Hamdan. 2016. Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Systems with Applications 49: 31--47. http://doi.org/10.1016/j.eswa.2015.12.004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. K.-Y. Ho and W Wang. 2016. Predicting stock price movements with news sentiment: An artificial neural network approach. Studies in Computational Intelligence 628: 395--403. http://doi.org/10.1007/978-3-319-28495-8_18Google ScholarGoogle Scholar
  14. K Honda, R Nonoguchi, A Notsu, and H Ichihashi. 2011. PCA-guided k-Means clustering with incomplete data. Fuzzy Systems (FUZZ), 2011 IEEE International Conference on, 1710--1714. http://doi.org/10.1109/FUZZY.2011.6007312Google ScholarGoogle ScholarCross RefCross Ref
  15. O C L Hou, Heigen Hsu, and J M Yang. 2010. An empirical investigation of research productivity on Text Mining #x2014; in bibliometrics view. New Trends in Information Science and Service Science (NISS), 2010 4th International Conference on, 646--650.Google ScholarGoogle Scholar
  16. IEEE. 2016. IEEE leads patent citations.Google ScholarGoogle Scholar
  17. Jahiruddin, Muhammad Abulaish, and Lipika Dey. 2010. A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora. Journal of Biomedical Informatics 43, 6: 1020--1035. http://dx.doi.org/10.1016/j.jbi.2010.09.008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mikael Johansson, Mattias Roupé, and Petra Bosch-Sijtsema. 2015. Real-time visualization of building information models (BIM). Automation in Construction 54: 69--82. http://dx.doi.org/10.1016/j.autcon.2015.03.018Google ScholarGoogle ScholarCross RefCross Ref
  19. C Katherine Andrea Cuartas, A John Petearson Anzola, and B Giovanny Mauricio Tarazona. 2015. Classification methodology of research topics based in decision trees: J48 andrandomtree. International Journal of Applied Engineering Research 10, 8: 19413-19424. Retrieved from http://www.scopus.com/inward/record.url?eid=2-s2.0-84929933512&partnerID=40&md5=03c7360c0a771362b2b135b252f12021Google ScholarGoogle Scholar
  20. M Kaya and S Conley. 2016. Comparison of sentiment lexicon development techniques for event prediction. Social Network Analysis and Mining 6, 1: 1--13. http://doi.org/10.1007/s13278-015-0315-8Google ScholarGoogle ScholarCross RefCross Ref
  21. Ehsan Lotfi and Azita Keshavarz. 2014. Gene expression microarray classification using PCA--BEL. Computers in Biology and Medicine 54: 180--187. http://dx.doi.org/10.1016/j.compbiomed.2014.09.008 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T Matsui, K Honda, C H Oh, A Notsu, and H Ichihashi. 2009. Cluster validation in k-Means clustering based on PCA-guided k-Means and procrustean transformation of PC scores. Fuzzy Systems, 2009. FUZZ-IEEE 2009. IEEE International Conference on, 1546--1550. http://doi.org/10.1109/FUZZY.2009.5277333Google ScholarGoogle ScholarCross RefCross Ref
  23. José M Merigó, Anna M Gil-Lafuente, and Ronald R Yager. 2015. An overview of fuzzy research with bibliometric indicators. Applied Soft Computing 27: 420--433. http://dx.doi.org/10.1016/j.asoc.2014.10.035 Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jussi Nikander, Ari Korhonen, Eiri Valanto, and Kirsi Virrantaus. 2007. Visualization of Spatial Data Structures on Different Levels of Abstraction. Electronic Notes in Theoretical Computer Science 178: 89--99. http://dx.doi.org/10.1016/j.entcs.2007.01.029 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Du Ping-ping, Li Wen-ping, Sang Shu-xun, Wang Lin-xiu, and Zhou Xiao-zhi. 2009. Application of 3D visualization concept layer model for coal-bed methane index system. Procedia Earth and Planetary Science 1, 1: 977--981. http://dx.doi.org/10.1016/j.proeps.2009.09.151Google ScholarGoogle ScholarCross RefCross Ref
  26. Daniel J Power and Ramesh Sharda. 2007. Model-driven decision support systems: Concepts and research directions. Decision Support Systems 43, 3: 1044--1061. http://dx.doi.org/10.1016/j.dss.2005.05.030 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. M A Schuh, J M Banda, T Wylie, P McInerney, K Ganesan Pillai, and R A Angryk. 2015. On visualization techniques for solar data mining. Astronomy and Computing 10: 32--42. http://dx.doi.org/10.1016/j.ascom.2014.12.003Google ScholarGoogle ScholarCross RefCross Ref
  28. F N Silva, F A Rodrigues, O N Oliveira Jr, and L da F. Costa. 2013. Quantifying the interdisciplinarity of scientific journals and fields. Journal of Informetrics 7, 2: 469--477. http:/dx.doi.org/10.1016/j.joi.2013.01.007Google ScholarGoogle ScholarCross RefCross Ref
  29. F N Silva, F A Rodrigues, O N Oliveira Jr, et al. 2015. A corpus-based semantic kernel for text classification by using meaning values of terms. Automation in Construction 43, 6: 69--82. http://doi.org/10.1109/BigData.2014.7004345Google ScholarGoogle Scholar
  30. Thiago H P Silva, Mirella M Moro, Ana Paula C Silva, Wagner Meira Jr., and Alberto H F Laender. 2014. Community-based Endogamy As an Influence Indicator. Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, IEEE Press, 67--76. Retrieved from http://dl.acm.org/citation.cfm?id=2740782 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Xiaoli Sun, Dan Wu, and Chao Zhang. 2015. Based on bibliometrics and content analysis of the literature on science and technology media. Management of Engineering and Technology (PICMET), 2015 Portland International Conference on, 1339--1344. http://doi.org/10.1109/PICMET.2015.7273157Google ScholarGoogle Scholar
  32. Arthur Szlam. 2009. Asymptotic regularity of subdivisions of Euclidean domains by iterated {PCA} and iterated 2-means. Applied and Computational Harmonic Analysis 27, 3: 342--350. http:/dx.doi.org/10.1016/j.acha.2009.02.006Google ScholarGoogle ScholarCross RefCross Ref
  33. K Vijay and K Selvakumar. 2015. Brain FMRI clustering using interaction K-means algorithm with PCA. Communications and Signal Processing (ICCSP), 2015 International Conference on, 909--913. http://doi.org/10.1109/ICCSP.2015.7322628Google ScholarGoogle ScholarCross RefCross Ref
  34. Z Wu and H Ju. 2008. Research of Printed Matter Flaws Inspection Based on Improved K-Means and PCA. Computational Intelligence and Industrial Application, 2008. PACIIA '08. Pacific-Asia Workshop on, 247--251. http://doi.org/10.1109/PACIIA.2008.164 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Qin Xu, Chris Ding, Jinpei Liu, and Bin Luo. 2015. PCA-guided search for K-means. Pattern Recognition Letters 54: 50--55. http:/dx.doi.org/10.1016/j.patrec.2014.11.017Google ScholarGoogle ScholarCross RefCross Ref
  36. Shijie Zhang, Wei Jin, Ying Huang, Wei Su, Jiong Yang, and Zhaoyang Feng. 2011. Profiling a Caenorhabditis elegans behavioral parametric dataset with a supervised K-means clustering algorithm identifies genetic networks regulating locomotion. Journal of Neuroscience Methods 197, 2: 315--323. http:/dx.doi.org/10.1016/j.jneumeth.2011.02.014Google ScholarGoogle ScholarCross RefCross Ref

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting Society
    July 2016
    339 pages
    ISBN:9781450340649
    DOI:10.1145/2925995

    Copyright © 2016 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 25 July 2016

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

    Acceptance Rates

    KMO '16 Paper Acceptance Rate47of96submissions,49%Overall Acceptance Rate47of96submissions,49%

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader