skip to main content
10.1145/2254129.2254155acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Features selection from high-dimensional web data using clustering analysis

Published: 13 June 2012 Publication History

Abstract

The features selection methodologies have become an important field of the data preprocessing techniques. These methods are applied to reduced the dimension of the attributes of different datasets to simplify their analysis. Some of the classical techniques used are wrapper approaches, heuristic functions and filters. The main problem of these approaches is that they usually are black box and computationally expensive algorithms. This work presents a new straightforward strategy to reduce the dimension of the attributes. This new methodology cares about the variables distribution and has been oriented to clustering analysis. It provides an easier human interpretation of the attributes selection strategy and the resulting clusters. Finally, this new approach has been experimentally tested using the FIFA World Cup web dataset, a well-known social-based statistical data with a high number of variables, to show how the features selection strategy find the most relevant variables.

References

[1]
Fifa web site, 2011. http://www.fifa.com/worldcup/archive/southafrica2010/statistics/index.html.
[2]
C. C. Aggarwal, J. L. Wolf, P. S. Yu, C. Procopiuc, and J. S. Park. Fast algorithms for projected clustering. SIGMOD Rec., 28(2):61--72, June 1999.
[3]
G. Bello, H. Menéndez, and D. Camacho. Using the clustering coefficient to guide a genetic-based communities finding algorithm. In H. Yin, W. Wang, and V. Rayward-Smith, editors, Intelligent Data Engineering and Automated Learning - IDEAL 2011, volume 6936 of Lecture Notes in Computer Science, pages 160--169. Springer Berlin/Heidelberg, 2011.
[4]
J. C. Bezdek, J. Keller, R. Krisnapuram, and N. Pal. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing (The Handbooks of Fuzzy Sets). Springer, 1 edition, Mar. 2005.
[5]
A. L. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artif. Intell., 97:245--271, December 1997.
[6]
S. R. Carroll and D. J. Carroll. Statistics Made Simple for School Leaders. Rowman & Littlefield, 2002.
[7]
L. Curiel, B. Baruque, C. Dueñas, E. Corchado, and C. Pérez-Tárrago. Genetic algorithms to simplify prognosis of endocarditis. In Proceedings of the 12th international conference on Intelligent data engineering and automated learning, IDEAL'11, pages 454--462, Berlin, Heidelberg, 2011. Springer-Verlag.
[8]
H. Davulcu, G. Yang, M. Kifer, and I. V. Ramakrishnan. Computational aspects of resilient data extraction from semistructured sources (extended abstract). In Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, PODS '00, pages 136--144, New York, NY, USA, 2000. ACM.
[9]
K. Delac, M. Grgic, and S. Grgic. Independent comparative study of PCA, ICA, and LDA on the FERET data set. International Journal of Imaging Systems and Technology, 15(5):252--260, 2005.
[10]
J. Han and M. Kamber. Data mining: concepts and techniques. Morgan Kaufmann, 2006.
[11]
E. Hruschka, R. Campello, A. Freitas, and A. de Carvalho. A survey of evolutionary algorithms for clustering. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 39(2):133--155, march 2009.
[12]
K. Kailing, H. P. Kriegel, and P. Kroger. Density-Connected Subspace Clustering for High-Dimensional Data. In Proc. 4th SIAM International Conference on Data Mining, Apr. 2004.
[13]
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artif. Intell., 97:273--324, December 1997.
[14]
G. N. Lance and W. T. Williams. A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems. The Computer Journal, 9(4):373--380, Feb. 1967.
[15]
D. T. Larose. Discovering Knowledge in Data. John Wiley & Sons, 2005.
[16]
D. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003.
[17]
J. B. Macqueen. Some methods of classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pages 281--297, 1967.
[18]
V. Roth and T. Lange. Feature selection in clustering problems. In S. Thrun, L. Saul, and B. Schölkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.

Cited By

View all
  • (2022)A new intrusion detection system based on Moth–Flame Optimizer algorithmExpert Systems with Applications10.1016/j.eswa.2022.118439210(118439)Online publication date: Dec-2022
  • (2016)A Clustering Approach for Optimization of Search ResultJournal of Image and Graphics10.18178/joig.4.1.63-664:1(63-66)Online publication date: 2016
  • (2016)Application of Clustering for Improving Search Result of a WebsiteInformation Systems Design and Intelligent Applications10.1007/978-81-322-2752-6_34(349-356)Online publication date: 3-Feb-2016
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
WIMS '12: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
June 2012
571 pages
ISBN:9781450309158
DOI:10.1145/2254129
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • UCV: University of Craiova
  • WNRI: Western Norway Research Institute

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FIFA
  2. clustering techniques
  3. data projection
  4. feature selection
  5. football
  6. soccer
  7. web mining
  8. world cup

Qualifiers

  • Research-article

Funding Sources

Conference

WIMS '12
Sponsor:
  • UCV
  • WNRI

Acceptance Rates

Overall Acceptance Rate 140 of 278 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)A new intrusion detection system based on Moth–Flame Optimizer algorithmExpert Systems with Applications10.1016/j.eswa.2022.118439210(118439)Online publication date: Dec-2022
  • (2016)A Clustering Approach for Optimization of Search ResultJournal of Image and Graphics10.18178/joig.4.1.63-664:1(63-66)Online publication date: 2016
  • (2016)Application of Clustering for Improving Search Result of a WebsiteInformation Systems Design and Intelligent Applications10.1007/978-81-322-2752-6_34(349-356)Online publication date: 3-Feb-2016
  • (2014)Association Rule Mining via Evolutionary Multi-objective OptimizationProceedings of the 8th International Workshop on Multi-disciplinary Trends in Artificial Intelligence - Volume 887510.1007/978-3-319-13365-2_4(35-46)Online publication date: 8-Dec-2014
  • (2013)Hidden Topic Models for Multi-label Review ClassificationProceedings of the 5th International Conference on Computational Collective Intelligence. Technologies and Applications - Volume 808310.1007/978-3-642-40495-5_60(603-611)Online publication date: 11-Sep-2013

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media