ABSTRACT
The Web Usage Mining (WUM), a rather recent research field, corresponds to the process of knowledge discovery from databases (KDD) applied to the Web usage data. The quantity of the Web usage data to be analyzed and its poor quality (in particular the abundance of features to be analyzed) are the main problems in WUM.
Considering the characteristics of Web log data and functions of every phase included in data preprocessing, this paper establishes a Web log data preprocessing algorithm based on feature selection. The implemented Wrapper Evaluation feature selection method use a Best First Search and a Greedy Stepwise Search and evaluate each of the attribute subsets according to Support Vector Machine learning scheme.
- Chang-bin, J., Li, C. 2010. Web Log Data Preprocessing Based on Collaborative Filtering, 2010 Second International Workshop on Education Technology and Computer Science, Wuhan, China, ISBN: 978-1-4244-6388-6, pp. 118--121.Google ScholarCross Ref
- Alam Ansari, S., Chattopadhayay, A., Das, S. 2010. A Kernel level VFS logger for building efficient file system Intrusion Detection System, Second International Conference on Computer and Network Technology, Bangkok, Thailand, ISBN: 978-0-7695-4042-9, pp.273--279, ACM doi 10.1109/ICCNT.2010.47. Google ScholarDigital Library
- Hernandez, P., Garrigos, I., and Mazon, J.-N. 2010. Modeling Web logs to enhance the analysis of Web usage data, 2010 Workshops on Database and Expert Systems Applications, Bilbao, Spain, ISBN: 978-0-7695-4174-7, pp. 297--301, ACM doi 10.1109/DEXA.2010.65. Google ScholarDigital Library
- Witten, I. H. and Frank, E. 2005. Data Mining, Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, Elsevier Inc., pp. 290. Google ScholarDigital Library
- Mitchell, T. 1997. Machine Learning, The McGraw-Hill Companies, Inc., pp. 52--78. Google ScholarDigital Library
- Sun, Y., Todorovic, S. and Goodison, S. 2010. Local-Learning-Based Feature Selection for High-Dimensional Data Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, September 2010, pp. 1610--1626. Google ScholarDigital Library
- Kohavi, R. 1995. Wrappers for Performance Enhancement and Oblivious Decision Graphs, PhD thesis, Stanford University. Google ScholarDigital Library
- Arlot, S., Celisse. 2010. A., A survey of cross-validation procedures for model selection, Statistics Surveys, Vol. 4 (2010) 40--79, ISSN: 1935--7516, pp.52.Google ScholarCross Ref
- Vapnik, V N. 2000. The nature of statistical learning theory, New York: Springer-Verlag. Google ScholarDigital Library
- Joachims, I. 1998. Text categorization with Support Vector Machines: Learning with many relevant features, Proceedings of the European Conference on Machine Learning, Berlin: Springer. Google ScholarDigital Library
- Yang, X., Guan, H., Tang, F., You, I., Guo, M., Shen, Y. 2011. Improvements on Sequential Minimal Optimization Algorithm for Support Vector Machine based on Semi-sparse Algorithm, 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, ISBN: 978-1-61284-733-7, Seoul, Korea, pp. 192--199. Google ScholarDigital Library
- Lu, K., Wang, L. 2011. A Novel Nonlinear Combination Model Based on Support Vector Machine for Rainfall Prediction, 2011 Fourth International Joint Conference on Computational Sciences and Optimization, Kunming, Yunnan, China, ISBN: 978-1-4244-9712-6, pp. 1343--1346. Google ScholarDigital Library
- Zhu, F., Ye, N., Pan, D., Ding, W. 2011. Incremental Support Vector Machine Learning: an Angle Approach, 2011 Fourth International Joint Conference on Computational Sciences and Optimization, Kunming, Yunnan, China, ISBN: 978-1-4244-9712-6, pp. 288--292. Google ScholarDigital Library
- Morariu, D., Vintan, L., Tresp, V. 2006. Feature Selection Methods for an Improved SVM Classifier, Proceedings of 14th International Conference on of Intelligent Systems (ICIS06), ISSN: 1305--5313 Volume 14,(pp. 83--89), Prague.Google Scholar
- J. Quinlan, 1993. C4.5: Programs for Machine Learning, Morgan Kaufmann. Google ScholarDigital Library
- Han, J., Kamber, M., Data Mining: Concepts and Techniques, Second Edition, Morgan Kaufmann Press, Elsevier Inc., San Francisco, 2006, pp. 402. Google ScholarDigital Library
Index Terms
- Wrappers for web access logs feature selection
Recommendations
Evolving Feature Selection
Feature selection is a preprocessing technique, commonly used on high-dimensional data, that studies how to select a subset or list of attributes or variables that are used to construct models describing data. Wide data sets, which have a huge number of ...
Dimensionality Reduction: Is Feature Selection More Effective Than Random Selection?
Advances in Computational IntelligenceAbstractThe advent of Big Data has brought with it an unprecedented and overwhelming increase in data volume, not only in samples but also in available features. Feature selection, the process of selecting the relevant features and discarding the ...
Hybrid feature selection by combining filters and wrappers
Feature selection aims at finding the most relevant features of a problem domain. It is very helpful in improving computational speed and prediction accuracy. However, identification of useful features from hundreds or even thousands of related features ...
Comments