skip to main content
10.1145/2254129.2254156acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Wrappers for web access logs feature selection

Published:13 June 2012Publication History

ABSTRACT

The Web Usage Mining (WUM), a rather recent research field, corresponds to the process of knowledge discovery from databases (KDD) applied to the Web usage data. The quantity of the Web usage data to be analyzed and its poor quality (in particular the abundance of features to be analyzed) are the main problems in WUM.

Considering the characteristics of Web log data and functions of every phase included in data preprocessing, this paper establishes a Web log data preprocessing algorithm based on feature selection. The implemented Wrapper Evaluation feature selection method use a Best First Search and a Greedy Stepwise Search and evaluate each of the attribute subsets according to Support Vector Machine learning scheme.

References

  1. Chang-bin, J., Li, C. 2010. Web Log Data Preprocessing Based on Collaborative Filtering, 2010 Second International Workshop on Education Technology and Computer Science, Wuhan, China, ISBN: 978-1-4244-6388-6, pp. 118--121.Google ScholarGoogle ScholarCross RefCross Ref
  2. Alam Ansari, S., Chattopadhayay, A., Das, S. 2010. A Kernel level VFS logger for building efficient file system Intrusion Detection System, Second International Conference on Computer and Network Technology, Bangkok, Thailand, ISBN: 978-0-7695-4042-9, pp.273--279, ACM doi 10.1109/ICCNT.2010.47. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Hernandez, P., Garrigos, I., and Mazon, J.-N. 2010. Modeling Web logs to enhance the analysis of Web usage data, 2010 Workshops on Database and Expert Systems Applications, Bilbao, Spain, ISBN: 978-0-7695-4174-7, pp. 297--301, ACM doi 10.1109/DEXA.2010.65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Witten, I. H. and Frank, E. 2005. Data Mining, Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, Elsevier Inc., pp. 290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Mitchell, T. 1997. Machine Learning, The McGraw-Hill Companies, Inc., pp. 52--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sun, Y., Todorovic, S. and Goodison, S. 2010. Local-Learning-Based Feature Selection for High-Dimensional Data Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, September 2010, pp. 1610--1626. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Kohavi, R. 1995. Wrappers for Performance Enhancement and Oblivious Decision Graphs, PhD thesis, Stanford University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Arlot, S., Celisse. 2010. A., A survey of cross-validation procedures for model selection, Statistics Surveys, Vol. 4 (2010) 40--79, ISSN: 1935--7516, pp.52.Google ScholarGoogle ScholarCross RefCross Ref
  9. Vapnik, V N. 2000. The nature of statistical learning theory, New York: Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Joachims, I. 1998. Text categorization with Support Vector Machines: Learning with many relevant features, Proceedings of the European Conference on Machine Learning, Berlin: Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Yang, X., Guan, H., Tang, F., You, I., Guo, M., Shen, Y. 2011. Improvements on Sequential Minimal Optimization Algorithm for Support Vector Machine based on Semi-sparse Algorithm, 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, ISBN: 978-1-61284-733-7, Seoul, Korea, pp. 192--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lu, K., Wang, L. 2011. A Novel Nonlinear Combination Model Based on Support Vector Machine for Rainfall Prediction, 2011 Fourth International Joint Conference on Computational Sciences and Optimization, Kunming, Yunnan, China, ISBN: 978-1-4244-9712-6, pp. 1343--1346. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Zhu, F., Ye, N., Pan, D., Ding, W. 2011. Incremental Support Vector Machine Learning: an Angle Approach, 2011 Fourth International Joint Conference on Computational Sciences and Optimization, Kunming, Yunnan, China, ISBN: 978-1-4244-9712-6, pp. 288--292. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Morariu, D., Vintan, L., Tresp, V. 2006. Feature Selection Methods for an Improved SVM Classifier, Proceedings of 14th International Conference on of Intelligent Systems (ICIS06), ISSN: 1305--5313 Volume 14,(pp. 83--89), Prague.Google ScholarGoogle Scholar
  15. J. Quinlan, 1993. C4.5: Programs for Machine Learning, Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Han, J., Kamber, M., Data Mining: Concepts and Techniques, Second Edition, Morgan Kaufmann Press, Elsevier Inc., San Francisco, 2006, pp. 402. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Wrappers for web access logs feature selection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WIMS '12: Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
        June 2012
        571 pages
        ISBN:9781450309158
        DOI:10.1145/2254129

        Copyright © 2012 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 June 2012

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate140of278submissions,50%
      • Article Metrics

        • Downloads (Last 12 months)1
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader