ACM Home Page
Please provide us with feedback. Feedback
Feature bagging for outlier detection
Full text PdfPdf (657 KB)
Source Conference on Knowledge Discovery in Data archive
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining table of contents
Chicago, Illinois, USA
SESSION: Research track paper table of contents
Pages: 157 - 166  
Year of Publication: 2005
ISBN:1-59593-135-X
Authors
Aleksandar Lazarevic  University of Minnesota, East Hartford, CT
Vipin Kumar  University of Minnesota, Minneapolis, MN
Sponsors
SIGKDD: ACM Special Interest Group on Knowledge Discovery in Data
ACM: Association for Computing Machinery
Publisher
ACM  New York, NY, USA
Bibliometrics
Downloads (6 Weeks): 24,   Downloads (12 Months): 248,   Citation Count: 6
Additional Information:

abstract   references   cited by   index terms   collaborative colleagues  

Tools and Actions: Review this Article  
Save this Article to a Binder    Display Formats: BibTex  EndNote ACM Ref   
DOI Bookmark: Use this link to bookmark this Article: http://doi.acm.org/10.1145/1081870.1081891
What is a DOI?

ABSTRACT

Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel feature bagging approach for detecting outliers in very large, high dimensional and noisy databases is proposed. It combines results from multiple outlier detection algorithms that are applied using different set of features. Every outlier detection algorithm uses a small subset of features that are randomly selected from the original feature set. As a result, each outlier detector identifies different outliers, and thus assigns to all data records outlier scores that correspond to their probability of being outliers. The outlier scores computed by the individual outlier detection algorithms are then combined in order to find the better quality outliers. Experiments performed on several synthetic and real life data sets show that the proposed methods for combining outputs from multiple outlier detection algorithms provide non-trivial improvements over the base algorithm.


REFERENCES

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has opted to expose the complete List rather than only correct and linked references.

1
2
3
4
 
5
V. Barnett and T. Lewis, Outliers in Statistical Data. New York, NY, John Wiley and Sons, 1994.
 
6
 
7
 
8
C. Blake,C. Merz, UCI Repository of machine learning databases,www.ics.uci.edu/~mlearn/MLRepository.html, 1998.
 
9
10
 
11
N. Chawla, A. Lazarevic, L. Hall,K. Bowyer, SMOTEBoost: Improving the Prediction of Minority Class in Boosting, In Proceedings of the Principles of Knowledge Discovery in Databases, PKDD-2003, Cavtat, Croatia, September 2003.
 
12
 
13
E. Eskin, A. Arnold, M. Prerau, L. Portnoy, S. Stolfo, A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data, in Applications of Data Mining in Computer Security, Advances In Information Security, S. Jajodia D. Barbara, Ed. Boston: Kluwer, 2002.
 
14
Y. Freund, R. Schapire, Experiments with a New Boosting Algorithm, In Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, 325--332, July 1996.
 
15
16
17
 
18
M. Joshi and V. Kumar, CREDOS: Classification using Ripple Down Structure (A Case for Rare Classes), In Proceedings of the SIAM International Conference on Data Mining, Lake Buena Vista, FL, April 2004.
 
19
 
20
E. Kong and T. Dietterich, Error-Correcting Output Coding Corrects Bias and Variance, In Proceedings of the 12th International Conference on Machine Learning, San Francisco, CA, 313--321, 1995.
 
21
A. Lazarevic, L. Ertoz, A. Ozgur, J. Srivastava and V. Kumar, A comparative study of anomaly detection schemes in network intrusion detection, In Proceedings of the Third SIAM International Conference on Data Mining, San Francisco, CA, May 2003.
 
22
 
23
 
24
 
25
R. Michalski, I. Mozetic, J. Hong and N. Lavrac, The Multi-Purpose Incremental Learning System AQ15 and its Testing Applications to Three Medical Domains, In Proceedings of the Fifth National Conference on Artificial Intelligence, Philadelphia, PA, 1041--1045, 1986.
 
26
27
 
28
 
29
 
30
P. van der Putten, M. van Someren, CoIL Challenge 2000: The Insurance Company Case, Sentient Machine Research, Amsterdam and Leiden Institute of Advanced Computer Science, Leiden LIACS Technical Report 2000-09, June, 2000.
 
31
 
32
A. E. Howe, D. Dreilinger, SavvySearch: A meta-search engine that learns which search engines to query, AI Magazine, Vol. 18., No. 2, 1997.
 
33
34
35
 
36
S. Papadimitriou, H. Kitagawa, P. B. Gibbons, C. Faloutsos: LOCI: Fast Outlier Detection Using the Local Correlation Integral. In Proceedings of IEEE International Conference on Data engineering, Bangalore, India March 2003.
 
37
 
38
L. Ertoz, Similarity Measures, PhD dissertation, University of Minnesota, in progress, 2005.


Collaborative Colleagues:
Aleksandar Lazarevic: colleagues
Vipin Kumar: colleagues