research-article

Batch and online anomaly detection for scientific applications in a Kubernetes environment

Authors:
Sahand Hariri

University of Illinois at Urbana-Champaign

University of Illinois at Urbana-Champaign
View Profile

,
Matias Carrasco Kind

National Center for Supercomputing Applications

National Center for Supercomputing Applications
View Profile

ScienceCloud'18: Proceedings of the 9th Workshop on Scientific Cloud ComputingJune 2018Article No.: 3Pages 1–7https://doi.org/10.1145/3217880.3217883

Published:11 June 2018Publication History

ScienceCloud'18: Proceedings of the 9th Workshop on Scientific Cloud Computing

Pages 1–7

ABSTRACT

We present a cloud based anomaly detection service framework that uses a containerized Spark cluster and ancillary user interfaces all managed by Kubernetes. The stack of technology put together allows for fast, reliable, resilient and easily scalable service for either batch or streaming data. At the heart of the service, we utilize an improved version of the algorithm Isolation Forest called Extended Isolation Forest for robust and efficient anomaly detection. We showcase the design and a normal workflow of our infrastructure which is ready to deploy on any Kubernetes cluster without extra technical knowledge. With exposed APIs and simple graphical interfaces, users can load any data and detect anomalies on the loaded set or on newly presented data points using a batch or a streaming mode. With the latter, users can subscribe and get notifications on the desired output. Our aim is to develop and apply these techniques to use with scientific data. In particular we are interested in finding anomalous objects within the overwhelming set of images and catalogs produced by current and future astronomical surveys, but that can be easily adopted to other fields.

References

2018. Jupyter Lab.Google Scholar
David Bernstein. 2014. Containers and cloud: From lxc to docker to kubernetes. IEEE Cloud Computing 1, 3 (2014), 81--84.Google ScholarCross Ref
Leo Breiman. 2001. Random Forests. Mach. Learn. 45, 1 (Oct. 2001), 5--32. Google ScholarDigital Library
Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009. Anomaly detection: A survey. ACM computing surveys (CSUR) 41, 3 (2009), 15. Google ScholarDigital Library
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 1 (Jan. 2008), 107--113. Google ScholarDigital Library
Sudipto Guha, Nina Mishra, Gourav Roy, and Okke Schrijvers. 2016. Robust Random Cut Forest Based Anomaly Detection on Streams. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 2712--2721. http://dl.acm.org/citation.cfm?id=3045390.3045676 Google ScholarDigital Library
Sahand Hariri and Matias Carrasco Kind. 2018. Extended Isolation Forest. In preparation (2018).Google Scholar
Marc Henrion, Daniel J. Mortlock, David J. Hand, and Axel Gandy. 2013. Classification and Anomaly Detection for Astronomical Survey Data. Springer New York, New York, NY, 149--184.Google Scholar
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2008. Isolation forest. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on. IEEE, 413--422.Google ScholarDigital Library
Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. 2012. Isolation-Based Anomaly Detection. ACM Trans. Knowl. Discov. Data 6, 1, Article 3 (March 2012), 39 pages. Google ScholarDigital Library
I. Nun, K. Pichara, P. Protopapas, and D.-W. Kim. 2014. Supervised Detection of Anomalous Light Curves in Massive Astronomical Catalogs. The Astrophysical Journal 793, Article 23 (Sept. 2014), 23 pages. arXiv:cs.CE/1404.4888Google ScholarCross Ref
Tiago Rosado and Jorge Bernardino. 2014. An Overview of Openstack Architecture. In Proceedings of the 18th International Database Engineering & Applications Symposium (IDEAS '14). ACM, New York, NY, USA, 366--367. Google ScholarDigital Library
Swee Chuan Tan, Kai Ming Ting, and Fei Tony Liu. 2011. Fast Anomaly Detection for Streaming Data. In IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16--22, 2011. 1511--1516. Google ScholarDigital Library
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10-10 (2010), 95. Google ScholarDigital Library
Weijia Zhang and Xiaofeng He. 2017. An Anomaly Detection Method for Medicare Fraud Detection. (2017), 309--314.Google Scholar

Recommendations

Fuzzy Isolation Forest for Anomaly Detection
Abstract
Anomaly detection is nowadays a key data mining task. Anomaly detection methods generally look for patterns of ”normal” profile and then identify data points that do not match that profile. One outstanding method, Isolation Forest, showed high ...
Read More
On the performance of SQL scalable systems on Kubernetes: a comparative study
Abstract
The popularization of Hadoop as the the-facto standard platform for data analytics in the context of Big Data applications has led to the upsurge of SQL-on-Hadoop systems, which provide scalable query execution engines allowing the use of SQL ...
Read More
Improving iForest for Hydrological Time Series Anomaly Detection
Algorithms and Architectures for Parallel Processing
Abstract
With the increasing number of installed hydrological sensors, the data from these sensors usually contain a variety of abnormal values due to network congestion, equipment failure, or environmental influence. To deal with the anomaly on a larger ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ScienceCloud'18: Proceedings of the 9th Workshop on Scientific Cloud Computing
June 2018
62 pages
ISBN:9781450358637
DOI:10.1145/3217880

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 June 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Anomaly Detection
Apache Spark
Cloud Computing
Isolation Forest
Kubernetes
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate44of151submissions,29%
Upcoming Conference
HPDC '24

Sponsor:

sigarch

The 33rd International Symposium on High-Performance Parallel and Distributed Computing

June 3 - 7, 2024

Pisa , Italy
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 427
  Total Downloads
- Downloads (Last 12 months)47
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Batch and online anomaly detection for scientific applications in a Kubernetes environment

ScienceCloud'18: Proceedings of the 9th Workshop on Scientific Cloud Computing

ABSTRACT

References

Cited By

Recommendations

Fuzzy Isolation Forest for Anomaly Detection

On the performance of SQL scalable systems on Kubernetes: a comparative study

Improving iForest for Hydrological Time Series Anomaly Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Batch and online anomaly detection for scientific applications in a Kubernetes environment

ScienceCloud'18: Proceedings of the 9th Workshop on Scientific Cloud Computing

ABSTRACT

References

Cited By

Recommendations

Fuzzy Isolation Forest for Anomaly Detection

On the performance of SQL scalable systems on Kubernetes: a comparative study

Improving iForest for Hydrological Time Series Anomaly Detection

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media