E-Locus - Institutional Repository of the University of Crete - Benchmarking anomaly detectors on streaming data

Home Benchmarking anomaly detectors on streaming data

Results - Details

[Add to Basket]

Identifier

000429153

Title

Benchmarking anomaly detectors on streaming data

Alternative Title

Πειραματική αξιολόγηση ανιχνευτών ανωμαλιών σε ροές δεδομένων

Author

Γιαννούλης, Μιχαήλ Ι.

Thesis advisor

Χριστοφίδης, Βασίλειος

Reviewer

Τσαμαρδινός, Ιωάννης
Τσακαλίδης, Παναγιώτης

Abstract

The experimental evaluation of unsupervised anomaly detection algorithms is a constant challenge within diverse research areas and applications domains. However, little is known regarding the strengths and weaknesses of online anomaly detection methods and the impact of their parameters. This paper elaborates on the design and development of a benchmark framework to perform an extensive experiment study on tree and nearest-neighbor based methods of top-notch unsupervised online outlier detectors (including their offline) in streaming manner, across a wide variety of multivariate datasets contaminated by sub and full space outliers. Initially, we present the semantics and functionalities of the detectors through a comprehensive example. Then, we introduce the benchmark environment providing a descriptive (meta) analysis of the datasets and implementation choices of detectors, posing also the set of their hyper-parameters and candidate values. The fair evaluation of detectors is guaranteed through an adequately analysis of critical methodological questions such as stream simulation and partitioning, evaluation protocols and metrics, detectors optimization and ranking. Through this study, we ascertain that online detectors not only approximate very well offline detectors (2.296 vs 2.266, respectively; Ranking value) but also outperform them under certain conditions. In addition, we surprisingly establish the robustness of online detectors' dynamic model on scaling data and subspace dimensionality. Nevertheless, they shown a decreasing effectiveness while scaling window parameters. We also examine the fundamentals of a dynamic model highlighting the need for a forgetting mechanism. To the best of our knowledge, this is the most complete online anomaly detection benchmark attempt on multivariate data.

Language

English

Issue date

2020-03-27

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

1750

Digital Documents
	Download document View document Views : 2