Your browser does not support JavaScript!

Post-graduate theses

Search command : Author="Στεφανίδης"  And Author="Κωνσταντίνος"

Current Record: 5 of 723

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000434070
Title Interpreting data anomalies: from descriptive to predictive explanations
Alternative Title Ερμηνεύοντας ανωμαλίες σε δεδομένα: από περιγραφικές σε προβλεπτικές εξηγήσεις
Author Μυρτάκης, Νικόλαος Ε.
Thesis advisor Χριστοφίδης, Βασίλης
Reviewer Τσαμαρδινός, Ιωάννης
Παλπάνας, Θέμης
Abstract In many data exploratory tasks, abnormal and rarely occurring patterns called anomalies (outliers, novelties) are more interesting than the prevalent ones. For instance, they could represent systematic errors, frauds in bank transactions, intrusions in network and system monitoring or other interesting phenomena. Numerous algorithms have been proposed for detecting anomalies. Unfortunately, unsupervised detectors in general, do not explain why a given sample (record) was labelled as an anomaly and thus diagnose its root causes. Anomaly explanations often take the form of feature subsets of significantly lower dimensionality compared to the original feature space. By examining only the features of an explaining subspace suffices to determine whether a sample is an anomaly or not according to a detector. Explanations can be categorized as (i) descriptive in the sense that they explain the samples used to train the detector and (ii) predictive that generalize to unseen data. In this thesis we experimentally evaluate the main descriptive explanation methods proposed in the literature, as well as, introduce the first predictive explanation method that is inspired by recent advances in Automated Machine Learning systems (AutoML). In the first part of our thesis, we present a thorough evaluation framework of unsupervised explanation algorithms for individual and groups of anomalies aiming to uncover several missing insights from the literature such as: (a) Is it effective to combine any explanation algorithm with any off-the-shelf outlier detector? (b) How is the behavior of an outlier detection and explanation pipeline affected by the number or the correlation of features in a dataset? and (c) What is the quality of summaries in the presence of outliers explained by subspaces of different dimensionality? A major drawback of the descriptive explanation methods stems from the fact that they should be recomputed for every new batch of data. To address this limitation, in the second part of our thesis, we present the design and experimental evaluation of the PROTEUS AutoML pipeline. PROTEUS produces global, predictive explanations using a surrogate model, specifically designed for feature selection on imbalanced datasets in order to best approximate the decision surface of any unsupervised detector. Computational experiments confirm the efficacy and robustness of PROTEUS to produce predictive explanations for different families of anomaly detectors as well as its reliability to estimate their predictive performance in unseen data.
Language English
Subject Anomaly detection
Ανίχνευση ανωμαλιών
Ανωμαλίες δεδομένων
Εξήγηση ανωμαλιών
Issue date 2020-11-27
Collection   Faculty/Department--Faculty of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link https://elocus.lib.uoc.gr//dlib/f/f/2/metadata-dlib-1605778084-394553-8606.tkl Bookmark and Share
Views 6

Digital Documents
No preview available

View document
Views : 1