Post-graduate theses
Current Record: 30 of 840
|
Identifier |
000460153 |
Title |
Do we need sophisticated imputation in AutoML? |
Alternative Title |
Χρειαζόμαστε σύνθετη συμπλήρωση ελλειπουσών τιμών στην Αυτοματοποιημένη Μηχανική Μάθηση; |
Author
|
Πατεράκης, Γιώργος Μ.
|
Thesis advisor
|
Τσαμαρδινός, Ιωάννης
|
Reviewer
|
Χριστοφίδης, Βασίλης
Τριανταφύλλου, Σοφία
|
Abstract |
Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several
imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML
setting, a question that naturally arises is to what extent sophisticated imputation
algorithms (e.g., Neural Network based) are really needed, or we can obtain a decent performance using simple methods like Mean/Mode (MM). In this paper, we
experimentally compare 6 state-of-the-art representatives of different imputation
algorithmic families from an AutoML predictive modeling perspective, including
a feature selection step and combined algorithm and hyper-parameter selection.
Experiments ran on 25 real-world incomplete datasets with missing values and 10
complete datasets in which synthetic missing values are introduced according to
different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by Mean/Mode. In addition, binary indicator (BI)
variables encoding missingness patterns actually improve predictive performance,
on average. Last but not least, although there are cases where Neural-Networkbased imputation significantly improves predictive performance, this comes at a
great computational cost and requires measuring all feature values to impute new
samples.
|
Language |
English |
Subject |
Machine learning |
|
Missing data |
|
Ελλειπούσες τιμές |
|
Μηχανική μάθηση |
Issue date |
2023-12-01 |
Collection
|
School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
|
|
Type of Work--Post-graduate theses
|
Permanent Link |
https://elocus.lib.uoc.gr//dlib/a/b/a/metadata-dlib-1699011014-142912-6096.tkl
|
Views |
1156 |
Digital Documents
|
|
No permission to view document.
It won't be available until: 2026-12-01
|