Your browser does not support JavaScript!

Post-graduate theses

Current Record: 30 of 840

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000460153
Title Do we need sophisticated imputation in AutoML?
Alternative Title Χρειαζόμαστε σύνθετη συμπλήρωση ελλειπουσών τιμών στην Αυτοματοποιημένη Μηχανική Μάθηση;
Author Πατεράκης, Γιώργος Μ.
Thesis advisor Τσαμαρδινός, Ιωάννης
Reviewer Χριστοφίδης, Βασίλης
Τριανταφύλλου, Σοφία
Abstract Numerous real-world data contain missing values, while in contrast, most Machine Learning (ML) algorithms assume complete datasets. For this reason, several imputation algorithms have been proposed to predict and fill in the missing values. Given the advances in predictive modeling algorithms tuned in an AutoML setting, a question that naturally arises is to what extent sophisticated imputation algorithms (e.g., Neural Network based) are really needed, or we can obtain a decent performance using simple methods like Mean/Mode (MM). In this paper, we experimentally compare 6 state-of-the-art representatives of different imputation algorithmic families from an AutoML predictive modeling perspective, including a feature selection step and combined algorithm and hyper-parameter selection. Experiments ran on 25 real-world incomplete datasets with missing values and 10 complete datasets in which synthetic missing values are introduced according to different missingness mechanisms, at varying missing frequencies. The main conclusion drawn from our experiments is that the best method on average is the Denoise AutoEncoder (DAE) on real-world datasets and the MissForest (MF) in simulated datasets, followed closely by Mean/Mode. In addition, binary indicator (BI) variables encoding missingness patterns actually improve predictive performance, on average. Last but not least, although there are cases where Neural-Networkbased imputation significantly improves predictive performance, this comes at a great computational cost and requires measuring all feature values to impute new samples.
Language English
Subject Machine learning
Missing data
Ελλειπούσες τιμές
Μηχανική μάθηση
Issue date 2023-12-01
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link https://elocus.lib.uoc.gr//dlib/a/b/a/metadata-dlib-1699011014-142912-6096.tkl Bookmark and Share
Views 1156

Digital Documents
No preview available

No permission to view document.
It won't be available until: 2026-12-01