Post-graduate theses
Current Record: 215 of 824
|
Identifier |
000408060 |
Title |
Bias correction of the cross - validation performance estimate and speed up of its execution time |
Alternative Title |
Διόρθωση της μεροληψίας της εκτίμησης της απόδοσης του πρωτοκόλλου της διασταυρωμένης επικύρωσης και επιτάχυνση της εκτέλεσης του. |
Author
|
Γρεασίδου, Ελισσάβετ Ε.
|
Thesis advisor
|
Τσαμαρδινός, Ιωάννης
|
Reviewer
|
Τόλλης, Ιωάννης
Στυλιανού, Ιωάννης
|
Abstract |
Cross Validation (CV) is a de-facto
standard in applied statistics
and supervised machine
learning both for model selection and assessment. The procedure is
applied on a set of
candidate configurations (i.e. a set of sequences of modelling steps with specified
algorithms and their hyper-parameter values for each step) for model production, and
the one with the best performance, according to a pre-specified criterion, is selected.
However, the
“best”
performance achieved during CV is known to be an optimistically
biased estimation of the generalization performance of the final model. To date, a
relatively limited amount of research has been devoted to the correction of this bias, and
all proposed methods either tend to over-correct or have limitations which can make their
use impractical.
In this thesis, we propose a Bootstrap-based Bias Correction method (BBC) which works
regardless of the data analysis task (e.g. classification, regression), or the structure of the models involved, and requires only a small computational overhead with respect to the
basic CV procedure. BBC corrects the bias in a conservative way, providing an almost
unbiased estimate of performance. Its main idea is to bootstrap the whole process of
selecting the best-performing configuration on the out-of-sample predictions of each
configuration, without additional training of models. In comparison to the alternatives,
namely the Nested Cross
Validation (NCV), and a method by
Tibshirani and Tibshirani (TT),
BBC is computationally more efficient, yields performance estimates competitive to those
of
NCV
and is applicable to any CV procedure. Subsequently, we also employ the idea of
bootstrapping
the out-of-sample predictions in order to speed up the execution time of
the CV procedure. Specifically, using a bootstrap-based hypothesis test we stop training
of models on new folds of statistically-significantly inferior configurations. The Bootstrap-based Early Dropping (BED) method significantly reduces the computational time of CV with a negligible or no effect on performance. The two methods can be combined leading
to the BED-BBC procedure that is both efficient and provides accurate estimates of
performance.
|
Language |
English |
Subject |
Dropping |
Issue date |
2017-03-17 |
Collection
|
School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
|
|
Type of Work--Post-graduate theses
|
Permanent Link |
https://elocus.lib.uoc.gr//dlib/0/b/f/metadata-dlib-1491378124-496826-14534.tkl
|
Views |
703 |