Your browser does not support JavaScript!

Post-graduate theses

Current Record: 215 of 824

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000408060
Title Bias correction of the cross - validation performance estimate and speed up of its execution time
Alternative Title Διόρθωση της μεροληψίας της εκτίμησης της απόδοσης του πρωτοκόλλου της διασταυρωμένης επικύρωσης και επιτάχυνση της εκτέλεσης του.
Author Γρεασίδου, Ελισσάβετ Ε.
Thesis advisor Τσαμαρδινός, Ιωάννης
Reviewer Τόλλης, Ιωάννης
Στυλιανού, Ιωάννης
Abstract Cross Validation (CV) is a de-facto standard in applied statistics and supervised machine learning both for model selection and assessment. The procedure is applied on a set of candidate configurations (i.e. a set of sequences of modelling steps with specified algorithms and their hyper-parameter values for each step) for model production, and the one with the best performance, according to a pre-specified criterion, is selected. However, the “best” performance achieved during CV is known to be an optimistically biased estimation of the generalization performance of the final model. To date, a relatively limited amount of research has been devoted to the correction of this bias, and all proposed methods either tend to over-correct or have limitations which can make their use impractical. In this thesis, we propose a Bootstrap-based Bias Correction method (BBC) which works regardless of the data analysis task (e.g. classification, regression), or the structure of the models involved, and requires only a small computational overhead with respect to the basic CV procedure. BBC corrects the bias in a conservative way, providing an almost unbiased estimate of performance. Its main idea is to bootstrap the whole process of selecting the best-performing configuration on the out-of-sample predictions of each configuration, without additional training of models. In comparison to the alternatives, namely the Nested Cross Validation (NCV), and a method by Tibshirani and Tibshirani (TT), BBC is computationally more efficient, yields performance estimates competitive to those of NCV and is applicable to any CV procedure. Subsequently, we also employ the idea of bootstrapping the out-of-sample predictions in order to speed up the execution time of the CV procedure. Specifically, using a bootstrap-based hypothesis test we stop training of models on new folds of statistically-significantly inferior configurations. The Bootstrap-based Early Dropping (BED) method significantly reduces the computational time of CV with a negligible or no effect on performance. The two methods can be combined leading to the BED-BBC procedure that is both efficient and provides accurate estimates of performance.
Language English
Subject Dropping
Issue date 2017-03-17
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link https://elocus.lib.uoc.gr//dlib/0/b/f/metadata-dlib-1491378124-496826-14534.tkl Bookmark and Share
Views 703

Digital Documents
No preview available

Download document
View document
Views : 36