Results - Details
Search command : Author="Τζιρίτας"
And Author="Γεώργιος"
Current Record: 4 of 12
|
Identifier |
000412226 |
Title |
Latent feature construction for gene expressions improves predictions |
Alternative Title |
Η κατασκευή κρυμμένων χαρακτηριστικών για γονιδιακές εκφράσεις βελτιώνει την προβλεπτική ικανότητα |
Author
|
Τσέλας, Χρήστος Ρ.
|
Thesis advisor
|
Τσαμαρδινός, Ιωάννης
|
Reviewer
|
Τζιρίτας, Γεώργιος
Στυλιανού, Ιωάννης
|
Abstract |
Gene expression analysis aims to improve the understanding of the intrinsic cellular processes
and contribute towards the successful implementation of personalized medicine. The advent of
high-throughput gene expression technologies such as microarrays and RNA-sequencing
(RNAseq) as well as the recent reduction of cost resulted in an explosion of publicly-available
datasets. The generated datasets are inevitably high-dimensional with typically small sample size
that severely limits the potential for developing reproducible prognostic models. Being able to
increase the predictive power without losing the information of the measured genome on a
newly-produced dataset is of paramount importance. Despite the fact that various studies
attempt to perform dimensionality reduction and dataset integration so as to increase
classification performance and robustness, there are still challenging issues primarily due to the
limited number of data as well as the technological diversity and heterogeneity across the
datasets.
Exploiting the redundancy of genomics data, we constructed low-dimensional, universal, latent
feature spaces of the genome utilizing several dimensionality reduction approaches and a diverse
set of curated datasets. Standard Principal Component Analysis (PCA), kernel PCA and Neural
Network Autoencoders were applied on datasets from four different platforms. While linear
techniques showed better reconstruction performance, nonlinear approaches were able to
capture more complex gene interactions, and thus enjoyed stronger classification power. When
newly-seen gene expression datasets projected to a latent space of 200 dimensions, the
classification power was improved. Moreover, we performed a large-scale experiment where the
dimensionality reduction methods were trained on an integrated set of 59864 unique samples.
The classification power was further improved especially for Autoencoder. Rather surprisingly,
the statistical variability of the additional datasets increased the classification performance
implying that intricate biological features were better learn. We additionally tested the possibility
of cross-platform data augmentation by constructing an intermediate feature space showing that
when platforms share common characteristics (such as GLP570 and GLP96) the predictive
performance was also improved.
|
Language |
English |
Subject |
Dimensional reduction |
|
Gene expression |
|
Machine learning |
|
Γονιδιακή έκφραση |
|
Μηχανική μάθηση |
|
Συμπίεση διαστάσεων |
Issue date |
2017-11-24 |
Collection
|
School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
|
|
Type of Work--Post-graduate theses
|
Permanent Link |
https://elocus.lib.uoc.gr//dlib/6/3/a/metadata-dlib-1513065632-757549-13451.tkl
|
Views |
520 |