E-Locus - Institutional Repository of the University of Crete

Home Search

Results - Details

Search command : Author="Τζιρίτας" And Author="Γεώργιος"

Current Record: 4 of 12

[Add to Basket]

Identifier

000412226

Title

Latent feature construction for gene expressions improves predictions

Alternative Title

Η κατασκευή κρυμμένων χαρακτηριστικών για γονιδιακές εκφράσεις βελτιώνει την προβλεπτική ικανότητα

Author

Τσέλας, Χρήστος Ρ.

Thesis advisor

Τσαμαρδινός, Ιωάννης

Reviewer

Τζιρίτας, Γεώργιος
Στυλιανού, Ιωάννης

Abstract

Gene expression analysis aims to improve the understanding of the intrinsic cellular processes and contribute towards the successful implementation of personalized medicine. The advent of high-throughput gene expression technologies such as microarrays and RNA-sequencing (RNAseq) as well as the recent reduction of cost resulted in an explosion of publicly-available datasets. The generated datasets are inevitably high-dimensional with typically small sample size that severely limits the potential for developing reproducible prognostic models. Being able to increase the predictive power without losing the information of the measured genome on a newly-produced dataset is of paramount importance. Despite the fact that various studies attempt to perform dimensionality reduction and dataset integration so as to increase classification performance and robustness, there are still challenging issues primarily due to the limited number of data as well as the technological diversity and heterogeneity across the datasets. Exploiting the redundancy of genomics data, we constructed low-dimensional, universal, latent feature spaces of the genome utilizing several dimensionality reduction approaches and a diverse set of curated datasets. Standard Principal Component Analysis (PCA), kernel PCA and Neural Network Autoencoders were applied on datasets from four different platforms. While linear techniques showed better reconstruction performance, nonlinear approaches were able to capture more complex gene interactions, and thus enjoyed stronger classification power. When newly-seen gene expression datasets projected to a latent space of 200 dimensions, the classification power was improved. Moreover, we performed a large-scale experiment where the dimensionality reduction methods were trained on an integrated set of 59864 unique samples. The classification power was further improved especially for Autoencoder. Rather surprisingly, the statistical variability of the additional datasets increased the classification performance implying that intricate biological features were better learn. We additionally tested the possibility of cross-platform data augmentation by constructing an intermediate feature space showing that when platforms share common characteristics (such as GLP570 and GLP96) the predictive performance was also improved.

Language

English

Subject

Dimensional reduction

Gene expression

Machine learning

Γονιδιακή έκφραση

Μηχανική μάθηση

Συμπίεση διαστάσεων

Issue date

2017-11-24

Collection