E-Locus - Institutional Repository of the University of Crete

Home Search

Results - Details

Search command : Author="Τζιρίτας" And Author="Γεώργιος"

Current Record: 2 of 12

[Add to Basket]

Identifier

000434163

Title

Learning biologically interpretablelLatent representations from gene expression data

Alternative Title

Μαθαίνοντας βιολογικά ερμηνεύσιμες κρυφές αναπαραστάσεις από δεδομένα γονιδιακών εκφράσεων

Author

Καραγιαννάκη, Ιουλία Ε.

Thesis advisor

Τσαμαρδινός, Ιωάννης

Reviewer

Τζιρίτας, Γεώργιος
Πανταζής, Γιάννης

Abstract

Gene expression data are typically high dimensional with low sample size. This leads to several statistical and analytical challenges that one needs to overcome in order to analyze and infer the underlying biological mechanisms of such data. To this end, several dimensionality reduction techniques have been proposed. Dimensionality reduction techniques learn a lower dimensional space (latent space), of newly constructed features and represent the data as a sum of those (latent representations). The projection of the data to the latent feature space compresses the data, retains the significant information and reduces noise. Typical dimensionality reduction techniques, such as Principal Component Analysis, derive latent representations that are uninterpretable biologically. In order to regain a degree of interpretability, other methods return sparse latent representations. Particularly, the new features are constructed as linear combinations of only a few of the molecular quantities. However, sparse latent representations are still hard to interpret biologically as they do not directly correspond to the known biological pathways or other known genesets. In this thesis, we present a novel algorithm for feature construction and dimensionality reduction called Pathway Activity Score Learning (PASL). The major novelty of PASL is that the constructed features are constrained to directly correspond to known molecular pathways and can be interpreted as pathway activity scores. PASL is evaluated both on simulated and real data. We show that PASL retains the predictive information for disease classification on new, unseen datasets. We also show that differential activation analysis provides complementary information to standard geneset enrichment analysis.

Language

English

Subject

Dimensionality reduction

Disease classification

Κατηγοριοποίηση ασθενειών

Μείωση διαστάσεων

Issue date

2020-11-27

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses