E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Doctoral theses

Doctoral theses

Current Record: 16 of 125

[Add to Basket]

Identifier

000456441

Title

Learning deep generative models for the enhancement of imbalanced signal classification

Alternative Title

Μάθηση βαθιών μοντέλων παραγωγής δεδομένων για τη βελτιστοποίηση της ταξινόμησης σημάτων με μη ισορροπημένη κατανομή κλάσεων

Author

Τρουλλινού, Ειρήνη Ι

Thesis advisor

Τσακαλίδης, Παναγιώτης

Reviewer

Παπαδοπούλη, Μαρία
Ποϊράζη, Παναγιώτα
Φρουδάκης, Εμμανουήλ
Παπαχαριλάου, Γιάννης
Τσαγκατάκης, Γρηγόριος
Τζαγκαράκης, Γεώργιος

Abstract

Accurately classifying different types of neuronal cells is crucial for comprehending their impact on brain functions. However, due to their biological complexity, automated and reliable classification of neuronal cell types remains a challenging task. Additionally, the inherent imbalanced distribution of neuronal cells in the brain poses another significant hurdle in the classification process. This can lead to unstable predictions and poor performance of most classification algorithms. The problem of imbalanced classification is not limited to neuronal cell-type classification alone, as it is a common issue in many real-world applications with limited labeled data and high class imbalance ratios, which results in a significant decrease in performance. Therefore, this dissertation aims to address both the challenge of automated neuronal cell-type classification and the design of robust generative models that can tackle the imbalanced classification problem by generating synthetic data. Typical methods for researching neuronal cell-type classification involve laborious and costly immunohistochemical analysis, which relies on molecular markers that may be expressed in several cell types. Additionally, algorithms that extract features based on cellular characteristics face the difficulty of identifying unique features for each class. Both methods demand substantial human intervention and are time-consuming. To overcome these challenges, this dissertation introduces the first automated neuronal cell type classification method based on deep learning and utilizing the time series of calcium (Ca2+) activity signals, a previously unexplored feature. The study focuses on two real-world datasets, the Goal Oriented Learning (GOL) task and the Random Foraging (RF) task, which describe different experiments on test animals. For the GOL task, we conduct a comparative research analysis of 1-Dimensional Convolutional Neural Networks (1D-CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory Networks (LSTMs). Additionally, we propose a simple data re-organization that significantly accelerates the training of RNNs and LSTMs, which typically require substantial training time when processing long time series data. For the RF task, we employ a 2D-CNN model, and we additionally utilize the novel features of animal velocity and the z-depth of each neuronal cell. The imbalanced classification problem has prompted the research community to propose three primary approaches: data-level, algorithmic-level, and hybrid methods that combine both. Data-level methods involve generative models, often based on Generative Adversarial Networks that rely on large quantities of data, while algorithmic-level methods require domain expert knowledge to develop effective learning objectives, which may be less accessible to users without such expertise. Usually, both these methods are applied to image data, and less frequently to time series data, but seldom to both. To address these limitations, we present GENDA, a Generative Neighborhood based Deep Autoencoder that is straightforward and effective in its design, and can be successfully applied to both image and time series data. GENDA learns latent representations based on the neighboring embedding space of the samples and can generate as many samples as necessary to balance the dataset, allowing for the efficient training of a classification-based model. Extensive experiments conducted on a variety of widely-used real datasets demonstrate the effectiveness of the proposed method. Finally, in order to enhance GENDA’s performance and leverage the information that can be provided by a classifier model during the generative model’s training process, we proposed GENDA-XL, a Generative Neighborhood-based Deep Autoencoder with eXtended Loss, which is the extension of GENDA. GENDA-XL features a more robust loss function compared to GENDA, as GENDA-XL uses a supervised similarity metric for learning efficient latent representations based on the neighboring embedding space of the samples, and also incorporates a pre-trained classifier model into its architecture that associates each generated sample with its specific label. Our experimental results demonstrate that GENDA-XL outperforms both GENDA and other methods that aim to address the imbalanced classification problem.

Language

English

Subject

Artificial neural networks

Calcium imaging data

Data augmentation methods

Generative models

Image data

Imbalanced classification

Latent space

Timeseries data

Timeseries data Artificial neural networks