Abstract |
Accurately classifying different types of neuronal cells is crucial for comprehending their
impact on brain functions. However, due to their biological complexity, automated and
reliable classification of neuronal cell types remains a challenging task. Additionally, the
inherent imbalanced distribution of neuronal cells in the brain poses another significant
hurdle in the classification process. This can lead to unstable predictions and poor performance of most classification algorithms. The problem of imbalanced classification is
not limited to neuronal cell-type classification alone, as it is a common issue in many
real-world applications with limited labeled data and high class imbalance ratios, which
results in a significant decrease in performance. Therefore, this dissertation aims to address both the challenge of automated neuronal cell-type classification and the design of
robust generative models that can tackle the imbalanced classification problem by generating synthetic data.
Typical methods for researching neuronal cell-type classification involve laborious and
costly immunohistochemical analysis, which relies on molecular markers that may be expressed in several cell types. Additionally, algorithms that extract features based on cellular characteristics face the difficulty of identifying unique features for each class. Both
methods demand substantial human intervention and are time-consuming. To overcome
these challenges, this dissertation introduces the first automated neuronal cell type classification method based on deep learning and utilizing the time series of calcium (Ca2+)
activity signals, a previously unexplored feature. The study focuses on two real-world
datasets, the Goal Oriented Learning (GOL) task and the Random Foraging (RF) task, which
describe different experiments on test animals. For the GOL task, we conduct a comparative research analysis of 1-Dimensional Convolutional Neural Networks (1D-CNNs), Recurrent Neural Networks (RNNs), and Long Short-Term Memory Networks (LSTMs). Additionally, we propose a simple data re-organization that significantly accelerates the training of RNNs and LSTMs, which typically require substantial training time when processing
long time series data. For the RF task, we employ a 2D-CNN model, and we additionally
utilize the novel features of animal velocity and the z-depth of each neuronal cell.
The imbalanced classification problem has prompted the research community to propose three primary approaches: data-level, algorithmic-level, and hybrid methods that
combine both. Data-level methods involve generative models, often based on Generative
Adversarial Networks that rely on large quantities of data, while algorithmic-level methods
require domain expert knowledge to develop effective learning objectives, which may be less accessible to users without such expertise. Usually, both these methods are applied
to image data, and less frequently to time series data, but seldom to both. To address
these limitations, we present GENDA, a Generative Neighborhood based Deep Autoencoder that is straightforward and effective in its design, and can be successfully applied to
both image and time series data. GENDA learns latent representations based on the neighboring embedding space of the samples and can generate as many samples as necessary
to balance the dataset, allowing for the efficient training of a classification-based model.
Extensive experiments conducted on a variety of widely-used real datasets demonstrate
the effectiveness of the proposed method.
Finally, in order to enhance GENDA’s performance and leverage the information that
can be provided by a classifier model during the generative model’s training process, we
proposed GENDA-XL, a Generative Neighborhood-based Deep Autoencoder with eXtended
Loss, which is the extension of GENDA. GENDA-XL features a more robust loss function
compared to GENDA, as GENDA-XL uses a supervised similarity metric for learning efficient latent representations based on the neighboring embedding space of the samples,
and also incorporates a pre-trained classifier model into its architecture that associates
each generated sample with its specific label. Our experimental results demonstrate that
GENDA-XL outperforms both GENDA and other methods that aim to address the imbalanced classification problem.
|