E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Post-graduate theses

Post-graduate theses

Current Record: 32 of 824

[Add to Basket]

Identifier

000456482

Title

Few-shot deep learning algorithms for image classification

Alternative Title

Αλγόριθμοι βαθιάς μάθησης για την κατηγοριοποίηση εικόνων με λίγα παραδείγματα

Author

Τζεβελεκάκης, Κωνσταντίνος Ε.

Thesis advisor

Κομοντάκης, Νίκος

Reviewer

Στυλιανού, Γιάννης
Πανταζής, Ιωάννης

Abstract

Deep learning, a thriving field of machine learning, has witnessed an unprecedented revolution during the last decade. The powerful idea of hierarchical representation learning combined with the abundance of data the digital era e↵ortlessly provides has led to breathtaking achievements in numerous scientific fields. Nevertheless, applications exist where a plethora of annotated training data is not available due to privacy restrictions, annotation difficulties, or prohibitive costs. Developing deep learning approaches that can be e↵ective in such low-data regime scenarios is still a largely open problem. In this work we consider such a low-data regime scenario for the problem of image classification, which is fundamental problem of Computer Vision. In the literature this is a setting also known as few-shot visual learning. In this case, given only a very small set of annotated images representing the available categories (e.g. even a single annotated image per category), the correct classification of an unlabeled image set is required. A common approach, termed metric learning, is to project both sets on a space, where samples are clustered with respect to their categories, in order to classify them using a similarity metric. Following the metric learning paradigm, we propose a methodology that utilizes deep embedding functions to project the samples on the embedding space. To implement these embedding functions, we combine the representation power of vision transformers, a state-of-the art deep learning architecture, amplified by employing pre-trained self-supervised foundation models. Undoubtedly, a few-shot learning algorithm should harness every bit of available information from the annotated data to be e↵ective under this low data regime. Hence, instead of just incorporating prior knowledge, encoded in the embedding functions parameters, we additionally exploit the information exchange between those functions. Specifically, we conduct a case study that can be summarized in two main questions; (i) Is an exchange of information between the embedding functions beneficial for the problem at hand? (ii) In what way this exchange of information can be established? In an attempt to answer these questions, we propose three main methods. These are namely ParallelVits, ParallelVits+Encoder, and BlendedVits. ParallelVits method undertakes the role of a performance baseline since it restricts the information flow between the embedding functions, whereas the rest of the methods enable information exchange by leveraging the flexibility of vision transformers architecture. Moreover, several hyper-parameters of the employed meta-learning framework, the neural network architectures, and the aforementioned methods have been put under scrutiny. The evaluation of our method has led to some interesting findings as well as to very promising experimental results, leading to near state-of-the-art performance in the miniImageNet dataset.

Language

English

Subject

Artificial intelligence

Neural networks

Transformers

Μετασχηματιστές

Νευρωνικά δίκτυα

Τεχνητή νοημοσύνη

Issue date

2023-07-21

Collection