Your browser does not support JavaScript!

Post-graduate theses

Current Record: 32 of 824

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000456482
Title Few-shot deep learning algorithms for image classification
Alternative Title Αλγόριθμοι βαθιάς μάθησης για την κατηγοριοποίηση εικόνων με λίγα παραδείγματα
Author Τζεβελεκάκης, Κωνσταντίνος Ε.
Thesis advisor Κομοντάκης, Νίκος
Reviewer Στυλιανού, Γιάννης
Πανταζής, Ιωάννης
Abstract Deep learning, a thriving field of machine learning, has witnessed an unprecedented revolution during the last decade. The powerful idea of hierarchical representation learning combined with the abundance of data the digital era e↵ortlessly provides has led to breathtaking achievements in numerous scientific fields. Nevertheless, applications exist where a plethora of annotated training data is not available due to privacy restrictions, annotation difficulties, or prohibitive costs. Developing deep learning approaches that can be e↵ective in such low-data regime scenarios is still a largely open problem. In this work we consider such a low-data regime scenario for the problem of image classification, which is fundamental problem of Computer Vision. In the literature this is a setting also known as few-shot visual learning. In this case, given only a very small set of annotated images representing the available categories (e.g. even a single annotated image per category), the correct classification of an unlabeled image set is required. A common approach, termed metric learning, is to project both sets on a space, where samples are clustered with respect to their categories, in order to classify them using a similarity metric. Following the metric learning paradigm, we propose a methodology that utilizes deep embedding functions to project the samples on the embedding space. To implement these embedding functions, we combine the representation power of vision transformers, a state-of-the art deep learning architecture, amplified by employing pre-trained self-supervised foundation models. Undoubtedly, a few-shot learning algorithm should harness every bit of available information from the annotated data to be e↵ective under this low data regime. Hence, instead of just incorporating prior knowledge, encoded in the embedding functions parameters, we additionally exploit the information exchange between those functions. Specifically, we conduct a case study that can be summarized in two main questions; (i) Is an exchange of information between the embedding functions beneficial for the problem at hand? (ii) In what way this exchange of information can be established? In an attempt to answer these questions, we propose three main methods. These are namely ParallelVits, ParallelVits+Encoder, and BlendedVits. ParallelVits method undertakes the role of a performance baseline since it restricts the information flow between the embedding functions, whereas the rest of the methods enable information exchange by leveraging the flexibility of vision transformers architecture. Moreover, several hyper-parameters of the employed meta-learning framework, the neural network architectures, and the aforementioned methods have been put under scrutiny. The evaluation of our method has led to some interesting findings as well as to very promising experimental results, leading to near state-of-the-art performance in the miniImageNet dataset.
Language English
Subject Artificial intelligence
Neural networks
Transformers
Μετασχηματιστές
Νευρωνικά δίκτυα
Τεχνητή νοημοσύνη
Issue date 2023-07-21
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link https://elocus.lib.uoc.gr//dlib/1/2/6/metadata-dlib-1687331092-107361-6722.tkl Bookmark and Share
Views 614

Digital Documents
No preview available

No permission to view document.
It won't be available until: 2024-07-21