Your browser does not support JavaScript!

Home    Search  

Results - Details

Search command : Author="Στεφανίδης"  And Author="Κωνσταντίνος"

Current Record: 9 of 45

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000414839
Title Single shot 3D hand pose estimation using radial basis function networks trained on synthetic data
Alternative Title Εκτίμηση της 3Δ πόζας του ανθρώπινου χεριού από μια εικόνα χρησιμοποιώντας δίκτυα συναρτήσεων ακτινικής βάσης εκπαιδευμένα σε συνθετικά δεδομένα
Author Νικοδήμου, Βασίλειος - Κλείτος Β.
Thesis advisor Αργυρός, Αντώνης
Reviewer Στεφανίδης, Κωνσταντίνος
Ζαμπούλης, Ξενοφών
Abstract Human motion tracking and analysis forms an important category of problems in the field of Computer Vision. Within this category, the class of problems that deal with the estimation of the full pose of a human hand are especially interesting. This thesis treats the problem of estimating in real time the full pose of a human hand, using only visual input. Many approaches have been proposed to solve this problem, including applying machine learning techniques. The recent success of deep neural networks for computer vision tasks has resulted in new advancements in this area. Despite the significant effort that has been devoted to the problem of 3D hand pose estimation, no method has succeeded to tackle the problem in its full generality. Machine learning approaches and, in particular, deep learning ones require large annotated datasets for training. The annotation process in real-world data is human labor intensive and time consuming. Therefore, an automatic way of creating and annotating training data is preferable. The use of synthetic data provides an easy way to obtain large volumes of accurately annotated data. On the negative side of using synthetic data, details of the real data may not be accurately simulated. Existing machine learning techniques are sensitive to the distribution of input data and may fail to generalize to real-world data when trained on synthetic data. In this thesis we present a novel framework to perform single shot 3D hand pose estimation from depth maps. More specifically, the input is assumed to be a single depth map, depicting a single hand in isolation, that is, not occluded by its surroundings. The depth map is acquired using a depth sensor, and no visual aids (e.g., markers) are used to facilitate the task of localizing the hand or parts of it. The method follows a coarse–to-fine strategy, employing Radial Basis Function Networks (RBFNs) that are trained on a large synthetic dataset. In order to synthesize the dataset that is used to train the RBFNs, we capture a real-world sequence of a human hand performing a set of diverse hand gestures. We proceed to estimate the hand pose for each frame of the sequence using an offline hand tracking method with high computational budget, achieving accurate estimations. Given the set of all the recovered hand poses, we proceed to select the most diverse of them. We use this representative set, along with a dense sampling of all possible rotations as a seed to generate the synthetic training set. Αn initialization RBFN and multiple specialized RBFNs are trained on parts of this large synthetic dataset. There are two classes of specialized RBFNs. One class is appropriately trained to recover the global hand rotation given the hand articulation and the second one to recover the global hand articulation given the hand rotation. Given an input depth map, we use the trained models to recover a hand pose. Towards this end, the initialization RBFN is used to provide a rough pose estimation. Subsequently, the specialized RBFNs are employed in an iterative refinement scheme in order to improve the initial estimation. This iterative refinement scheme is repeated for a predetermined number of repetitions, after the completion of which the final estimation is retrieved. The overall computational cost of the proposed approach is dominated by the computation of several RBFNs, yielding in practice a system that achieves close to real-time performance. Furthermore, the proposed method is parallelizable, taking advantage of the inherent data-parallelism of RBFNs. The method requires few real-world data and virtually no manual annotation, and it has few hyper-parameters that are experimentally investigated to identify their optimal values. We perform a quantitative evaluation of our method on a test sequence of our own. Additionally, we present quantitative results on a public dataset that is commonly used to evaluate hand pose estimation and tracking methods. Qualitative results are also presented for both datasets. We show that our approach achieves promising results in all cases. Conclusively, this work shows that the proposed RBFNs-based approach can generalize quite well when learning from synthetic data.
Language English
Subject Artificial neural networks
Hand pose estimation
Radial basis function networks
Synthetic data
Δίκτυα συναρτήσεων ακτινικής βάσης
Εκτίμηση πόζας χεριού
Συνθετκκά δεδομένα
Τεχνητά νευρωνικά δίκτυα
Issue date 2018-03-23
Collection   Faculty/Department--Faculty of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link Bookmark and Share
Views 33

Digital Documents
No preview available

View document
Views : 60