E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Doctoral theses

Doctoral theses

Search command : Author="Βόντας" And Author="Ιωάννης"

Current Record: 41 of 125

[Add to Basket]

Identifier

000417143

Title

DRACOSS : a framework for direction of arrival estimation and counting of multiple sound sources with microphone arrays

Alternative Title

DRACOSS: ένα ολοκληρωμένο πλαίσιο για εκτίμηση πλήθους και κατεύθυνσης άφιξης πολλαπλών ηχητικών πηγών με συστοιχίες μικροφώνων

Author

Παυλίδη, Δέσποινα Απόστολος

Thesis advisor

Μούχταρης, Αθανάσιος

Reviewer

Τσακαλίδης, Παναγιώτης
Pulkki, Ville

Abstract

Technological advances have infiltrated our everyday life more than ever before. High intelligence devices and gadgets, equipped with cutting-edge technology algorithms, facilitate and empower our lifestyle. Smart-home automation, next generation hearing aids, robots with autonomous navigation systems have brought to the foreground of the research community audio signal processing problems. One such problem is the estimation of the number of sources and the directions from which sound originates, what we most frequently call direction of arrival (DOA) estimation. The problem of DOA estimation is active for more than thirty years, consequently a plethora of algorithms have been proposed in the literature. Some of them can be considered classic and frequently come from the telecommunications research area. Beamforming techniques belong in this category, where an appropriately weighted sum of the signals of a microphone array is used to forma receiving beam, which scans the space and detects areas of activity. Subspace approaches, such as the well-known MUSIC algorithm, formulate a spatial function that gets maximized when activity is detected, relying on the decomposition of the array sample covariance matrix. Other algorithms stemmed from research activity on blindly separating mixtures of audio signals, i.e., the blind source separation (BSS) problem. Independent component analysis methods, where the goal is to estimate a demixing matrix, which reveals DOA information, and sparse component analysis methods, which exploit the sparsity of activity of the sources in some appropriately chosen domain, both fall into the BSS category. A recently emerging category is that of estimating the intensity vector, which points towards the net flow of sound energy, hence, revealing the corresponding DOA of the generating sound source. The aforementioned methods fail at either estimating accurately DOAs when multiple sources are simultaneously active, e.g., beamforming techniques, or they are computationally heavy and significantly affected by the amount of available data, e.g., ICA and subspace approaches, while some are restricted by specific array geometries. We, thus, observe the lack of a methodology than can address the problem of DOA estimation holistically, aiming at tackling all aforementioned aspects of the problem. In this thesis we aim at filling this gap with our proposed DRACOSS framework, i.e., an integrated framework for tackling the problem of DOA estimation and counting of multiple, simultaneously active, sound sources utilizing microphone arrays. DRACOSS is developed in two-dimensional (2D) and three-dimensional (3D) spaces, using a uniformcircular array and a spherical microphone array respectively. DRACOSS constitutes a procedure of four distinct steps: (a) exploitation of the sparsity of sound signals, (b) local singlesourceDOA estimation, (c) histogramformation, and (d) post-processing of the histogram. We detect the sparsity of involved sound signals in the time-frequency domain by utilizing a relaxed sparsity assumption, which relies on the estimation of a mean correlation coefficient between pairs of microphones. We proceed with the collection of local DOA estimates in detected single-activity areas, which will then be used to form histograms. For the 2D case we employ a local DOA estimator, designed specifically for circular arrays and form one-dimensional histograms. For the 3D case we use an intensity vector estimator and then form two-dimensional histograms. In both cases, by post-processing the histograms we provide counting and DOA estimation results for all active sound sources. DRACOSS performs robustly under a wide collection of simulated and real scenarios in terms of noise and reverberation conditions, in terms of the number of simultaneously active sources and in comparison with state-of-the-art methods. We also propose the formulation of two classic DOA methods, i.e., beamforming and MUSIC, through the DRACOSS framework, which manages to significantly improve their performance. Aiming at constantly improving our approach and following the vivid technological stream, we show recent, very promising results on counting by utilizing deep neural networks.

Language

English

Subject

Histogram processing

Sound intensity vector

Sparsity

Spherical harmonic domain

Spherical microphone arrays

Time-frequency domain

Αραιότητα

Διάνυσμα ηχητικής έντασης

Επεξεργασία ιστογραμμάτων