Abstract |
Technological advances have infiltrated our everyday life more than ever before. High intelligence
devices and gadgets, equipped with cutting-edge technology algorithms, facilitate
and empower our lifestyle. Smart-home automation, next generation hearing aids,
robots with autonomous navigation systems have brought to the foreground of the research
community audio signal processing problems. One such problem is the estimation
of the number of sources and the directions from which sound originates, what we most
frequently call direction of arrival (DOA) estimation.
The problem of DOA estimation is active for more than thirty years, consequently a
plethora of algorithms have been proposed in the literature. Some of them can be considered
classic and frequently come from the telecommunications research area. Beamforming
techniques belong in this category, where an appropriately weighted sum of the
signals of a microphone array is used to forma receiving beam, which scans the space and
detects areas of activity. Subspace approaches, such as the well-known MUSIC algorithm,
formulate a spatial function that gets maximized when activity is detected, relying on the
decomposition of the array sample covariance matrix. Other algorithms stemmed from
research activity on blindly separating mixtures of audio signals, i.e., the blind source separation
(BSS) problem. Independent component analysis methods, where the goal is to
estimate a demixing matrix, which reveals DOA information, and sparse component analysis
methods, which exploit the sparsity of activity of the sources in some appropriately
chosen domain, both fall into the BSS category. A recently emerging category is that of
estimating the intensity vector, which points towards the net flow of sound energy, hence,
revealing the corresponding DOA of the generating sound source.
The aforementioned methods fail at either estimating accurately DOAs when multiple
sources are simultaneously active, e.g., beamforming techniques, or they are computationally
heavy and significantly affected by the amount of available data, e.g., ICA and
subspace approaches, while some are restricted by specific array geometries. We, thus,
observe the lack of a methodology than can address the problem of DOA estimation holistically,
aiming at tackling all aforementioned aspects of the problem.
In this thesis we aim at filling this gap with our proposed DRACOSS framework, i.e., an
integrated framework for tackling the problem of DOA estimation and counting of multiple,
simultaneously active, sound sources utilizing microphone arrays. DRACOSS is developed
in two-dimensional (2D) and three-dimensional (3D) spaces, using a uniformcircular
array and a spherical microphone array respectively. DRACOSS constitutes a procedure of four distinct steps: (a) exploitation of the sparsity of sound signals, (b) local singlesourceDOA
estimation, (c) histogramformation, and (d) post-processing of the histogram.
We detect the sparsity of involved sound signals in the time-frequency domain by utilizing
a relaxed sparsity assumption, which relies on the estimation of a mean correlation
coefficient between pairs of microphones. We proceed with the collection of local DOA
estimates in detected single-activity areas, which will then be used to form histograms.
For the 2D case we employ a local DOA estimator, designed specifically for circular arrays
and form one-dimensional histograms. For the 3D case we use an intensity vector estimator
and then form two-dimensional histograms. In both cases, by post-processing the
histograms we provide counting and DOA estimation results for all active sound sources.
DRACOSS performs robustly under a wide collection of simulated and real scenarios in
terms of noise and reverberation conditions, in terms of the number of simultaneously
active sources and in comparison with state-of-the-art methods. We also propose the formulation
of two classic DOA methods, i.e., beamforming and MUSIC, through the DRACOSS
framework, which manages to significantly improve their performance. Aiming at
constantly improving our approach and following the vivid technological stream, we show
recent, very promising results on counting by utilizing deep neural networks.
|