E-Locus - Institutional Repository of the University of Crete - Text-independent speaker identification using sparsely excited speech signals and compressed sensing

Home Text-independent speaker identification using sparsely excited speech signals and compressed sensing

Results - Details

[Add to Basket]

Identifier

000361529

Title

Text-independent speaker identification using sparsely excited speech signals and compressed sensing

Alternative Title

Αναγνώριση ομιλητή ανεξάρτητη από το κείμενου με τη βοήθεια αραιών σημάτων και της θεωρίας συμπιεστικής δειγματοληψίας

Author

Καραμιχάλη, Ελένη Ηλία

Thesis advisor

Μουχτάρης, Αθανάσιος

Abstract

Compressed Sensing (CS) is an emerging theory that claims that the Nyquist sampling theorem yields for more samples than necessary. According to the Nyquist sampling theorem, the sampling rate of a signal must be at least equal to the double of its maximum frequency. On the contrary, CS seeks to represent a signal using a small number of linear, non-adaptive measurements which are far less than the signal’s bandwidth. Thus, CS accomplishes both compression and sampling in one low-complexity step. The only requirement for CS to be efficient is that the signal is sparse in some basis, which means it has only a few non zero elements in some basis. Compressed sensing has been used for full signal reconstruction, but in our case it was used for feature recovery in order to perform text-independent speaker identification. Speaker identification is the act of recognizing a speaker under the condition that he is a part of a database which has been modeled beforehand using features extracted from each speaker’s training set. Specifically, we trained a Gaussian Mixture Model for each speaker in the database, using Line Spectral Frequencies. Text-independent speaker identification means that the testing speech signals were not included in the training phase. We chose to use CS theory for speaker identification for two reasons. The first one is that CS theory requires just a few samples to reconstruct a signal and this is very useful in environments like sensor networks where there are limitations in the data traffic that can be sent between the sensor nodes. Thus, although traffic is limited, we are still able to avoid information loss. The second reason is that CS algorithms are robust to noise. These algorithms force the signals to be sparse in some basis which results in neglecting noisy samples that have low energy. After experimenting with some CS algorithms for signal reconstruction, we decided to use Orthogonal Matching Pursuit for our research because of its low complexity and the lowest feature distortion after the reconstruction. The results may not be as good as the ones using features extracted from the original speech signals, but they are quite good regarding the number of samples that were used, and are very promising for future investigation and research.

Language

English

Subject

Compressed sensing

Sparse signals

Speaker identification

Αναγνώριση ομιλητή

Αραιά σήματα

Συμπιεστική δειγματοληψία

Issue date

2010-11-19

Collection