Your browser does not support JavaScript!

Doctoral theses

Search command : Author="Παπαγιαννάκης"  And Author="Γεώργιος"

Current Record: 12 of 114

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000452048
Title Neural networks for the quality and intelligibility enhancement of speech
Author PV, Muhammed Shifas I.
Thesis advisor Στυλιανού, Ιωάννης
Reviewer King, Simon
Cooke, Martin
Τσακαλίδης, Παναγιώτης
Κατσαμάνης, Αθανάσιος
Κομοντάκης, Νικόλαος
Πανταζής, Γιάννης
Abstract Speech is the most effective way to communicate ideas generated in human minds. However, spoken communication in real life is often affected by noise in the surroundings which can substantially reduce the intelligibility and perceived quality of the signal. Techniques to enhance the communication have been proposed in the past and successfully tested in modern engines like Amazon Alexa, allowing it to operate in adverse conditions. The ambient noise can disrupt both signal acquisition by a device as well as speech perception by the listener. Speech enhancement (SE) techniques are developed to restore speech from its disrupted observations, and listening enhancement (LE) techniques are designed to improve the perceived intelligibility by altering the speech before its presentation in noise as the naturally produced speech is not always very intelligible. Often SE and LE systems are operated as two independent modules in modern devices , which limit their performance. The effort in this thesis is to combine the SE and LE enhancement techniques to have an end-to-end system for communication applications. We approach the problem from the neural networking perspective. As such, multiple novel architecturesfor SE and LE were invented, and the conceptsfrom those models have been used to build the final end-to-end system. Regarding speech enhancement (SE), three new architectures have been invented; two of which are in the feature domain and one in the waveform domain. The feature domain architectures formulate the enhancement task in the short-time Fourier transform (STFT) representation of speech, therefore, are parametrically less complex. Features from the two-dimensional (2D) representation of speech are extracted with the use gruCNN neural cell, which is found effective in isolating noises with high variance. The gruCNN-SE model has outperformed state-of-the-art speech enhancementsystems with standard convolution (CNN) and long short-term memory (LSTM) cells. Subsequently, a bidirectional extension of gruCNN module (BigruCNN) is proposed with the inclusion of backward dependencies among the 2D frames. Besides, a novel waveform domain network with a characteristic dilation pattern (SEFFTNet) is presented. The SE-FFTNet is found efficient in learning the statistical dissimilarity of speech and noise in a noisy observation. Regarding listening enhancement (LE), a novel WaveNet-like architecture to improve the listener's intelligibility in noise (wSSDRC) is proposed. The wSSDRC system performs both spectral shaping (SS) and dynamic range compression (DRC) of the input for intelligibility enhancement. The model is found to produce a median absolute intelligibility boost of 39% for normal hearing and 38% for hearing-impaired listeners in stationary noise over the unprocessed speech. Subsequently, a novel end-to-end system which combines the objectives of SE and LE is proposed to enhance the intelligibility of noisy observations. The end-to-end system was found to increase the listeners’ keyword correct rate in stationary noise from 2.5% to 60% at 0 dB input SNR, and from about 10% to 75% at 5 dB input SNR, compared with the unprocessed speech, while substantially outperforming the modular setup with SE followed by LE.
Language English
Issue date 2022-12-02
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses
  Type of Work--Doctoral theses
Permanent Link https://elocus.lib.uoc.gr//dlib/6/6/d/metadata-dlib-1667821856-979724-22866.tkl Bookmark and Share
Views 499

Digital Documents
No preview available

Download document
View document
Views : 1