E-Locus - Institutional Repository of the University of Crete - On the use of neural networks and dilation for speech enhancement in a Generative Adversarial Network environment

Home On the use of neural networks and dilation for speech enhancement in a Generative Adversarial Network environment

Results - Details

[Add to Basket]

Identifier

000438710

Title

On the use of neural networks and dilation for speech enhancement in a Generative Adversarial Network environment

Alternative Title

Περί της χρήσης νευρωνικών δικτύων για βελτίωση σήματος φωνής και της διαστολής σε ένα περιβάλλον Παραγωγικού Αντιπαραθετικού Δικτύου

Author

Μπακαγιάννης, Λεωνίδας

Thesis advisor

Στυλιανού, Ιωάννης

Reviewer

Τσακαλίδης, Παναγιώτης
Πανταζής, Γιάννης

Abstract

Speech Enhancement(SE) is a Speech Processing field which aims to improve speech quality of noisy signals in an attempt to increase their intelligibility and,as a consequence, to reduce the amount of effort that someone has to make in order to listen to them. Several algorithms have been proposed for speech enhancement throughout the 20th century. Most of them mainly took advantage of the spectral characteristics of the noisy signal. But with the use of Neural Networks (NNs) escalating over the recent years, there have been several neural-network-based systems that are used to enhance a signal and remove noise. A relatively recent class of machine learning frameworks based on Neural Networks are Generative Adversarial Networks(GANs) which use two separate neural networks,the Generator and the Discriminator, that compete with each other in order to achieve the system’s goals. These two networks play a minimax zero-sum game, where the Generator tries to produce samples that seem real to the Discriminator with the ultimate goal of the generator being the production of samples that the discriminator cannot distinguish whether they erupt from the generator or from the real distribution. In this thesis, a study of the main neural-network-based systems for speech enhancement is presented alongside a study on how a neural network concept, dilation, can be used to boost speech enhancement performance. Specifically, a comparative evaluation of the architectures of three Speech Enhancement system (SE-WaveNet, SEGAN, SE-FFTNet) is presented as well their comparative evaluation based on objective (PESQ,STOI,CSIG,CBAK,CVAL,SSNR) and subjective (Mean Opinion Score) metrics. Additionally, the experiments regarding the application of dilation in a Generative Adversial Network environment in an effort to reduce the number of parameters required for a Speech Enhancement Generative Adversarial Network is presented.

Language

English

Issue date

2021-03-26

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

587

Digital Documents
	Download document View document Views : 2