Abstract |
Speech Enhancement(SE) is a Speech Processing field which aims to improve speech
quality of noisy signals in an attempt to increase their intelligibility and,as a consequence,
to reduce the amount of effort that someone has to make in order to listen to them.
Several algorithms have been proposed for speech enhancement throughout the 20th
century. Most of them mainly took advantage of the spectral characteristics of the noisy
signal. But with the use of Neural Networks (NNs) escalating over the recent years, there
have been several neural-network-based systems that are used to enhance a signal and
remove noise. A relatively recent class of machine learning frameworks based on Neural
Networks are Generative Adversarial Networks(GANs) which use two separate neural
networks,the Generator and the Discriminator, that compete with each other in order to
achieve the system’s goals. These two networks play a minimax zero-sum game, where
the Generator tries to produce samples that seem real to the Discriminator with the
ultimate goal of the generator being the production of samples that the discriminator
cannot distinguish whether they erupt from the generator or from the real distribution.
In this thesis, a study of the main neural-network-based systems for speech enhancement
is presented alongside a study on how a neural network concept, dilation, can be used to
boost speech enhancement performance. Specifically, a comparative evaluation of the
architectures of three Speech Enhancement system (SE-WaveNet, SEGAN, SE-FFTNet) is
presented as well their comparative evaluation based on objective
(PESQ,STOI,CSIG,CBAK,CVAL,SSNR) and subjective (Mean Opinion Score) metrics.
Additionally, the experiments regarding the application of dilation in a Generative
Adversial Network environment in an effort to reduce the number of parameters required
for a Speech Enhancement Generative Adversarial Network is presented.
|