Post-graduate theses
Current Record: 45 of 824
|
Identifier |
000452055 |
Title |
A neural-based sinusoidal vocoder |
Alternative Title |
Ένας νευρωνικός ημιτονοειδής φωνοκωδικοποιητής |
Author
|
Ραπτάκης, Μιχαήλ Γ.
|
Thesis advisor
|
Στυλιανού, Ιωάννης
|
Reviewer
|
Κομοντάκης, Νικόλαος
Πανταζής, Ιωάννης
|
Abstract |
The new era of voice encoding is entirely dominated by neural network models capable of producing natural-sounding synthetic speech, undoubtedly superior
compared to all previous parametric methods. However, their exceptionally high
quality comes at the cost of being spatially large and computationally demanding.
Additionally, despite taking into account some statistical traits of speech signals,
most state-of-the-art architectures rarely consider fundamental and well-studied
characteristics or methodologies discovered by speech processing literature of the
past. In this work, instead of directly synthesizing speech signals by solely using
the “raw power” of neural networks, the aim is to take advantage of speech’s quasiperiodic and sinusoidal properties to show how a modern neural-based vocoder can
generate speech based on a sinusoidal representation. Using MelGAN as our starting vocoder model due to its renowned speed and quality, we extend it by adding
layers that, instead of directly outputting the speech waveform itself, estimate the
amplitudes and phases of a new proposed sinusoidal representation. Our results
show that the produced quality is on par with the original MelGAN model in terms
of MOS scores, indicating that this novel and less expensive approach is indeed
feasible. We further experiment with these models and broach the difficulty of
finding a multi-resolution spectral loss able to produce quality up to the standards
of adversarially-trained models.
|
Language |
English |
Subject |
Deep learning |
|
Neural networks |
|
Βαθιά μάθηση |
|
Νευρωνικά δίκτυα |
Issue date |
2022-12-02 |
Collection
|
School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
|
|
Type of Work--Post-graduate theses
|
Permanent Link |
https://elocus.lib.uoc.gr//dlib/b/0/0/metadata-dlib-1667906572-550601-12721.tkl
|
Views |
482 |
Digital Documents
|
|
No permission to view document.
It won't be available until: 2025-12-02
|