E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Doctoral theses

Doctoral theses

Current Record: 61 of 121

[Add to Basket]

Identifier

000385819

Title

Adaptive sinusoidal models for speech with applications in speech modifications and audio analysis

Alternative Title

Προσαρμόσιμα ημιτονοειδή μοντέλα για σήματα φωνής με εφαρμογές σε μετασχηματισμούς φωνής και ανάλυση σημάτων ήχου

Author

Καφεντζής, Γεώργιος Παναγιώτη

Thesis advisor

Στυλιανού, Ιωάννης

Abstract

Sinusoidal Modeling is one of the most widely used parametric methods for speech and audio signal processing. The accurate estimation of sinusoidal parameters (amplitudes, frequencies, and phases) is a critical task for close representation of the analyzed signal. In this thesis, based on recent advances in sinusoidal analysis, we propose high resolution adaptive sinusoidal models for analysis, synthesis, and modifications systems of speech. Our goal is to provide systems that represent speech in a highly accurate and compact way. Inspired by the recently introduced adaptive Quasi-Harmonic Model (aQHM) and adaptive Harmonic Model (aHM), we overview the theory of adaptive Sinusoidal Modeling and we propose a model named the extended adaptive Quasi-Harmonic Model (eaQHM), which is a non-parametric model able to adjust the instantaneous amplitudes and phases of its basis functions to the underlying time-varying characteristics of the speech signal, thus significantly alleviating the so-called local stationarity hypothesis. The eaQHM is shown to outperform aQHM in analysis and resynthesis of voiced speech. Based on the eaQHM, a hybrid analysis/synthesis system of speech is presented (eaQHNM), along with a hybrid version of the aHM (aHNM). Moreover, we present motivation for a full-band representation of speech using the eaQHM, that is, representing all parts of speech as high resolution AM-FM sinusoids. Experiments show that adaptation and quasi-harmonicity is sufficient to provide transparent quality in unvoiced speech resynthesis. The full-band eaQHM analysis and synthesis system is presented next, which outperforms state-of-the-art systems, hybrid or full-band, in speech reconstruction, providing transparent quality confirmed by objective and subjective evaluations. Regarding applications, the eaQHM and the aHM are applied on speech modifications (time and pitch scaling). The resulting modifications are of high quality, and follow very simple rules, compared to other state-of-the-art modification systems. The concepts of relative phase and relative phase delays are crucial for the development of artefact-free, shape-invariant, high quality modifications. Results show that harmonicity is preferred over quasi-harmonicity in speech modifications due to the embedded simplicity of representation. Moreover, the full-band eaQHM is applied on the problem of modeling audio signals, and specifically of musical instrument sounds. The eaQHM is evaluated and compared to state-of-the-art systems, and is shown to outperform them in terms of resynthesis quality, successfully representing the attack, transient, and stationary part of a musical instrument sound. Finally, another application is suggested, namely the analysis and classification of emotional speech. The eaQHM is applied on the analysis of emotional speech, providing its instantaneous parameters as features that can be used in recognition and Vector-Quantization-based classification of the emotional content of speech. Although the sinusoidal models are not often used in such tasks, results are promising.

Language

English

Subject

Adaptive guase-harmonic model

Προσαρμόσιμο σχεδόν αρμονικό μοντέλο

Issue date

2014-07-08

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses

Type of Work--Doctoral theses

Permanent Link

https://elocus.lib.uoc.gr//dlib/b/d/e/metadata-dlib-1405335037-106497-24638.tkl