Your browser does not support JavaScript!

Home    Expressive speech analysis and classification using adaptive sinusoidal modeling  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000393660
Title Expressive speech analysis and classification using adaptive sinusoidal modeling
Alternative Title Ανάλυση και ταξινόμηση εκφραστικής ομιλίας χρησιμοποιώντας προσαρμόσιμα ημιτονοειδή μοντέλα
Author Γιακουμάκη, Θεοδώρα Ι.
Thesis advisor Στυλιανού, Ιωάννης
Reviewer Μουχτάρης, Αθανάσιος
Kotti, Margarita
Abstract Emotional (or stressed/expressive) speech can be defined as the speech style produced by an emotionally charged speaker. Speakers that feel sad, angry, happy and neutral put a certain stress in their speech that is typically characterized as emotional. Processing of emotional speech is assumed among the most challenging speech styles for modelling, recognition, and classifications. The emotional condition of speakers may be revealed by the analysis of their speech, and such knowledge could be effective in emergency conditions, health care applications, and as pre-processing step in recognition and classification systems, among others. Acoustic analysis of speech produced under different emotional conditions reveals a great number of speech characteristics that vary according to the emotional state of the speaker. Therefore these characteristics could be used to identify and/or classify different emotional speech styles. There is little research on the parameters of the Sinusoidal Model (SM), namely amplitude, frequency, and phase as features to separate different speaking styles. However, the estimation of these parameters is subjected to an important constraint; they are derived under the assumption of local stationarity, that is, the speech signal is assumed to be stationary inside the analysis window. Nonetheless, speaking styles described as fast or angry may not hold this assumption. Recently, this problem has been handled by the adaptive Sinusoidal Models (aSMs), by projecting the signal onto a set of amplitude and frequency varying basis functions inside the analysis window. Hence, sinusoidal parameters are more accurately estimated. In this thesis, we propose the use of an adaptive Sinusoidal Model (aSM), the extended adaptive Quasi-Harmonic Model (eaQHM), for emotional speech analysis and classification. The eaQHM adapts the amplitude and the phase of the basis functions to the local characteristics of the signal. Firstly, the eaQHM is employed to analyze emotional speech in accurate, robust, continuous, time-varying parameters (amplitude and frequency). It is shown that these parameters can adequately and accurately represent emotional speech content. Using a well known database of pre-labeled narrowband expressive speech (SUSAS) and the emotional database of Berlin, we show that very high Signal to Reconstruction Error Ratio (SRER) values can be obtained, compared to the standard Sinusoidal Model (SM). Specifically, eaQHM outperforms SM in average by 100% in SRER. Additionally, formal listening tests,on a wideband custom emotional speech database of running speech, show that eaQHM outperforms SM from a perceptual resynthesis quality point of view. The parameters obtained from the eaQHM models can represent more accurately an emotional speech signal. We propose the use of these parameters in an application based on emotional speech, the classification of emotional speech. Using the SUSAS and Berlin databases we develop two separate Vector Quantizers (VQs) for the classification, one for amplitude and one for frequency features. Finally, we suggest a combined amplitude-frequency classification scheme. Experiments show that both single and combined classification schemes achieve higher performance when the features are obtained from eaQHM.
Language English, Greek
Subject Emotion classificator
Emotional speech
Ανάλυση ομιλίας
Ταξινόμηση συναισθήματος
Issue date 2015-07-17
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 783

Digital Documents
No preview available

Download document
View document
Views : 25