E-Locus - Institutional Repository of the University of Crete - Spectral based short-time features for voice quality assessment

Home Spectral based short-time features for voice quality assessment

Results - Details

[Add to Basket]

Identifier

000344043

Title

Spectral based short-time features for voice quality assessment

Alternative Title

Στιγμιαία Φασματικά Χαρακτηριστικά για την Εκτίμηση Ποιότητας Φωνής

Author

Βασιλάκης, Μιλτιάδης Δ

Thesis advisor

Στυλιανού, Ιωάννης

Abstract

In the context of voice quality assessment, phoniatricians are aided by the measurement of several phenomena that may reveal the existence of pathology in voice. Of the most prominent among such phenomena are these of jitter and shimmer. Jitter is defined as perturbations of the glottal cycle and shimmer is defined as perturbations of the glottal excitation amplitude. Both phenomena occur during voice production, especially in the case of vowel phonation. Acoustic analysis methods are usually employed to estimate jitter using the radiated speech signal as input. Most of these methods measure jitter in the time domain and are based on pitch period estimation, consequently, they are sensitive to the error of this estimation. Furthermore, the lack of robustness that is exhibited by pitch period estimators, it makes the use of continuous speech recordings as input problematic, and essentially limits jitter measurement to sustained vowel signals. Similarly for shimmer, time domain acoustic analysis methods are usually called to estimate the phenomenon in speech signals, based on estimation of peak amplitude per period. Moreover, these methods, for both phenomena, are affected by averaging and explicit or implicit use of low-pass information. The use of mathematical descriptions for jitter and shimmer, in order to transfer the estimation from the time domain to the frequency domain, may alleviate these problems.
Using a mathematical model that couples two periodic events to achieve the local aperiodicity, allows jitter to be modeled as the shift of one of the two periodic events with respect to the other. Said model, when transformed to the frequency domain, displays interesting spectral trends between the harmonic and subharmonic subspectra. The two spectral parts are shown to form a beat spectrum, with the number of intersections between them directly dependent on the shift related to jitter. This behavior was exploited to develop a short-time Spectral Jitter Estimator (SJE). Experiments with synthetic signals of jittered phonation showed that SJE provides accurate local estimates of jitter. Further evaluation was conducted on two databases of actual sustained vowel recordings from healthy and pathological voices. Comparison with corresponding estimations from the Multi-Dimension Voice Program (MDVP) and the Praat system revealed that SJE outperforms both in normal versus pathological voice discrimination accuracy by at least 4\%, as this was judged using Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC) index. Examination of the short-time statistics of SJE showed that there is a higher correlation with the existence of pathology in voice, due to the fact that SJE takes into account the full spectrum.
SJE was also shown to be robust against errors in pitch period estimations, which combined with the ability of jitter estimation over short time intervals, deemed SJE a very good candidate for measuring jitter in continuous speech. Through cross-database validation a threshold of pathology for SJE has been determined. By applying this threshold to a database of reading text recordings from normophonic and dysphonic speakers, a second threshold and new features were established, especially for monitoring jitter in continuous speech. In terms of AUC, the suggested features for reading text provide a discrimination score of about 95%, while the second threshold provides a Classification Rate (CR) of 87.8%. Furthermore, estimated short-time jitter values from reading text were found to confirm the studies showing the decrease of jitter with increasing fundamental frequencies, and the more frequent presence of high jitter values in the case of pathological voices as time increases.
A mathematical model that combines two periodic events, allows also for modeling of shimmer by applying different amplitude deviations on the two events. Again, by transforming the model from the time domain to the frequency domain, notable spectral properties are observed. Using this properties four features indicative of shimmer were created to evaluate the model. Experiments with synthetic shimmered phonation signals, as well as the two afore-mentioned databases of sustained vowel recordings, showed that the model captures correctly the shimmer phenomenon and further development should be pursued.

Physical description

xiv, 54 σ. : εικ. ; 30 cm.

Language

English

Issue date

2009-07-24

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

757

Digital Documents
	Download document View document Views : 53