Home Spectral based short-time features for voice quality assessment
Results - Details
|
||||
Identifier | 000344043 | |||
Title | Spectral based short-time features for voice quality assessment | |||
Alternative Title | Στιγμιαία Φασματικά Χαρακτηριστικά για την Εκτίμηση Ποιότητας Φωνής | |||
Author | Βασιλάκης, Μιλτιάδης Δ | |||
Thesis advisor | Στυλιανού, Ιωάννης | |||
Abstract |
Using a mathematical model that couples two periodic events to achieve the local aperiodicity, allows jitter to be modeled as the shift of one of the two periodic events with respect to the other. Said model, when transformed to the frequency domain, displays interesting spectral trends between the harmonic and subharmonic subspectra. The two spectral parts are shown to form a beat spectrum, with the number of intersections between them directly dependent on the shift related to jitter. This behavior was exploited to develop a short-time Spectral Jitter Estimator (SJE). Experiments with synthetic signals of jittered phonation showed that SJE provides accurate local estimates of jitter. Further evaluation was conducted on two databases of actual sustained vowel recordings from healthy and pathological voices. Comparison with corresponding estimations from the Multi-Dimension Voice Program (MDVP) and the Praat system revealed that SJE outperforms both in normal versus pathological voice discrimination accuracy by at least 4\%, as this was judged using Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC) index. Examination of the short-time statistics of SJE showed that there is a higher correlation with the existence of pathology in voice, due to the fact that SJE takes into account the full spectrum. SJE was also shown to be robust against errors in pitch period estimations, which combined with the ability of jitter estimation over short time intervals, deemed SJE a very good candidate for measuring jitter in continuous speech. Through cross-database validation a threshold of pathology for SJE has been determined. By applying this threshold to a database of reading text recordings from normophonic and dysphonic speakers, a second threshold and new features were established, especially for monitoring jitter in continuous speech. In terms of AUC, the suggested features for reading text provide a discrimination score of about 95%, while the second threshold provides a Classification Rate (CR) of 87.8%. Furthermore, estimated short-time jitter values from reading text were found to confirm the studies showing the decrease of jitter with increasing fundamental frequencies, and the more frequent presence of high jitter values in the case of pathological voices as time increases. A mathematical model that combines two periodic events, allows also for modeling of shimmer by applying different amplitude deviations on the two events. Again, by transforming the model from the time domain to the frequency domain, notable spectral properties are observed. Using this properties four features indicative of shimmer were created to evaluate the model. Experiments with synthetic shimmered phonation signals, as well as the two afore-mentioned databases of sustained vowel recordings, showed that the model captures correctly the shimmer phenomenon and further development should be pursued. |
|||
Physical description | xiv, 54 σ. : εικ. ; 30 cm. | |||
Language | English | |||
Issue date | 2009-07-24 | |||
Collection | School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses | |||
Type of Work--Post-graduate theses | ||||
Views | 757 |
Digital Documents | |
---|---|
Download document |