Abstract |
One of the most important human abilities is speech along with hearing. Speech is the primary
way in which we attune to the society. Our voice can uncover several information about us
to other people. It reveals our energy level, our emotions, our personality and our artistry. Voice
abnormalities may cause social isolation or may create problems in the professional field. Due to
this significance of the voice, the early detection of a voice pathology is essential.
A well-known voice abnormality is called Spasmodic Dysphonia (SD). SD is a neurological
disease primarily affecting the regular contraction of the muscles around vocal cords, causing
their undesirable vibration. This abnormal vibration of muscles of the glottis has an impact on
speech. One that suffers from SD speaks more tremulous and makes disruptions during speech.
Similar indications appear also to normophonic speakers usually related to stress, voice fatigue,
etc. Even for the normophonic cases, these indications may be a first symptom of a neurological
disease, so an early diagnosis is necessary. Therefore, algorithms that measure the intensity of
the symptoms are very useful.
Traditional methods that detect and quantify voice pathologies use the amplitude information
of the speech signal. More refined approaches make essential the isolation of the glottal source
signal as the glottis is related to voice abnormalities. However, in both cases the amplitude based
methods are not very reliable because the amplitude spectrum cannot capture characteristics of
the glottis. A better indicator of voice irregularities is the phase information. Nevertheless, very
few studies use the phase information because of its difficulty in the manipulation. Moreover,
studies which work with the phase information, use inverse filtering techniques for extracting the
glottal source signal and then they extract features from the phase spectrogram of the glottal
source. In this thesis, an innovated phase-based method for voice quality assessment is presented.
The proposed method is less complex than the state-of-the-art methods which use the
inverse filtering for extracting the glottal source. Firstly, the instantaneous amplitudes, phases
and frequencies are estimated from the speech signal by an adaptive harmonic model. From the
instantaneous phases of the speech signal through mathematical formulas, a new phase spectrum,
the Phase Distortion (PD) spectrum, is extracted, highly correlated with the shape of the glottal
source. From the time variance of the PD spectrum (PDD), a new metric called Regularity Ratio
(RR) is proposed to capture the irregularities of the glottal source.
Finally, the efficiency of our method is validated on a database containing speakers with SD
before and after the botulinum toxin injection. The results show that the obtained ranking is highly correlated with the subjective evaluations provided by medical doctors not only on the
overall severity of SD but also on other features like tremor and jitter, revealing that our proposed
feature, the RR, can be applied on other voice pathologies.
|