Your browser does not support JavaScript!

Post-graduate theses

Search command : Author="Στεφανίδης"  And Author="Κωνσταντίνος"

Current Record: 29 of 818

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000456885
Title Speech rhythm detection and its application in speech perception
Alternative Title Ανίχνευση του ρυθμού ομιλίας και η εφαρμογή του στην αντίληψη της ομιλίας
Author Λυδάκη, Ελευθερία Ε.
Thesis advisor Τσακαλίδης, Παναγιώτης
Reviewer Στυλιανού, Ιωάννης
Τσαγκατάκης, Γρηγόριος
Abstract Speech rhythm refers to the rhythmic patterns and timing variations that occur in spoken language. It encompasses the natural flow, stress patterns, and timing of speech sounds, syllables, and words. Rhythm consists an important dynamic prosodic feature of speech that is linked with speech perception. The detection of speech rhythm is a significant task with diverse applications. In this study, the focus is on using rhythmic measures to estimate voice preference. The motivation behind this research arises from the belief that voices demonstrating specific rhythm patterns are generally preferred by individuals. In this thesis, speech rhythm was studied as a possible predictor of listener preference. Even though rhythm can be perceived by humans, there is no ubiquitously accepted definition or measure for speech rhythm in the scientific community. In the literature, there is strong evidence that rhythm is encoded in the amplitude envelope of a signal. Mainly, the envelope is decomposed into partials and then the corresponding instantaneous frequency is extracted which is assumed to carry the information regarding the signal’s rhythmicity. Two techniques were utilized to achieve the decomposition of the envelope into meaningful components. The first technique, which was proposed in a previous study, includes extracting rhythmic measures via an Empirical Mode Decomposition (EMD) of the envelope. Here, it is suggested to extract the same measures by using an AM-FM decomposition on the envelope instead of EMD. This modification has the potential to improve the accuracy of the resulting values since EMD isn’t mathematically robust. The envelope, although informative to some extent, is a simplified representation of the speech signal. It lacks important elements like pitch, which could potentially contribute to the understanding of rhythm. Relying solely on the envelope may overlook relevant rhythmic features present in the speech signal. We hypothesize that the rhythmicity of speech is closely related to the manner in which individuals transition between syllables. Therefore, an approach that directly captures the rhythmicity of speech was introduced by considering the segment of the speech signal associated with syllable transitions. This, e↵ectively addresses the concern of information loss that occurs during envelope extraction. During this research, data consisting of speech signals from multiple speakers were utilized. The information regarding the preferred speakers, as determined by listeners, was also available. This knowledge allowed the investigation of the underlying factors contributing to voice preference and the analysis of the specific characteristics that make certain speakers more preferred than others. The experiments were extended beyond natural speaking rate, namely for fast speaking style, and the preference and rhythm in fast speech was explored as well. Statistical analyses were conducted to evaluate the suitability of rhythmic metrics derived from envelope and signal-based techniques for the task at hand. Findings revealed that the envelope-derived metrics are heavily influenced by speech rate and they are not well-suited for accurately capturing rhythm. In contrast, syllables transition derived directly from the speech signal showcased promising results. A satisfactory separation between preferred and non-preferred speakers was achieved, e↵ectively capturing certain characteristics that influence listeners’ preference. One-way ANOVA and pairwise comparison tests were preformed to validate the statistical significance of the di↵erences between speakers. The results based on syllables transition indicate promising avenues for future research. Considering the multi-component nature of preference, the exploration of additional metrics becomes crucial in improving the overall performance which will lead to a comprehensive and reliable evaluation of listener preference.
Language English
Subject AM-FM
Amplitude envelope
EMD
Listener preference
Speech rate
Syllable transitions
Μετάβαση συλλαβών
Προτίμηση ακροατών
Ρυθμός ομιλίας
Ταχύτητα ομιλίας
Χρονική περιβάλλουσα
Issue date 2023-07-21
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link https://elocus.lib.uoc.gr//dlib/e/b/9/metadata-dlib-1688463184-381107-26400.tkl Bookmark and Share
Views 547

Digital Documents
No preview available

Download document
View document
Views : 13