E-Locus - Institutional Repository of the University of Crete

Home Search

Results - Details

Search command : Author="Τσακαλίδης" And Author="Παναγιώτης"

Current Record: 6 of 68

[Add to Basket]

Identifier

000456885

Title

Speech rhythm detection and its application in speech perception

Alternative Title

Ανίχνευση του ρυθμού ομιλίας και η εφαρμογή του στην αντίληψη της ομιλίας

Author

Λυδάκη, Ελευθερία Ε.

Thesis advisor

Τσακαλίδης, Παναγιώτης

Reviewer

Στυλιανού, Ιωάννης
Τσαγκατάκης, Γρηγόριος

Abstract

Speech rhythm refers to the rhythmic patterns and timing variations that occur in spoken language. It encompasses the natural flow, stress patterns, and timing of speech sounds, syllables, and words. Rhythm consists an important dynamic prosodic feature of speech that is linked with speech perception. The detection of speech rhythm is a significant task with diverse applications. In this study, the focus is on using rhythmic measures to estimate voice preference. The motivation behind this research arises from the belief that voices demonstrating specific rhythm patterns are generally preferred by individuals. In this thesis, speech rhythm was studied as a possible predictor of listener preference. Even though rhythm can be perceived by humans, there is no ubiquitously accepted definition or measure for speech rhythm in the scientific community. In the literature, there is strong evidence that rhythm is encoded in the amplitude envelope of a signal. Mainly, the envelope is decomposed into partials and then the corresponding instantaneous frequency is extracted which is assumed to carry the information regarding the signal’s rhythmicity. Two techniques were utilized to achieve the decomposition of the envelope into meaningful components. The first technique, which was proposed in a previous study, includes extracting rhythmic measures via an Empirical Mode Decomposition (EMD) of the envelope. Here, it is suggested to extract the same measures by using an AM-FM decomposition on the envelope instead of EMD. This modification has the potential to improve the accuracy of the resulting values since EMD isn’t mathematically robust. The envelope, although informative to some extent, is a simplified representation of the speech signal. It lacks important elements like pitch, which could potentially contribute to the understanding of rhythm. Relying solely on the envelope may overlook relevant rhythmic features present in the speech signal. We hypothesize that the rhythmicity of speech is closely related to the manner in which individuals transition between syllables. Therefore, an approach that directly captures the rhythmicity of speech was introduced by considering the segment of the speech signal associated with syllable transitions. This, e↵ectively addresses the concern of information loss that occurs during envelope extraction. During this research, data consisting of speech signals from multiple speakers were utilized. The information regarding the preferred speakers, as determined by listeners, was also available. This knowledge allowed the investigation of the underlying factors contributing to voice preference and the analysis of the specific characteristics that make certain speakers more preferred than others. The experiments were extended beyond natural speaking rate, namely for fast speaking style, and the preference and rhythm in fast speech was explored as well. Statistical analyses were conducted to evaluate the suitability of rhythmic metrics derived from envelope and signal-based techniques for the task at hand. Findings revealed that the envelope-derived metrics are heavily influenced by speech rate and they are not well-suited for accurately capturing rhythm. In contrast, syllables transition derived directly from the speech signal showcased promising results. A satisfactory separation between preferred and non-preferred speakers was achieved, e↵ectively capturing certain characteristics that influence listeners’ preference. One-way ANOVA and pairwise comparison tests were preformed to validate the statistical significance of the di↵erences between speakers. The results based on syllables transition indicate promising avenues for future research. Considering the multi-component nature of preference, the exploration of additional metrics becomes crucial in improving the overall performance which will lead to a comprehensive and reliable evaluation of listener preference.

Language

English

Subject

AM-FM

Amplitude envelope

EMD

Listener preference

Speech rate

Syllable transitions

Μετάβαση συλλαβών

Προτίμηση ακροατών

Ρυθμός ομιλίας