E-Locus - Institutional Repository of the University of Crete - On speaker interpolation and speech conversion for parallel corpora

Home On speaker interpolation and speech conversion for parallel corpora

Results - Details

[Add to Basket]

Identifier

000362223

Title

On speaker interpolation and speech conversion for parallel corpora

Alternative Title

Παρεμβολή ομιλητών και μετατροπή φωνής δια παράλληλα κείμενα

Author

Γρέκας, Γεώργιος Αντωνίου

Thesis advisor

Στυλιανού, Γιάννης

Abstract

In daily speech the linguistic information plays a major role in the communication between people. However, voice quality and individuality are important in speech recognition and understanding. For instance, it is exceptionally significant to understand and discriminate between two or more speakers in a radio or a television program. Voice individuality, apart from providing the aforementioned advantages in communication, enriches our daily life with variety. For a number of modern applications it is important to create and maintain data bases for different speakers, for example, in gaming, in text-to-speech synthesis and in cartoon movies. This may be time consuming and expensive, depending on the requirements of the application. Speaker interpolation (SI) is the process of producing an intermediate voice between two or more speakers, while voice conversion (VC) is the technique of processing the voice of one person, namely the source speaker, such that his/her voice resembles the voice of another person, namely the target speaker. Moreover, the converted or interpolated speech should sound natural and intelligible. Despite the extended research in VC, high-quality voice conversion has not been achieved yeet. A number of reasons explain this current shortcoming, with the main ones being a) the oversmoothing effect by using of statistical modeling b) inaccurate estimation of the speaker-depended features and c)the inadequacy of the used synthesis methods. Voice conversion methods are based on spectral envelope information, which represents the vocal tract, since it has an important role on speech individuality. In conventional VC the excitation signal of the source speaker is ex- tracted first by inverse filtering. Then this excitation signal is filtered from the vocal tract of the target speaker. In speech interpolation the excitation signal is filtered from an interpolated vocal tract of the given speakers. The scope of this thesis is to deal with this research gap and achieve high quality speech interpolation and voice conversion of parallel corpora using accurate meth- ods for spectral envelope estimation (true envelope), time and frequency alignment (piecewise linear time and frequency warping), and speech synthesis (interpolated lattice filter or overlap and add). With the use of precise methods in each processing step it was expected to reduce the artifacts currently met in voice conversion. In speech interpolation the produced vocal tract is not just an interpolation between the given speakers, but the vocal tract length can be altered, producing a broad range of voices. Hence, given a limited data base a substantially larger one that contains individual speakers for every use can be created.

Language

English

Subject

Speaker interpolation

Speech conversion

Μετατροπή φωνής

Παρεμβολή ομιλητών

Issue date

2010-11-19

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

428

Digital Documents
	Download document View document Views : 14