Your browser does not support JavaScript!

Home    On speaker interpolation and speech conversion for parallel corpora  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000362223
Title On speaker interpolation and speech conversion for parallel corpora
Alternative Title Παρεμβολή ομιλητών και μετατροπή φωνής δια παράλληλα κείμενα
Author Γρέκας, Γεώργιος Αντωνίου
Thesis advisor Στυλιανού, Γιάννης
Abstract In daily speech the linguistic information plays a major role in the communication between people. However, voice quality and individuality are important in speech recognition and understanding. For instance, it is exceptionally significant to understand and discriminate between two or more speakers in a radio or a television program. Voice individuality, apart from providing the aforementioned advantages in communication, enriches our daily life with variety. For a number of modern applications it is important to create and maintain data bases for different speakers, for example, in gaming, in text-to-speech synthesis and in cartoon movies. This may be time consuming and expensive, depending on the requirements of the application. Speaker interpolation (SI) is the process of producing an intermediate voice between two or more speakers, while voice conversion (VC) is the technique of processing the voice of one person, namely the source speaker, such that his/her voice resembles the voice of another person, namely the target speaker. Moreover, the converted or interpolated speech should sound natural and intelligible. Despite the extended research in VC, high-quality voice conversion has not been achieved yeet. A number of reasons explain this current shortcoming, with the main ones being a) the oversmoothing effect by using of statistical modeling b) inaccurate estimation of the speaker-depended features and c)the inadequacy of the used synthesis methods. Voice conversion methods are based on spectral envelope information, which represents the vocal tract, since it has an important role on speech individuality. In conventional VC the excitation signal of the source speaker is ex- tracted first by inverse filtering. Then this excitation signal is filtered from the vocal tract of the target speaker. In speech interpolation the excitation signal is filtered from an interpolated vocal tract of the given speakers. The scope of this thesis is to deal with this research gap and achieve high quality speech interpolation and voice conversion of parallel corpora using accurate meth- ods for spectral envelope estimation (true envelope), time and frequency alignment (piecewise linear time and frequency warping), and speech synthesis (interpolated lattice filter or overlap and add). With the use of precise methods in each processing step it was expected to reduce the artifacts currently met in voice conversion. In speech interpolation the produced vocal tract is not just an interpolation between the given speakers, but the vocal tract length can be altered, producing a broad range of voices. Hence, given a limited data base a substantially larger one that contains individual speakers for every use can be created.
Language English
Subject Speaker interpolation
Speech conversion
Μετατροπή φωνής
Παρεμβολή ομιλητών
Issue date 2010-11-19
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 428

Digital Documents
No preview available

Download document
View document
Views : 14