E-Locus - Institutional Repository of the University of Crete - Μετατροπή της ομιλίας σε κινήσεις του στόματος

Home Μετατροπή της ομιλίας σε κινήσεις του στόματος

Results - Details

[Add to Basket]

Identifier

uch.csd.msc//2002sideris

Title

Μετατροπή της ομιλίας σε κινήσεις του στόματος

Alternative Title

Conversion of speech into mouth movements

Creator

Sideris, Christos S

Abstract

Human speech is a multi-dimensional signal consisting of acoustical and optical components. The presence of both contributes to better quality in communication. Α very important parameter of the optical component is mouth movement since everybody has the ability to understand that the mouth movement of the person who talks, is synchronized with speech. Given that only the acoustical part of speech is present, we developed four different approaches of a lip-sync algorithm which converts human speech to mouth movements. All of the approaches are based on the Linear Prediction method of analysis, which is commonly used in sound processing. The total number of distinct mouth positions (visemes) that we used is eight and they correspond to the most distinct positions of the mouth during the speech. The input in every approach of the algorithm is a speech signal and the output a sequence of visemes. We analyze the need for smoothing the sequence of visemes in order to improve the realism of the output results, and we present smoothing methods knowing at each instant, a number of the following visemes In the algorithms first approach we correspond one viseme per 20 ms frame (Frame) of speech signal. In the second approach we correspond one viseme per 40 ms frame (Big Frame) of speech signal. In the third approach we regard every following viseme to be known and we smooth the sequence of visemes according to a heuristic algorithm. Finally in the fourth approach we consider we know up to four following visemes so we use a more complex heuristic algorithm with more smoothing rules for the sequence of visemes. The first three approaches can be implemented in real time. Τhe last approach gives the best lip-sync results. In all of the approaches we use the energy and the number of zero-crossings from the time domain and the linear prediction smooth spectrum from the frequency domain. All of the lip-sync approaches were implemented in a MATLAB application, which presents the results in the form of a talking face on users screen. Every new user that wants to use our application, have to follow a short and easy training procedure. Finally more vivid applications are presented, in which our MATLAB application is combined with 3d modeling and animation programs.

Issue date

2002-07-01

Date available

2002-11-29

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

453

Digital Documents
	Download document View document Views : 5