Your browser does not support JavaScript!

Home    Μετατροπή της ομιλίας σε κινήσεις του στόματος  

Results - Details

Add to Basket
[Add to Basket]
Identifier uch.csd.msc//2002sideris
Title Μετατροπή της ομιλίας σε κινήσεις του στόματος
Alternative Title Conversion of speech into mouth movements
Creator Sideris, Christos S
Abstract Human speech is a multi-dimensional signal consisting of acoustical and optical components. The presence of both contributes to better quality in communication. Α very important parameter of the optical component is mouth movement since everybody has the ability to understand that the mouth movement of the person who talks, is synchronized with speech. Given that only the acoustical part of speech is present, we developed four different approaches of a lip-sync algorithm which converts human speech to mouth movements. All of the approaches are based on the Linear Prediction method of analysis, which is commonly used in sound processing. The total number of distinct mouth positions (visemes) that we used is eight and they correspond to the most distinct positions of the mouth during the speech. The input in every approach of the algorithm is a speech signal and the output a sequence of visemes. We analyze the need for smoothing the sequence of visemes in order to improve the realism of the output results, and we present smoothing methods knowing at each instant, a number of the following visemes In the algorithm’s first approach we correspond one viseme per 20 ms frame (Frame) of speech signal. In the second approach we correspond one viseme per 40 ms frame (Big Frame) of speech signal. In the third approach we regard every following viseme to be known and we smooth the sequence of visemes according to a heuristic algorithm. Finally in the fourth approach we consider we know up to four following visemes so we use a more complex heuristic algorithm with more smoothing rules for the sequence of visemes. The first three approaches can be implemented in real time. Τhe last approach gives the best lip-sync results. In all of the approaches we use the energy and the number of zero-crossings from the time domain and the linear prediction smooth spectrum from the frequency domain. All of the lip-sync approaches were implemented in a MATLAB application, which presents the results in the form of a talking face on user’s screen. Every new user that wants to use our application, have to follow a short and easy training procedure. Finally more “vivid” applications are presented, in which our MATLAB application is combined with 3d modeling and animation programs.
Issue date 2002-07-01
Date available 2002-11-29
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 453

Digital Documents
No preview available

Download document
View document
Views : 5