Abstract |
The sinusoidal model and its variants are commonly used in speech processing. In the
literature, there are various methods for the estimation of the unknown parameters of
the sinusoidal model. Among them, the most known methods are the ones based on the
Fast Fourier Transform (FFT), on Analysis-By-Synthesis (ABS) approaches and through
Least Squares (LS) methods.
The LS methods are more accurate and actually optimum for Gaussian noise, and
thus, more appropriate for high quality estimations. In addition, LS methods prove to be
able to cope with short analysis windows. On the contrary, the FFT and the ABS- based
methods cannot handle overlapping frequency responses, in other words, they cannot
handle short analysis windows. This is important since in the case of short analysis
windows the stationary assumption for the signal is more valid. However, LS solutions
are in general slower compared to FFT-based algorithms and optimized implementations
of ABS schemes. In the present thesis, our goal is to alleviate the computational burden
that the LS-based techniques bear, such that both the increased accuracy and the faster
computational implementation can be achieved.
The four models of which the amplitude coefficients will be estimated, namely the
Harmonic, Sinusoidal, Quasi-Harmonic and Generalized Quasi-Harmonic models, are re-
introduced. Then, each model is studied individually and the straightforward LS solution
for the amplitude estimation is presented.
The sources of computational load in the case of an LS solution are indicated and
various computational improvements are introduced for each model in terms of its computational complexity and execution time. The first speed up process includes performing
matrix multiplications manually, which yields a direct formula for every element of the
result. For the next accelerating method, we show how we can calculate a certain matrix
of exponentials using primarily multiplications. As a final acceleration, having realized
that certain elements of a matrix, which is needed to be calculated and then inverted,
play a less important role in the process of deriving the solution, we allow certain approximations of the matrix by omitting the calculation of the less important elements.
Finally, it is demonstrated that by following the suggested steps, the complexity of LS-based solution along with the execution time, are reduced. The methods are evaluated
by analyzing and re-synthesizing randomly created synthetic signals and calculating the Mean Square Error, Signal-to-Reconstruction Error Ratio and CPU time improvement for each step. Next, in an effort to test the robustness of our hastening methods, we illustrate
their competence in analyzing noisy synthetic signals. Furthermore, as a final test we check the ability of our amplitude estimation mechanisms to analyze and synthesize real-world voiced speech signals.
|