E-Locus - Institutional Repository of the University of Crete

Home Collections School/Department School of Sciences and Engineering Department of Computer Science Doctoral theses

Doctoral theses

Current Record: 101 of 121

[Add to Basket]

Identifier

uch.csd.phd//2007agiomyrgiannakis

Title

Sinusoidal Coding of Speech for Voice over IP

Alternative Title

Ημιτονοειδής κωδικοποίηση σημάτων φωνής για μετάδοση μέσω δικτύων IP

Author

Αγιομυργιαννάκης, Ιωάννης

Thesis advisor

Στυλιανού, Ι.

Abstract

It is widely accepted that Voice-over-Internet-Protocol (VoIP) will dominate wireless and wireline voice communications in the near future. Traditionally, a minimum level of Quality-of-Service is achieved by careful traffic monitoring and network fine-tuning. However, this solution is not feasible when there is no possibility of controlling/monitoring the parameters of the network. For example, when speech traffic is routed through Internet there are increased packet losses due to network delays and the strict end-to-end delay requirements for voice communication. Most of today's speech codecs were not initially designed to cope with such conditions. One solution is to introduce channel coding at the expense of end-to-end delay. Another solution is to perform joint source/channel coding of speech by designing speech codecs which are natively robust to increased packet losses. This thesis proposes a framework for developing speech codecs which are robust to packet losses. The thesis addresses the problem in two levels: at the basic source/channel coding level where novel methods are proposed for introducing controlled redundancy into the bitstream, and at the signal representation/coding level where a novel speech parameterization/modeling is presented that is amenable to efficient quantization using the proposed source coding methods. The speech codec is designed to facilitate high-quality Packet Loss Concealment (PLC). Speech signal is modeled with harmonically related sinusoids; a representation that enables fine time-frequency resolution which is vital for high-quality PLC. Furthermore, each packet is encoded independently of the previous packets in order to avoid a desynchronization between the encoder and the decoder upon a packet loss. This allows some redundancy to exist in the bit-stream. A number of contributions are made to well-known harmonic speech models. A fast analysis/synthesis method is proposed and used in the construction of an Analysis-by-Synthesis (AbS) pitch detector. Harmonic Codecs tend to rely on phase models for the reconstruction of the harmonic phases, introducing artifacts that effect the quality of the reconstructed speech signal. For a high-quality speech reconstruction, the quantization of phase is required. Unfortunately, phase quantization is not a trivial problem because phases are circular variables. A novel phase-quantization algorithm is proposed to address this problem. Harmonics phases are properly aligned and modeled with a Wrapped Gaussian Mixture Model (WGMM) capable of handling parameters that belong to circular spaces. The WGMM is estimated with a suitable Expectation-Maximization (EM) algorithm. Phases are then quantized by extending the efficient GMM-based quantization techniques for linear spaces to WGMM and circular spaces. When packet losses are increased, additional redundancy can be introduced using Multiple Description Coding (MDC). In MDC, each frame is encoded in two descriptions; receiving both descriptions provides a high-quality reconstruction while receiving one description provides a lower-quality reconstruction. With current GMM-based MDC schemes it is possible to quantize the amplitudes of the harmonics which represent an important portion of the information of the speech signal. A novel WGMM-based MDC scheme is proposed and used for MDC of the harmonic phases. It is shown that it is possible to construct high-quality MDC codecs based on harmonic models. Furthermore, it is shown that the redundancy between the MDC descriptions can be used to "correct" bit errors that may have occurred during transmission. At the source coding level, a scheme for /Multiple Description Transform Coding/ (MDTC) of multivariate Gaussian using Parseval Frame expansions and a source coding technique referred to as /Conditional Vector Quantization/ (CVQ), are proposed. The MDTC algorithm is extended to generic sources that can be modeled with GMM. The proposed frame facilitates a computationally efficient /Optimal Consistent Reconstruction/ algorithm (OCR) and /Cooperative Encoding/ (CE). In CE, the two MDTC encoders cooperate in order to provide better central/side distortion tradeoffs. The proposed scheme provides scalability, low complexity and storage requirements, excellent performance in low redundancies and competitive performance in high redundancies. In CVQ, the focus is given in correcting the most frequent type of errors; single and double packet losses. Furthermore, CVQ finds application to ΒandWidth Expansion (BWE), the extension of the bandwidth of narrowband speech to wideband. Concluding, two /proof-of-concept/ harmonic codecs are constructed, a single description and a multiple description codec. Both codecs are narrowband, variable rate, similar to quality with the state-of-the-art iLBC (internet Low Bit-Rate Codec) under perfect channel conditions and better than iLBC when packet losses occur. The single description codec requires 14 kbps and it is capable of accepting 20% packet losses with minimal quality degradation while the multiple description codec operates at 21 kbps while it is capable of accepting 40% packet losses without significant quality degradation.

Language

English

Issue date

2007-02-01

Date available

2007-10-11

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses

Type of Work--Doctoral theses

Permanent Link

https://elocus.lib.uoc.gr//dlib/9/4/6/metadata-dlib-2007agiomyrgiannakis.tkl