Abstract |
In the last few years, multichannel audio began gradually to displace stereophonic audio systems because it offers significant advantages to audio reproduction when compared to stereo audio. The large number of channels gives the listener the sensation of being “surrounded” by sound and immerses him with a realistic acoustic scene. The main problem with the increased number of channels is the demand of higher datarates for storage and transmission purposes. Consequently, multichannel audio compression algorithms have been developed in order to further reduce the datarate requirements by exploiting the similarities among the multiple channels. These compression algorithms achieve a significant coding gain, but they still remain demanding for many practical low-bandwidth applications.
Our objective is to propose a modeling and coding method for achieving as low as possible bitrate requiremets for multichannel and furthermore for immersive audio applications such as remote mixing of the multichannel recording and remote collaboration of geographically distributed musicians. This translates into deriving a model which can take advantage of the similarities among the various microphone signals of a given multichannel recording.
In this thesis, we propose encoding one audio channel, which can be one of the multiple microphone signals of a multichannel recording or a downmix sum signal, while for the remaining microphones we retain only the parameters that allow for resynthesis of the content at the decoder. This scheme is implemented via an enhanced adaptation of the sinusoids plus noise model. According to this model, an audio signal can be decomposed into a deterministic (sinusoidal) part and a stochastic (noise) part. The proposed approach is based on the observation that the noise part for each microphone signal can be obtained by transforming the noise part of one of the signals (reference), using the noise envelope of each of the remaining (side) multiple microphone signals.
The coding process can be divided into coding of the sinusoidal parameters and coding of the noise spectral envelopes. Coding of the sinusoidal parameters is based on a high-rate quantization scheme, while the encoding process of the noise spectral envelope is based on the vector quantization method for speech coding. The coding performance is evaluated using subjective listening tests. The results show that a reproduction of good quality can be achieved using the proposed approach, by fully encoding a single audio channel only, with side information for each microphone signal in the order of 18 kbps. In this thesis the sinusoidal model is applied for high-quality audio coding for the first time in the multichannel audio domain.
|