Your browser does not support JavaScript!

Home    Search  

Results - Details

Search command : Author="Μουχτάρης"  And Author="Αθανάσιος"

Current Record: 18 of 29

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000394828
Title Voicing detection in spontaneous and real-life recordings from music lessons
Alternative Title Ανίχνευση φωνής σε ερασιτεχνικές καταγραφές μουσικών σεμιναρίων υπό πραγματικές συνθήκες
Author Γιαννικάκη, Σοφία Ελπινίκη Σ.
Thesis advisor Στυλιανού, Ιωάννης
Reviewer Μουχτάρης, Αθανάσιος
Μπενέτος, Εμμανουήλ
Abstract Speech is one of the most important abilities that we have, since it is one of the principal ways of communication with the world. In the past few years a lot of interest has been shown in developing voice-based applications. Such applications involve the isolation of speech from an audio file. The algorithms that achieve this are called Voice Detection algorithms. From the analysis of a given input audio signal, the parts containing voice are kept while the other parts (noise, silence, etc) are discarded. In this way a great reduction of the information to be further processed is achieved. The task of Voice Detection is closely related with Speech/Nonspeech Classification. In addition, Singing Voice Detection and Speech/Music Discrimination can be seen as subclasses of what we generally call Voice Detection. When dealing with such tasks, an audio signal is given as an input to a system and is then processed. The signal is usually analysed in frames, from which features are extracted. The frame duration depends mostly on the application and sometimes on the features being used. Many features have been proposed until now. There are two categories in which the features could be divided, time domain and frequency domain features. In time domain the short time energy, the zero-crossing rate and autocorrelation based features are most often used. In frequency domain cepstral features are most frequently used, due to the useful information about speech presence. To be more specific, in Singing Voice Detection and in Speech/Music Discrimination the state-of-the-art feature are the Mel-Frequency Cepstral Coefficients. It has been reported, that this particular feature provides the best performance in the majority of the cases. In this thesis an algorithm is developed that performs voice detection in spontaneous and real-life recordings from music lessons. The content of the recordings was such that the proposed algorithm was challenged to discriminate both speech and singing voice from music and other noises. A classic approach for this problem would use MFCCs as the discrimination feature and an SVM classifier for the classification into “speech” or “nonspeech”. In our work the methodology of this approach is expanded by preserving the MFCCs as the main feature and incorporating three other features namely, the Cepstral Flux, the Clarity and the Harmonicity. Cepstral Flux is extracted from the Cepstrum, while Clarity and Harmonicity are time-domain autocorrelation-based features. The goal is to improve with these additional features the performance of the system that uses only the MFCCs. So, different combination of the three additional features with the MFCCs were examined and evaluated. A 10-fold cross-validation is applied on segments, which are labelled as “speech” or “nonspeech”. The database used for the training and the testing purposes of our algorithm consists of three seminars. Two of them concern traditional cretan music classes with lira and the third one traditional cretan music classes with lute. Each recording has been carried out under different environmental conditions. Performance evaluation was conducted using the Detection Error Tradeoff (DET) and Receiver Operating Characteristic (ROC) curves as a visual evaluation tool. Also, the Equal Error Rate (EER), the Efficiency and the Area Under the Curve (AUC) were computed in each case. Each seminar was evaluated separately, as well as all together. A combination of training and testing sets from different seminars was also done, to be able to provide reliable results. It is shown that the use of the additional features significantly enhances the performance of the classic algorithm that uses only the MFCCs from about 0.5% to 20%. Specifically, it is observed that three out of the five combinations stand out, by reducing about 20% the miss probability given a false alarm probability equal to 5%.
Language English, Greek
Subject Audio proccessing
Cepstral flux
Clarity
Harmonicity
MFCC
SVM
Speech / music discrimination
Speech / nonspeech classification
Αρμονικότητα
Διαχωρισμός ομιλίας / μουσικής
Επεξεργασία ήχου
Μηχανές διανυσματικής υποστήριξης
Ταξινόμηση ομιλίας και μη ομιλίας
Issue date 2015-07-17
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link https://elocus.lib.uoc.gr//dlib/e/2/f/metadata-dlib-1435132310-965498-27655.tkl Bookmark and Share
Views 592

Digital Documents
No preview available

Download document
View document
Views : 21