E-Locus - Institutional Repository of the University of Crete - Assessing the quality of audio in musical concert recordings using deep neural networks

Home Assessing the quality of audio in musical concert recordings using deep neural networks

Results - Details

[Add to Basket]

Identifier

000425807

Title

Assessing the quality of audio in musical concert recordings using deep neural networks

Alternative Title

Εκτίμηση ποιότητας ηχογραφήσεων από μουσικές συναυλίες με χρήση τεχνικών βαθιάς μάθησης

Author

Σίμου, Νίκων Χ.

Thesis advisor

Τσακαλίδης, Παναγιώτης

Reviewer

Στεφανάκης, Νίκος
Δημητρόπουλος, Ξενοφώντας
Πανταζής, Γιάννης

Abstract

The era in which we live in can be indisputably characterized by the enormous flow of multimedia information. Using portable multimedia devices such as drones and smartphones we are able to capture every moment of our lives and of the public events that we attend. A large proportion of audiovisual recordings from these events becomes available through the social media and the large number of websites which provide video and audio content. The availability of such massive amount of User Generated Recordings (UGRs) has triggered new research directions related to the search, organization and management of this content. In this Thesis, we use Deep Neural Networks (DNN) in order to create a tool to automatically assess the audio quality of musical concert recordings that users upload on multimedia platforms such as YouTube. It is well known that DNNs require a lot of training samples, which means that one would need an enormous amount of time in order to listen and to assign a subjective quality score to each audio sample. We tackle this problem by treating quality assessment as a binary classification problem where class 0 consist of the set of UGRs from a certain event and class 1 consist of the professional quality recordings from the same event. Furthermore, we use an automatic synchronization process in order to match every UGR with its corresponding segment from a professional quality recording, which assists in making the process invariant to audio content. Experiments produced with different DNN architectures and acoustic feature are presented, showing that the UGR class can be discriminated from the professional quality class with a high accuracy.

Language

English

Subject

Audio processing

Deep learning

Βαθειά μάθηση

Επεξεργασία ήχου

Issue date

2019-11-22

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

495

Digital Documents
	Download document View document Views : 5