E-Locus - Institutional Repository of the University of Crete

Home Search

Results - Details

Search command : Author="Πλεξουσάκης" And Author="Δημήτρης"

Current Record: 13 of 88

[Add to Basket]

Identifier

000438712

Title

A scalable data science platform built on open source technologies with application of predictive analytics on acute Respiratory Distress Syndrome disease

Alternative Title

Μια επεκτάσιμη πλατφόρμα επιστήμης δεδομένων, βασισμένη σε τεχνολογίες ανοιχτού κώδικα με εφαρμογή προγνωστικής ανάλυσης για τη νόσο του Συνδρόμου Οξείας Αναπνευστικής Δυσχέρειας

Author

Χανιωτάκης, Ευάγγελος

Thesis advisor

Πλεξουσάκης, Δημήτρης

Reviewer

Τσικνάκης, Μανώλης
Μαγκούτης, Κώστας

Abstract

The continuous growth of high volumes of biomedical data in healthcare generates significant challenges for their efficient management. This need has made inevitable the adoption of big data infrastructures and relevant techniques from healthcare organizations, in order for them to efficiently explore the wealth of real-world data generated with the objective to improve the quality of healthcare services. In the healthcare industry, various big data sources, that are characterized by heterogeneity, exist. These include hospital information systems (HIS) and medical records of patients (ΕHRs), results of laboratory procedures and examinations residing in relevant information systems (Laboratory Information Systems - LIS), data from continuous patient monitoring (e.g. in an Intensive Care Unit - ICU) and data from smart devices, such as wearables. Also, very big data sets are generated from genomics-related clinical and research work. Regarding genomics, the rate of growth over the last decade has also been truly astonishing, with the total amount of sequence data produced doubling approximately every seven months. This data requires efficient management and analysis in order to derive meaningful and actionable information. In developing such solutions, a range of challenges and complications associated with each step of the pipeline for handling such healthcare big data sets need to be addressed. These can only be resolved by using high-quality computing solutions for big data analysis. Especially in the current situation of the COVID-19 pandemic, complications that might occur after the onset of this disease are really important. An important such complication is Acute Respiratory Distress Syndrome (ARDS), which is a serious respiratory condition with high mortality and associated morbidity. A large number of basic and clinical studies have demonstrated that early diagnosis and intervention are key to improving the survival rate of patients with ARDS. Therefore, there is a pressing need for the development and clinical testing of predictive models for ARDS events, which might improve the clinical diagnosis or the management of ARDS. In the present thesis, we focused on two distinct objectives; namely a) to design a scalable data science platform, built on open source technologies, and b) to exploit the platform and publically available big healthcare datasets to develop machine learning models for predicting acute respiratory distress syndrome (ARDS) events through commonly available parameters, including baseline characteristics and clinical and laboratory parameters. This thesis is divided into two main parts. The first part presents and analyzes in detail all the procedures, materials, and methods adopted to develop this big data management platform. We report on the complications and difficulties that arise in creating and using such systems with large biomedical datasets, such as the MIMIC-III dataset. The second part of the thesis describes how we exploit this clinical database, to perform an evaluation study of our platform on a real world clinical scenario for ARDS. The objective of the study was to develop and evaluate a novel application of machine learning models for predicting acute respiratory distress syndrome (ARDS. We employ random forests and logistic regression algorithmic models, trained on patient health record data for the early prediction and diagnosis of ARDS. Our approach achieves better results in all metrics that are based on AUC, when compared to relevant published efforts using the MIMIC III dataset to develop predictive models of ARDS. Specifically, both of our algorithmic models outperform in ARDS prediction, with 10-fold cross validated Random Forest being dominant, according to AUC (95.1%), Accuracy (98.0%), Specificity (98.62%) and Sensitivity (96.25%).

Language

English

Issue date

2021-03-26

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Permanent Link

https://elocus.lib.uoc.gr//dlib/f/8/7/metadata-dlib-1617104541-393830-8562.tkl

Views

647

Digital Documents
	Download document View document Views : 5