Your browser does not support JavaScript!

Home    Search  

Results - Details

Search command : Author="Πλεξουσάκης"  And Author="Δημήτρης"

Current Record: 18 of 88

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000430233
Title Services for connecting and integrating big number of linked datasets
Alternative Title Υπηρεσίες για τη διασύνδεση και ολοκλήρωση μεγάλου πλήθους σημασιολογικών συνολοδεδομένων
Author Μουνταντωνάκης, Μιχαήλ Ε
Thesis advisor Τζίτζικας, Ιωάννης
Reviewer Πλεξουσάκης, Δημήτρης
Μαγκούτης, Κώστας
Αντωνίου, Γρηγόρης
Κουμπαράκης, Μανώλης
Φλουρής, Γιώργος
Auer, Soren
Abstract Linked Data is a method for publishing structured data that facilitates their sharing, linking, searching and re-use. A big number of such datasets (or sources), has already been published and their number and size keeps increasing. Although the main objective of Linked Data is linking and integration, this target has not yet been satisfactorily achieved. Even seemingly simple tasks, such as finding all the available information for an entity is challenging, since this presupposes knowing the contents of all datasets and performing cross-dataset identity reasoning, i.e., computing the symmetric and transitive closure of the equivalence relationships that exist among entities and schemas. Another big challenge is Dataset Discovery, since current approaches exploit only the metadata of datasets, without taking into consideration their contents. In this dissertation, we analyze the research work done in the area of Linked Data Integration, by giving emphasis on methods that can be used at large scale. Specifically, we factorize the integration process according to various dimensions, for better understanding the overall problem and for identifying the open challenges. Then, we propose indexes and algorithms for tackling the above challenges, i.e., methods for performing cross-dataset identity reasoning, for finding all the available information for an entity, methods for offering content-based Dataset Discovery, and others. Due to the large number and volume of datasets, we propose techniques that include incremental and parallelized algorithms. We show that content-based Dataset Discovery is reduced to solving optimization problems, and we propose techniques for solving them in an efficient way. The aforementioned indexes and algorithms have been implemented in a suite of services that we have developed, called LODsyndesis, which offers all these services in real time. Furthermore, we present an extensive connectivity analysis for a big subset of LOD cloud datasets. In particular, we introduce measurements (concerning connectivity and efficiency) for 2 billion triples, 412 million URIs and 44 million equivalence relationships derived from 400 datasets, by using from 1 to 96 machines for indexing the datasets. Just indicatively, by using the proposed indexes and algorithms, with 96 machines it takes less than 10 minutes to compute the closure of 44 million equivalence relationships, and 81 minutes for indexing 2 billion triples. Furthermore, the dedicated indexes, along with the proposed incremental algorithms, enable the computation of connectivity metrics for 1 million subsets of datasets in 1 second (three orders of magnitude faster than conventional methods), while the provided services offer responses in a few seconds. These services enable the implementation of other high level services, such as services for Data Enrichment which can be exploited for Machine-Learning tasks, and techniques for Knowledge Graph Embeddings, and we show that this enrichment improves the prediction of machine-learning problems.
Language English
Subject Big data
Connectivity
Data integration
Data quality
Dataset discovery and selection
Lattice of measurements
Linked data
RDF
Ανακάλυψη και επιλογή πηγών δεδομένων
Διασυνδεδεμένα δεδομένα
Μεγάλα δεδομένα
Ολοκλήρωση δεδομένων
Πλέγμα μετρήσεων
Ποιότητα δεδομένων
Συνδεσιμότητα
Issue date 2020-07-24
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses
  Type of Work--Doctoral theses
Permanent Link https://elocus.lib.uoc.gr//dlib/a/8/b/metadata-dlib-1593426546-876833-20060.tkl Bookmark and Share
Views 589

Digital Documents
No preview available

Download document
View document
Views : 9