Your browser does not support JavaScript!

Home    Collections    Type of Work    Doctoral theses  

Doctoral theses

Current Record: 6 of 2427

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000463792
Title Efficient query answering for RDF knowledge graphs
Alternative Title Αποτελεσματική απάντηση επερωτήσεων σε RDF γράφους γνώσης
Author Τρουλλινού, Γεωργία Σπυρίδων
Thesis advisor Πλεξουσάκης, Δημήτριος
Reviewer Φλουρής, Γιώργος
Χριστοφίδης, Βασίλης
Μαγκούτης, Κώστας
Μαρκάτος, Ευάγγελος
Πιτουρά, Ευαγγελία
Τζίτζικας, Γιάννης
Abstract RDF Knowledge Bases now available online scale to millions or even billions of triples that should be effectively and efficiently processed and queried. This ever-increasing size and number of RDF data collections dictate the usage of distributed data management systems in order to efficiently query them. Apache Spark is one of the most widely used distributed engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploit Spark for querying RDF data, and adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic data partitioning fails, on the one hand, to minimize data access and on the other hand to group data usually queried together. This translates into limited improvements in terms of efficiency in query answering. Further, it is common for queries to not terminate due to the complexity of the RDF datasets. In this thesis, we present novel schema-based partitioning techniques accepting as input an RDF dataset and effectively partitioning it, exploiting schema information in order to provide efficient query answering. We first focus on exact query answering. As RDF datasets are weakly structured, schema information might be incomplete or absent. We present, the first incremental and hybrid RDF type discovery system for RDF datasets, enabling type discovery in datasets where type declarations are either partially available or completely missing. Using this identified schema we explore summarization techniques for effectively partitioning data, concluding with a partitioning scheme that enables fine-tuning of data distribution, significantly reducing data access for query answering. Then we focus on progressive query-answering offering an alternative solution to longrunning queries and presenting the first system for progressive query answering over KGs. We again rely on a mined hierarchical schema structure that we exploit for effectively partitioning data. The corresponding partitioning scheme enables the progressive evaluation of input queries with minimal latency and allows trading query accuracy for efficiency. The extensive experimental study on both real-world and synthetic datasets, with varied query workloads, shows the effectiveness and the efficiency of our solutions, on both exact and progressive query answering along with their internal components (i.e. schema discovery & summarization), as well as their superiority with respect to baselines.
Language English
Subject Data Partitioning
Query Answering
Spark
Summaries
Αποτίμηση Επερωτήσεων
Κατεκερματισμός Δεδομένων
Σπαρκ
Συνόψεις
Issue date 2024-03-22
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses
  Type of Work--Doctoral theses
Permanent Link https://elocus.lib.uoc.gr//dlib/d/1/4/metadata-dlib-1712298725-643932-2548.tkl Bookmark and Share
Views 13

Digital Documents
No preview available

Download document
View document
Views : 2