E-Locus - Institutional Repository of the University of Crete

Home Collections Type of Work Doctoral theses

Doctoral theses

Current Record: 6 of 2427

[Add to Basket]

Identifier

000463792

Title

Efficient query answering for RDF knowledge graphs

Alternative Title

Αποτελεσματική απάντηση επερωτήσεων σε RDF γράφους γνώσης

Author

Τρουλλινού, Γεωργία Σπυρίδων

Thesis advisor

Πλεξουσάκης, Δημήτριος

Reviewer

Φλουρής, Γιώργος
Χριστοφίδης, Βασίλης
Μαγκούτης, Κώστας
Μαρκάτος, Ευάγγελος
Πιτουρά, Ευαγγελία
Τζίτζικας, Γιάννης

Abstract

RDF Knowledge Bases now available online scale to millions or even billions of triples that should be effectively and efficiently processed and queried. This ever-increasing size and number of RDF data collections dictate the usage of distributed data management systems in order to efficiently query them. Apache Spark is one of the most widely used distributed engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploit Spark for querying RDF data, and adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic data partitioning fails, on the one hand, to minimize data access and on the other hand to group data usually queried together. This translates into limited improvements in terms of efficiency in query answering. Further, it is common for queries to not terminate due to the complexity of the RDF datasets. In this thesis, we present novel schema-based partitioning techniques accepting as input an RDF dataset and effectively partitioning it, exploiting schema information in order to provide efficient query answering. We first focus on exact query answering. As RDF datasets are weakly structured, schema information might be incomplete or absent. We present, the first incremental and hybrid RDF type discovery system for RDF datasets, enabling type discovery in datasets where type declarations are either partially available or completely missing. Using this identified schema we explore summarization techniques for effectively partitioning data, concluding with a partitioning scheme that enables fine-tuning of data distribution, significantly reducing data access for query answering. Then we focus on progressive query-answering offering an alternative solution to longrunning queries and presenting the first system for progressive query answering over KGs. We again rely on a mined hierarchical schema structure that we exploit for effectively partitioning data. The corresponding partitioning scheme enables the progressive evaluation of input queries with minimal latency and allows trading query accuracy for efficiency. The extensive experimental study on both real-world and synthetic datasets, with varied query workloads, shows the effectiveness and the efficiency of our solutions, on both exact and progressive query answering along with their internal components (i.e. schema discovery & summarization), as well as their superiority with respect to baselines.

Language

English

Subject

Data Partitioning

Query Answering

Spark

Summaries

Αποτίμηση Επερωτήσεων

Κατεκερματισμός Δεδομένων

Σπαρκ

Συνόψεις

Issue date