Results - Details
Search command : Author="Τζίτζικας"
And Author="Γιάννης"
Current Record: 3 of 25
|
Identifier |
000463792 |
Title |
Efficient query answering for RDF knowledge graphs |
Alternative Title |
Αποτελεσματική απάντηση επερωτήσεων σε RDF γράφους γνώσης |
Author
|
Τρουλλινού, Γεωργία Σπυρίδων
|
Thesis advisor
|
Πλεξουσάκης, Δημήτριος
|
Reviewer
|
Φλουρής, Γιώργος
Χριστοφίδης, Βασίλης
Μαγκούτης, Κώστας
Μαρκάτος, Ευάγγελος
Πιτουρά, Ευαγγελία
Τζίτζικας, Γιάννης
|
Abstract |
RDF Knowledge Bases now available online scale to millions or even billions of triples
that should be effectively and efficiently processed and queried. This ever-increasing size
and number of RDF data collections dictate the usage of distributed data management
systems in order to efficiently query them. Apache Spark is one of the most widely used
distributed engines for big data processing, with more and more systems adopting it for
efficient query answering. Existing approaches exploit Spark for querying RDF data, and
adopt partitioning techniques for reducing the data that need to be accessed in order to
improve efficiency. However, simplistic data partitioning fails, on the one hand, to minimize data access and on the other hand to group data usually queried together. This
translates into limited improvements in terms of efficiency in query answering. Further, it
is common for queries to not terminate due to the complexity of the RDF datasets.
In this thesis, we present novel schema-based partitioning techniques accepting as input an RDF dataset and effectively partitioning it, exploiting schema information in order
to provide efficient query answering.
We first focus on exact query answering. As RDF datasets are weakly structured, schema
information might be incomplete or absent. We present, the first incremental and hybrid
RDF type discovery system for RDF datasets, enabling type discovery in datasets where
type declarations are either partially available or completely missing. Using this identified
schema we explore summarization techniques for effectively partitioning data, concluding with a partitioning scheme that enables fine-tuning of data distribution, significantly
reducing data access for query answering.
Then we focus on progressive query-answering offering an alternative solution to longrunning queries and presenting the first system for progressive query answering over KGs.
We again rely on a mined hierarchical schema structure that we exploit for effectively partitioning data. The corresponding partitioning scheme enables the progressive evaluation
of input queries with minimal latency and allows trading query accuracy for efficiency.
The extensive experimental study on both real-world and synthetic datasets, with varied query workloads, shows the effectiveness and the efficiency of our solutions, on both
exact and progressive query answering along with their internal components (i.e. schema
discovery & summarization), as well as their superiority with respect to baselines.
|
Language |
English |
Subject |
Data Partitioning |
|
Query Answering |
|
Spark |
|
Summaries |
|
Αποτίμηση Επερωτήσεων |
|
Κατεκερματισμός Δεδομένων |
|
Σπαρκ |
|
Συνόψεις |
Issue date |
2024-03-22 |
Collection
|
School/Department--School of Sciences and Engineering--Department of Computer Science--Doctoral theses
|
|
Type of Work--Doctoral theses
|
Permanent Link |
https://elocus.lib.uoc.gr//dlib/d/1/4/metadata-dlib-1712298725-643932-2548.tkl
|
Views |
329 |