Results - Details
Search command : Author="Στεφανίδης"
And Author="Κωνσταντίνος"
Current Record: 35 of 75
|
Identifier |
000421162 |
Title |
LAWA : locality aware partitioning for efficient query answering over RDF data |
Alternative Title |
Κατακερματισμός δεδομένων RDF με σκοπό την αποτελεσματική απάντηση επερωτήσεων εκμεταλλευόμενοι την τοπικότητα των δεδομένων |
Author
|
Αγαθάγγελος, Ιωάννης Κ.
|
Thesis advisor
|
Πλεξουσάκης, Δημήτρης
|
Reviewer
|
Τζίτζικας, Ιωάννης
Στεφανίδης, Κωνσταντίνος
|
Abstract |
The explosion of the web and the abundance of linked data, demand for effective
and efficient methods for storage, management and querying. Apache Spark is one of the
most active big-data approaches, with more and more systems adopting it, for efficient
querying over distributed data. However, most of the Spark-based RDF query answering
approaches so far, are exploiting simplistic horizontal and/or vertical partitioning of
triples, resulting in poor query performance. The main reason for this is that simplistic
data partitioning approaches fail to identify data locality and group together data that are
usually queried together.
To mitigate this problem, in this thesis, we present LAWA, a novel platform that
accepts as input an RDF dataset and effectively partitions data, ensuring data locality. To
achieve this, we identify the dataset’s important nodes as centroids and then we
distribute the other nodes to the centroid they mostly depend on.
This scheme ensures data locality, and in most cases results in a balanced data
distribution, however, without offering any guarantees on it. In order to study the design
choices and isolate the impact of data locality and data distribution we introduce two
variants. One focusing purely only on data locality, i.e. the Locality Aware Partitioning
(LAP) approach and another one enforcing balanced data distribution as well, i.e. the
Bounded Locality Aware Partitioning (BLAP) approach.
We show that out approach offers an optimal fine tuning between data distribution,
replication and data reduction, dominating existing approaches. More specifically we
evaluate our approach using both synthetic and real workloads, showing that we improve
query answering orders of magnitude over existing state of the art.
|
Language |
English |
Subject |
Data partitioning |
|
Distributed query processing |
Issue date |
2019-03-29 |
Collection
|
School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
|
|
Type of Work--Post-graduate theses
|
Permanent Link |
https://elocus.lib.uoc.gr//dlib/0/5/d/metadata-dlib-1550481806-994956-19484.tkl
|
Views |
633 |