Your browser does not support JavaScript!

Home    Search  

Results - Details

Search command : Author="Στεφανίδης"  And Author="Κωνσταντίνος"

Current Record: 13 of 53

Back to Results Previous page
Next page
Add to Basket
[Add to Basket]
Identifier 000421162
Title LAWA : locality aware partitioning for efficient query answering over RDF data
Alternative Title Κατακερματισμός δεδομένων RDF με σκοπό την αποτελεσματική απάντηση επερωτήσεων εκμεταλλευόμενοι την τοπικότητα των δεδομένων
Author Αγαθάγγελος, Ιωάννης Κ.
Thesis advisor Πλεξουσάκης, Δημήτρης
Reviewer Τζίτζικας, Ιωάννης
Στεφανίδης, Κωνσταντίνος
Abstract The explosion of the web and the abundance of linked data, demand for effective and efficient methods for storage, management and querying. Apache Spark is one of the most active big-data approaches, with more and more systems adopting it, for efficient querying over distributed data. However, most of the Spark-based RDF query answering approaches so far, are exploiting simplistic horizontal and/or vertical partitioning of triples, resulting in poor query performance. The main reason for this is that simplistic data partitioning approaches fail to identify data locality and group together data that are usually queried together. To mitigate this problem, in this thesis, we present LAWA, a novel platform that accepts as input an RDF dataset and effectively partitions data, ensuring data locality. To achieve this, we identify the dataset’s important nodes as centroids and then we distribute the other nodes to the centroid they mostly depend on. This scheme ensures data locality, and in most cases results in a balanced data distribution, however, without offering any guarantees on it. In order to study the design choices and isolate the impact of data locality and data distribution we introduce two variants. One focusing purely only on data locality, i.e. the Locality Aware Partitioning (LAP) approach and another one enforcing balanced data distribution as well, i.e. the Bounded Locality Aware Partitioning (BLAP) approach. We show that out approach offers an optimal fine tuning between data distribution, replication and data reduction, dominating existing approaches. More specifically we evaluate our approach using both synthetic and real workloads, showing that we improve query answering orders of magnitude over existing state of the art.
Language English
Subject Data partitioning
Distributed query processing
Issue date 2019-03-29
Collection   Faculty/Department--Faculty of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Permanent Link Bookmark and Share
Views 110

Digital Documents
No preview available

View document
Views : 15