Your browser does not support JavaScript!

Home    Heuristic Optimization of SPARQL queries over Column-Store DBMS  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000370106
http://elocus.lib.uoc.gr//dlib/2/c/c/metadata-dlib-1322123539-434628-19267.tkl
Title Heuristic Optimization of SPARQL queries over Column-Store DBMS
Alternative Title Ευρετική βελτιστοποίηση ερωτήσεων SPARQL σε ΣΔΒΔ βασισμένα σε κολόνες
Author Αναγνωστόπουλος-Τσιαλιαμάνης, Πέτρος Βασίλειος
Thesis advisor Χριστοφίδης, Βασίλης
Abstract During the last decade we have witnessed a tremendous increase in the amount of semantic data available on the Web in almost every field of human activity. More and more corporate, governmental, or even user-generated datasets break the walls of ``private'' management within their production site, are published, and become available for potential data consumers, i.e., applications/services, individual users and communities. In this context, The Web of Data extends current Web to a global data space connecting data from diverse domains. This gives added value for decision support and business intelligence applications, and enables new types of services that operate on top of an unbound, global data space and not on a fixed set of data sources as in Web 2.0 mashups. A central issue in this respect is the manipulation and usage of data based on their meaning by using effective and efficient support for storing, querying, and manipulating semantic RDF data, the lingua franca of Linked Open Data and hence the default data model for the Web of Data. In this thesis we are focusing on the problem of scalable processing and optimization of semantic queries expressed in SPARQL using modern relational engines. Existing native or SQL-based engines for processing SPARQL queries heavily rely on statistics regarding the stored RDF graphs as well as adequate cost based planning algorithms to optimize complex join queries. Extensive data statistics are quite expensive to compute and maintain for large scale evolving semantic data over the Web. The main challenge in this respect is to devise heuristics-based query optimization techniques generating near to optimal execution plans without any knowledge of the underlying datasets. For this reason we propose the first heuristics-based SPARQL planner (HSP) that is capable of exploring the syntactic variations of triple patterns in a query in order to choose a near to optimal execution plan without the use of a cost model. Furthermore, we have implemented HSP plans on top of the MonetDB column-based DBMS. We have paid particular attention to the efficient implementation of HSP logical plans to the underlying MonetDB query execution engine by translating them into MonetDB's physical algebra (MAL). We have finally, experimentally evaluated the quality and execution time of the plans produced by HSP with a state-of-the-art Cost-based Dynamic Programming (CDP) algorithm employed by RDF-3X using synthetically generated and real RDF datasets. In all queries of our workload, HSP produce plans with the same number of merge and hash joins as CDP. Their differences lie on the employed ordered variables as well as the execution order of joins which essentially affect the size of intermediate results. With the exception of queries which are not substantially different in their syntax, HSP plans executed on MonetDB outperform those of CDP executed in RDF-3X up to three orders of magnitude. More precisely, HSP tries to produce plans that maximize the number of merge joins over the ordered variables which are shared among the triple patterns of a query and relies on various heurists to decide which ordered variables will be used in selections and joins as well as which underlying access paths will be exploited for evaluating the triple patterns (essentially sorted triple relations in MonetDB).
Language English
Subject Query Optimization
Query Procssing
SPARQL
Semantic Web
Βελτιστοποίηση επερωτήσεων
Σημασιολογικός ιστός
Issue date 2011-11-18
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 583

Digital Documents
No preview available

Download document
View document
Views : 145