Your browser does not support JavaScript!

Home    Data quality metrics in RDF based RDMSs  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000337668
Title Data quality metrics in RDF based RDMSs
Alternative Title Μετρικές Ποιότητας Δεδομένων σε Δυομότιμα Συστήματα Διαχείρισης Δεδομένων βασισμένα στην RDF
Author Κίτσου, Γεωργία Γεώργιος
Thesis advisor Χριστοφίδης, Βασίλης
Abstract Scientific or educational communities are striving nowadays for highly autonomous infrastructures enabling to exchange queries and integrate (semi-) structured data hosted by peers. In this context, there is a need for P2P data management systems (PDMSs), capable of supporting loosely coupled communities of databases in which each peer base can join and leave the network at free will, while groups of peers can collaborate on the fly to provide advanced data management services on a very large scale.
In our work, we consider that peers advertise their local bases using fragments of community RDF/S schemas. These advertisements are specified by appropriate RDF/S views, called RVL views. In this setting, due to the high distributed nature of a PDMS we need an efficient lookup service for identifying, in a decentralized fashion, which peer views can fully or partially contribute to the answer of a specific query.* *In addition, due to the very large number of peers in a PDMS that can actually contribute to the answer of a query, an interleaved query routing and planning is required to obtain as fast as possible the first answers from the most relevant peers while the query is further processed by others.
However, as the number of peers in a PDMS increases and queries become complex, the number of produced plans that need to be optimized and executed with the interleaved query routing and planning becomes huge. Most of the previous work has considered pruning the space of plans either with respect to a cost model or to some quality metric. However, pruning can be even more efficient if both cost and quality metric are considered at the same time. A threshold combining both cost and data quality metrics could be set either by the user or the system.
In this thesis, we provide formulae for estimating data quality metrics such as /coverage/, /density/ and /completeness/ of the view instances published by the peers with respect to the PDMS schema. The same metrics can be used to measure the quality of query plans produced by the PDMS optimizer. In the process of estimating these data quality metrics, the notion of overlap between the data of two or more peers is important.
Moreover, we introduce formulae for cardinality estimations of the two most important operators in our framework, i.e. the union and join operator and present a variation of an existing cost model based on response time of plans for use in the query planning phase. Finally, we enrich existing query planning algorithms proposed for RDF-based PDMSs with the data quality metrics we defined. Our objective is to discard plans that are ranked below a specific threshold combining cost with data-quality metrics and thus reduce as much as possible the planning time, while ensure that the final plan to be executed will be the best possible one with respect to the enforced threshold.
Physical description 130 σ. : εικ. ; 30 cm.
Language English
Issue date 2008-12-04
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 475

Digital Documents
No preview available

Download document
View document
Views : 4