Home Data quality metrics in RDF based RDMSs
Results - Details
|
||||
Identifier | 000337668 | |||
Title | Data quality metrics in RDF based RDMSs | |||
Alternative Title | Μετρικές Ποιότητας Δεδομένων σε Δυομότιμα Συστήματα Διαχείρισης Δεδομένων βασισμένα στην RDF | |||
Author | Κίτσου, Γεωργία Γεώργιος | |||
Thesis advisor | Χριστοφίδης, Βασίλης | |||
Abstract |
In our work, we consider that peers advertise their local bases using fragments of community RDF/S schemas. These advertisements are specified by appropriate RDF/S views, called RVL views. In this setting, due to the high distributed nature of a PDMS we need an efficient lookup service for identifying, in a decentralized fashion, which peer views can fully or partially contribute to the answer of a specific query.* *In addition, due to the very large number of peers in a PDMS that can actually contribute to the answer of a query, an interleaved query routing and planning is required to obtain as fast as possible the first answers from the most relevant peers while the query is further processed by others. However, as the number of peers in a PDMS increases and queries become complex, the number of produced plans that need to be optimized and executed with the interleaved query routing and planning becomes huge. Most of the previous work has considered pruning the space of plans either with respect to a cost model or to some quality metric. However, pruning can be even more efficient if both cost and quality metric are considered at the same time. A threshold combining both cost and data quality metrics could be set either by the user or the system. In this thesis, we provide formulae for estimating data quality metrics such as /coverage/, /density/ and /completeness/ of the view instances published by the peers with respect to the PDMS schema. The same metrics can be used to measure the quality of query plans produced by the PDMS optimizer. In the process of estimating these data quality metrics, the notion of overlap between the data of two or more peers is important. Moreover, we introduce formulae for cardinality estimations of the two most important operators in our framework, i.e. the union and join operator and present a variation of an existing cost model based on response time of plans for use in the query planning phase. Finally, we enrich existing query planning algorithms proposed for RDF-based PDMSs with the data quality metrics we defined. Our objective is to discard plans that are ranked below a specific threshold combining cost with data-quality metrics and thus reduce as much as possible the planning time, while ensure that the final plan to be executed will be the best possible one with respect to the enforced threshold. |
|||
Physical description | 130 σ. : εικ. ; 30 cm. | |||
Language | English | |||
Issue date | 2008-12-04 | |||
Collection | School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses | |||
Type of Work--Post-graduate theses | ||||
Views | 525 |
Digital Documents | |
---|---|
Download document |