Your browser does not support JavaScript!

Home    Compact Archiving of Multiple (possibly Versioned) RDF/S Triple Sets  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000368498
http://elocus.lib.uoc.gr//dlib/1/1/6/metadata-dlib-772e0fdc23fd0dfc32ef7fc65c241c56_1314687153.tkl
Title Compact Archiving of Multiple (possibly Versioned) RDF/S Triple Sets
Alternative Title Συμπαγής αρχειοθέτηση πολλαπλών εκδόσεων από σύνολα τριπλετών RDF/S
Author Ψαράκη, Μαρία-Γεωργία Ευτύχιος
Thesis advisor Τζίτζικας, Ιωάννης
Abstract The Semantic Web (SW) is an evolving extension of the World Wide Web, in which content can be expressed not only in natural language, but also in languages (e.g. RDF/S) that can be interpreted formally enabling the provision of more advanced searching, sharing and integration services. However since knowledge is not static, but evolves over time, there is a need for techniques for managing this evolution. One of them is that of archiving past versions. Archiving is useful for various reasons (interoperability, traceability, provenance). For instance, in e-science failure to keep the previous states of data (over which other experiments were based) jeopardizes scientific evidence and our ability to verify findings. In this work we use the term RDF KB (for short KB) to refer to any set or RDF/S triples. POI (Partial Order Index) is a structure which has been proposed recently for storing multiple (either versioned or not) KBs. POI exploits the fact that RDF is based on a graph data model, and hence an RDF KB does not have a unique serialization (as it happens with texts). This characteristic justifies exploring directions that have not been elaborated by the classical versioning systems for texts (e.g. versioning systems for software). In brief, POI offers notable space saving in comparison to the differential storage of KBs (storing deltas), as well as efficiency in various cross version operations, especially in cases where the contents of the KBs are subset-related. This thesis focuses on methods for further reducing the space requirements of POI. Specifically we introduce a variation of POI that we call CPOI (Compact POI), which relies on gapped triple identifiers with variable length encoding schemes for natural numbers. For this structure we identified conditions under which CPOI guarantees space gains (over POI and other storage options). Since these are sufficient (not necessary) conditions, we conducted an extensive experimental evaluation, also for measuring the compression ratio achieved, and for comparatively evaluating various identifier assignment policies. The results showed significant storage savings, specifically, the total space required in large and realistic synthetic datasets is in average around 25 times less than the size of the original dataset and 3 times less than the size of a differential (delta-based) storage. The total size of CPOI is about 60%-80% of the size of plain POI, while the size of the compressed sets is 8% of the size of the uncompressed sets.
Language English
Subject Archiving
Compression
Knowledge Base
RDF data
RDF δεδομένα
Semantic Web
Storage
Triples
Αρχειοθέτηση
Βάση Γνώσης
Σημασιολογικός Ιστός
Συμπίεση
Τριπλέτες
Issue date 2011-07-15
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 509

Digital Documents
No preview available

Download document
View document
Views : 12