E-Locus - Institutional Repository of the University of Crete - On Computing Deltas of RDF/S Knowledge Bases with Blank Nodes

Home On Computing Deltas of RDF/S Knowledge Bases with Blank Nodes

Results - Details

[Add to Basket]

Identifier

000379744

Title

On Computing Deltas of RDF/S Knowledge Bases with Blank Nodes

Alternative Title

Υπολογίζοντας τις διαφορές μεταξύ βάσεων γνώσεων RDF με ανώνυμους κόμβους.

Author

Λαντζάκη, Χριστίνα Αντώνιος

Thesis advisor

Τζίτζικας, Ιωάννης

Abstract

The Semantic Web (SW) is an evolving extension of the World Wide Web in which the content can be expressed not only in natural language, but also in formal lan- guages (e.g. RDF/S) that can be read and used by software agents, permitting them to find, share and integrate information more easily. The statement of Hera- clitus ”Everything flows, nothing stands still” holds also in the context of the SW since everything changes (the resources themselves, the ontologies, the resource descriptions, etc). Consequently, the ability to compute the differences that exist between two RDF/S Knowledge Bases (KBs), hereafter Delta, is very important. In particular, Deltas can be employed to (a) aid humans understand the evolution of knowledge, and (b) reduce the amount of data that need to be exchanged and managed over the network in order to build SW synchronization, versioning and replication services. The comparison problem is not simple because RDF allows anonymous nodes. A anonymous node, else called blank node, is a node in an RDF graph which is not identified by a URI and is not a literal. Several RDF KBs rely heavily on blank nodes (e.g. 7.5% of Linked Data are estimated to be blank nodes) as they are convenient for representing complex attributes or resources whose identity is unknown but their attributes (either literals or associations with other resources) are known. Considering blank nodes as ”constants” unique to both graphs does not help either in detecting equivalence between graphs nor in reducing the Delta. On the contrary, matching the blank nodes of the two graphs can significantly reduce the produced delta. This work is the first that focuses on methods of matching the blank nodes of the one graph with the blank nodes of the other graph, by approaching it as an optimization problem aiming at finding the mapping that yields the minimum in size Delta (with the least number of triples to delete or add to make the graphs equivalent). We prove that finding the optimal blank node mapping is NP-Hard in the general case by reducing to the sub-graph isomorphism problem. When graphs do not contain directly connected blank nodes (no triples with more than one blank nodes exists), we show that the polynomial Hungarian algorithm can be used to find the optimal blank node mapping. For the general case we present various polynomial algorithms returning approximate solutions. For making the application of our method feasible also to very large KBs we present a signature-based mapping algorithm with NlogN time complexity. Finally, for the proposed algorithms we report extensive comparative experi- mental results, over real and synthetic KBs, regarding delta reduction (and its de- viation from the optimal), equivalence detection, and computational requirements. The results are very interesting. Indicatively the signature-based algorithm can match KBs with up to 150,000 bnodes in a few seconds with Y % deviation from the optimal.

Language

English

Subject

Bases RDF

Blank Nodes

Delta

Knowledge

RDF

Ανώνυμοι κόμβοι

Βάσεις γνώσης

Issue date

2013