E-Locus - Institutional Repository of the University of Crete - SPIMBench : a scalable, schema-aware instance matching benchmark for the semantic publishing domain

Home SPIMBench : a scalable, schema-aware instance matching benchmark for the semantic publishing domain

Results - Details

[Add to Basket]

Identifier

000388638

Title

SPIMBench : a scalable, schema-aware instance matching benchmark for the semantic publishing domain

Alternative Title

Ένα κλιμακώσιμο με επίγνωση σχήματος πλαίσιο αξιολόγησης συστημάτων αντιστοίχισης στιγμιότυπων για τη δημοσίευση σημασιολογικά εμπλουτισμένων δεδομένων

Author

Σαβέτα, Τζανίνα Α.

Thesis advisor

Πλεξουσάκης, Δημήτρης

Reviewer

Τζίτζικας, Ιωάννης
Φουντουλάκη, Ειρήνη

Abstract

Instance matching systems and methods need to be tested using well defined and widely accepted benchmarks to determine the weak and strong points thereof and also to motivate the development of more complete systems. A benchmark should test the overall quality of the instance matching system in terms of measures such as precision, recall, and F-measure as well as the ability to handle large and diverse datasets. A number of benchmarks have already been proposed to test the performance of instance matching techniques mostly for XML and relational data but, more recently, also for RDF, the type of data prevalent in the Web of Data. Instance Matching benchmarks for RDF data are the first to consider the problem of instance matching when a real world object is represented in different ways that do not all conform to the same RDFS or OWL schema. Meaning that in addition to lexical differences among entities representing the same object, these benchmarks consider structural differences such as property splitting or aggregation. However, to the best of our knowledge, none of the proposed benchmarks to date considers the more complex logical constructs that can be expressed in terms of rich OWTL constructs. The logical transformations proposed by existing benchmarks all remain at the level of simple RDFS constraints. In this thesis we propose the Semantic Publishing Instance Matching Bench¬mark, in short, SPIMBench inspired from the Semantic Publishing domain. SPIM¬Bench is based on the BBC (http: //www. bbc. com/) ontologies that represent infor-mation about creative works (called journalistic assets) created by the publisher's editorial team. SPIMBench proposes and implements i) a scalable data generator, it) a set of transformations on source data to obtain the target data that include, in addition to the standard value and structural transformations, logical ones that go beyond the standard RDFS constructs and include expressive OWL constructs, namely instance (in)equality, equivalence of classes and properties, property con-straints and complex class definitions, a Hi) weighted gold standard that can be used for debugging instance matching systems and finally, iv) a set of metrics used to assess the performance of an instance matching system.

Language

English, Greek

Issue date

2014-11-21

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

674

Digital Documents
	Download document View document Views : 38