Your browser does not support JavaScript!

Home    SPIMBench : a scalable, schema-aware instance matching benchmark for the semantic publishing domain  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000388638
Title SPIMBench : a scalable, schema-aware instance matching benchmark for the semantic publishing domain
Alternative Title Ένα κλιμακώσιμο με επίγνωση σχήματος πλαίσιο αξιολόγησης συστημάτων αντιστοίχισης στιγμιότυπων για τη δημοσίευση σημασιολογικά εμπλουτισμένων δεδομένων
Author Σαβέτα, Τζανίνα Α.
Thesis advisor Πλεξουσάκης, Δημήτρης
Reviewer Τζίτζικας, Ιωάννης
Φουντουλάκη, Ειρήνη
Abstract Instance matching systems and methods need to be tested using well defined and widely accepted benchmarks to determine the weak and strong points thereof and also to motivate the development of more complete systems. A benchmark should test the overall quality of the instance matching system in terms of measures such as precision, recall, and F-measure as well as the ability to handle large and diverse datasets. A number of benchmarks have already been proposed to test the performance of instance matching techniques mostly for XML and relational data but, more recently, also for RDF, the type of data prevalent in the Web of Data. Instance Matching benchmarks for RDF data are the first to consider the problem of instance matching when a real world object is represented in different ways that do not all conform to the same RDFS or OWL schema. Meaning that in addition to lexical differences among entities representing the same object, these benchmarks consider structural differences such as property splitting or aggregation. However, to the best of our knowledge, none of the proposed benchmarks to date considers the more complex logical constructs that can be expressed in terms of rich OWTL constructs. The logical transformations proposed by existing benchmarks all remain at the level of simple RDFS constraints. In this thesis we propose the Semantic Publishing Instance Matching Bench¬mark, in short, SPIMBench inspired from the Semantic Publishing domain. SPIM¬Bench is based on the BBC (http: //www. bbc. com/) ontologies that represent infor-mation about creative works (called journalistic assets) created by the publisher's editorial team. SPIMBench proposes and implements i) a scalable data generator, it) a set of transformations on source data to obtain the target data that include, in addition to the standard value and structural transformations, logical ones that go beyond the standard RDFS constructs and include expressive OWL constructs, namely instance (in)equality, equivalence of classes and properties, property con-straints and complex class definitions, a Hi) weighted gold standard that can be used for debugging instance matching systems and finally, iv) a set of metrics used to assess the performance of an instance matching system.
Language English, Greek
Issue date 2014-11-21
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 674

Digital Documents
No preview available

Download document
View document
Views : 38