Abstract |
Curated ontologies and semantic annotations are increasingly being used in e-science to
reflect the current terminology and
conceptualization of various scientific domains. Such curated Knowledge Bases (KB) are
usually backended by relational databases using adequate
schemas. Schemas may be generic or application/domain specific and in many cases are
required to satisfy a wide range of integrity
constraints. As curated KBs continuously evolve, such constraints are often violated and
thus KBs need to be frequently \emph{repaired}.
Motivated by the fact that consistency is nowdays mostly enforced manually by the
scientists acting as curators, we propose a
\emph{generic} and \emph{personalized} repairing framework for assisting them in this
arduous task. Modeling integrity constraints using the
class of Disjunctive Embedded Dependencies (DEDs), we are capable of supporting a variety
of useful integrity constraints presented in the
literature. Moreover, we rely on coplex curator preferences over various interesting
features of the resulting repairs that can capture diverse
notions of \emph{minimality} in repairs. As a result, other repair policies presented in
the literature can be emulated within our
framework.
Moreover, we propose a novel \emph{exhaustive} repair finding algorithm which, unlike
existing greedy frameworks, is not sensitive to the
resolution order and syntax of violated constraints and can {\em correctly compute
globally optimal repairs for different kinds of
constraints and preferences}. Despite its exponential nature, the performance and memory
requirements of the exhaustive algorithm are
experimentally demonstrated to be satisfactory for real world curation cases, thanks to a
series of optimizations. Finally, we propose
the corresponding ``greedy'' algorithm wich computes \emph{locally optimal repairs} by
considering each violation individualy keeping only
the preffered-per-violation repairs.
Last but not least, we propose possible extensions of our framework to describe policies
where the inconsistencies are resolved during their
introduction (e.g., belief revision, belief merging). This can be achieved by carefuly
designing operations which modify the KB's status
in order to prevent the inconsistencies from creeping into the system.
|