Home Trial & error : data transformation and provenance techniques in scientific workflows
Results - Details
|
||||
Identifier | 000334421 | |||
Title | Trial & error : data transformation and provenance techniques in scientific workflows | |||
Alternative Title | Trial & Error: Τεχνικές Μετατροπής και Προέλευσης Δεδομένων σε Επιστημονικά Συστήματα Ροής | |||
Author | Τσισπαράς, Βασίλης Κ | |||
Thesis advisor | Χριστοφίδης, Βασίλης | |||
Abstract |
In many cases the data sources to be integrated are manually formatted text files such as dictionaries, so called "corpora and encyclopedic material" with very complex encoding rules and many exceptions, which need to be transformed into a Database-compatible format conforming to a global Schema, so that all information can be accessed and queried in a uniform way. Most of the past and many recent systems employ one-step (single-step) data transformation procedures. The single step transformations are mainly idiosyncratic, i.e. different for each source, and require the implementation of very specific tools. In order to create such a transformation software repeated testing is required, but the specific software may be used only once. In this Thesis we created an application called Trial & Error which supports a multiple step data transformation technique and thereby enables the wider use of generic components. We have empirically found from a set of examples that the data transformation process can be broken down into many small steps which can be of more generic nature. The tools we used in these steps were designed as elementary as possible to increase the chance of reuse. The smaller the steps are, the more generic they can be. With the use of generic components and the semi-automatic execution we succeeded to reduce the execution time, the human intervention and improved the error handling technique. Trial & Error uses an existing Workflow Management System (WFMS) in order to associate every data transformation step with a workflow task. We extended the functionality of the WFMS by embedding some programming code parts in it in order to support the control flow. For a particular transformation procedure we select existing software applications or create small software components, suitable to our requirements, and integrate them into the WFMS as tasks. Our application both supports workflow instance creation and execution. It also supports storing and querying the Provenance information for each workflow instance which is very important in this domain. We demonstrate our application by converting from archaeological corpora written in Microsoft Word format into RDF CIDOC CRM compatible format. This Thesis presents a novel application to Data Transformation and Cleaning and proposes a solution for all the Science domains which need to convert their data laying in books and corpora to digital form. |
|||
Physical description | viii, 118 σ. : εικ. ; 30 cm. | |||
Language | English | |||
Issue date | 2008-07-22 | |||
Collection | School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses | |||
Type of Work--Post-graduate theses | ||||
Views | 456 |
Digital Documents | |
---|---|
Download document |