Abstract |
Data mining, as well as Data Analysis and Machine Learning are utilized during the last
few years, by a variety of people of different experience level and from different fields,
to provide solutions in a variety of different domains and applications. Machine Learning
algorithms are applied to any dataset, regardless of the type or content, and regardless of
source e.g. real world or simulated, observations, and can generate a model describing
them.
The available data mining algorithm and methodologies space, is increased day by day,
making it difficult even for experts to follow this changes. As different dataset types may
demand, a different approach in terms of methodology or algorithmic analysis, the Data
Analysis process is becoming even more complex. A lot of effort has been given towards
the development of Data Mining Assistants (usually referred as IDAs - Intelligent Data
Assistants), in order to overcome the above obstacles.
In this Thesis we designed and developed an automated intelligent system, the RB-DMA
(Rule Based Data Mining Assistant), which, based on an extension of the OntoDM data
mining ontology [1] and combined with a set of rules written in Drools [2], proposes
the most appropriate data mining workflows, ranked based on their efficiency for a given
analysis.
Our approach provides, all the decisions the end-user will need, regardless of their experience
or knowledge, in order to conduct an analysis with trustful results.
Data Analysis depending on the amount and complexity of data, usually require a considerable
amount of time in order to produce results. Our system, addresses this issue, by
proposing to the user the n best workflows, where n is considerably small and optimized
to produce close to best results.
Last, but not least, the system covers up to 200 data analysis scenarios (binary classification
and regression, on a variety of data types and sizes).
|