E-Locus - Institutional Repository of the University of Crete - Using linked data for named entity extraction and disambiguation

Home Using linked data for named entity extraction and disambiguation

Results - Details

[Add to Basket]

Identifier

000399691

Title

Using linked data for named entity extraction and disambiguation

Alternative Title

Χρήση διασυνδεδεμένων δεδομένων για εξόρυξη και αποσαφήνιση οντοτήτων

Author

Μπαριτάκης, Εμμανουήλ Μ.

Thesis advisor

Τζίτζικας, Ιωάννης

Reviewer

Πλεξουσάκης, Δημήτριος
Φουντουλάκη, Ειρήνη

Abstract

Named Entity Extraction (NEE) is the process of identifying entities in texts and, very commonly, linking them to related (Web) resources. This task is useful in several applications, e.g. for question answering, annotating documents, processing of search results, etc. However, it is quite common for an entity name to correspond to more than one semantic categories, e.g. Argentina may refer either to Fish Species Argentina or to Country Argentina. This is the well - known Named Entity Disambiguation (NED) problem. In addition to, existing NEE and NED tools lack an open or easy configuration although this is very important for building domain - specific applications. For example, supporting a new category of entities, or specifying how to link the detected entities with online resources, is either impossible or very laborious. In this thesis we show how we can exploit semantic information (Linked Data) at real - time for configuring a NEE system and disambiguating the mined entities. We introduce an RDF/S vocabulary, called Open NEE Configuration Model, which allows a NEE service to describe (and publish as Linked Data) its entity mining capabilities, but also to be dynamically configured. We present X - Link a NEE framework that realizes this model, and contrary to the existing tools, it allows the user to easily define the categories of entities that are interesting for the application at hand ( by exploiting Linked Data). Then we focus on the problem of NED in this context, i.e. on the problem of selecting the right category for each extracted entity. To this end we introduce three methods, each approaching the problem from a different perspective. The first method is based exclusively on NEE results and selects as more probable category the one with the highest occurrence frequency. The second method moves a step forward and exploits the semantic relations between the mined entities, using their semantic resources, and returns the semantic resource that is closer to the others in the semantic graph. The last method uses machine learning algorithms for classifying the entire document into a specific category based on a train set. Then we report the results of a thorough comparative experimental evaluation using search results from Bing search engine. We evaluate the introduced methods over collections of documents of different size and we measured the achieved precision and the required time for disambiguation. The results allowed us to identify the strong and weak aspects of each method. Overall, the third method works well in most cases apart from small snippets, e.g. tweets, where it achieves almost the same precision with the second method.

Language

English

Subject

Διασυνδεδεμένα δεδομένα

Issue date

2016-03-18

Collection

School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses

Type of Work--Post-graduate theses

Views

682

Digital Documents
	Download document View document Views : 21