Your browser does not support JavaScript!

Home    Using linked data for named entity extraction and disambiguation  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000399691
Title Using linked data for named entity extraction and disambiguation
Alternative Title Χρήση διασυνδεδεμένων δεδομένων για εξόρυξη και αποσαφήνιση οντοτήτων
Author Μπαριτάκης, Εμμανουήλ Μ.
Thesis advisor Τζίτζικας, Ιωάννης
Reviewer Πλεξουσάκης, Δημήτριος
Φουντουλάκη, Ειρήνη
Abstract Named Entity Extraction (NEE) is the process of identifying entities in texts and, very commonly, linking them to related (Web) resources. This task is useful in several applications, e.g. for question answering, annotating documents, processing of search results, etc. However, it is quite common for an entity name to correspond to more than one semantic categories, e.g. Argentina may refer either to Fish Species Argentina or to Country Argentina. This is the well - known Named Entity Disambiguation (NED) problem. In addition to, existing NEE and NED tools lack an open or easy configuration although this is very important for building domain - specific applications. For example, supporting a new category of entities, or specifying how to link the detected entities with online resources, is either impossible or very laborious. In this thesis we show how we can exploit semantic information (Linked Data) at real - time for configuring a NEE system and disambiguating the mined entities. We introduce an RDF/S vocabulary, called Open NEE Configuration Model, which allows a NEE service to describe (and publish as Linked Data) its entity mining capabilities, but also to be dynamically configured. We present X - Link a NEE framework that realizes this model, and contrary to the existing tools, it allows the user to easily define the categories of entities that are interesting for the application at hand ( by exploiting Linked Data). Then we focus on the problem of NED in this context, i.e. on the problem of selecting the right category for each extracted entity. To this end we introduce three methods, each approaching the problem from a different perspective. The first method is based exclusively on NEE results and selects as more probable category the one with the highest occurrence frequency. The second method moves a step forward and exploits the semantic relations between the mined entities, using their semantic resources, and returns the semantic resource that is closer to the others in the semantic graph. The last method uses machine learning algorithms for classifying the entire document into a specific category based on a train set. Then we report the results of a thorough comparative experimental evaluation using search results from Bing search engine. We evaluate the introduced methods over collections of documents of different size and we measured the achieved precision and the required time for disambiguation. The results allowed us to identify the strong and weak aspects of each method. Overall, the third method works well in most cases apart from small snippets, e.g. tweets, where it achieves almost the same precision with the second method.
Language English
Subject Διασυνδεδεμένα δεδομένα
Issue date 2016-03-18
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 627

Digital Documents
No preview available

Download document
View document
Views : 21