Your browser does not support JavaScript!

Home    A Machine Learning method to classify sentences containing biomedical entities in academic text  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000453963
Title A Machine Learning method to classify sentences containing biomedical entities in academic text
Alternative Title Μια μέθοδος μηχανικής μάθησης για την ταξινόμηση προτάσεων που περιέχουν βιοϊατρικές οντότητες σε ακαδημαϊκά κείμενα
Author Μπουμπάκης, Απόστολος
Thesis advisor Καντεράκης, Αλέξανδρος
Abstract Nowadays, the amount of biomedical literature is getting larger and larger and thus Natural Language Processing (NLP) research in clinical documents is gaining a very significant role. The automated analysis of biomedical literature is rapidly growing, stimulating the development of several techniques of automatic Named Entity Recognition (NER) and document classification. However, despite the existence of so many techniques for the classification of biomedical entity sentences, few types of entities can be easily recognized. The aim of this study is to present the state-of-the-art Named Entity Recognition technique, Bidirectional Encoder Representations from Transformers (BERT), in order to recognize/extract Disease, Gene, SNP and Chemical entities from biomedical texts. The reason why BERT was chosen is the fact that it is the most widespread Neural Network architecture for training language models, having led to considerable improvements in various NLP tasks. In general, the more the parameters in a BERT model, the better the results obtained. Unfortunately, due to the fact that the memory consumption increases with the size of these models, the lighter BERT variant, distilBERT, was applied. This technique was evaluated on two NER tasks for each entity. All in all, in outline, hundreds of biomedical papers were parsed in an XML format, analyzed to their sentences, classified and labeled accordingly, in order to create different datasets. Finally, they were passed through the BERT model to recognize sentences that include (or not) the aforementioned entities. The results showed that by appropriately pre-training the BERT model, great recognition performance can be achieved, without extensive fine-tuning and optimization requirements, while outperforming previous models on NER biomedical text mining task. However, there is by all means space for further tuning and much more future work and new challenges.
Language English
Subject Document classification
Entity recognition
Αναγνώριση οντοτητων
Ταξινόμηση κειμένων
Issue date 2023-04-05
Collection   School/Department--School of Medicine--Department of Medicine--Post-graduate theses
  Type of Work--Post-graduate theses
Views 368

Digital Documents
No preview available

Download document
View document
Views : 0