Your browser does not support JavaScript!

Home    Εξόρυξη γνώσεων από Βιοϊατρική Βιβλιογραφία – Το Σύστημα ΜINEBIOΤEXT: Ανακάλυψη συσχετίσεων μεταξύ γονιδίων, πρωτεϊνών και ασθενειών  

Results - Details

Add to Basket
[Add to Basket]
Identifier uch.csd.msc//2006antonakaki
Title Εξόρυξη γνώσεων από Βιοϊατρική Βιβλιογραφία – Το Σύστημα ΜINEBIOΤEXT: Ανακάλυψη συσχετίσεων μεταξύ γονιδίων, πρωτεϊνών και ασθενειών
Alternative Title Mining the Biomedical Literature – The MineBioText system: Discovery of Gene, Protein and Disease Correlations
Creator Antonakaki, Despoina
Abstract Automatic knowledge discovery from biomedical free-texts appears as a necessity considering the growing of the massive amounts of biomedical scientific literature. A special problem that makes this task more challenging, and difficult as well, is the overabundance and diversity of the related genomic/proteomic ontologies and the respective gene and protein terminologies. Specifically, a genomic/proteomic term, e.g., gene, protein and their functional descriptions, as well as the diseases, are referred with many different ways in scientific documents regarding the organization, research context and the naming conventions that the authors are adherent to. The work reported in this thesis presents methods and tools for the efficient and reliable mining of biomedical literature, based on advanced text-mining techniques. Specifically it covers the following R&D challenges: (a) Identification of gene/protein--gene/protein and gene/protein--disease correlations following a text mining approach. The approach utilizes data-mining and statistical techniques, algorithms and metrics to deal with the following problems: (i) identification and recognition of terms in text-references – based on an appropriately devised and implemented algorithmic process that utilises the Trie data-structure; and (ii) ranking of terms and their (potential) relations or, links – based on the MIM entropic metric (Mutual Information Metric) to measure the respective terms’ association strength. (b) Construction of a genes association network – based on the assessed terms’ (genes, proteins, diseases) association strengths. (c) Categorization / Classification of textreferences (mainly from the PubMed abstracts repository) into class categories utilizing an appropriately devised classification metric and procedure, and using the most descriptive (i.e, strong) associations between terms. Pre-assignment of text-references (i.e., PubMed abstract) to categories is performed by posting respective queries to PubMed, i.e., querying PubMed with “breast cancer” the retrieved documents are considered to belong to the “breast cancer” category. (d) Assessment on the texts’ categorization / classification results – based on respective PubMed abstract collections, their precategorization and careful experimental set-up to measure prediction results, i.e., accuracy and precision. (e) Design and development of a tool – the MineBioText (Mining Biomedical Texts), that encompasses all of the aforementioned operations with extra functionalities for setting-up the domain of reference and study, e.g., gene/protein and disease names, their synonyms and free-text descriptions, text collections, parameterization of build-in algorithmic processes etc.
Issue date 2006-04-01
Date available 2006-07-19
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 491

Digital Documents
No preview available

Download document
View document
Views : 19