Results - Details
Search command : Author="Μαυροθαλασσίτης"
And Author="Γεώργιος"
Current Record: 15 of 26
|
Identifier |
000383048 |
Title |
BioTextQuest:Ένα διαδικτυακό εργαλείο εξόρυξης δεδομένων με στόχο την ανακάλυψη καινούργιας πληροφορίας.Κοινωνικοποίηση γονιδίων:Mια μελέτη της γονιδιακής θέσης ,της περιεκτικότητας σε GC και της σίγασης γονιδίων στη Salmonellα |
Alternative Title |
BiotextQuest:a data mining tool for concept discovery.Gene socialization :gene order GC content and gene silencing in salmonella |
|
Κοινωνικοποίηση γονιδίων:Μια μελέτη της γονιδιακής θέσης ,της περιεκτικότητας σε GC και της σίγασης γονιδίων στη Salmonella |
Author
|
Παπανικολάου, Νικόλας
|
Thesis advisor
|
Ηλιόπουλος, Ιωάννης
|
Reviewer
|
Σαββάκης, Χαράλαμπος
Μαυροθαλασσίτης, Γεώργιος
Καραγωγέος Δόμνα
Ηλιόπουλος, Αριστείδης
Πoίράζη, Παναγιώτα
Προμπονάς, Βασίλης
|
Abstract |
This thesis describes research carried out at the Medical School of the
University of Crete under the supervision of Professor Charalambos Savakis
and in collaboration with Dr Ioannis Iliopoulos. The thesis comprises of 2
distinct parts. The first part describes a text mining method that groups
PubMed abstracts in meaningful clusters and the second part describes a
whole genome comparison analysis between two bacterial genomes.
Part 1: bioTextQuest
bioTextQuest is an online tool that allows the user to perform a
specialized keyword search in PubMed. The abstracts (that are locally stored
in the bioTextQuest Database) are collected and analyzed. The analysis is
performed in the following stages:
1. Various predefined words (stoplist) are excluded from the abstracts.
2. Each word of each abstract is weighted for its importance (based on a
dictionary) using a variation of a specialized weight algorithm called
TF.IDF. Less ‘important’ terms are pruned. Terms with high TF.IDF and
terms not appearing in the dictionary pass through.
3. Remaining terms comprise the Li.S.T. (List of Significant Terms).
4. Based on Li.S.T., each abstract is represented by a vector.
5. Various clustering algorithms act on the vectors and group them in
clusters.
6. Each cluster is annotated using Gene Ontology (molecular function,
cellular compartment and biological process annotation) and Reflect
(protein annotation).
7. Each Cluster is presented to users using the respective Significant
Terms in a Tag Cloud format that represents the contribution of each
term in the corresponding cluster.
The clusters can be altered by adjusting several parameters and can
be better studied through the aid of their functional enrichment. Clustering can
help in quickly assessing a scientific field, concept discovery etc.
Part 2: Gene Socialization
We performed a genome-wide comparison of two bacterial genomes
(Salmonella Typhimurium and Escherichia Coli) focusing on gene order
conservation. We study synteny in conjunction with GC content, gene
duplication, gene essentiality, gene silencing, horizontal gene transfer and
synonymous vs. non-synonymous single-point mutations.
We found out that genes that conserve their gene order tend to be
more conserved, have higher GC content and lower nonsynonymous/
synonymous ratio. Genes that lose their original position tend to
be silenced. Also, duplicated genes follow different evolutionary paths
depending on whether they conserve their original position or not: duplicates
that remain in their original position tend to be more conserved than the ones
that leave their genomic neighborhood. The latter tend to accumulate more
AT mutations. Additionally, essential genes tend to remain in their original
genetic location.
|
Language |
Greek |
Subject |
Bioinformatics |
|
Data intergation |
|
Text mining |
|
Whole genome analysis |
|
Βιοπληροφορική |
|
Γονιδιωματική ανάλυση |
|
Εξόρυξη κειμένου |
|
Συγχώνευση γνώσης |
Issue date |
2014-01-22 |
Collection
|
School/Department--School of Medicine--Department of Medicine--Doctoral theses
|
|
Type of Work--Doctoral theses
|
Permanent Link |
https://elocus.lib.uoc.gr//dlib/e/3/1/metadata-dlib-1393585002-267763-23161.tkl
|
Views |
261 |