Your browser does not support JavaScript!

Home    Mature miRNA identification via the use of a Naive Bayes classifier  

Results - Details

Add to Basket
[Add to Basket]
Identifier 000342573
Title Mature miRNA identification via the use of a Naive Bayes classifier
Author Γκίρτζου, Αικατερίνη
Thesis advisor Τσακαλίδης, Πάνος
Abstract MicroRNAs (miRNAs) are small single stranded RNAs, on average 22nt long, generated from endogenous hairpin–shaped transcripts with post transcriptional activity. Although many computational methods are currently available for identifying miRNA genes in the genomes of various species, very few algorithms can accurately predict the functional part of the miRNA gene, namely the mature miRNA. We introduce a computational method that uses a Naive Bayes classifier to identify mature miRNA candidates based on sequence and secondary structure information of the miRNA precursor. Specifically, for each mature miRNA, we generate a set of negative examples of equal length on the respective precursor(s). The true and negative sets are then used to estimate probability distributions for the sequence and secondary structure composition on each position along the mature or in flanking regions around it, as well as for the distances of the starting and ending position of the mature from the precursor’s hairpin and ends. The divergence between these distributions is estimated using the symmetric KullbackLeibler metric. The features at which the two distributions differ significantly and consistently over a 10fold crossvalidation procedure are used as features for training the Naive Bayes classifier. We used experimentally verified human and mouse miRNA data to train the classifier and a performance of AUC _ 0.88 was achieved using a consensus averaging over a 10–fold cross–validation procedure. Moreover, we examined four strategies in order to provide the most accurate candidate mature, based on the ranking provided by our model. For each strategy, the confidence that the computational truth was ±6nt away from the true mature was:
a) 86.88% for the top scorer,
b) 88.25% for the middle point of 4 top scorers,
c) 89.34% for the mean value of 4 top scorers and
d) 87.83% for the top scorer and its duplex. Our findings suggest that position specific sequence and structure information and the distance features combined with a simple Bayes classifier achieve a good performance on the challenging task of mature miRNA identification.
Physical description xiii, 66 σ. : εικ. ; 30 cm.
Language English
Issue date 2009-04-02
Collection   School/Department--School of Sciences and Engineering--Department of Computer Science--Post-graduate theses
  Type of Work--Post-graduate theses
Views 501

Digital Documents
No preview available

Download document
View document
Views : 13