SciELO - Scientific Electronic Library Online

 
vol.9 número4La introducción de resultados investigativos, un problema de actualidad en la formación del Ingeniero en Ciencias InformáticasIndicadores para la evaluación de la calidad de la formación del ingeniero en Ciencias Informáticas índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Artigo

Indicadores

  • Não possue artigos citadosCitado por SciELO

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Revista Cubana de Ciencias Informáticas

versão On-line ISSN 2227-1899

Resumo

DIAZ-BARRIOS, Heidy et al. Machine Learning algorithms for Splice Sites classification in genomic sequences. Rev cuba cienc informat [online]. 2015, vol.9, n.4, pp. 155-170. ISSN 2227-1899.

The classification techniques are been used frequently in the solution of different Bioinformatic problems. The ADN sequences in the majority of the gene make a transcript to ARN messenger, whom have led to proteins. The ADN contain in the genes encode segments (exones), and unencode segments (introns). During the process of transcription the introns are cut, that mechanism is call splicing, it put the axons of the gene, one consecutive the other, and ready to lead to the sequence of amino acid to make the protein up. In the splice sites, the beginning of the introns is call donor (AG par), and the end is call acceptor (GT par). A few of these combinations are really splice sites. The present work is about the prediction of splicing. It is used the techniques of machine learning necessary to descript biology domains and two database of nucleates sequences to classify true or false splice sites, with 7000 cases, 6000 false and 1000 true. It is about to proof and compare a series of algorithms using WEKA (Waikato Enviroment for Knowledge Analysis) to find the best classifiers. To make the selection of the best classification it is applied the knowlest measure based in the Matrix of Confusion: accuracy, rate of True Positive (TP), area under the curve of Receiver Operator Curve (ROC), etc. As result of the study it is conclude that the Bayesian methods maximize the number of true positive and the area under the curve, which are the nominations to use to classify splice sites.

Palavras-chave : acceptor; classifiers; donnor; machine learning; splicing.

        · resumo em Espanhol     · texto em Espanhol     · Espanhol ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License