SciELO - Scientific Electronic Library Online

 
vol.14 número4Representación basada en imágenes para el reconocimiento patrones mioeléctricos ante variabilidad inter-sesionesComportamiento de la albúmina en pacientes pediátricos en estadíos de la sepsis índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

  • Não possue artigos citadosCitado por SciELO

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Revista Cubana de Ciencias Informáticas

versão On-line ISSN 2227-1899

Resumo

GONZALEZ VALLE, Yadelis; GALPERT, Deborah; MOLINA-RUIZ, Reinaldo  e  AGUERO-CHAPIN, Guillermin. Feature integration and semi-supervised learning for functional enzyme classification by using Spark K-means. Rev cuba cienc informat [online]. 2020, vol.14, n.4, pp.134-161.  Epub 01-Dez-2020. ISSN 2227-1899.

The functional classification of enzymes has been a field of great interest for bioinformatics for several years. This classification must take into account the scarce information of some classes, the imbalance between them and the increasing number of enzymes to be classified. In this article we investigate the use of semi-supervised and unsupervised clustering algorithms to group similar enzyme sequences, from the integration of alignment-free protein descriptors based on the k-mers method with different k values. Four algorithms were implemented in Spark that group enzymes according to their enzymatic function. These are based on transformations to existing methods such as the Global Logic Combinatorial, the K-means and the Ensemble Clustering. The quality of the clustering was measured using the silhouette index as an internal measure and the F-measure as an external measure. In the experiment, 58 functionally characterized sequences of 501 enzymes of the Glicosil Hidrolasa-70 (GH-70) family (with a high value for biotechnology and that can cause millionaire losses in sugar production) from the CAZy database were taken as reference, with the objective of comparing the results of the implemented grouping methods. There were obtained moderate values of the silhouette index as an internal measure but better than those obtained with the K-means method. The best value of 0.9 of the F-measure of the Ensemble Clustering method combined with semi-supervised learning was achieved.

Palavras-chave : Enzyme clustering; K-mean centroids; unsupervised learning; semi-supervised learning.

        · resumo em Espanhol     · texto em Espanhol     · Espanhol ( pdf )